Trees

Introduction

 

   A TREE is a data structure in which the data items are NOT

arranged in a simple list.  This means it is not necessarily natural

to store these items in an array - this makes it significantly

different from a LIST, a STACK, or a QUEUE.

 

   For example, the tree below is used to store all of the 3 letter

words which can be made using the letters A and B:

 

                                  Nothing

                              /             \

                             /               \

                           A                    B

                        /     \              /     \

                      AA        AB         BA         BB

                    /    \    /    \     /    \     /    \

                  AAA   AAB  ABA   ABB  BAA  BAB  BBA   BBB

 

    Notice that each two-letter word "gives birth" to 2 different 3-letter

words, by adding either an A or a B.    Thus, we refer to   ABA and ABB

as the "children" of AB, and we call AB the "parent".

 

    A tree CAN contain more than two children per parent - for example,

draw a tree containing all 3-letter words made from A,B, and C.  In this

case, each small word has 3 children, formed by adding A,B, or C.

 

    The tree above, in which each parent has 2 children, is called a

BINARY tree.  The children are commonly called the LEFT-CHILD and the

RIGHT-CHILD.  The orientation (left or right) is sometimes unimportant,

other times very important.

 

More vocabulary:  The "words" in the tree above are called NODES.

                  The lines are called BRANCHES.

                  A NODE with no children is sometimes called an ATOM.

                  A part of the tree, for example all of the words

                  starting with A, is called a SUB-TREE.

 

     A tree can be used to "think ahead" in a game, such as TIC-TAC-TOE.

Below is a tree starting in the middle of a game of tic-tac-toe.

 

                                   . X .

                                   . O O

                   /~~~~~~~~~~~~   X O X   ~~~~~~~~~~~~\

                  /                  |                  \

                 /                   |                   \

           . X .                   X X .                   . X X

           X O O                   . O O                   . O O

           X O X                   X O X                   X O X

         /       \               /       \               /      \

    O X .         . X O     X X .         X X O     O X X        . X X

    X O O         X O O     O O O         . O O     . O O        O O O

    X O X         X O X     X O X         X O X     X O X        X O X

      |             |      -------          |         |         -------

    O X X         X X O    O-WINS!        X X O     O X X       O-WINS!

    X O O         X O O                   X O O     X O O

    X O X         X O X                   X O X     X O X

   -------       -------                 -------   -------

    DRAW         X-WINS!                 X-WINS!    DRAW

 

 

Storage/Representation

 

Notice that this is NOT a binary tree.  Nor is it a tree with 3 or 4

branches per node.  In this case, there are fewer and fewer branches

at each level (since there are fewer and fewer free squares as the game

progresses), and some branches stop, because the game is over.

 

    Being 2-dimensional, trees are NOT a natural structure for

a digital computer, whose memory locations are arranged in a

simple list (array.)  In order to store a tree in the computer,

we need to make special arrangements.

 

    One way would be to DRAW the tree on the screen, using a mouse,

and then store the picture by storing the pixels.  This is a very

bad method, since a simple question like "How many nodes are there

in this tree?" is extremely difficult to answer.

 

    A better method is to store the tree inside an array.  To do this,

we need to assign each node to a position in the array.  Then, to

represent arrows, we use "pointers" containing the "address" of a node -

this is its position in the array.

 

    The three-letter-AB words tree from the previous page is

represented in arrays here:

 

    Position     Word        Parent    LeftChild   RightChild

    --------   ---------   ---------   ---------   ----------

        1       nothing        0           2           3

        2         A            1           4           5

        3         B            1           6           7

        4         AA           2           8           9

        5         AB           2          10          11

        6         BA           3          12          13

        7         BB           3          14          15

        8         AAA          4           0           0

        9         AAB          4           0           0

       10         ABA          5           0           0

       11         ABB          5           0           0

       12         BAA          6           0           0

       13         BAB          6           0           0

       14         BBA          7           0           0

       15         BBB          7           0           0

 

     Look at the 3rd row, under LeftChild.  This number 6 means that

the left-branch in the tree from B goes to BA.  A ZERO indicates

that there is no branch.  For example, ABB has no children, so both

child pointers are 0.  Similarly, the NOTHING entry (also called the

ROOT) has no parent, so the parent pointer is zero.

 

 

 

Basic Operations

 

  Problem 1: Finish representing the tic-tac-toe tree in

             an array.  You will need a large piece of paper, since

             each node contains a board, rather than a word.

 

   Position   Board      Parent     LeftChild    MiddleChild  RightChild

 

              . X .

      1       . O O        0            2            3           4

              X O X

 

              . X .

      2       X O O        1            5            0           6

              X O X

 

              X X .

      3       . O O        1            7            0           8

              X O X

----------------------------------------------------------------------

      A common example of a binary tree is the process used by young

mathematics students to find the prime-factorization of a number.

The algorithm can be described as follows:

 

    (1) Get a blank sheet of paper                               (Initialize)

    (2) Write the number in the middle at the top of the page.   (Root)

    (3) Think of two numbers which, when you                     (Split)

        multiply them, would equal "the number."

       (a) If you found two numbers, then draw left

           and right branches and put these numbers

           at the ends.

       (b) If you did not find two numbers, then

           there is nothing more to do with this number.

    (4) Do 3 for each number which you wrote in 3.              (Repeat)

        If none of the new numbers can be split, then stop.

    (5) Write down all of the ATOMS - those numbers which

        were not split - this is the prime-factorization.

 

For example, prime factoring 120 can give a variety of trees:

 

                      120                           120

                   /       \                      /     \

                12            10                2        60

              /    \        /     \                    /    \

             3      4      2       5                  2      30

                  /   \                                    /    \

                 2     2                                  2      15

                                                               /    \

    The prime factorization is:                               3      5

          120 = 3*2*2*2*5

 

Oddly enough, although a variety of trees are possible, they all contain

the same atoms - that is, the prime-factorization is unique.

 

The general algorithm for building up a binary tree is -

 

         (1)  Initialize - empty the tree

         (2)  Root - get a value for the root

         (3)  Split an atom

         (4)  Repeat (3) until finished

 


This concept is quite natural, but it is a bit tricky to keep track

of things in arrays.  Thus, it is important to write good procedures

for (1) (2) and (3) - splitting an atom is especially tricky.

 

Problems

 

 Problem (2) :  Write procedures for (1),(2), and (3).  Then use them

    to input a BINARY TREE, for example a prime-factorization tree. The SPLIT

    procedure should work as follows: the user types in the "name" of an atom,

    and the names of the two children, and the appropriate branches are added

    to the arrays. Then write a 4th procedure - print atoms.  This is a very

    simple procedure - it simply searches the list of nodes, looking for zero

    pointers in the left and right children.

 

 Problem (3) :  Automate this process, so that you write a program

    which prime factors a number by using the tree method. After each SPLIT,

    the computer shows the list of atoms. The user types the name of an atom,

    and the computer splits that atom if it is possible.  Thus, the user

    decides which atom to split, but the computer takes care of finding the

    factors.  This means you will need a new procedure which accepts an

    integer as input, and returns two factors.  It could return 0,0 for a

    prime input - that is, a number with no factors.  In this case,

    no split occurs.  Do not allow the following:

 

                   15

                 /    \

                3       5

              /   \

            1       3

                  /   \

                1       3

                ..............  Infinite tree, getting nowhere.

 

Problem (4)  :  Automate the process further: the computer chooses the

   atom which is to be split.  It loops, by itself, until none of the

   atoms can be split.  Then it stops!  This is a bit tricky.  Now you

   should be able to type in a number, and the computer does the prime

   factorization by itself.

 

Problem (5)  :  Although it is of no particular theoretical importance,

   it is nice for people to be able to SEE a tree.  The drawings above are

   quite difficult to produce using a computer program, and have the

   possibility of growing too large for the screen.  Look at the Explorer   

   program in Windows . This is one possible way to represent a tree.  

   Another method, which makes sense in the prime-factorization problem,

   is the following:

 

                          120

                        (12)(10)

                     ((3)(4))((2)(5))

                ((3)((2)(2)))((2)(5))

 

   There is one "formula" for each "level" in the tree.

   Try writing a SHOWTREE procedure, which prints this list of formulas.

 

Problem (6)  :  The prime-factorization problem can produce many different

   trees for the same number.  Can you write a program which would COUNT the

   number of different trees for a given number?