IB Computing 2 Trees (1) Introduction

Trees

Introduction

A TREE is a data structure in which the data items are NOT

arranged in a simple list. This means it is not necessarily natural

to store these items in an array - this makes it significantly

different from a LIST, a STACK, or a QUEUE.

For example, the tree below is used to store all of the 3 letter

words which can be made using the letters A and B:

Nothing

/ \

A B

/ \ / \

AA AB BA BB

/ \ / \ / \ / \

AAA AAB ABA ABB BAA BAB BBA BBB

Notice that each two-letter word "gives birth" to 2 different 3-letter

words, by adding either an A or a B. Thus, we refer to ABA and ABB

as the "children" of AB, and we call AB the "parent".

A tree CAN contain more than two children per parent - for example,

draw a tree containing all 3-letter words made from A,B, and C. In this

case, each small word has 3 children, formed by adding A,B, or C.

The tree above, in which each parent has 2 children, is called a

BINARY tree. The children are commonly called the LEFT-CHILD and the

RIGHT-CHILD. The orientation (left or right) is sometimes unimportant,

other times very important.

More vocabulary: The "words" in the tree above are called NODES.

The lines are called BRANCHES.

A NODE with no children is sometimes called an ATOM.

A part of the tree, for example all of the words

starting with A, is called a SUB-TREE.

A tree can be used to "think ahead" in a game, such as TIC-TAC-TOE.

Below is a tree starting in the middle of a game of tic-tac-toe.

. X .

. O O

/~~~~~~~~~~~~ X O X ~~~~~~~~~~~~\

/ | \

. X . X X . . X X

X O O . O O . O O

X O X X O X X O X

/ \ / \ / \

O X . . X O X X . X X O O X X . X X

X O O X O O O O O . O O . O O O O O

X O X X O X X O X X O X X O X X O X

| | ------- | | -------

O X X X X O O-WINS! X X O O X X O-WINS!

X O O X O O X O O X O O

X O X X O X X O X X O X

------- ------- ------- -------

DRAW X-WINS! X-WINS! DRAW

Storage/Representation

Notice that this is NOT a binary tree. Nor is it a tree with 3 or 4

branches per node. In this case, there are fewer and fewer branches

at each level (since there are fewer and fewer free squares as the game

progresses), and some branches stop, because the game is over.

Being 2-dimensional, trees are NOT a natural structure for

a digital computer, whose memory locations are arranged in a

simple list (array.) In order to store a tree in the computer,

we need to make special arrangements.

One way would be to DRAW the tree on the screen, using a mouse,

and then store the picture by storing the pixels. This is a very

bad method, since a simple question like "How many nodes are there

in this tree?" is extremely difficult to answer.

A better method is to store the tree inside an array. To do this,

we need to assign each node to a position in the array. Then, to

represent arrows, we use "pointers" containing the "address" of a node -

this is its position in the array.

The three-letter-AB words tree from the previous page is

represented in arrays here:

Position Word Parent LeftChild RightChild

-------- --------- --------- --------- ----------

1 nothing 0 2 3

2 A 1 4 5

3 B 1 6 7

4 AA 2 8 9

5 AB 2 10 11

6 BA 3 12 13

7 BB 3 14 15

8 AAA 4 0 0

9 AAB 4 0 0

10 ABA 5 0 0

11 ABB 5 0 0

12 BAA 6 0 0

13 BAB 6 0 0

14 BBA 7 0 0

15 BBB 7 0 0

Look at the 3rd row, under LeftChild. This number 6 means that

the left-branch in the tree from B goes to BA. A ZERO indicates

that there is no branch. For example, ABB has no children, so both

child pointers are 0. Similarly, the NOTHING entry (also called the

ROOT) has no parent, so the parent pointer is zero.

Basic Operations

Problem 1: Finish representing the tic-tac-toe tree in

an array. You will need a large piece of paper, since

each node contains a board, rather than a word.

Position Board Parent LeftChild MiddleChild RightChild

. X .

1 . O O 0 2 3 4

X O X

. X .

2 X O O 1 5 0 6

X O X

X X .

3 . O O 1 7 0 8

X O X

----------------------------------------------------------------------

A common example of a binary tree is the process used by young

mathematics students to find the prime-factorization of a number.

The algorithm can be described as follows:

(1) Get a blank sheet of paper (Initialize)

(2) Write the number in the middle at the top of the page. (Root)

(3) Think of two numbers which, when you (Split)

multiply them, would equal "the number."

(a) If you found two numbers, then draw left

and right branches and put these numbers

at the ends.

(b) If you did not find two numbers, then

there is nothing more to do with this number.

(4) Do 3 for each number which you wrote in 3. (Repeat)

If none of the new numbers can be split, then stop.

(5) Write down all of the ATOMS - those numbers which

were not split - this is the prime-factorization.

For example, prime factoring 120 can give a variety of trees:

120 120

/ \ / \

12 10 2 60

/ \ / \ / \

3 4 2 5 2 30

/ \ / \

2 2 2 15

/ \

The prime factorization is: 3 5

120 = 3*2*2*2*5

Oddly enough, although a variety of trees are possible, they all contain

the same atoms - that is, the prime-factorization is unique.

The general algorithm for building up a binary tree is -

(1) Initialize - empty the tree

(2) Root - get a value for the root

(3) Split an atom

(4) Repeat (3) until finished

This concept is quite natural, but it is a bit tricky to keep track

of things in arrays. Thus, it is important to write good procedures

for (1) (2) and (3) - splitting an atom is especially tricky.

Problems

Problem (2) : Write procedures for (1),(2), and (3). Then use them

to input a BINARY TREE, for example a prime-factorization tree. The SPLIT

procedure should work as follows: the user types in the "name" of an atom,

and the names of the two children, and the appropriate branches are added

to the arrays. Then write a 4th procedure - print atoms. This is a very

simple procedure - it simply searches the list of nodes, looking for zero

pointers in the left and right children.

Problem (3) : Automate this process, so that you write a program

which prime factors a number by using the tree method. After each SPLIT,

the computer shows the list of atoms. The user types the name of an atom,

and the computer splits that atom if it is possible. Thus, the user

decides which atom to split, but the computer takes care of finding the

factors. This means you will need a new procedure which accepts an

integer as input, and returns two factors. It could return 0,0 for a

prime input - that is, a number with no factors. In this case,

no split occurs. Do not allow the following:

/ \

3 5

/ \

1 3

/ \

1 3

.............. Infinite tree, getting nowhere.

Problem (4) : Automate the process further: the computer chooses the

atom which is to be split. It loops, by itself, until none of the

atoms can be split. Then it stops! This is a bit tricky. Now you

should be able to type in a number, and the computer does the prime

factorization by itself.

Problem (5) : Although it is of no particular theoretical importance,

it is nice for people to be able to SEE a tree. The drawings above are

quite difficult to produce using a computer program, and have the

possibility of growing too large for the screen. Look at the Explorer

program in Windows . This is one possible way to represent a tree.

Another method, which makes sense in the prime-factorization problem,

is the following:

120

(12)(10)

((3)(4))((2)(5))

((3)((2)(2)))((2)(5))

There is one "formula" for each "level" in the tree.

Try writing a SHOWTREE procedure, which prints this list of formulas.

Problem (6) : The prime-factorization problem can produce many different

trees for the same number. Can you write a program which would COUNT the

number of different trees for a given number?