Removal - BinarySearchTree: An Unbalanced Binary Search Tree

6.2 BinarySearchTree: An Unbalanced Binary Search Tree

6.2.3 Removal

Deleting a value stored in a node, u, of aBinarySearchTree is a little more difficult. Ifuis a leaf, then we can just detachufrom its parent.

Even better: Ifuhas only one child, then we can spliceufrom the tree by havingu.parentadoptu’s child (see Figure6.8):

BinarySearchTree

BinarySearchTree: An Unbalanced Binary Search Tree §6.2

1 3

4 5

6 7

8 12 14

13 9

Figure 6.8: Removing a leaf (6) or a node with only one child (9) is easy.

s.parent = p;

} n--;

}

Things get tricky, though, whenuhas two children. In this case, the simplest thing to do is to find a node,w, that has less than two children and such that w.x can replace u.x. To maintain the binary search tree property, the valuew.xshould be close to the value ofu.x. For example, choosingwsuch thatw.xis the smallest value greater thanu.xwill work.

Finding the nodewis easy; it is the smallest value in the subtree rooted at u.right. This node can be easily removed because it has no left child (see Figure6.9).

BinarySearchTree void remove(Node u) {

if (u.left == nil || u.right == nil) { splice(u);

} else {

Node w = u.right;

while (w.left != nil) w = w.left;

u.x = w.x;

splice(w);

} }

Figure 6.9: Deleting a value (11) from a node,u, with two children is done by replacingu’s value with the smallest value in the right subtree ofu.

6.2.4 Summary

The find(x), add(x), and remove(x) operations in a BinarySearchTree each involve following a path from the root of the tree to some node in the tree. Without knowing more about the shape of the tree it is difficult to say much about the length of this path, except that it is less thann, the number of nodes in the tree. The following (unimpressive) theorem summarizes the performance of theBinarySearchTreedata structure:

Theorem 6.1. BinarySearchTreeimplements theSSetinterface and sup-ports the operationsadd(x),remove(x), andfind(x)inO(n)time per opera-tion.

Theorem6.1compares poorly with Theorem4.1, which shows that the SkiplistSSetstructure can implement theSSetinterface withO(logn) expected time per operation. The problem with theBinarySearchTree structure is that it can becomeunbalanced. Instead of looking like the tree in Figure6.5it can look like a long chain ofnnodes, all but the last having exactly one child.

There are a number of ways of avoiding unbalanced binary search trees, all of which lead to data structures that haveO(logn) time opera-tions. In Chapter7we show how O(logn)expectedtime operations can be achieved with randomization. In Chapter 8 we show how O(logn) amortizedtime operations can be achieved with partial rebuilding opera-tions. In Chapter9we show howO(logn)worst-casetime operations can be achieved by simulating a tree that is not binary: one in which nodes can have up to four children.

Discussion and Exercises §6.3

6.3 Discussion and Exercises

Binary trees have been used to model relationships for thousands of years.

One reason for this is that binary trees naturally model (pedigree) family trees. These are the family trees in which the root is a person, the left and right children are the person’s parents, and so on, recursively. In more recent centuries binary trees have also been used to model species trees in biology, where the leaves of the tree represent extant species and the internal nodes of the tree represent speciation events in which two populations of a single species evolve into two separate species.

Binary search trees appear to have been discovered independently by several groups in the 1950s [48, Section 6.2.2]. Further references to spe-cific kinds of binary search trees are provided in subsequent chapters.

When implementing a binary tree from scratch, there are several de-sign decisions to be made. One of these is the question of whether or not each node stores a pointer to its parent. If most of the operations simply follow a root-to-leaf path, then parent pointers are unnecessary, waste space, and are a potential source of coding errors. On the other hand, the lack of parent pointers means that tree traversals must be done recursively or with the use of an explicit stack. Some other methods (like inserting or deleting into some kinds of balanced binary search trees) are also complicated by the lack of parent pointers.

Another design decision is concerned with how to store the parent, left child, and right child pointers at a node. In the implementation given here, these pointers are stored as separate variables. Another option is to store them in an array,p, of length 3, so thatu.p[0] is the left child of u, u.p[1] is the right child ofu, andu.p[2] is the parent ofu. Using an array this way means that some sequences ofifstatements can be simplified into algebraic expressions.

An example of such a simplification occurs during tree traversal. If a traversal arrives at a nodeufromu.p[i], then the next node in the traver-sal isu.p[(i+ 1) mod 3]. Similar examples occur when there is left-right symmetry. For example, the sibling ofu.p[i] isu.p[(i+ 1) mod 2]. This trick works whetheru.p[i] is a left child (i= 0) or a right child (i= 1) ofu. In some cases this means that some complicated code that would otherwise need to have both a left version and right version can be

writ-ten only once. See the methodsrotateLeft(u) androtateRight(u) on page163for an example.

Exercise 6.1. Prove that a binary tree havingn≥1 nodes hasn−1 edges.

Exercise 6.2. Prove that a binary tree havingn≥1 real nodes hasn+ 1 external nodes.

Exercise 6.3. Prove that, if a binary tree, T, has at least one leaf, then either (a)T’s root has at most one child or (b)T has more than one leaf.

Exercise 6.4. Implement a non-recursive method, size2(u), that com-putes the size of the subtree rooted at nodeu.

Exercise 6.5. Write a non-recursive method,height2(u), that computes the height of nodeuin aBinaryTree.

Exercise 6.6. A binary tree issize-balancedif, for every nodeu, the size of the subtrees rooted atu.leftandu.rightdiffer by at most one. Write a recursive method,isBalanced(), that tests if a binary tree is balanced.

Your method should run inO(n) time. (Be sure to test your code on some large trees with different shapes; it is easy to write a method that takes much longer thanO(n) time.)

Apre-ordertraversal of a binary tree is a traversal that visits each node, u, before any of its children. Anin-ordertraversal visits uafter visiting all the nodes inu’s left subtree but before visiting any of the nodes inu’s right subtree. Apost-ordertraversal visitsuonly after visiting all other nodes inu’s subtree. The pre/in/post-order numbering of a tree labels the nodes of a tree with the integers 0, . . . ,n−1 in the order that they are encountered by a pre/in/post-order traversal. See Figure6.10for an example.

Exercise 6.7. Create a subclass ofBinaryTree whose nodes have fields for storing pre-order, post-order, and in-order numbers. Write recursive methodspreOrderNumber(),inOrderNumber(), andpostOrderNumbers() that assign these numbers correctly. These methods should each run in O(n) time.

Discussion and Exercises §6.3

2 1

4 3

5 0

8 10 11

9 7

0 4

1 3

2 11

5 7 8

9 6

0 1

2 3

4 5

6 9 11

10 7

Figure 6.10: Pre-order, post-order, and in-order numberings of a binary tree.

Exercise 6.8. Implement the non-recursive functionsnextPreOrder(u), nextInOrder(u), andnextPostOrder(u) that return the node that follows u in a pre-order, in-order, or post-order traversal, respectively. These functions should take amortized constant time; if we start at any node uand repeatedly call one of these functions and assign the return value touuntilu=null, then the cost of all these calls should beO(n).

Exercise 6.9. Suppose we are given a binary tree with pre-, post-, and in-order numbers assigned to the nodes. Show how these numbers can be used to answer each of the following questions in constant time:

1. Given a nodeu, determine the size of the subtree rooted atu.

2. Given a nodeu, determine the depth ofu.

3. Given two nodesuandw, determine ifuis an ancestor ofw

Exercise 6.10. Suppose you are given a list of nodes with pre-order and in-order numbers assigned. Prove that there is at most one possible tree with this pre-order/in-order numbering and show how to construct it.

Exercise 6.11. Show that the shape of any binary tree on n nodes can be represented using at most 2(n−1) bits. (Hint: think about recording what happens during a traversal and then playing back that recording to reconstruct the tree.)

Exercise 6.12. Illustrate what happens when we add the values 3.5 and then 4.5 to the binary search tree in Figure6.5.

Exercise 6.13. Illustrate what happens when we remove the values 3 and then 5 from the binary search tree in Figure6.5.

Exercise 6.14. Implement a BinarySearchTreemethod, getLE(x), that returns a list of all items in the tree that are less than or equal tox. The running time of your method should beO(n⁰+h) wheren⁰is the number of items less than or equal toxandhis the height of the tree.

Exercise 6.15. Describe how to add the elements{1, . . . ,n}to an initially emptyBinarySearchTreein such a way that the resulting tree has height n−1. How many ways are there to do this?

Discussion and Exercises §6.3 Exercise 6.16. If we have someBinarySearchTreeand perform the op-erationsadd(x) followed byremove(x) (with the same value ofx) do we necessarily return to the original tree?

Exercise 6.17. Can aremove(x) operation increase the height of any node in aBinarySearchTree? If so, by how much?

Exercise 6.18. Can anadd(x) operation increase the height of any node in aBinarySearchTree? Can it increase the height of the tree? If so, by how much?

Exercise 6.19. Design and implement a version of BinarySearchTree in which each node,u, maintains values u.size(the size of the subtree rooted at u), u.depth (the depth ofu), and u.height(the height of the subtree rooted atu).

These values should be maintained, even during calls to the add(x) andremove(x) operations, but this should not increase the cost of these operations by more than a constant factor.

Chapter 7 Random Binary Search Trees

In this chapter, we present a binary search tree structure that uses ran-domization to achieveO(logn) expected time for all operations.

7.1 Random Binary Search Trees

Consider the two binary search trees shown in Figure7.1, each of which hasn= 15 nodes. The one on the left is a list and the other is a perfectly balanced binary search tree. The one on the left has a height ofn−1 = 14 and the one on the right has a height of three.

Imagine how these two trees could have been constructed. The one on the left occurs if we start with an emptyBinarySearchTreeand add the sequence

h0,1,2,3,4,5,6,7,8,9,10,11,12,13,14i .

No other sequence of additions will create this tree (as you can prove by induction onn). On the other hand, the tree on the right can be created by the sequence

h7,3,11,1,5,9,13,0,2,4,6,8,10,12,14i . Other sequences work as well, including

h7,3,1,5,0,2,4,6,11,9,13,8,10,12,14i , and

h7,3,1,11,5,0,2,4,6,9,13,8,10,12,14i .

Figure 7.1: Two binary search trees containing the integers 0, . . . ,14.

In fact, there are 21,964,800 addition sequences that generate the tree on the right and only one that generates the tree on the left.

The above example gives some anecdotal evidence that, if we choose a random permutation of 0, . . . ,14, and add it into a binary search tree, then we are more likely to get a very balanced tree (the right side of Figure7.1) than we are to get a very unbalanced tree (the left side of Figure7.1).

We can formalize this notion by studying random binary search trees.

Arandom binary search treeof sizenis obtained in the following way: Take a random permutation,x₀, . . . ,x_n₋₁, of the integers 0, . . . ,n−1 and add its elements, one by one, into aBinarySearchTree. Byrandom permutation we mean that each of the possiblen! permutations (orderings) of 0, . . . ,n−1 is equally likely, so that the probability of obtaining any particular per-mutation is 1/n!.

Note that the values 0, . . . ,n−1 could be replaced by any ordered set of nelements without changing any of the properties of the random binary search tree. The elementx∈ {0, . . . ,n−1}is simply standing in for the element of rankxin an ordered set of sizen.

Before we can present our main result about random binary search trees, we must take some time for a short digression to discuss a type of number that comes up frequently when studying randomized structures.

For a non-negative integer,k, thek-thharmonic number, denotedH_k, is

Random Binary Search Trees §7.1

i=11/iis upper- and lower-bounded by two integrals. The value of these integrals is given by the area of the shaded region, while the value ofH_kis given by the area of the rectangles.

defined as

H_k= 1 + 1/2 + 1/3 +···+ 1/k .

The harmonic numberH_khas no simple closed form, but it is very closely related to the natural logarithm ofk. In particular,

lnk < H_k≤lnk+ 1 .

Readers who have studied calculus might notice that this is because the integralR_k

1(1/x)dx = lnk. Keeping in mind that an integral can be in-terpreted as the area between a curve and thex-axis, the value of H_k can be lower-bounded by the integralR_k

1(1/x)dxand upper-bounded by 1 +R_k

1(1/x)dx. (See Figure7.2for a graphical explanation.)

Lemma 7.1. In a random binary search tree of sizen, the following statements hold:

1. For anyx∈ {0, . . . ,n−1}, the expected length of the search path forxis H_x+1+H_n₋_x−O(1).¹

2. For anyx∈(−1, n)\ {0, . . . ,n−1}, the expected length of the search path forxisH_d_x_e+H_n_−d_x_e.

1The expressionsx+1 andn−xcan be interpreted respectively as the number of elements in the tree less than or equal toxand the number of elements in the tree greater than or equal tox.

We will prove Lemma7.1in the next section. For now, consider what the two parts of Lemma7.1tell us. The first part tells us that if we search for an element in a tree of sizen, then the expected length of the search path is at most 2lnn+O(1). The second part tells us the same thing about searching for a value not stored in the tree. When we compare the two parts of the lemma, we see that it is only slightly faster to search for some-thing that is in the tree compared to somesome-thing that is not.

Dans le document Open Data Structures (Page 159-171)