Steven J. Zeil July 11, 2013 Contents 1 Definition: Binary Search Trees 2 1.1 The Binary Search Tree ADT.................................................... 3 2 Implementing Binary Search Trees 7 2.1 Searching a Binary Tree...................................................... 7 2.2 Inserting into Binary Search Trees................................................ 9 2.3 Deletion............................................................... 12 2.3.1 Removing a Leaf...................................................... 16 2.3.2 Removing A Node with a Null Right Child........................................ 21 2.3.3 Removing a Node with Two Non-Null Children.................................... 27 3 How Fast Are Binary Search Trees? 33 3.1 Balancing............................................................... 34 3.2 Performance............................................................. 36 3.3 Average-Case............................................................ 39 3.4 Can We Avoid the Worst Case?.................................................. 41 1
A tree in which every parent has at most 2 children is a binary tree. The most common use of binary trees is for ADTs that require frequent searches for arbitrary keys. E.g., sets, maps For this we use a special form of binary tree, the binary search tree. 1 Definition: Binary Search Trees A binary tree T is a binary search tree if for each node n with children T L and T R : The value in n is greater than the values in every node in T L. The value in n is less than the values in every node in T R. Both T L and T R are binary search trees. 30 20 70 10 50 Question: Is this a BST? 40 60 CS361 2
Answer: 30 20 10 50 40 70 60 Yes, this is a binary search tree. Each node is greater than or equal to all of its left descendants, and is less than or equal than all of its right descendants.............................................. 1.1 The Binary Search Tree ADT Let s look at the basic interface for a binary search tree. template <typename T> class stnode ❶ public : / / stnode i s used to implement the binary search t r e e c l a s s / / making the data public s i m p l i f i e s building the c l a s s functions T nodevalue ; / / node data stnode<t> * l e f t, * right, * parent ; / / child pointers and pointer to the node s parent / / constructor stnode ( const T& item, stnode<t> * l p t r = null_ptr, CS361 3
stnode<t> * rptr = null_ptr, stnode<t> * pptr = null_ ptr ) : nodevalue ( item ), l e f t ( l p t r ), r i g h t ( r ptr ), parent ( pptr ) ; template <typename T> class stree ❷ public : typedef s t r e e _ i t e r a t o r <T> i t e r a t o r ; typedef stree_ const_ iterator <T> const_ iterator ; stree ( ) ; / / constructor. i n i t i a l i z e root to null_ptr and s i z e to 0 stree (T * f i r s t, T * l a s t ) ; / / constructor. i n s e r t the elements from the pointer / / range [ f i r s t, l a s t ) into the t r e e stree ( const stree <T>& tree ) ; / / copy constructor ~ stree ( ) ; / / destructor stree <T>& operator= ( const stree <T>& rhs ) ; / / assignment operator i t e r a t o r find ( const T& item ) ; ❸ / / search for item. i f found, return an i t e r a t o r pointing / / at i t in the t r e e ; otherwise, return end ( ) const_ iterator find ( const T& item ) const ; / / constant version int empty ( ) const ; CS361 4
/ / indicate whether the t r e e i s empty int size ( ) const ; / / return the number of data items in the t r e e std : : pair < i t e r a t o r, bool> i n s e r t ( const T& item ) ; ❹ / / i f item i s not in the tree, i n s e r t i t and / / return a pair whose i t e r a t o r component points / / at item and whose bool component i s true. i f item / / i s in the tree, return a pair whose i t e r a t o r / / component points at the e x i s t i n g item and whose / / bool component i s f a l s e / / Postcondition : the t r e e s i z e increases by 1 i f item / / i s not in the t r e e int erase ( const T& item ) ; ❺ / / i f item i s in the tree, erase i t and return 1; / / otherwise, return 0 / / Postcondition : the t r e e s i z e decreases by 1 i f / / item i s in the t r e e void erase ( i t e r a t o r pos ) ; ❺ / / erase the item pointed to by pos. / / Preconditions : the t r e e i s not empty and pos points / / to an item in the t r e e. i f the t r e e i s empty, the / / function throws the underflowerror exception. i f the / / i t e r a t o r i s invalid, the function throws the referenceerror / / exception. / / Postcondition : the t r e e s i z e decreases by 1 void erase ( i t e r a t o r f i r s t, i t e r a t o r l a s t ) ; ❺ / / erase a l l items in the range [ f i r s t, l a s t ). / / Precondition : the t r e e i s not empty. i f the t r e e / / i s empty, the function throws the underflowerror / / exception. CS361 5
/ / Postcondition : the s i z e of the t r e e decreases by / / the number of elements in the range [ f i r s t, l a s t ) i t e r a t o r begin ( ) ; / / return an i t e r a t o r pointing to the f i r s t item / / inorder const_ iterator begin ( ) const ; / / constant version i t e r a t o r end ( ) ; / / return an i t e r a t o r pointing j u s t past the end of / / the t r e e data const_ iterator end ( ) const ; / / constant version private : stnode<t> * root ; / / pointer to t r e e root int treesize ; / / number of elements in the t r e e stnode<t> *getstnode ( const T& item, stnode<t> * lptr, stnode<t> * rptr, stnode<t> * pptr ) ; / / allocate a new t r e e node and return a pointer to i t. / / i f memory allocation f a i l s, the function throws the / / memoryallocationerror exception stnode<t> * copytree ( stnode<t> * t ) ; / / recursive function used by copy constructor and assignment / / operator to assign the current t r e e as a copy of another t r e e void deletetree ( stnode<t> * t ) ; / / recursive function used by destructor and assignment / / operator to delete a l l the nodes in the t r e e CS361 6
stnode<t> * findnode ( const T& item ) const ; / / search for item in the t r e e. i f i t i s in the tree, / / return a pointer to i t s node ; otherwise, return null_ptr. / / used by find ( ) and erase ( ) friend c l a s s s t r e e _ i t e r a t o r <T>; friend c l a s s stree_ const_ iterator <T>; / / allow the i t e r a t o r c l a s s e s to access the private section / / of s t r e e ; This code is taken from your textbook and is the same code used in our prior discussion of tree iterators. Some points of note: ❶ The stnode template implements individual tree nodes. ❷ The stree template represents the entire tree (the whole collection of related nodes), with functions for searching, insertion, iteration, etc.. Our primary focus in this lecture will be on the find (❸), insert (❹) and erase (❺) functions. 2 Implementing Binary Search Trees Since you have, presumably, read your text s discussion of how to implement BSTs, I m mainly going to hit the high points. 2.1 Searching a Binary Tree We ll start by reviewing the basic searching algorithm. CS361 7
template <typename T> typename stree <T > : : i t e r a t o r stree <T > : : find ( const T& item ) stnode<t> * curr ; / / search t r e e for item curr = findnode ( item ) ; / / i f item found, return const_iterator with value current ; / / otherwise, return end ( ) i f ( curr!= null_ptr ) return i t e r a t o r ( curr, t h i s ) ; return end ( ) ; The tree s find operation works by using a private utility function, findnode, to find a pointer to the node containing the desired data and then uses that pointer to construct an iterator representing the position of that node. CS361 8
We search a tree by comparing the value we re searching for to the current node, t. If the value we want is smaller, we look in the left subtree. If the value we want is larger, we look in the right subtree. / / search for data item in the t r e e. i f found, return i t s node / / address ; otherwise, return null_ptr template <typename T> stnode<t>* stree <T > : : findnode ( const T& item ) const / / c y c l e t through the t r e e starting with root stnode<t> * t = root ; / / terminate on on empty subtree while ( t! = null_ ptr &&! ( item == t >nodevalue ) ) i f ( item < t >nodevalue ) t = t > l e f t ; t = t >right ; / / return pointer to node ; null_ptr i f not found return t ; You may note that this algorithm bears a certain resemblance to the binary search (17) algorithm we studied earlier in the semester. We shall see shortly that the performance of both search algorithms on a collection of N items is O(log N ), but that binary trees support faster insertion operations, allowing us to build the searchable collection in less time than when using binary search over sorted arrays. You can run this algorithm to see how it works. 2.2 Inserting into Binary Search Trees template <typename T> std::pair<typename stree<t>::iterator, bool> stree<t>::insert(const T& item) // t is current node in traversal, parent the previous node stnode<t> *t = root, *parent = null_ptr, *newnode; CS361 9
// terminate on on empty subtree while(t!= null_ptr) // update the parent pointer. then go left or right parent = t; // if a match occurs, return a pair whose iterator // component points at item in the tree and whose // bool component is false if (item == t->nodevalue) return std::pair<iterator, bool> (iterator(t, this), false); if (item < t->nodevalue) t = t->left; t = t->right; // create the new leaf node newnode = getstnode(item,null_ptr,null_ptr,parent); // if parent is null_ptr, insert as root node if (parent == null_ptr) root = newnode; if (item < parent->nodevalue) // insert as left child parent->left = newnode; CS361 10
// insert as right child parent->right = newnode; // increment size treesize++; // return an pair whose iterator component points at // the new node and whose bool component is true return std::pair<iterator, bool> (iterator(newnode, this), true); The first part of the insertion function is closely related to the recursive form of the search. In fact, we are searching for the place where the new data would reside, if it were in the tree. We know we have not found it when we reach a null pointer. Since that pointer (as either the left or right child of some parent node) was found by asking where would this data go if it were in the tree?, we know that we can, in fact, insert the data here. You might want to run this algorithm and experiment with inserting nodes into binary search trees. Take particular note of what happens if you insert data in ascending or descending order, as opposed to inserting randomly ordered data. CS361 11
2.3 Deletion Our tree class actually provides two distinct approaches to erasing. We can erase the data at a given position (iterator) or erase a given value, if it exists. Deleting a value is shown here. We simply do a conventional binary search tree findnode call and, if the value actually exists in the tree, erase the node at the position where we found the data. template <typename T> int stree <T > : : erase ( const T& item ) int numbererased = 1 ; / / search t r e e for item stnode<t> *p = findnode ( item ) ; / / i f item found, delete the node i f (p!= null_ptr ) erase ( i t e r a t o r (p, t h i s ) ) ; numbererased = 0 ; return numbererased ; In essence, this passes the buck to the "erase at a position" function, which we will look at next. template <typename T> void stree<t>::erase(iterator pos) // dnodeptr = pointer to node D that is deleted // pnodeptr = pointer to parent P of node D // rnodeptr = pointer to node R that replaces D stnode<t> *dnodeptr = pos.nodeptr, *pnodeptr, *rnodeptr=null_ptr; if (treesize == 0) throw underflowerror("stree erase(): tree is empty"); if (dnodeptr == null_ptr) CS361 12
throw referenceerror("stree erase(): invalid iterator"); // assign pnodeptr the address of P pnodeptr = dnodeptr->parent; // If D has a null_ptr pointer, the // replacement node is the other child if (dnodeptr->left == null_ptr dnodeptr->right == null_ptr) if (dnodeptr->right == null_ptr) rnodeptr = dnodeptr->left; rnodeptr = dnodeptr->right; if (rnodeptr!= null_ptr) // the parent of R is now the parent of D rnodeptr->parent = pnodeptr; // both pointers of dnodeptr are non-null_ptr. // find and unlink replacement node for D. // starting at the right child of node D, // find the node whose value is the smallest of all // nodes whose values are greater than the value in D. // unlink the node from the tree. // pofrnodeptr = pointer to parent of replacement node stnode<t> *pofrnodeptr = dnodeptr; CS361 13
// first possible replacement is right child of D rnodeptr = dnodeptr->right; // descend down left subtree of the right child of D, // keeping a record of current node and its parent. // when we stop, we have found the replacement while(rnodeptr->left!= null_ptr) pofrnodeptr = rnodeptr; rnodeptr = rnodeptr->left; if (pofrnodeptr == dnodeptr) // right child of deleted node is the replacement. // assign left subtree of D to left subtree of R rnodeptr->left = dnodeptr->left; // assign the parent of D as the parent of R rnodeptr->parent = pnodeptr; // assign the left child of D to have parent R dnodeptr->left->parent = rnodeptr; // we moved at least one node down a left branch // of the right child of D. unlink R from tree by // assigning its right subtree as the left child of // the parent of R pofrnodeptr->left = rnodeptr->right; // the parent of the right child of R is the CS361 14
// parent of R if (rnodeptr->right!= null_ptr)/**/ rnodeptr->right->parent = pofrnodeptr; // put replacement node in place of dnodeptr // assign children of R to be those of D rnodeptr->left = dnodeptr->left; rnodeptr->right = dnodeptr->right; // assign the parent of R to be the parent of D rnodeptr->parent = pnodeptr; // assign the parent pointer in the children // of R to point at R rnodeptr->left->parent = rnodeptr; rnodeptr->right->parent = rnodeptr; // complete the link to the parent node. // deleting the root node. assign new root if (pnodeptr == null_ptr) root = rnodeptr; // attach R to the correct branch of P if (dnodeptr->nodevalue < pnodeptr->nodevalue) pnodeptr->left = rnodeptr; pnodeptr->right = rnodeptr; // delete the node from memory and decrement tree size CS361 15
delete dnodeptr; treesize--; Here is the erase algorithm. For the moment, concentrate on the code for replacing the node we want to erase, pnodeptr, by a replacement node rnodeptr. You can see that it is careful to place the address of the replacement into either the tree root, the left child of the erased node s parent, or the right child of the erased node s parent, depending on the data value in the parent. Most of the code in this function is actually concerned with finding that replacement node. We can break down the problem of finding a suitable replacement when removing a given node from a BST into cases: 1. Removing a leaf 2. Removing a node that has only one child only a left child only a right child 3. Removing a node that has two children 2.3.1 Removing a Leaf CS361 16
30 20 10 50 40 70 60 Case 1: Suppose we wanted to remove the 40 from this tree. What would we have to do so that the remaining nodes would still be a valid BST? Nothing at all! If we simply delete this node (setting the pointer to it from its parent to 0), what s left would still be a perfectly good binary search tree it would satisfy all the BST ordering requirements. Now, take a look at this code for removing a node, pointed at by dnodeptr, from a BST. Find the leaf case, and you can see that all we do is to delete the node. template <typename T> void stree<t>::erase(iterator pos) // dnodeptr = pointer to node D that is deleted // pnodeptr = pointer to parent P of node D // rnodeptr = pointer to node R that replaces D stnode<t> *dnodeptr = pos.nodeptr, *pnodeptr, *rnodeptr=null_ptr; if (treesize == 0) throw underflowerror("stree erase(): tree is empty"); if (dnodeptr == null_ptr) CS361 17
throw referenceerror("stree erase(): invalid iterator"); // assign pnodeptr the address of P pnodeptr = dnodeptr->parent; // If D has a null_ptr pointer, the // replacement node is the other child if (dnodeptr->left == null_ptr dnodeptr->right == null_ptr) if (dnodeptr->right == null_ptr) rnodeptr = dnodeptr->left; rnodeptr = dnodeptr->right; if (rnodeptr!= null_ptr) // the parent of R is now the parent of D rnodeptr->parent = pnodeptr; // both pointers of dnodeptr are non-null_ptr. // find and unlink replacement node for D. // starting at the right child of node D, // find the node whose value is the smallest of all // nodes whose values are greater than the value in D. // unlink the node from the tree. // pofrnodeptr = pointer to parent of replacement node stnode<t> *pofrnodeptr = dnodeptr; CS361 18
// first possible replacement is right child of D rnodeptr = dnodeptr->right; // descend down left subtree of the right child of D, // keeping a record of current node and its parent. // when we stop, we have found the replacement while(rnodeptr->left!= null_ptr) pofrnodeptr = rnodeptr; rnodeptr = rnodeptr->left; if (pofrnodeptr == dnodeptr) // right child of deleted node is the replacement. // assign left subtree of D to left subtree of R rnodeptr->left = dnodeptr->left; // assign the parent of D as the parent of R rnodeptr->parent = pnodeptr; // assign the left child of D to have parent R dnodeptr->left->parent = rnodeptr; // we moved at least one node down a left branch // of the right child of D. unlink R from tree by // assigning its right subtree as the left child of // the parent of R pofrnodeptr->left = rnodeptr->right; CS361 19
// the parent of the right child of R is the // parent of R if (rnodeptr->right!= null_ptr)/**/ rnodeptr->right->parent = pofrnodeptr; // put replacement node in place of dnodeptr // assign children of R to be those of D rnodeptr->left = dnodeptr->left; rnodeptr->right = dnodeptr->right; // assign the parent of R to be the parent of D rnodeptr->parent = pnodeptr; // assign the parent pointer in the children // of R to point at R rnodeptr->left->parent = rnodeptr; rnodeptr->right->parent = rnodeptr; // complete the link to the parent node. // deleting the root node. assign new root if (pnodeptr == null_ptr) root = rnodeptr; // attach R to the correct branch of P if (dnodeptr->nodevalue < pnodeptr->nodevalue) pnodeptr->left = rnodeptr; pnodeptr->right = rnodeptr; // delete the node from memory and decrement tree size delete dnodeptr; CS361 20
treesize--; (Note that when we assign dnodeptr->left to rnodeptr, that in this leaf case dnodeptr->left is null.) So if we are removing a tree leaf, we "replace" it by a null pointer. 2.3.2 Removing A Node with a Null Right Child 30 20 10 50 40 70 60 Case 2: Suppose we wanted to remove the 20 or the 70 from this tree. What would we have to do so that the remaining nodes would still be a valid BST? There is one pointer to the node being deleted, and one pointer from that node to its only child. So this is actually a bit like deleting a node from the middle of a linked list. All we need to do is to reroute the pointer from the parent ( 30 ) to the node we want to remove, making that pointer point directly to the child of the node we are going to remove. CS361 21
30 20 70 10 50 For example, starting from this: 40 60 30 70 Verify for yourself if we remove 20: 10 50 40 60 CS361 22
30 20 10 50 or 70: 40 60 in this manner, that the results are still valid BSTs. template <typename T> void stree<t>::erase(iterator pos) // dnodeptr = pointer to node D that is deleted // pnodeptr = pointer to parent P of node D // rnodeptr = pointer to node R that replaces D stnode<t> *dnodeptr = pos.nodeptr, *pnodeptr, *rnodeptr=null_ptr; if (treesize == 0) throw underflowerror("stree erase(): tree is empty"); if (dnodeptr == null_ptr) throw referenceerror("stree erase(): invalid iterator"); // assign pnodeptr the address of P CS361 23
pnodeptr = dnodeptr->parent; // If D has a null_ptr pointer, the // replacement node is the other child if (dnodeptr->left == null_ptr dnodeptr->right == null_ptr) if (dnodeptr->right == null_ptr) rnodeptr = dnodeptr->left; rnodeptr = dnodeptr->right; if (rnodeptr!= null_ptr) // the parent of R is now the parent of D rnodeptr->parent = pnodeptr; // both pointers of dnodeptr are non-null_ptr. // find and unlink replacement node for D. // starting at the right child of node D, // find the node whose value is the smallest of all // nodes whose values are greater than the value in D. // unlink the node from the tree. // pofrnodeptr = pointer to parent of replacement node stnode<t> *pofrnodeptr = dnodeptr; // first possible replacement is right child of D rnodeptr = dnodeptr->right; CS361 24
// descend down left subtree of the right child of D, // keeping a record of current node and its parent. // when we stop, we have found the replacement while(rnodeptr->left!= null_ptr) pofrnodeptr = rnodeptr; rnodeptr = rnodeptr->left; if (pofrnodeptr == dnodeptr) // right child of deleted node is the replacement. // assign left subtree of D to left subtree of R rnodeptr->left = dnodeptr->left; // assign the parent of D as the parent of R rnodeptr->parent = pnodeptr; // assign the left child of D to have parent R dnodeptr->left->parent = rnodeptr; // we moved at least one node down a left branch // of the right child of D. unlink R from tree by // assigning its right subtree as the left child of // the parent of R pofrnodeptr->left = rnodeptr->right; // the parent of the right child of R is the // parent of R if (rnodeptr->right!= null_ptr)/**/ rnodeptr->right->parent = pofrnodeptr; CS361 25
// put replacement node in place of dnodeptr // assign children of R to be those of D rnodeptr->left = dnodeptr->left; rnodeptr->right = dnodeptr->right; // assign the parent of R to be the parent of D rnodeptr->parent = pnodeptr; // assign the parent pointer in the children // of R to point at R rnodeptr->left->parent = rnodeptr; rnodeptr->right->parent = rnodeptr; // complete the link to the parent node. // deleting the root node. assign new root if (pnodeptr == null_ptr) root = rnodeptr; // attach R to the correct branch of P if (dnodeptr->nodevalue < pnodeptr->nodevalue) pnodeptr->left = rnodeptr; pnodeptr->right = rnodeptr; // delete the node from memory and decrement tree size delete dnodeptr; treesize--; Again, take a look at this code for the case when the node being erased has exactly one child. Notice that its non-null child CS361 26
is chosen as the replacement node, rnodeptr. 2.3.3 Removing a Node with Two Non-Null Children 30 20 10 50 40 70 60 Case 3: Suppose we wanted to remove the 50 or the 30 from this tree. What would we have to do so that the remaining nodes would still be a valid BST? This is a hard case. Clearly, if we remove either the "50" or "30" nodes, we break the tree into pieces, with no obvious place to put the now-detached subtrees. So let s take a different tack. Instead of deleting this node, is there some other data value that we could put into that node that would preserve the BST ordering (all nodes to the left must be less, all nodes to the right must be greater or equal)? There are, in fact, two values that we could safely put in there: the smallest value from the right subtree, or the largest value from the left subtree. We can find the largest value on the left by taking one step to the left then running as far down to the right as we can go We can find the smallest value on the right by taking one step to the right then running as far down to the left as we can go CS361 27
30 20 70 10 50 Now, if we replace 30 by... 40 60 20 20 70... the largest value from the left: 10 50 40 60 CS361 28
40 20 70 10 50 or by the smallest value from the right, 40 60 the results are properly ordered for a BST, except possibly for the node we just copied the value from. But since that node is now redundant, we can delete it from the tree. And here s the best part. Since we find the node to copy from by running as far as we can go in one direction or the other, we know that the node we copied from has at least 1 null child pointer (otherwise we would have kept running past it). So removing it from the tree will always fall into one of the earlier, simpler cases (leaf or only one child). template <typename T> void stree<t>::erase(iterator pos) // dnodeptr = pointer to node D that is deleted // pnodeptr = pointer to parent P of node D // rnodeptr = pointer to node R that replaces D stnode<t> *dnodeptr = pos.nodeptr, *pnodeptr, *rnodeptr=null_ptr; if (treesize == 0) throw underflowerror("stree erase(): tree is empty"); if (dnodeptr == null_ptr) CS361 29
throw referenceerror("stree erase(): invalid iterator"); // assign pnodeptr the address of P pnodeptr = dnodeptr->parent; // If D has a null_ptr pointer, the // replacement node is the other child if (dnodeptr->left == null_ptr dnodeptr->right == null_ptr) if (dnodeptr->right == null_ptr) rnodeptr = dnodeptr->left; rnodeptr = dnodeptr->right; if (rnodeptr!= null_ptr) // the parent of R is now the parent of D rnodeptr->parent = pnodeptr; // both pointers of dnodeptr are non-null_ptr. // find and unlink replacement node for D. // starting at the right child of node D, // find the node whose value is the smallest of all // nodes whose values are greater than the value in D. // unlink the node from the tree. // pofrnodeptr = pointer to parent of replacement node CS361 30
stnode<t> *pofrnodeptr = dnodeptr; // first possible replacement is right child of D rnodeptr = dnodeptr->right; // descend down left subtree of the right child of D, // keeping a record of current node and its parent. // when we stop, we have found the replacement while(rnodeptr->left!= null_ptr) pofrnodeptr = rnodeptr; rnodeptr = rnodeptr->left; if (pofrnodeptr == dnodeptr) // right child of deleted node is the replacement. // assign left subtree of D to left subtree of R rnodeptr->left = dnodeptr->left; // assign the parent of D as the parent of R rnodeptr->parent = pnodeptr; // assign the left child of D to have parent R dnodeptr->left->parent = rnodeptr; CS361 31
// we moved at least one node down a left branch // of the right child of D. unlink R from tree by // assigning its right subtree as the left child of // the parent of R pofrnodeptr->left = rnodeptr->right; // the parent of the right child of R is the // parent of R if (rnodeptr->right!= null_ptr)/**/ rnodeptr->right->parent = pofrnodeptr; // put replacement node in place of dnodeptr // assign children of R to be those of D rnodeptr->left = dnodeptr->left; rnodeptr->right = dnodeptr->right; // assign the parent of R to be the parent of D rnodeptr->parent = pnodeptr; // assign the parent pointer in the children // of R to point at R rnodeptr->left->parent = rnodeptr; rnodeptr->right->parent = rnodeptr; // complete the link to the parent node. CS361 32
// deleting the root node. assign new root if (pnodeptr == null_ptr) root = rnodeptr; // attach R to the correct branch of P if (dnodeptr->nodevalue < pnodeptr->nodevalue) pnodeptr->left = rnodeptr; pnodeptr->right = rnodeptr; // delete the node from memory and decrement tree size delete dnodeptr; treesize--; Again, take a look at the code for removing a node. This code does the "step to the right, then run to the left" behavior we have just described in order to find the replacement node. The remaining code is concerned with removing that replacement node from where it currently resides so that we can then link it in to the parent of the node being erased. Finally, try running this algorithm, available as erase from a position. Try to observe each of the major cases, as outlined here, in action. 3 How Fast Are Binary Search Trees? Each step in the BST insert and findnode algorithms move one level deeper in the tree. Similarly, in erase, the only part that is not constant time is the running down the tree to find the smallest value to the right. The number of recursive calls/loop iterations in all these algorithms is therefore no greater than the height of the tree. But how high can a BST be? That depends on how well the tree is balanced. CS361 33
3.1 Balancing A binary tree is balanced if for every interior node, the height of its two children differ by at most 1. 30 Unbalanced trees are easy to obtain. This is a BST. 20 10 50 70 40 60 CS361 34
10 20 30 40 But, so is this! 50 60 70 The shape of the tree depends upon the order of insertions. The worst case is when the data being inserted is already in order (or in reverse order). In that case, the tree degenerates into a sorted linked list, as shown above. The best case is when the tree is balanced, meaning that, for each node, the heights of the node s children are nearly the same. CS361 35
3.2 Performance / / search for data item in the t r e e. i f found, return i t s node / / address ; otherwise, return null_ptr template <typename T> stnode<t>* stree <T > : : findnode ( const T& item ) const / / c y c l e t through the t r e e starting with root stnode<t> * t = root ; / / terminate on on empty subtree while ( t! = null_ ptr &&! ( item == t >nodevalue ) ) i f ( item < t >nodevalue ) t = t > l e f t ; t = t >right ; Consider the findnode operation on a nearly balanced tree with N nodes. Question: What is the complexity of the best case? O(1) O(log N ) O(N ) O(N log N ) / / return pointer to node ; null_ptr i f not found return t ; O(N 2) CS361 36
Answer: In the best case, we find what we re looking for in the root of the tree. That s O(1) time.............................................. / / search for data item in the t r e e. i f found, return i t s node / / address ; otherwise, return null_ptr template <typename T> stnode<t>* stree <T > : : findnode ( const T& item ) const / / c y c l e t through the t r e e starting with root stnode<t> * t = root ; / / terminate on on empty subtree while ( t! = null_ ptr &&! ( item == t >nodevalue ) ) i f ( item < t >nodevalue ) t = t > l e f t ; t = t >right ; / / return pointer to node ; null_ptr i f not found return t ; Question: Consider the findnode operation on a nearly balanced tree with N nodes. What is the complexity of the worst case? O(1) O(log N ) O(N ) O(N log N ) O(N 2) CS361 37
Answer: The findnode operation starts at the root and moves down one level each recursion. So it is, in the worst case, O(h) where h is the height of the tree.............................................. But how high is a balanced tree? A nearly balanced tree will be height log N. Consider a tree that is completely balanced and has its lowest level full. Since every node on the lowest level shares a parent with one other, there will be exactly half as many nodes on the next-to-lowest level as on the lowest. And, by the same reasoning, each level will have half as many nodes as the one below it, until we finally get to the single root at the top of the tree. So a balanced tree has height log N, and searching a balanced binary tree would be O(log N ). / / search for data item in the t r e e. i f found, return i t s node / / address ; otherwise, return null_ptr template <typename T> stnode<t>* stree <T > : : findnode ( const T& item ) const / / c y c l e t through the t r e e starting with root stnode<t> * t = root ; / / terminate on on empty subtree while ( t! = null_ ptr &&! ( item == t >nodevalue ) ) i f ( item < t >nodevalue ) t = t > l e f t ; t = t >right ; / / return pointer to node ; null_ptr i f not found return t ; Question: Consider the findnode operation on a degenerate tree with N nodes. What is the complexity of the worst case? O(1) O(log N ) O(N ) O(N log N ) O(N 2) CS361 38
Answer: 10 20 30 40 50 A degenerate tree looks like a linked list. In the worst case, the value we re looking for is at the end of the list, so we have to search through all N nodes to get there. Thus the worst case is O(N ). 60 70............................................. There s quite a difference, then, between the worst case behavior of trees, depending upon the tree s shape. 3.3 Average-Case So the question is, does the "average" binary tree look more like the balanced or the degenerate case? An intuitive argument is: No tree with n nodes has height < log(n) No tree with n nodes has height > n Average depth of all nodes is therefore bounded between n/2 and (logn)/2. The more unbalanced a tree is, the less likely that a random insertion would increase the tree height. CS361 39
40 20 60 For example, if we are inserting into this tree, then any insertion will increase the tree s height. 10 30 50 70 10 20 But if we were inserting a randomly selected value into this one, then there is only a 2/8 chance that we will increase the height of the tree. 30 40 50 60 70 CS361 40
30 20 10 50 40 70 60 For trees that are somewhere between those two extremes, the chances of a random insertion actually increasing the height of the tree will fall somewhere between those two probability extremes. Insertions that don t increase the tree height make the tree more balanced. So, the more unbalanced a tree is, the more likely that a random insertion will actually tend to increase the balance of the tree. This suggests (but does not prove) that randomly constructed binary search trees tend to be reasonably balanced. It is possible to prove this claim, but the proof is beyond the scope of this class. But, it s not safe to be too sanguine about the height of binary search trees. Although random construction tends to yield reasonable balance, in real applications we often do not get random values. Question: Which of the following data would, if inserted into an initially empty binary search tree, yield a degenerate tree? data that is in ascending order data that is in descending order both of the above none of the above 3.4 Can We Avoid the Worst Case? Both data in ascending and descending order results in degenerate trees. (Try it if you are not convinced.) CS361 41
It s very common to get data that is in sorted or almost sorted order, so degenerate behavior turns out to be more common than we might expect. Also, the arguments made so far don t take deletions into account, which tend to unbalance trees. Later, we ll look at variants of the binary search tree that use more elaborate insertion and deletion algorithms to maintain tree balance. CS361 42