Binary Search Trees. Contents. Steven J. Zeil. July 11, Definition: Binary Search Trees The Binary Search Tree ADT...

Similar documents
the pointer range [first, last) into the tree

Advanced Data Structures and Algorithms

Traversing Trees with Iterators

Traversing Trees with Iterators

Tutorial AVL TREES. arra[5] = {1,2,3,4,5} arrb[8] = {20,30,80,40,10,60,50,70} FIGURE 1 Equivalent Binary Search and AVL Trees. arra = {1, 2, 3, 4, 5}

void insert( Type const & ) void push_front( Type const & )

Binary Trees and Binary Search Trees

Tree traversals and binary trees

Algorithms. AVL Tree

Binary Trees. BSTs. For example: Jargon: Data Structures & Algorithms. root node. level: internal node. edge.

Trees. Chapter 6. strings. 3 Both position and Enumerator are similar in concept to C++ iterators, although the details are quite different.

Binary Trees, Binary Search Trees

Module 4: Index Structures Lecture 13: Index structure. The Lecture Contains: Index structure. Binary search tree (BST) B-tree. B+-tree.

Title Description Participants Textbook

Trees. (Trees) Data Structures and Programming Spring / 28

Chapter 20: Binary Trees

CSCI-1200 Data Structures Fall 2017 Lecture 17 Trees, Part I

Trees, Part 1: Unbalanced Trees

Programming II (CS300)

CSCI2100B Data Structures Trees

Operations on Heap Tree The major operations required to be performed on a heap tree are Insertion, Deletion, and Merging.

8. Binary Search Tree

CS102 Binary Search Trees

SCJ2013 Data Structure & Algorithms. Binary Search Tree. Nor Bahiah Hj Ahmad

! Tree: set of nodes and directed edges. ! Parent: source node of directed edge. ! Child: terminal node of directed edge

Data Structures and Algorithms

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology

Advanced Java Concepts Unit 5: Trees. Notes and Exercises

CSE100. Advanced Data Structures. Lecture 13. (Based on Paul Kube course materials)

Binary Search Tree. Revised based on textbook author s notes.

Trees 2: Linked Representation, Tree Traversal, and Binary Search Trees

Trees 2: Linked Representation, Tree Traversal, and Binary Search Trees

INF2220: algorithms and data structures Series 1

CS302 - Data Structures using C++

Data Structures and Algorithms

Lecture 15 Binary Search Trees

4. Trees. 4.1 Preliminaries. 4.2 Binary trees. 4.3 Binary search trees. 4.4 AVL trees. 4.5 Splay trees. 4.6 B-trees. 4. Trees

CSCI-1200 Data Structures Fall 2018 Lecture 19 Trees, Part III

Largest Online Community of VU Students

Fall, 2015 Prof. Jungkeun Park

Lecture 23: Binary Search Trees

CSCI-1200 Data Structures Spring 2018 Lecture 16 Trees, Part I

Uses for Trees About Trees Binary Trees. Trees. Seth Long. January 31, 2010

BINARY SEARCH TREES cs2420 Introduction to Algorithms and Data Structures Spring 2015

Data Structures in Java

Lecture Notes on Binary Search Trees

- 1 - Handout #22S May 24, 2013 Practice Second Midterm Exam Solutions. CS106B Spring 2013

CSCI-1200 Data Structures Spring 2015 Lecture 18 Trees, Part I

CSCI-1200 Data Structures Spring 2015 Lecture 20 Trees, Part III

CSC212 Data Structure - Section FG

Priority Queues. 1 Introduction. 2 Naïve Implementations. CSci 335 Software Design and Analysis III Chapter 6 Priority Queues. Prof.

Lecture Notes on Binary Search Trees

Discussion 2C Notes (Week 8, February 25) TA: Brian Choi Section Webpage:

CSCI Trees. Mark Redekopp David Kempe

Search Trees. Computer Science S-111 Harvard University David G. Sullivan, Ph.D. Binary Search Trees

Balanced Binary Search Trees. Victor Gao

An AVL tree with N nodes is an excellent data. The Big-Oh analysis shows that most operations finish within O(log N) time

Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19

Lecture 15 Notes Binary Search Trees

! Tree: set of nodes and directed edges. ! Parent: source node of directed edge. ! Child: terminal node of directed edge

Parallel and Sequential Data Structures and Algorithms Lecture (Spring 2012) Lecture 16 Treaps; Augmented BSTs

Binary Trees and Huffman Encoding Binary Search Trees

Introduction. for large input, even access time may be prohibitive we need data structures that exhibit times closer to O(log N) binary search tree

Trees, Binary Trees, and Binary Search Trees

Advanced Set Representation Methods

Trees. Eric McCreath

Binary Search Trees Treesort

Friday Four Square! 4:15PM, Outside Gates

CSCI-1200 Data Structures Fall 2017 Lecture 18 Trees, Part II

Trees can be used to store entire records from a database, serving as an in-memory representation of the collection of records in a file.

Data Structures and Algorithms for Engineers

1 Binary trees. 1 Binary search trees. 1 Traversal. 1 Insertion. 1 An empty structure is an empty tree.

CSE 250 Final Exam. Fall 2013 Time: 3 hours. Dec 11, No electronic devices of any kind. You can open your textbook and notes

CMSC 341. Binary Search Trees CMSC 341 BST

A set of nodes (or vertices) with a single starting point

Binary Search Trees. See Section 11.1 of the text.

Tree: non-recursive definition. Trees, Binary Search Trees, and Heaps. Tree: recursive definition. Tree: example.

Lecture 34. Wednesday, April 6 CS 215 Fundamentals of Programming II - Lecture 34 1

Motivation Computer Information Systems Storage Retrieval Updates. Binary Search Trees. OrderedStructures. Binary Search Tree

Lecture Notes on Binary Search Trees

Spring 2018 Mentoring 8: March 14, Binary Trees

Figure 1. A breadth-first traversal.

Lecture 5. Treaps Find, insert, delete, split, and join in treaps Randomized search trees Randomized search tree time costs

Lec 17 April 8. Topics: binary Trees expression trees. (Chapter 5 of text)

6. Asymptotics: The Big-O and Other Notations

CSCI-1200 Data Structures Fall 2014 Lecture 18 Trees, Part III

Binary Trees. Height 1

3. Priority Queues. ADT Stack : LIFO. ADT Queue : FIFO. ADT Priority Queue : pick the element with the lowest (or highest) priority.

Lecture Notes on Binary Search Trees

Trees. CSE 373 Data Structures

Computational Optimization ISE 407. Lecture 16. Dr. Ted Ralphs

LECTURE 18 AVL TREES

IX. Binary Trees (Chapter 10)

Binary Search Trees, etc.

CS 171: Introduction to Computer Science II. Binary Search Trees

Recall: Properties of B-Trees

Tree Travsersals and BST Iterators

Dynamic Access Binary Search Trees

examines every node of a list until a matching node is found, or until all nodes have been examined and no match is found.

Exercise 1 : B-Trees [ =17pts]

Transcription:

Steven J. Zeil July 11, 2013 Contents 1 Definition: Binary Search Trees 2 1.1 The Binary Search Tree ADT.................................................... 3 2 Implementing Binary Search Trees 7 2.1 Searching a Binary Tree...................................................... 7 2.2 Inserting into Binary Search Trees................................................ 9 2.3 Deletion............................................................... 12 2.3.1 Removing a Leaf...................................................... 16 2.3.2 Removing A Node with a Null Right Child........................................ 21 2.3.3 Removing a Node with Two Non-Null Children.................................... 27 3 How Fast Are Binary Search Trees? 33 3.1 Balancing............................................................... 34 3.2 Performance............................................................. 36 3.3 Average-Case............................................................ 39 3.4 Can We Avoid the Worst Case?.................................................. 41 1

A tree in which every parent has at most 2 children is a binary tree. The most common use of binary trees is for ADTs that require frequent searches for arbitrary keys. E.g., sets, maps For this we use a special form of binary tree, the binary search tree. 1 Definition: Binary Search Trees A binary tree T is a binary search tree if for each node n with children T L and T R : The value in n is greater than the values in every node in T L. The value in n is less than the values in every node in T R. Both T L and T R are binary search trees. 30 20 70 10 50 Question: Is this a BST? 40 60 CS361 2

Answer: 30 20 10 50 40 70 60 Yes, this is a binary search tree. Each node is greater than or equal to all of its left descendants, and is less than or equal than all of its right descendants.............................................. 1.1 The Binary Search Tree ADT Let s look at the basic interface for a binary search tree. template <typename T> class stnode ❶ public : / / stnode i s used to implement the binary search t r e e c l a s s / / making the data public s i m p l i f i e s building the c l a s s functions T nodevalue ; / / node data stnode<t> * l e f t, * right, * parent ; / / child pointers and pointer to the node s parent / / constructor stnode ( const T& item, stnode<t> * l p t r = null_ptr, CS361 3

stnode<t> * rptr = null_ptr, stnode<t> * pptr = null_ ptr ) : nodevalue ( item ), l e f t ( l p t r ), r i g h t ( r ptr ), parent ( pptr ) ; template <typename T> class stree ❷ public : typedef s t r e e _ i t e r a t o r <T> i t e r a t o r ; typedef stree_ const_ iterator <T> const_ iterator ; stree ( ) ; / / constructor. i n i t i a l i z e root to null_ptr and s i z e to 0 stree (T * f i r s t, T * l a s t ) ; / / constructor. i n s e r t the elements from the pointer / / range [ f i r s t, l a s t ) into the t r e e stree ( const stree <T>& tree ) ; / / copy constructor ~ stree ( ) ; / / destructor stree <T>& operator= ( const stree <T>& rhs ) ; / / assignment operator i t e r a t o r find ( const T& item ) ; ❸ / / search for item. i f found, return an i t e r a t o r pointing / / at i t in the t r e e ; otherwise, return end ( ) const_ iterator find ( const T& item ) const ; / / constant version int empty ( ) const ; CS361 4

/ / indicate whether the t r e e i s empty int size ( ) const ; / / return the number of data items in the t r e e std : : pair < i t e r a t o r, bool> i n s e r t ( const T& item ) ; ❹ / / i f item i s not in the tree, i n s e r t i t and / / return a pair whose i t e r a t o r component points / / at item and whose bool component i s true. i f item / / i s in the tree, return a pair whose i t e r a t o r / / component points at the e x i s t i n g item and whose / / bool component i s f a l s e / / Postcondition : the t r e e s i z e increases by 1 i f item / / i s not in the t r e e int erase ( const T& item ) ; ❺ / / i f item i s in the tree, erase i t and return 1; / / otherwise, return 0 / / Postcondition : the t r e e s i z e decreases by 1 i f / / item i s in the t r e e void erase ( i t e r a t o r pos ) ; ❺ / / erase the item pointed to by pos. / / Preconditions : the t r e e i s not empty and pos points / / to an item in the t r e e. i f the t r e e i s empty, the / / function throws the underflowerror exception. i f the / / i t e r a t o r i s invalid, the function throws the referenceerror / / exception. / / Postcondition : the t r e e s i z e decreases by 1 void erase ( i t e r a t o r f i r s t, i t e r a t o r l a s t ) ; ❺ / / erase a l l items in the range [ f i r s t, l a s t ). / / Precondition : the t r e e i s not empty. i f the t r e e / / i s empty, the function throws the underflowerror / / exception. CS361 5

/ / Postcondition : the s i z e of the t r e e decreases by / / the number of elements in the range [ f i r s t, l a s t ) i t e r a t o r begin ( ) ; / / return an i t e r a t o r pointing to the f i r s t item / / inorder const_ iterator begin ( ) const ; / / constant version i t e r a t o r end ( ) ; / / return an i t e r a t o r pointing j u s t past the end of / / the t r e e data const_ iterator end ( ) const ; / / constant version private : stnode<t> * root ; / / pointer to t r e e root int treesize ; / / number of elements in the t r e e stnode<t> *getstnode ( const T& item, stnode<t> * lptr, stnode<t> * rptr, stnode<t> * pptr ) ; / / allocate a new t r e e node and return a pointer to i t. / / i f memory allocation f a i l s, the function throws the / / memoryallocationerror exception stnode<t> * copytree ( stnode<t> * t ) ; / / recursive function used by copy constructor and assignment / / operator to assign the current t r e e as a copy of another t r e e void deletetree ( stnode<t> * t ) ; / / recursive function used by destructor and assignment / / operator to delete a l l the nodes in the t r e e CS361 6

stnode<t> * findnode ( const T& item ) const ; / / search for item in the t r e e. i f i t i s in the tree, / / return a pointer to i t s node ; otherwise, return null_ptr. / / used by find ( ) and erase ( ) friend c l a s s s t r e e _ i t e r a t o r <T>; friend c l a s s stree_ const_ iterator <T>; / / allow the i t e r a t o r c l a s s e s to access the private section / / of s t r e e ; This code is taken from your textbook and is the same code used in our prior discussion of tree iterators. Some points of note: ❶ The stnode template implements individual tree nodes. ❷ The stree template represents the entire tree (the whole collection of related nodes), with functions for searching, insertion, iteration, etc.. Our primary focus in this lecture will be on the find (❸), insert (❹) and erase (❺) functions. 2 Implementing Binary Search Trees Since you have, presumably, read your text s discussion of how to implement BSTs, I m mainly going to hit the high points. 2.1 Searching a Binary Tree We ll start by reviewing the basic searching algorithm. CS361 7

template <typename T> typename stree <T > : : i t e r a t o r stree <T > : : find ( const T& item ) stnode<t> * curr ; / / search t r e e for item curr = findnode ( item ) ; / / i f item found, return const_iterator with value current ; / / otherwise, return end ( ) i f ( curr!= null_ptr ) return i t e r a t o r ( curr, t h i s ) ; return end ( ) ; The tree s find operation works by using a private utility function, findnode, to find a pointer to the node containing the desired data and then uses that pointer to construct an iterator representing the position of that node. CS361 8

We search a tree by comparing the value we re searching for to the current node, t. If the value we want is smaller, we look in the left subtree. If the value we want is larger, we look in the right subtree. / / search for data item in the t r e e. i f found, return i t s node / / address ; otherwise, return null_ptr template <typename T> stnode<t>* stree <T > : : findnode ( const T& item ) const / / c y c l e t through the t r e e starting with root stnode<t> * t = root ; / / terminate on on empty subtree while ( t! = null_ ptr &&! ( item == t >nodevalue ) ) i f ( item < t >nodevalue ) t = t > l e f t ; t = t >right ; / / return pointer to node ; null_ptr i f not found return t ; You may note that this algorithm bears a certain resemblance to the binary search (17) algorithm we studied earlier in the semester. We shall see shortly that the performance of both search algorithms on a collection of N items is O(log N ), but that binary trees support faster insertion operations, allowing us to build the searchable collection in less time than when using binary search over sorted arrays. You can run this algorithm to see how it works. 2.2 Inserting into Binary Search Trees template <typename T> std::pair<typename stree<t>::iterator, bool> stree<t>::insert(const T& item) // t is current node in traversal, parent the previous node stnode<t> *t = root, *parent = null_ptr, *newnode; CS361 9

// terminate on on empty subtree while(t!= null_ptr) // update the parent pointer. then go left or right parent = t; // if a match occurs, return a pair whose iterator // component points at item in the tree and whose // bool component is false if (item == t->nodevalue) return std::pair<iterator, bool> (iterator(t, this), false); if (item < t->nodevalue) t = t->left; t = t->right; // create the new leaf node newnode = getstnode(item,null_ptr,null_ptr,parent); // if parent is null_ptr, insert as root node if (parent == null_ptr) root = newnode; if (item < parent->nodevalue) // insert as left child parent->left = newnode; CS361 10

// insert as right child parent->right = newnode; // increment size treesize++; // return an pair whose iterator component points at // the new node and whose bool component is true return std::pair<iterator, bool> (iterator(newnode, this), true); The first part of the insertion function is closely related to the recursive form of the search. In fact, we are searching for the place where the new data would reside, if it were in the tree. We know we have not found it when we reach a null pointer. Since that pointer (as either the left or right child of some parent node) was found by asking where would this data go if it were in the tree?, we know that we can, in fact, insert the data here. You might want to run this algorithm and experiment with inserting nodes into binary search trees. Take particular note of what happens if you insert data in ascending or descending order, as opposed to inserting randomly ordered data. CS361 11

2.3 Deletion Our tree class actually provides two distinct approaches to erasing. We can erase the data at a given position (iterator) or erase a given value, if it exists. Deleting a value is shown here. We simply do a conventional binary search tree findnode call and, if the value actually exists in the tree, erase the node at the position where we found the data. template <typename T> int stree <T > : : erase ( const T& item ) int numbererased = 1 ; / / search t r e e for item stnode<t> *p = findnode ( item ) ; / / i f item found, delete the node i f (p!= null_ptr ) erase ( i t e r a t o r (p, t h i s ) ) ; numbererased = 0 ; return numbererased ; In essence, this passes the buck to the "erase at a position" function, which we will look at next. template <typename T> void stree<t>::erase(iterator pos) // dnodeptr = pointer to node D that is deleted // pnodeptr = pointer to parent P of node D // rnodeptr = pointer to node R that replaces D stnode<t> *dnodeptr = pos.nodeptr, *pnodeptr, *rnodeptr=null_ptr; if (treesize == 0) throw underflowerror("stree erase(): tree is empty"); if (dnodeptr == null_ptr) CS361 12

throw referenceerror("stree erase(): invalid iterator"); // assign pnodeptr the address of P pnodeptr = dnodeptr->parent; // If D has a null_ptr pointer, the // replacement node is the other child if (dnodeptr->left == null_ptr dnodeptr->right == null_ptr) if (dnodeptr->right == null_ptr) rnodeptr = dnodeptr->left; rnodeptr = dnodeptr->right; if (rnodeptr!= null_ptr) // the parent of R is now the parent of D rnodeptr->parent = pnodeptr; // both pointers of dnodeptr are non-null_ptr. // find and unlink replacement node for D. // starting at the right child of node D, // find the node whose value is the smallest of all // nodes whose values are greater than the value in D. // unlink the node from the tree. // pofrnodeptr = pointer to parent of replacement node stnode<t> *pofrnodeptr = dnodeptr; CS361 13

// first possible replacement is right child of D rnodeptr = dnodeptr->right; // descend down left subtree of the right child of D, // keeping a record of current node and its parent. // when we stop, we have found the replacement while(rnodeptr->left!= null_ptr) pofrnodeptr = rnodeptr; rnodeptr = rnodeptr->left; if (pofrnodeptr == dnodeptr) // right child of deleted node is the replacement. // assign left subtree of D to left subtree of R rnodeptr->left = dnodeptr->left; // assign the parent of D as the parent of R rnodeptr->parent = pnodeptr; // assign the left child of D to have parent R dnodeptr->left->parent = rnodeptr; // we moved at least one node down a left branch // of the right child of D. unlink R from tree by // assigning its right subtree as the left child of // the parent of R pofrnodeptr->left = rnodeptr->right; // the parent of the right child of R is the CS361 14

// parent of R if (rnodeptr->right!= null_ptr)/**/ rnodeptr->right->parent = pofrnodeptr; // put replacement node in place of dnodeptr // assign children of R to be those of D rnodeptr->left = dnodeptr->left; rnodeptr->right = dnodeptr->right; // assign the parent of R to be the parent of D rnodeptr->parent = pnodeptr; // assign the parent pointer in the children // of R to point at R rnodeptr->left->parent = rnodeptr; rnodeptr->right->parent = rnodeptr; // complete the link to the parent node. // deleting the root node. assign new root if (pnodeptr == null_ptr) root = rnodeptr; // attach R to the correct branch of P if (dnodeptr->nodevalue < pnodeptr->nodevalue) pnodeptr->left = rnodeptr; pnodeptr->right = rnodeptr; // delete the node from memory and decrement tree size CS361 15

delete dnodeptr; treesize--; Here is the erase algorithm. For the moment, concentrate on the code for replacing the node we want to erase, pnodeptr, by a replacement node rnodeptr. You can see that it is careful to place the address of the replacement into either the tree root, the left child of the erased node s parent, or the right child of the erased node s parent, depending on the data value in the parent. Most of the code in this function is actually concerned with finding that replacement node. We can break down the problem of finding a suitable replacement when removing a given node from a BST into cases: 1. Removing a leaf 2. Removing a node that has only one child only a left child only a right child 3. Removing a node that has two children 2.3.1 Removing a Leaf CS361 16

30 20 10 50 40 70 60 Case 1: Suppose we wanted to remove the 40 from this tree. What would we have to do so that the remaining nodes would still be a valid BST? Nothing at all! If we simply delete this node (setting the pointer to it from its parent to 0), what s left would still be a perfectly good binary search tree it would satisfy all the BST ordering requirements. Now, take a look at this code for removing a node, pointed at by dnodeptr, from a BST. Find the leaf case, and you can see that all we do is to delete the node. template <typename T> void stree<t>::erase(iterator pos) // dnodeptr = pointer to node D that is deleted // pnodeptr = pointer to parent P of node D // rnodeptr = pointer to node R that replaces D stnode<t> *dnodeptr = pos.nodeptr, *pnodeptr, *rnodeptr=null_ptr; if (treesize == 0) throw underflowerror("stree erase(): tree is empty"); if (dnodeptr == null_ptr) CS361 17

throw referenceerror("stree erase(): invalid iterator"); // assign pnodeptr the address of P pnodeptr = dnodeptr->parent; // If D has a null_ptr pointer, the // replacement node is the other child if (dnodeptr->left == null_ptr dnodeptr->right == null_ptr) if (dnodeptr->right == null_ptr) rnodeptr = dnodeptr->left; rnodeptr = dnodeptr->right; if (rnodeptr!= null_ptr) // the parent of R is now the parent of D rnodeptr->parent = pnodeptr; // both pointers of dnodeptr are non-null_ptr. // find and unlink replacement node for D. // starting at the right child of node D, // find the node whose value is the smallest of all // nodes whose values are greater than the value in D. // unlink the node from the tree. // pofrnodeptr = pointer to parent of replacement node stnode<t> *pofrnodeptr = dnodeptr; CS361 18

// first possible replacement is right child of D rnodeptr = dnodeptr->right; // descend down left subtree of the right child of D, // keeping a record of current node and its parent. // when we stop, we have found the replacement while(rnodeptr->left!= null_ptr) pofrnodeptr = rnodeptr; rnodeptr = rnodeptr->left; if (pofrnodeptr == dnodeptr) // right child of deleted node is the replacement. // assign left subtree of D to left subtree of R rnodeptr->left = dnodeptr->left; // assign the parent of D as the parent of R rnodeptr->parent = pnodeptr; // assign the left child of D to have parent R dnodeptr->left->parent = rnodeptr; // we moved at least one node down a left branch // of the right child of D. unlink R from tree by // assigning its right subtree as the left child of // the parent of R pofrnodeptr->left = rnodeptr->right; CS361 19

// the parent of the right child of R is the // parent of R if (rnodeptr->right!= null_ptr)/**/ rnodeptr->right->parent = pofrnodeptr; // put replacement node in place of dnodeptr // assign children of R to be those of D rnodeptr->left = dnodeptr->left; rnodeptr->right = dnodeptr->right; // assign the parent of R to be the parent of D rnodeptr->parent = pnodeptr; // assign the parent pointer in the children // of R to point at R rnodeptr->left->parent = rnodeptr; rnodeptr->right->parent = rnodeptr; // complete the link to the parent node. // deleting the root node. assign new root if (pnodeptr == null_ptr) root = rnodeptr; // attach R to the correct branch of P if (dnodeptr->nodevalue < pnodeptr->nodevalue) pnodeptr->left = rnodeptr; pnodeptr->right = rnodeptr; // delete the node from memory and decrement tree size delete dnodeptr; CS361 20

treesize--; (Note that when we assign dnodeptr->left to rnodeptr, that in this leaf case dnodeptr->left is null.) So if we are removing a tree leaf, we "replace" it by a null pointer. 2.3.2 Removing A Node with a Null Right Child 30 20 10 50 40 70 60 Case 2: Suppose we wanted to remove the 20 or the 70 from this tree. What would we have to do so that the remaining nodes would still be a valid BST? There is one pointer to the node being deleted, and one pointer from that node to its only child. So this is actually a bit like deleting a node from the middle of a linked list. All we need to do is to reroute the pointer from the parent ( 30 ) to the node we want to remove, making that pointer point directly to the child of the node we are going to remove. CS361 21

30 20 70 10 50 For example, starting from this: 40 60 30 70 Verify for yourself if we remove 20: 10 50 40 60 CS361 22

30 20 10 50 or 70: 40 60 in this manner, that the results are still valid BSTs. template <typename T> void stree<t>::erase(iterator pos) // dnodeptr = pointer to node D that is deleted // pnodeptr = pointer to parent P of node D // rnodeptr = pointer to node R that replaces D stnode<t> *dnodeptr = pos.nodeptr, *pnodeptr, *rnodeptr=null_ptr; if (treesize == 0) throw underflowerror("stree erase(): tree is empty"); if (dnodeptr == null_ptr) throw referenceerror("stree erase(): invalid iterator"); // assign pnodeptr the address of P CS361 23

pnodeptr = dnodeptr->parent; // If D has a null_ptr pointer, the // replacement node is the other child if (dnodeptr->left == null_ptr dnodeptr->right == null_ptr) if (dnodeptr->right == null_ptr) rnodeptr = dnodeptr->left; rnodeptr = dnodeptr->right; if (rnodeptr!= null_ptr) // the parent of R is now the parent of D rnodeptr->parent = pnodeptr; // both pointers of dnodeptr are non-null_ptr. // find and unlink replacement node for D. // starting at the right child of node D, // find the node whose value is the smallest of all // nodes whose values are greater than the value in D. // unlink the node from the tree. // pofrnodeptr = pointer to parent of replacement node stnode<t> *pofrnodeptr = dnodeptr; // first possible replacement is right child of D rnodeptr = dnodeptr->right; CS361 24

// descend down left subtree of the right child of D, // keeping a record of current node and its parent. // when we stop, we have found the replacement while(rnodeptr->left!= null_ptr) pofrnodeptr = rnodeptr; rnodeptr = rnodeptr->left; if (pofrnodeptr == dnodeptr) // right child of deleted node is the replacement. // assign left subtree of D to left subtree of R rnodeptr->left = dnodeptr->left; // assign the parent of D as the parent of R rnodeptr->parent = pnodeptr; // assign the left child of D to have parent R dnodeptr->left->parent = rnodeptr; // we moved at least one node down a left branch // of the right child of D. unlink R from tree by // assigning its right subtree as the left child of // the parent of R pofrnodeptr->left = rnodeptr->right; // the parent of the right child of R is the // parent of R if (rnodeptr->right!= null_ptr)/**/ rnodeptr->right->parent = pofrnodeptr; CS361 25

// put replacement node in place of dnodeptr // assign children of R to be those of D rnodeptr->left = dnodeptr->left; rnodeptr->right = dnodeptr->right; // assign the parent of R to be the parent of D rnodeptr->parent = pnodeptr; // assign the parent pointer in the children // of R to point at R rnodeptr->left->parent = rnodeptr; rnodeptr->right->parent = rnodeptr; // complete the link to the parent node. // deleting the root node. assign new root if (pnodeptr == null_ptr) root = rnodeptr; // attach R to the correct branch of P if (dnodeptr->nodevalue < pnodeptr->nodevalue) pnodeptr->left = rnodeptr; pnodeptr->right = rnodeptr; // delete the node from memory and decrement tree size delete dnodeptr; treesize--; Again, take a look at this code for the case when the node being erased has exactly one child. Notice that its non-null child CS361 26

is chosen as the replacement node, rnodeptr. 2.3.3 Removing a Node with Two Non-Null Children 30 20 10 50 40 70 60 Case 3: Suppose we wanted to remove the 50 or the 30 from this tree. What would we have to do so that the remaining nodes would still be a valid BST? This is a hard case. Clearly, if we remove either the "50" or "30" nodes, we break the tree into pieces, with no obvious place to put the now-detached subtrees. So let s take a different tack. Instead of deleting this node, is there some other data value that we could put into that node that would preserve the BST ordering (all nodes to the left must be less, all nodes to the right must be greater or equal)? There are, in fact, two values that we could safely put in there: the smallest value from the right subtree, or the largest value from the left subtree. We can find the largest value on the left by taking one step to the left then running as far down to the right as we can go We can find the smallest value on the right by taking one step to the right then running as far down to the left as we can go CS361 27

30 20 70 10 50 Now, if we replace 30 by... 40 60 20 20 70... the largest value from the left: 10 50 40 60 CS361 28

40 20 70 10 50 or by the smallest value from the right, 40 60 the results are properly ordered for a BST, except possibly for the node we just copied the value from. But since that node is now redundant, we can delete it from the tree. And here s the best part. Since we find the node to copy from by running as far as we can go in one direction or the other, we know that the node we copied from has at least 1 null child pointer (otherwise we would have kept running past it). So removing it from the tree will always fall into one of the earlier, simpler cases (leaf or only one child). template <typename T> void stree<t>::erase(iterator pos) // dnodeptr = pointer to node D that is deleted // pnodeptr = pointer to parent P of node D // rnodeptr = pointer to node R that replaces D stnode<t> *dnodeptr = pos.nodeptr, *pnodeptr, *rnodeptr=null_ptr; if (treesize == 0) throw underflowerror("stree erase(): tree is empty"); if (dnodeptr == null_ptr) CS361 29

throw referenceerror("stree erase(): invalid iterator"); // assign pnodeptr the address of P pnodeptr = dnodeptr->parent; // If D has a null_ptr pointer, the // replacement node is the other child if (dnodeptr->left == null_ptr dnodeptr->right == null_ptr) if (dnodeptr->right == null_ptr) rnodeptr = dnodeptr->left; rnodeptr = dnodeptr->right; if (rnodeptr!= null_ptr) // the parent of R is now the parent of D rnodeptr->parent = pnodeptr; // both pointers of dnodeptr are non-null_ptr. // find and unlink replacement node for D. // starting at the right child of node D, // find the node whose value is the smallest of all // nodes whose values are greater than the value in D. // unlink the node from the tree. // pofrnodeptr = pointer to parent of replacement node CS361 30

stnode<t> *pofrnodeptr = dnodeptr; // first possible replacement is right child of D rnodeptr = dnodeptr->right; // descend down left subtree of the right child of D, // keeping a record of current node and its parent. // when we stop, we have found the replacement while(rnodeptr->left!= null_ptr) pofrnodeptr = rnodeptr; rnodeptr = rnodeptr->left; if (pofrnodeptr == dnodeptr) // right child of deleted node is the replacement. // assign left subtree of D to left subtree of R rnodeptr->left = dnodeptr->left; // assign the parent of D as the parent of R rnodeptr->parent = pnodeptr; // assign the left child of D to have parent R dnodeptr->left->parent = rnodeptr; CS361 31

// we moved at least one node down a left branch // of the right child of D. unlink R from tree by // assigning its right subtree as the left child of // the parent of R pofrnodeptr->left = rnodeptr->right; // the parent of the right child of R is the // parent of R if (rnodeptr->right!= null_ptr)/**/ rnodeptr->right->parent = pofrnodeptr; // put replacement node in place of dnodeptr // assign children of R to be those of D rnodeptr->left = dnodeptr->left; rnodeptr->right = dnodeptr->right; // assign the parent of R to be the parent of D rnodeptr->parent = pnodeptr; // assign the parent pointer in the children // of R to point at R rnodeptr->left->parent = rnodeptr; rnodeptr->right->parent = rnodeptr; // complete the link to the parent node. CS361 32

// deleting the root node. assign new root if (pnodeptr == null_ptr) root = rnodeptr; // attach R to the correct branch of P if (dnodeptr->nodevalue < pnodeptr->nodevalue) pnodeptr->left = rnodeptr; pnodeptr->right = rnodeptr; // delete the node from memory and decrement tree size delete dnodeptr; treesize--; Again, take a look at the code for removing a node. This code does the "step to the right, then run to the left" behavior we have just described in order to find the replacement node. The remaining code is concerned with removing that replacement node from where it currently resides so that we can then link it in to the parent of the node being erased. Finally, try running this algorithm, available as erase from a position. Try to observe each of the major cases, as outlined here, in action. 3 How Fast Are Binary Search Trees? Each step in the BST insert and findnode algorithms move one level deeper in the tree. Similarly, in erase, the only part that is not constant time is the running down the tree to find the smallest value to the right. The number of recursive calls/loop iterations in all these algorithms is therefore no greater than the height of the tree. But how high can a BST be? That depends on how well the tree is balanced. CS361 33

3.1 Balancing A binary tree is balanced if for every interior node, the height of its two children differ by at most 1. 30 Unbalanced trees are easy to obtain. This is a BST. 20 10 50 70 40 60 CS361 34

10 20 30 40 But, so is this! 50 60 70 The shape of the tree depends upon the order of insertions. The worst case is when the data being inserted is already in order (or in reverse order). In that case, the tree degenerates into a sorted linked list, as shown above. The best case is when the tree is balanced, meaning that, for each node, the heights of the node s children are nearly the same. CS361 35

3.2 Performance / / search for data item in the t r e e. i f found, return i t s node / / address ; otherwise, return null_ptr template <typename T> stnode<t>* stree <T > : : findnode ( const T& item ) const / / c y c l e t through the t r e e starting with root stnode<t> * t = root ; / / terminate on on empty subtree while ( t! = null_ ptr &&! ( item == t >nodevalue ) ) i f ( item < t >nodevalue ) t = t > l e f t ; t = t >right ; Consider the findnode operation on a nearly balanced tree with N nodes. Question: What is the complexity of the best case? O(1) O(log N ) O(N ) O(N log N ) / / return pointer to node ; null_ptr i f not found return t ; O(N 2) CS361 36

Answer: In the best case, we find what we re looking for in the root of the tree. That s O(1) time.............................................. / / search for data item in the t r e e. i f found, return i t s node / / address ; otherwise, return null_ptr template <typename T> stnode<t>* stree <T > : : findnode ( const T& item ) const / / c y c l e t through the t r e e starting with root stnode<t> * t = root ; / / terminate on on empty subtree while ( t! = null_ ptr &&! ( item == t >nodevalue ) ) i f ( item < t >nodevalue ) t = t > l e f t ; t = t >right ; / / return pointer to node ; null_ptr i f not found return t ; Question: Consider the findnode operation on a nearly balanced tree with N nodes. What is the complexity of the worst case? O(1) O(log N ) O(N ) O(N log N ) O(N 2) CS361 37

Answer: The findnode operation starts at the root and moves down one level each recursion. So it is, in the worst case, O(h) where h is the height of the tree.............................................. But how high is a balanced tree? A nearly balanced tree will be height log N. Consider a tree that is completely balanced and has its lowest level full. Since every node on the lowest level shares a parent with one other, there will be exactly half as many nodes on the next-to-lowest level as on the lowest. And, by the same reasoning, each level will have half as many nodes as the one below it, until we finally get to the single root at the top of the tree. So a balanced tree has height log N, and searching a balanced binary tree would be O(log N ). / / search for data item in the t r e e. i f found, return i t s node / / address ; otherwise, return null_ptr template <typename T> stnode<t>* stree <T > : : findnode ( const T& item ) const / / c y c l e t through the t r e e starting with root stnode<t> * t = root ; / / terminate on on empty subtree while ( t! = null_ ptr &&! ( item == t >nodevalue ) ) i f ( item < t >nodevalue ) t = t > l e f t ; t = t >right ; / / return pointer to node ; null_ptr i f not found return t ; Question: Consider the findnode operation on a degenerate tree with N nodes. What is the complexity of the worst case? O(1) O(log N ) O(N ) O(N log N ) O(N 2) CS361 38

Answer: 10 20 30 40 50 A degenerate tree looks like a linked list. In the worst case, the value we re looking for is at the end of the list, so we have to search through all N nodes to get there. Thus the worst case is O(N ). 60 70............................................. There s quite a difference, then, between the worst case behavior of trees, depending upon the tree s shape. 3.3 Average-Case So the question is, does the "average" binary tree look more like the balanced or the degenerate case? An intuitive argument is: No tree with n nodes has height < log(n) No tree with n nodes has height > n Average depth of all nodes is therefore bounded between n/2 and (logn)/2. The more unbalanced a tree is, the less likely that a random insertion would increase the tree height. CS361 39

40 20 60 For example, if we are inserting into this tree, then any insertion will increase the tree s height. 10 30 50 70 10 20 But if we were inserting a randomly selected value into this one, then there is only a 2/8 chance that we will increase the height of the tree. 30 40 50 60 70 CS361 40

30 20 10 50 40 70 60 For trees that are somewhere between those two extremes, the chances of a random insertion actually increasing the height of the tree will fall somewhere between those two probability extremes. Insertions that don t increase the tree height make the tree more balanced. So, the more unbalanced a tree is, the more likely that a random insertion will actually tend to increase the balance of the tree. This suggests (but does not prove) that randomly constructed binary search trees tend to be reasonably balanced. It is possible to prove this claim, but the proof is beyond the scope of this class. But, it s not safe to be too sanguine about the height of binary search trees. Although random construction tends to yield reasonable balance, in real applications we often do not get random values. Question: Which of the following data would, if inserted into an initially empty binary search tree, yield a degenerate tree? data that is in ascending order data that is in descending order both of the above none of the above 3.4 Can We Avoid the Worst Case? Both data in ascending and descending order results in degenerate trees. (Try it if you are not convinced.) CS361 41

It s very common to get data that is in sorted or almost sorted order, so degenerate behavior turns out to be more common than we might expect. Also, the arguments made so far don t take deletions into account, which tend to unbalance trees. Later, we ll look at variants of the binary search tree that use more elaborate insertion and deletion algorithms to maintain tree balance. CS361 42