Balanced BST Balanced BSTs guarantee O(logN) performance at all times the height or left and right sub-trees are about the same simple BST are O(N) in the worst case Categories of BSTs AVL, SPLAY trees: dynamic environment optimal trees: static environment E.G.M. Petrakis Trees in Main Memory 1
AVL Trees AVL (Adelson, Lelskii, Landis): the height of the left and right subtrees differ by at most 1 the same for every subtree Number of comparisons for membership operations best case: completely balanced worst case: 1.44 log(n+2) expected case: logn +.25 <= log(n +1) E.G.M. Petrakis Trees in Main Memory 2
AVL Trees : completely balanced sub-tree / : the left sub-tree is 1 higher \ : the right sub-tree is 1 higher E.G.M. Petrakis Trees in Main Memory 3
AVL Trees E.G.M. Petrakis Trees in Main Memory 4
Non AVL Trees critical node // : the left sub-tree is 2 higher \\ : the right sub-tree is 2 higher E.G.M. Petrakis Trees in Main Memory 5
Single Right Rotations Insertions or deletions may result in non AVL trees => apply rotations to balance the tree Β Α Α T3 T1 Β T1 T2 T2 T3 E.G.M. Petrakis Trees in Main Memory 6
Single Left Rotations Α ΑB T1 Β Α T3 T2 T3 T1 T2 E.G.M. Petrakis Trees in Main Memory 7
4 4 2 2 insert 1 2 single right rotation 1 4 1 4 insert 9 4 single left rotation 4 9 E.G.M. Petrakis Trees in Main Memory 9
Double Left Rotation T4 T2 Composition of two single rotations (one right and one left rotation) ΑC ΑB A T3 T1 E.G.M. Petrakis Trees in Main Memory 9 or T4 ΑC Α Τ2 or Β Α T3 T1 T4 ΑC T2 ΑΒ or T3 A Τ1
Example of Double Left Rotation Critical node 4 4 \\ 2 7 2 = / = 6 / 7 = 9 = 7 4 2 6 9 6 insert 6 9 E.G.M. Petrakis Trees in Main Memory 10
Double Right Rotation Composition of two single rotations (one left and one right rotation) ΑB ΑC ΑΒ Α ΑB ΑA C T1 T2 ΑΒ T3 T4 T1 A T2 T3 T4 T1 T2 or T3 Τ4 E.G.M. Petrakis Trees in Main Memory 11 or or
Insertion (deletion) in AVL 1. Follow the search path to verify that the key is not already there 2. Insert (delete) the key 3. Retreat along the search path and check the balance factor 4. Rebalance if necessary (see next) E.G.M. Petrakis Trees in Main Memory 12
Rebalancing For every node reached coming up from its left sub-tree after insertion readjust balance factor = becomes / => no operation \ becomes = => no operation / becomes // => must be rebalanced!! The // node becomes a critical node Only the path from the critical node to the leaf has to be rebalanced!! Rotation is applied only at the critical node! E.G.M. Petrakis Trees in Main Memory 13
Rebalancing (cont.) The balance factor of the critical node determines what rotation is to take place single or double If the child and the grand child (inserted node) of the critical node are on the same direction (both / ) => single rotation Else => double rotation Rebalance similarly if coming up from the right sub-tree (opposite signs) E.G.M. Petrakis Trees in Main Memory 14
Performance Performance of membership operations on AVL trees: easy for the worst case! An AVL tree will never be more than 45% higher that its perfectly balanced counterpart (Adelson, Velskii, Landis): log(n+1) <= h b (N) <= l.4404log(n+2) 0.302 E.G.M. Petrakis Trees in Main Memory 15
Worst case AVL Sparse tree => each sub-tree has minimum number of nodes E.G.M. Petrakis Trees in Main Memory 16
Fibonacci Trees T h : tree of height h T h has two sub-trees, one with height h-1 and one with height h-2 else it wouldn t have minimum number of nodes T 0 is the empty sub-tree (also Fibonacci) T 1 is the sub-tree with 1 node (also Fibonacci) E.G.M. Petrakis Trees in Main Memory 17
E.G.M. Petrakis Trees in Main Memory 1 Fibonacci Trees (cont.) Average height N h number of nodes of T h N h = N h-1 + N h-2 + 1 N 0 = 1 N 1 = 2 From which h <= 1.44 log(n+1) 1 2 5 1 5 1 2 5 1 5 1 N 2 h 2 h h + = + +
More Examples 7 single rotation 7 7 9 insert 9 9 E.G.M. Petrakis Trees in Main Memory 19
Examples (cont.) 6 double rotation 6 7 6 insert 7 7 E.G.M. Petrakis Trees in Main Memory 20
Examples (cont.) double rotation 7 6 insert 7 6 7 6 E.G.M. Petrakis Trees in Main Memory 21
Examples (cont.) single rotation 7 7 6 7 9 9 delete 6 9 E.G.M. Petrakis Trees in Main Memory 22
Examples (cont.) 5 6 single rotation 6 6 9 7 9 7 9 7 delete 5 E.G.M. Petrakis Trees in Main Memory 23
Examples (cont.) double rotation 5 6 6 6 7 7 7 delete 5 E.G.M. Petrakis Trees in Main Memory 24
General Deletions 5 5 3 delete 4 2 delete 2 4 7 10 1 3 7 10 1 6 9 11 6 9 11 E.G.M. Petrakis Trees in Main Memory 25
General Deletions (cont.) 5 5 2 7 2 10 delete delete 6 delete 5 1 3 6 10 1 3 7 11 9 11 9 E.G.M. Petrakis Trees in Main Memory 26
General Deletions (cont.) 3 2 10 delete 5 1 7 11 9 E.G.M. Petrakis Trees in Main Memory 27
Self Organizing Search Splay Trees: adapt to query patterns move to root (front): whenever a node is accessed move it to the root using rotations equivalent to move-to-front in arrays current node insertions: inserted node deletions: father of deleted node or null if this is the root membership: the last accessed node E.G.M. Petrakis Trees in Main Memory 2
20 Search(10) 20 15 30 15 30 13 first rotation 10 second rotation 10 14 20 13 14 10 10 13 15 30 third rotation 13 20 15 14 30 14 E.G.M. Petrakis Trees in Main Memory 29
Splay Cases If the current node q has no grandfather but it has father p => only one single rotation two symmetric cases: L, R a p b q c E.G.M. Petrakis Trees in Main Memory 30 a p b q c
If p has also grandfather qp => 4 cases gp a p LL b q a c d gp a p q d b c RL E.G.M. Petrakis Trees in Main Memory 31 gp gp LL symmetric of RR RL symmetric of RL q p d c b q p a b c d
1 1 a q α q j j 2 i 7 RR 2 i 7 LL 6 h 6 h 3 g 5 g c 4 d 5 current node 3 4 e f e f c d E.G.M. Petrakis Trees in Main Memory 32
E.G.M. Petrakis Trees in Main Memory 33 1 a q j i 2 7 6 g 3 c 4 5 e f b LR 1 α q j i 2 7 6 h 3 c 4 5 e f d g RL a, b, c are sub-trees
5 a 2 1 q j b 3 4 f 6 7 i c d e g h E.G.M. Petrakis Trees in Main Memory 34
Splay Performance Splay trees adapt to unknown or changing probability distributions Splay trees do not guarantee logarithmic cost for each access AVL trees do! asymptotic cost close to the cost of the optimal BST for unknown probability distributions It can be shown that the cost of m operations on an initially empty splay tree, where n are insertions is O(mlogn) in the worst case E.G.M. Petrakis Trees in Main Memory 35
Optimal BST Static environment: no insertions or deletions Keys are accessed with various frequencies Have the most frequently accessed keys near the root Application: a symbol table in main memory E.G.M. Petrakis Trees in Main Memory 36
Searching Given symbols a 1 < a 2 <.< a n and their probabilities: p 1, p 2, p n minimize cost Successful search cost = Transform unsuccessful to successful consider new symbols E 1, E 2, E n - α1 α2 αi αi+1. αn αn+1 n i= 1 p level( i a i ) E0 E1 E2 Ei En Ei= (αi, αi+1 ) E0= (-, α1 ) En= (αn, ) E.G.M. Petrakis Trees in Main Memory 37
Unsuccessful Search a n a n-2 a n- 1 E i failure node unsuccessful search for all values in E i terminates on the same failure node (in fact, one node higher) a n-3 E.G.M. Petrakis Trees in Main Memory 3
(a1, a2, a3) = (do, if, read) p i = q i = 1/7 Example read if if do if do read if do cost = 15/7 cost = 13/7 Optimal BST E.G.M. Petrakis Trees in Main Memory 39 read cost = 15/7
read do do read if if cost = 15/7 cost = 15/7 E.G.M. Petrakis Trees in Main Memory 40
Search Cost If p i is the probability to search for a i and q i is the probability to search in E i then n n pi + qi = 1 i = 1 i = 1 n cost = pi level(ai) + qi {level (Ei) 1} i = 1 i 1 successful search E.G.M. Petrakis Trees in Main Memory 41 n unsuccessful search
Observation 1 In a BST, a subtree has nodes that are consecutive in a sorted sequence of keys (e.g. [5,26]) 20 5 10 25 13 24 26 12 14 E.G.M. Petrakis Trees in Main Memory 42
Observation 2 If T ij is a sub-tree of an optimal BST holding keys from i to j then T ij must be optimal among all possible BSTs that store the same keys optimality lemma: all sub-trees of an optimal BST are also optimal E.G.M. Petrakis Trees in Main Memory 43
Optimal BST Construction 1) Construct all possible trees: 1 2n NP-hard, there are trees!! n + 1 n 2) Dynamic programming solves the problem in polynomial time O(n 3 ) at each step, the algorithm finds and stores the optimal tree in each range of key values increases the size of the range at each step until the whole range is obtained E.G.M. Petrakis Trees in Main Memory 44
Example (successful search only): 1) BSTs with 1 node optimal keys 1 10 20 40 probabilities 0,3 0,2 0,1 0,4 range 1 10 20 40 cost= 0.3 0.2 0.1 0.4 2) BSTs with 2 nodes k=1-10 k=10-20 k=20-20 range 1-10 1 optimal range 10-20 10 20 cost=0.3 1+0.2 2=0.7 cost=0.2+0.1 2=0.4 10 20 1 10 cost=0.2 1+0,3.2=0. cost=0.1+0.2 2=0.5 10 optimal range 20-40 E.G.M. Petrakis Trees in Main Memory 45 20 40 cost=0.1+0.=0.9 40 20 cost=0.4+0.2=0.6
3) BSTs with 3 nodes k=1-20 range 1-20 20 1 10 cost=0.1+2 0.3+3 0.2=1.3 10 1 20 cost=0.2+2(0.3+0.1)=1 1 10 20 cost=0.3+2 0.2+3 0.1=1 optimal k=10-40 range 10-40 10 40 20 cost=0.2+2 0.4+3 0.1=1.3 20 10 40 cost=0.1+2(0.2+0.4)=1.3 10 40 20 cost=0.4+2 0.2+30.1=1.1 E.G.M. Petrakis Trees in Main Memory 46
4) BSTs with 4 nodes range 1-40 1 10 40 20 cost=0.3+2 0.4+3 0.2+4 0.1=2.1 10 1 40 cost=0.2+2(0.3+0.4)+3 0.1=1.9 1 20 20 10 40 OPTIMAL BST cost=0.1+2(0.3+0.4)+3 0.2=2.1 1 40 10 E.G.M. Petrakis Trees in Main Memory 47 20 cost=0.4+2 0.3+3 0.2+4 0.1=2
Complexity Compute all optimal BSTs for all C ij, i,j=1,2..n Let m=j-i: number of keys in range C ij n-m-1 C ij s must be computed The one with the minimum cost must be found, this takes O(m(n-m-1)) time For all C ij s it takes 2 3 (nm m ) = O(n ) 1 m There is a better O(n 2 n ) algorithm by Knuth There is also a O(n) greedy algorithm E.G.M. Petrakis Trees in Main Memory 4
Optimal BSTs High probability keys should be near the root But, the value of a key is also a factor It may not be desirable to put the smallest or largest key close to the root => this may result in skinny trees (e.g., lists) E.G.M. Petrakis Trees in Main Memory 49