Lecture 3 February 20, 2007

Similar documents
Lecture 2 February 4, 2010

Dynamic Optimality Almost

DYNAMIC OPTIMALITY ALMOST

Lecture 4 Feb 21, 2007

Dynamic Optimality Almost

3 Competitive Dynamic BSTs (January 31 and February 2)

The Geometry of Binary Search Trees

Dynamic Optimality and Multi-Splay Trees 1

Lecture 3 February 9, 2010

Lecture 13 Thursday, March 18, 2010

Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19

Poketree: A Dynamically Competitive Data Structure with Good Worst-Case Performance

Lecture 19 Apr 25, 2007

Lecture 8 13 March, 2012

Lecture 5 February 26, 2007

Randomized incremental construction. Trapezoidal decomposition: Special sampling idea: Sample all except one item

arxiv: v1 [cs.ds] 13 Jul 2009

Upper Bounds for Maximally Greedy Binary Search Trees

Union-Find and Amortization

ICS 691: Advanced Data Structures Spring Lecture 3

ICS 691: Advanced Data Structures Spring Lecture 8

Chapter 12 Advanced Data Structures

Trees. Reading: Weiss, Chapter 4. Cpt S 223, Fall 2007 Copyright: Washington State University

V Advanced Data Structures

Lecture 12 March 21, 2007

Properties of a heap (represented by an array A)

Network Design Foundation Fall 2011 Lecture 7

Module 4: Index Structures Lecture 13: Index structure. The Lecture Contains: Index structure. Binary search tree (BST) B-tree. B+-tree.

Introduction. for large input, even access time may be prohibitive we need data structures that exhibit times closer to O(log N) binary search tree

Algorithms in Systems Engineering ISE 172. Lecture 16. Dr. Ted Ralphs

1 The range query problem

Randomized Splay Trees Final Project

Sorted Arrays. Operation Access Search Selection Predecessor Successor Output (print) Insert Delete Extract-Min

Trees. (Trees) Data Structures and Programming Spring / 28

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology

Lecture 7 8 March, 2012

Computational Optimization ISE 407. Lecture 16. Dr. Ted Ralphs

Splay Trees. (Splay Trees) Data Structures and Programming Spring / 27

Balanced Trees Part Two

Lecture 6: Analysis of Algorithms (CS )

Lecture 7 March 5, 2007

Lecture 9 March 15, 2012

Binary Search Trees. Analysis of Algorithms

We assume uniform hashing (UH):

Lower Bound on Comparison-based Sorting

Multi-way Search Trees. (Multi-way Search Trees) Data Structures and Programming Spring / 25

Trees. Eric McCreath

Balanced Binary Search Trees. Victor Gao

Quiz 1 Solutions. (a) f(n) = n g(n) = log n Circle all that apply: f = O(g) f = Θ(g) f = Ω(g)

V Advanced Data Structures

Lecture 6: External Interval Tree (Part II) 3 Making the external interval tree dynamic. 3.1 Dynamizing an underflow structure

Comparisons. Θ(n 2 ) Θ(n) Sorting Revisited. So far we talked about two algorithms to sort an array of numbers. What is the advantage of merge sort?

CS350: Data Structures B-Trees

DATA STRUCTURES AND ALGORITHMS. Hierarchical data structures: AVL tree, Bayer tree, Heap

Comparisons. Heaps. Heaps. Heaps. Sorting Revisited. Heaps. So far we talked about two algorithms to sort an array of numbers

CSE 530A. B+ Trees. Washington University Fall 2013

Notes on Binary Dumbbell Trees

Lecture 19 April 15, 2010

Disk Accesses. CS 361, Lecture 25. B-Tree Properties. Outline

Representing Dynamic Binary Trees Succinctly

Dynamic Access Binary Search Trees

Advanced Algorithms. Class Notes for Thursday, September 18, 2014 Bernard Moret

ECE 242 Data Structures and Algorithms. Trees IV. Lecture 21. Prof.

Search Trees - 2. Venkatanatha Sarma Y. Lecture delivered by: Assistant Professor MSRSAS-Bangalore. M.S Ramaiah School of Advanced Studies - Bangalore

New Bounds on Optimal Binary Search Trees. Dion Harmon

Point Enclosure and the Interval Tree

CS 380 ALGORITHM DESIGN AND ANALYSIS

Friday Four Square! 4:15PM, Outside Gates

SAMPLING AND THE MOMENT TECHNIQUE. By Sveta Oksen

Analysis of Algorithms

Parallel and Sequential Data Structures and Algorithms Lecture (Spring 2012) Lecture 16 Treaps; Augmented BSTs

Disjoint-set data structure: Union-Find. Lecture 20

9/24/ Hash functions

CS60020: Foundations of Algorithm Design and Machine Learning. Sourangshu Bhattacharya

CS60020: Foundations of Algorithm Design and Machine Learning. Sourangshu Bhattacharya

Problem Set 6 Solutions

Lecture April, 2010

Lecture: Analysis of Algorithms (CS )

18.3 Deleting a key from a B-tree

Week 10. Sorting. 1 Binary heaps. 2 Heapification. 3 Building a heap 4 HEAP-SORT. 5 Priority queues 6 QUICK-SORT. 7 Analysing QUICK-SORT.

In a non-ideal situation, we can allow the binary tree to grow to twice the height of the perfect tree (2 lg n) and periodically balance it

Problem Set 5 Solutions

Trapezoid and Chain Methods

Advanced Algorithm Design and Analysis (Lecture 12) SW5 fall 2005 Simonas Šaltenis E1-215b

7 Distributed Data Management II Caching

Recall: Properties of B-Trees

Efficient pebbling for list traversal synopses

Lecture Summary CSC 263H. August 5, 2016

CS 350 : Data Structures B-Trees

How many leaves on the decision tree? There are n! leaves, because every permutation appears at least once.

Multi-Splay Trees. Chengwen Chris Wang CMU-CS July 31, 2006

CSC148 Week 7. Larry Zhang

Dynamic Access Binary Search Trees

Outline for Today. Why could we construct them in time O(n)? Analyzing data structures over the long term.

6.851: Advanced Data Structures Spring Prof. Erik Demaine Scribes: Josh Alman (2012), Vlad Firoiu (2012), Di Liu (2012) TB Schardl (2010)

Week 10. Sorting. 1 Binary heaps. 2 Heapification. 3 Building a heap 4 HEAP-SORT. 5 Priority queues 6 QUICK-SORT. 7 Analysing QUICK-SORT.

Binary Trees, Binary Search Trees

A Fast Algorithm for Optimal Alignment between Similar Ordered Trees

Lecture 3 February 23, 2012

Algorithms for Memory Hierarchies Lecture 2

Transcription:

6.897: Advanced Data Structures Spring 2007 Prof. Erik Demaine Lecture 3 February 20, 2007 Scribe: Hui Tang 1 Overview In the last lecture we discussed Binary Search Trees and the many bounds which achieve o(lg n) per operation for natural classes of access sequences. This motivates us to search for a BST which is optimal (or close to optimal) on any sequence of operations. In this lecture we discuss lower bounds which hold for any Binary Search Tree (the Wilber bounds). Based on such a bound, we develop the Tango tree, which is an O(lg lg n)-competitive BST structure. The Tango upper bound and the Wilber lower bounds represent the tighest bounds on the offline optimal BST known to date. 2 Wilber Bounds The two Wilber bounds are functions of the access sequence, and they are lower bounds for all BST structures. They imply, for instance, that there exists a fixed sequence of accesses which takes Ω(lg n) per operations for any BST. 2.1 Wilber s 2nd Lower Bound Let x 1,x 2,...,x m be the access sequence. For each access x j compute the following Wilber number. We look at where x j fits among x i,x i+1,...,x j 1 for all i = j 1,j 2... that is, counting backwards from j 1 until the previous access to the key x j. Now, we say that a i < x j < b i, where a i and b i are the tightest bounds on x j discovered so far. Each time i is decremented, either a i increases or b i decreases (more tightly fitting x j ), or nothing happens (the access is uninteresting for x j ). The Wilber number is the number of alternations between a i increasing and b i decreasing. 1

Wilber s 2nd lower bound [Wil89] states that the sum of the Wilber numbers of all x j s is a lower bound on the total cost of the access sequence, for any BST. It should be noted that this only holds in an amortized sense (for the total cost): the actual cost to access some x j may be lower than its Wilber number on some BSTs. We ommit the proof; it is similar to the proof of Wilber s 1st lower bound, which is given below. An interesting open problem is whether Wilber s second lower bound is tight. We now proceed to an interesting consequence of Wilber s 2nd lower bound: key-independent optimality. Consider a situation in which the keys are just unique identifiers, and, even though they are comparable, the order relation is not particularly meaningful. In that case, we might as well randomly permute the keys. In other words, we are interested in E π [OPT(π(x 1 ),π(x 2 ),...,π(x m )], where the expectation is taken over a random permutation π of the set of keys. Key-independent optimality is defined as usual with respect to the optimal offline BST. Theorem 1 (key independent optimality [Iac02]). A BST has the key-independent optimality property iff it has the working-set property. In particular, splay trees are key-independently optimal. Proof. (sketch) We must show that: ( m ) E π [dynamic OPT(π(x 1 ),π(x 2 ),...,π(x m )] = Working-Set Bound Θ lg t i (x i ) The O( ) direction is easy: the working-set bound does not depend on the ordering, so it applies for any permutation. We must now show that a random permutation makes the problem as hard as the working-set problem: i=1 wilber2(x j ) only considers the working-set of x j define W = {elements accessed since last access to x j } permutation π changes W x j falls roughly in the middle with constant probability E[# times a i increases] = Θ(lg W ) E[# times b i decreases] = Θ(lg W ) Claim: We expect a constant fraction of the increases in a i and the decreases in b i to interleave E[wilber2(x j )] = Θ (lg W ) 2

2.2 Wilber s 1st Lower Bound Fix an arbitrary static lower bound tree P with no relation to the actual BST T, but over the same keys. In the application that we consider, P is a perfect binary tree. For each node y in P, we label each access x i L if key x i is in y s left subtree in P, R if key x i is in y s right subtree in P, or leave it blank if otherwise. For each y, we count the number of interleaves (the number of alterations) between accesses to the left and right subtrees: interleave(y) = # of alternations L R. Wilber s 1st lower bound [Wil89] states that the total number of interleaves is a lower bound for all BST data structures serving the access sequence x. It is important to note that the lower bound tree P must remain static. Proof. (sketch) We define the transition point of y in P to be the highest node z in the BST T such that the root-to-z path in T includes a node from the left and right subtrees of y in P. Observe that the transition point is well defined, and does not change until we touch z. In addition, the transition point is unique for every node y. We must touch the transition point of y in the next access to a key labeled L or R, that is, within the next 2 interleave(y) s. This implies that the price we pay for accessing the sequence x is greater than or equal to half the number of interleaves, that is, pay interleave(x) 2. 3 Tango Trees Tango trees [DHIP04] are an O(lg lg n)-competitive BST. They represent an important step forward from the previous competitive ratio of O(lg n), which is achieved by standard balanced trees. The running time of Tango trees is O(lg lg n) higher than Wilber s first bound, so we also obtain a bound on how close Wilber is to OPT. It is easy to see that if the lower bound tree is fixed without knowing the sequence (as any online algorithm must do), Wilber s first bound can be Ω(lg lg n) away from OPT, so one cannot achieve a better bound using this technique. 3

To achieve this improved performance, we divide a BST up into smaller auxiliary trees, which are balanced trees of size O(lg n). If we must operate on k auxiliary trees, we can achieve O(k lg lg n) time. We will achieve k = 1+ the increase in the Wilber bound given by the current access, from which the competitiveness follows. Let us again take a perfect binary tree P and select a node y in P. We define the preferred child of y to be the root of the subtree with the most recent access (i.e. the preferred child is the left one iff the last access under y was to the left subtree). If y has no children or its children have not been accessed, it has no preferred child. An interleave is equivalent to changing the preferred child of a node, which means that the Wilber bound is the number of changes to preferred children. Now we define a preferred path as a chain of preferred child pointers. We store each preferred path from P in a balanced auxiliary tree that is conceptually separate from T, such that the leaves link to the roots of the auxiliary trees of children paths. Because the height of P is lg n, each auxiliary tree will store lg n nodes. A search on an auxiliary tree will therefore take O(lg lg n) time, so the search cost for k auxiliary trees = O(k lg lg n). A preferred path is not stored by depth (that would be impossible in the BST model), but in the sorted order of the keys. 4

3.1 Searching Tango trees To search this data structure for node x, we start at the root node of the topmost auxiliary tree (which contains the root of P). We then traverse the tree looking for x. It is likely that we will jump between several auxiliary trees say we visit k trees. We search each auxiliary tree in O(lg lg n) time, meaning our entire search takes place in O(k lg lg n) time. This assumes that we can update our data structure as fast as we can search, because we will be forced to change k 1 preferred children (except for startup costs if a node has no preferred children). 3.2 Balancing Auxiliary Trees The auxiliary trees must be updated whenever preferred paths change. When a preferred path changes, we must cut the path from a certain point down, and insert another preferred path there. We note that cutting and joining resemble the split and concatenate operations in balanced BST s. However, because auxiliary trees are ordered by key rather than depth, the operations are slightly more complex than the typical split and concatenate. Luckily, we note that a depth > d corresponds to an interval of keys. Thus, cutting from a point down becomes equivalent to cutting a segment in the sorted order of the keys. In this way, we can change preferred paths by cutting a subtree out of an auxiliary tree using two split operations 5

and adding a subtree using a concatenate operation. To perform these cut and join operations, we label the roots of the path subtrees with a special color and pretend it s not there. This means we only need to worry about doing the split and concatenate on a subtree of height lg n rather than trying to figure out what to do with all the auxiliary subtrees hanging off the path subtree we are interested in. We know that balanced BSTs can support split and concatenate in O(lg size) time, meaning they all operate in O(lg lg n) time. Thus, we remain O(lg lg n)-competitive. References [DHIP04] Erik D. Demaine, Dion Harmon, John Iacono, and Mihai Patrascu. Dynamic optimality almost. In FOCS 04: Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science (FOCS 04), pages 484 490. IEEE Computer Society, 2004. [Iac02] John Iacono. Key independent optimality. In ISAAC 02: Proceedings of the 13th International Symposium on Algorithms and Computation, pages 25 31. Springer-Verlag, 2002. [Wil89] R. Wilber. Lower bounds for accessing binary search trees with rotations. SIAM J. Comput., 18(1):56 67, 1989. 6