Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19

Similar documents
1 Binary Search Trees (BSTs)

Parallel and Sequential Data Structures and Algorithms Lecture (Spring 2012) Lecture 16 Treaps; Augmented BSTs

Chapter 15. Binary Search Trees (BSTs) Preliminaries

Lecture 5. Treaps Find, insert, delete, split, and join in treaps Randomized search trees Randomized search tree time costs

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17

CSE100. Advanced Data Structures. Lecture 8. (Based on Paul Kube course materials)

Computational Optimization ISE 407. Lecture 16. Dr. Ted Ralphs

Algorithms in Systems Engineering ISE 172. Lecture 16. Dr. Ted Ralphs

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

1 Leaffix Scan, Rootfix Scan, Tree Size, and Depth

Lecture 6 Sequences II. Parallel and Sequential Data Structures and Algorithms, (Fall 2013) Lectured by Danny Sleator 12 September 2013

Balanced Binary Search Trees. Victor Gao

1 Probabilistic analysis and randomized algorithms

Graph Contraction. Graph Contraction CSE341T/CSE549T 10/20/2014. Lecture 14

Question 7.11 Show how heapsort processes the input:

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Balanced Search Trees Date: 9/20/18

Lecture 6: Analysis of Algorithms (CS )

Lecture 3 February 20, 2007

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:

Quicksort. Alternative Strategies for Dividing Lists. Fundamentals of Computer Science I (CS F)

Data Structures and Algorithms

Lecture: Analysis of Algorithms (CS )

COMP3121/3821/9101/ s1 Assignment 1

6.842 Randomness and Computation September 25-27, Lecture 6 & 7. Definition 1 Interactive Proof Systems (IPS) [Goldwasser, Micali, Rackoff]

Disk Accesses. CS 361, Lecture 25. B-Tree Properties. Outline

Binary Trees

Algorithms. AVL Tree

CSci 231 Final Review

We assume uniform hashing (UH):

Cpt S 223 Course Overview. Cpt S 223, Fall 2007 Copyright: Washington State University

Introduction. for large input, even access time may be prohibitive we need data structures that exhibit times closer to O(log N) binary search tree

Design and Analysis of Algorithms

CS301 - Data Structures Glossary By

Chapter 12 Advanced Data Structures

CSE 373: Data Structures and Algorithms

Chapter 13. Michelle Bodnar, Andrew Lohr. April 12, 2016

CSE 332: Data Structures & Parallelism Lecture 12: Comparison Sorting. Ruth Anderson Winter 2019

CSE 3101: Introduction to the Design and Analysis of Algorithms. Office hours: Wed 4-6 pm (CSEB 3043), or by appointment.

Lecture Notes for Advanced Algorithms

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology

Algorithms for Memory Hierarchies Lecture 2

We will show that the height of a RB tree on n vertices is approximately 2*log n. In class I presented a simple structural proof of this claim:

Quiz 1 Solutions. (a) f(n) = n g(n) = log n Circle all that apply: f = O(g) f = Θ(g) f = Ω(g)

COT 6405 Introduction to Theory of Algorithms

CPSC 311 Lecture Notes. Sorting and Order Statistics (Chapters 6-9)

Sorting Shabsi Walfish NYU - Fundamental Algorithms Summer 2006

Solutions to Exam Data structures (X and NV)

Selection (deterministic & randomized): finding the median in linear time

Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, Merge Sort & Quick Sort

Problem Set 5 Solutions

II (Sorting and) Order Statistics

Sample Exam 1 Questions

Lecture Summary CSC 263H. August 5, 2016

Randomized incremental construction. Trapezoidal decomposition: Special sampling idea: Sample all except one item

1 Minimum Cut Problem

Solutions. Suppose we insert all elements of U into the table, and let n(b) be the number of elements of U that hash to bucket b. Then.

Computer Science 210 Data Structures Siena College Fall Topic Notes: Priority Queues and Heaps

CS 4349 Lecture October 18th, 2017

Scribe: Virginia Williams, Sam Kim (2016), Mary Wootters (2017) Date: May 22, 2017

Dijkstra s algorithm for shortest paths when no edges have negative weight.

CSE 100 Advanced Data Structures

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph.

Note that this is a rep invariant! The type system doesn t enforce this but you need it to be true. Should use repok to check in debug version.

1. Covered basics of a simple design technique (Divideand-conquer) 2. Next, more sorting algorithms.

Unit 6 Chapter 15 EXAMPLES OF COMPLEXITY CALCULATION

CSE373: Data Structure & Algorithms Lecture 18: Comparison Sorting. Dan Grossman Fall 2013

Final Exam in Algorithms and Data Structures 1 (1DL210)

DATA STRUCTURES AND ALGORITHMS. Hierarchical data structures: AVL tree, Bayer tree, Heap

CS 151 Complexity Theory Spring Final Solutions. L i NL i NC 2i P.

Binary Search Trees. Contents. Steven J. Zeil. July 11, Definition: Binary Search Trees The Binary Search Tree ADT...

Poketree: A Dynamically Competitive Data Structure with Good Worst-Case Performance

Motivation for B-Trees

Design and Analysis of Algorithms (VII)

Binary search trees. Binary search trees are data structures based on binary trees that support operations on dynamic sets.

CS 5321: Advanced Algorithms Sorting. Acknowledgement. Ali Ebnenasir Department of Computer Science Michigan Technological University

How many leaves on the decision tree? There are n! leaves, because every permutation appears at least once.

Lecture 3 February 9, 2010

Next. 1. Covered basics of a simple design technique (Divideand-conquer) 2. Next, more sorting algorithms.

EECS 2011M: Fundamentals of Data Structures

Lecture 7. Binary Search Trees and Red-Black Trees

8 Introduction to Distributed Computing

Search Trees. Undirected graph Directed graph Tree Binary search tree

Jana Kosecka. Linear Time Sorting, Median, Order Statistics. Many slides here are based on E. Demaine, D. Luebke slides

FINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 ( Marks: 1 ) - Please choose one The data of the problem is of 2GB and the hard

CSC148 Week 7. Larry Zhang

Binary Search Trees Treesort

Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi

Lecture 2 February 4, 2010

Scan and Quicksort. 1 Scan. 1.1 Contraction CSE341T 09/20/2017. Lecture 7

Learning Goals. CS221: Algorithms and Data Structures Lecture #3 Mind Your Priority Queues. Today s Outline. Back to Queues. Priority Queue ADT

Binary Search Trees. Analysis of Algorithms

Hash Tables. CS 311 Data Structures and Algorithms Lecture Slides. Wednesday, April 22, Glenn G. Chappell

CPSC 536N: Randomized Algorithms Term 2. Lecture 4

Lecture 7 Quicksort : Principles of Imperative Computation (Spring 2018) Frank Pfenning

MITOCW watch?v=yarwp7tntl4

Trees (Part 1, Theoretical) CSE 2320 Algorithms and Data Structures University of Texas at Arlington

Introduction to Algorithms I

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Divide and Conquer

CSE373: Data Structure & Algorithms Lecture 21: More Comparison Sorting. Aaron Bauer Winter 2014

Transcription:

CSE34T/CSE549T /05/04 Lecture 9 Treaps Binary Search Trees (BSTs) Search trees are tree-based data structures that can be used to store and search for items that satisfy a total order. There are many types of search trees designed for a wide variety of purposes. Probably, the most common use is to implement sets and tables (dictionaries, mappings). What s common among search trees is that they store a collection of elements in a tree structure and use values (most often called keys) in the internal nodes to navigate in search of these elements. A binary search tree (BST) is a search tree in which every node in the tree has at most two children. If search trees are kept balanced in some way, then they can usually be used to get good bounds on the work and span for accessing and updating them. We refer to such trees as balanced search trees. If trees are never updated but only used for searching, then balancing is easy it needs only be done once but what makes balanced trees interesting is their ability to efficiently maintain balance even when updated. To allow for efficient updates, balanced search trees do not require that the trees be strictly balanced, but rather that they are approximately balanced in some way. You should convince yourself that it would be impossible to maintain a perfectly balanced tree while allowing efficient (e.g. O(log n)) updates. There are dozens of balanced search trees that have been suggested over the years, dating back at least to AVL trees in 96. The trees mostly differ in how they maintain approximate balance. Most trees either try to maintain height balance (the two children of a node are about the same height) or weight balance (the two children of a node are about the same size, i.e., the number of elements). Here we list a few balanced trees:. AVL trees. Binary search trees in which the two children of each node differ in height by at most.. Red-Black trees. Binary search trees with a somewhat looser height balance criteria no path is more than twice as long as any path. 3. 3 and 3 4 trees. Trees with perfect height balance but the nodes can have different number of children so they might not be weight balanced. These are isomorphic to red-black trees by grouping each black node with its red children, if any.

4. B-trees. A generalization of 3 4 trees that allow for a large branching factor, sometimes up to 000s of children. Due to their large branching factor, they are well-suited for storing data on disks. 5. Splay trees. Binary search trees that are only balanced in the amortized sense (i.e. on average across multiple operations). 6. Weight balanced trees. Trees in which the children all have the same size. These are most typically binary, but can also have other branching factors. 7. Treaps. A binary search tree that uses random priorities associated with every element to keep balance. 8. Random search trees. A variant on treaps in which priorities are not used, but random decisions are made with probabilities based on tree sizes. 9. Skip trees. A randomized search tree in which nodes are promoted to higher levels based on flipping coins. These are related to skip lists, which are not technically trees but are also used as a search structure. Traditional treatments of binary search trees concentrate on three operations: search, insert and delete. Out of these, search is naturally parallel since any number of searches can proceed in parallel with no conflicts However, insert and delete are inherently sequential, as normally described. For this reason, in later lectures, we ll discuss union and difference, which are more general operations that are useful for parallel updates and of which insert and delete are just a special case. Treaps A Treap (tree + heap) is a randomized BST that maintains balance in a probabilistic way. In the next couple lectures, we will show that with high probability, a treap with n keys will have depth O(log n). In a Treap, a random priority is assigned to every key. In practice, this is often done by performing a hash on a key, or by assigning a different random number to each key. We will assume that each priority is unique although it is possible to remove this assumption. The nodes in a treap must satisfy two properties: BST Property. Their keys satisfy the BST property (i.e., keys are stored in-order in the tree). In splay trees and other self-adjusting trees, this is not true since a searches can modify the tree.

Heap Property. The associated priority satisfy the heap property. The (max) heap property requires for every node that the priority at a node is greater than the priority of its two children. Consider the following key-value pairs: (a,3), (b,9), (c, ), (e,6), (f, 5) These elements would be placed in the following treap. (b,9) (a,3) (e,6) (c,) (f,5) Theorem For any set S of key-value pairs, there is exactly one treap T containing the key-value pairs in S which satisfies the treap properties. Proof. The key k with the highest priority in S must be the root node, since otherwise the tree would not be in heap order. Then, to satisfy the property that the treap is ordered with respect to the nodes keys, all keys in S less than k must be in the left subtree, and all keys greater than k must be in the right subtree. Inductively, the two subtrees of k must be constructed in the same manner. Now, if you are given a set of elements with keys and priorities, how would you build a treap out of them? Take the element with the largest priority, make that the root. Then partition around that root based on keys to find the elements in the right and left subtrees and recurse. What does this remind you of? This procedure is exactly the procedure you do for quicksort. There is a straightforward relationship between the analysis of quicksort and the height of a treap. 3 A Different Analysis of Randomized Quicksort We now turn to analyzing quicksort. As mentioned earlier, the motivation for analyzing this now is that this analysis is a nice segue into the Treap analysis. The argument here is essentially the same as the analysis we needed to show that the expected depth of a node in a Treap is O(log n). 3

For the analysis, we ll consider a completely equivalent algorithm which will be slightly easier to analyze. Before the start of the algorithm, we ll pick for each element a random priority uniformly at random from the real interval [0, ]. At each step, instead of picking a pivot randomly, we ll instead pick the key with the highest priority (sound familiar?). Notice that once the priorities are decided, the algorithm is completely deterministic; you should convince yourself that the two presentations of the algorithm are fully equivalent (modulo the technical details about how we might store the priority values). We re interested in counting how many comparisons QUICKSORT makes. This immediately bounds the work for the algorithm because this is where the bulk of work is done. That is, if we let X n # of comparisons QUICKSORT makes on input of size n, our goal is to find an upper bound on E [X n ] for any input sequence S. For this, we ll consider the final sorted order of the keys T SORT(S). In this terminology, we ll also denote by p i the priority we chose for the element T i. We ll derive an expression for X n by breaking it up into a bunch of random variables and bound them. Consider two positions i, j {,..., n} in the sequence T. We use the random indicator variables A ij to indicate whether we compare the elements T i and T j during the algorithm i.e., the variable will take on the value if they are compared and 0 otherwise. Looking closely at the algorithm, we have that if some two elements are compared, one of them has to be a pivot in that call. So, then the other element will be either in the left partition or the right partition, but the pivot won t be the part of any partition. Therefore, once an element is a pivot, we never compare it to anything ever again. This gives the following observation: Observation In the quicksort algorithm, if some two elements are compared in a QUICKSORT call, they will never be compared again in other call. Therefore, with these random variables, we can express the total comparison count X n as follows: n X n i ji+ By linearity of expectation, we have E [X n ] n i n ji+ E [A i,j]. Furthermore, since each A i,j is an indicator random variable, E [A i,j ] Pr {A i,j }. Our task therefore comes down to computing the probability that T i and T j are compared (i.e., Pr {A i,j }) and working out the sum. Computing the probability Pr {A i,j }. The crux of the matter is in describing the event A i,j in terms of a simple event that we have a handle on. Before we prove any concrete result, Formally, there s a permutation π : {,..., n} {,..., n} between the positions of S and T. A i,j 4

let s take a closer look at the quicksort algorithm to gather some intuitions. Notice that the top level takes as its pivot p the element with highest priority. Then, it splits the sequence into two parts, one with keys larger than p and the other with keys smaller than p. For each of these parts, we run QUICKSORT recursively; therefore, inside it, the algorithm will pick the highest priority element as the pivot, which is then used to split the sequence further. With this view, the following observation is not hard to see: Claim For i < j, T i and T j are compared if and only if p i or p j has the highest priority among {p i, p i+,..., p j }. Proof. We ll show this by contradiction. Assume there is a key T k, i < k < j with a higher priority between them. In any collection of keys that include T i and T j, T k will become a pivot before either of them. Since T k sits between T i and T k (i.e., T i T k T j ), it will separate T i and T j into different buckets, so they are never compared. Therefore, for T i and T j to be compared, p i or p j has to be bigger than all the priorities in between. Since there are j i + possible keys in between (including both i and j) and each has equal probability of being the highest, the probability that either i or j is the greatest is /(j i+). Therefore, E [A i,j ] Pr {A i,j } Pr {p i or p j is the maximum among {p i,..., p j }} j i +. Hence, the expected number of comparisons made is E [X n ] n i ji+ n i ji+ n n i i k n n i < i k n < H n i E [A i,j ] j i + (k j i) k + k O(n lg n) 5

4 Back to Treaps How does this help with our analysis of treaps? Let us assume that you are given a set of elements with randomly generated priorities and you create a treap out of them. What is the expected search cost? The expected search cost is the expected depth of a node in the tree. Let us analyze that. Consider a set of keys K and associated priorities p : key int. We assume the priorities are unique. Consider the keys laid out in order (i.e., by taking an inorder traversal of the Treap), and as with the analysis of quicksort we use T i and T j to refer to two positions in the sorted order. If we calculate the depth starting with zero at the root, the expected depth of a key is equivalent to the number of ancestors in the tree. So we want to know how many ancestor. We use the random indicator variable B ij to indicate that T j is an ancestor of T i. Now the expected depth can be written as: E [depth of i in T ] E [B ij ]. j,j i To analyze B ij lets just consider the j i + keys and associated priorities from T i to T j inclusive of both ends. (Note that keys outside of this range is irrelevant, because they don t effect the probability of B ij.) We consider three cases:. The element T i has the highest priority.. One of the elements T k in the middle has the highest priority (i.e., neither T i nor T j. 3. The element T j has the highest priority. What happens in each case? In the first case T j cannot be an ancestor of T i since T i has a higher priority. In the second case, note that T k will be picked to be the root of the subtree containing T i and T j before either T i or T j is picked, and separate T i and T j into two different subtrees based on the BST property. Thus, T j cannot possibly be T i s ancestor in this case. Finally, in the third case T j is an ancestor of T i, since there is no key between them with a higher priority to separate them into two subtrees. We therefore have that T j is an ancestor of T i exactly when it has a priority greater than all elements from T i to T j (inclusive on both sides). Since priorities are selected randomly, this has a chance of /(j i + ) and we have E [B ij ] j i +. 6

Now we have E [depth of i in T ] j,j i i j j i + i j + + ji+ j i + i + i +... + + + 3 +... + n i + H i + H n i+ O(log n) Exercise Including constant factors how does the expected depth for the first key compare to the expected depth of the middle (i n/) key. Is this similar to the bound you got for red-black trees? What is the red-black tree guarantee? It is that the height of the tree is O(lg n). The answer is no. The fact that the expected depth of a key (and hence its expected search cost) is O(lg n) does not imply that the expected height of the tree is O(lg n). To see why, think about a family of trees where every key has depth O(lg n) expect for one key which has depth of n. For this family of trees, the expected depth of a node is O(lg n), but the expected height of a tree is n. Or, putting it differently the height of the tree depends on the depth of the key that has maximum depth. Even though we have the expected depth of a key, that doesn t tell us the expected value of the height of the tree. Recall that E [max depth across all keys] is not the same as max{e [depthof key], E [depthof key],...}. It turns out that in order to argue that the height of a tree is O(lg n), we need a high-probability bound, which we will study in the next lecture. 7