Problem Set 5 Solutions

Similar documents
Introduction to Algorithms Massachusetts Institute of Technology Professors Erik D. Demaine and Charles E. Leiserson.

Problem Set 6 Solutions

Computational Optimization ISE 407. Lecture 16. Dr. Ted Ralphs

Augmenting Data Structures

Algorithms in Systems Engineering ISE 172. Lecture 16. Dr. Ted Ralphs

Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19

Problem Set 5. MIT students: Each problem should be done on a separate sheet (or sheets) of three-hole punched paper.

Problem Set 5 Solutions

We assume uniform hashing (UH):

CMSC 754 Computational Geometry 1

Solutions to Midterm 2 - Monday, July 11th, 2009

V Advanced Data Structures

Balanced Trees Part Two

3 Competitive Dynamic BSTs (January 31 and February 2)

Data Structures and Algorithms

Introduction to Algorithms February 24, 2004 Massachusetts Institute of Technology Professors Erik Demaine and Shafi Goldwasser Handout 8

COMP3121/3821/9101/ s1 Assignment 1

Algorithms. Deleting from Red-Black Trees B-Trees

V Advanced Data Structures

Quiz 1 Solutions. (a) f(n) = n g(n) = log n Circle all that apply: f = O(g) f = Θ(g) f = Ω(g)

Solutions to Problem Set 1

Priority Queues. 1 Introduction. 2 Naïve Implementations. CSci 335 Software Design and Analysis III Chapter 6 Priority Queues. Prof.

CS60020: Foundations of Algorithm Design and Machine Learning. Sourangshu Bhattacharya

Week - 04 Lecture - 01 Merge Sort. (Refer Slide Time: 00:02)

CS60020: Foundations of Algorithm Design and Machine Learning. Sourangshu Bhattacharya

Outline. Definition. 2 Height-Balance. 3 Searches. 4 Rotations. 5 Insertion. 6 Deletions. 7 Reference. 1 Every node is either red or black.

Lecture 6: Analysis of Algorithms (CS )

Computational Geometry

2-3 Tree. Outline B-TREE. catch(...){ printf( "Assignment::SolveProblem() AAAA!"); } ADD SLIDES ON DISJOINT SETS

Lecture 3: Art Gallery Problems and Polygon Triangulation

Properties of red-black trees

Search Trees. Undirected graph Directed graph Tree Binary search tree

Algorithms. AVL Tree

Algorithms for Memory Hierarchies Lecture 2

COMP 250 Fall recurrences 2 Oct. 13, 2017

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

9/24/ Hash functions

Data Structures and Algorithms CMPSC 465

Advanced Set Representation Methods

16 Greedy Algorithms

Lecture 3 February 9, 2010

Trees. Eric McCreath

Lecture 3: B-Trees. October Lecture 3: B-Trees

We will show that the height of a RB tree on n vertices is approximately 2*log n. In class I presented a simple structural proof of this claim:

II (Sorting and) Order Statistics

Ensures that no such path is more than twice as long as any other, so that the tree is approximately balanced

Analysis of Algorithms

Module 4: Index Structures Lecture 13: Index structure. The Lecture Contains: Index structure. Binary search tree (BST) B-tree. B+-tree.

CPS 231 Exam 2 SOLUTIONS

would be included in is small: to be exact. Thus with probability1, the same partition n+1 n+1 would be produced regardless of whether p is in the inp

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

Lecture: Analysis of Algorithms (CS )

Figure 4.1: The evolution of a rooted tree.

CS2 Algorithms and Data Structures Note 6

18.3 Deleting a key from a B-tree

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics

Questions from the material presented in this lecture

Lecture 5. Treaps Find, insert, delete, split, and join in treaps Randomized search trees Randomized search tree time costs

1 The range query problem

FINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 ( Marks: 1 ) - Please choose one The data of the problem is of 2GB and the hard

Problem Set 6 Solutions

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

D. Θ nlogn ( ) D. Ο. ). Which of the following is not necessarily true? . Which of the following cannot be shown as an improvement? D.

CS F-11 B-Trees 1

Introduction to Algorithms April 21, 2004 Massachusetts Institute of Technology. Quiz 2 Solutions

6.856 Randomized Algorithms

Range Queries. Kuba Karpierz, Bruno Vacherot. March 4, 2016

1. Meshes. D7013E Lecture 14

CS521 \ Notes for the Final Exam

(2,4) Trees. 2/22/2006 (2,4) Trees 1

CS2 Algorithms and Data Structures Note 6

CISC 235: Topic 4. Balanced Binary Search Trees

Binary Trees

Fundamental mathematical techniques reviewed: Mathematical induction Recursion. Typically taught in courses such as Calculus and Discrete Mathematics.

COMP Analysis of Algorithms & Data Structures

CMPS 2200 Fall 2017 Red-black trees Carola Wenk

Problem Set 4 Solutions

Analysis of Algorithms

Operations on Heap Tree The major operations required to be performed on a heap tree are Insertion, Deletion, and Merging.

CSE 502 Class 16. Jeremy Buhler Steve Cole. March A while back, we introduced the idea of collections to put hash tables in context.

1 Leaffix Scan, Rootfix Scan, Tree Size, and Depth

Thus, it is reasonable to compare binary search trees and binary heaps as is shown in Table 1.

CS Fall 2010 B-trees Carola Wenk

Here is a recursive algorithm that solves this problem, given a pointer to the root of T : MaxWtSubtree [r]

15.4 Longest common subsequence

Scribe: Virginia Williams, Sam Kim (2016), Mary Wootters (2017) Date: May 22, 2017

Name: Lirong TAN 1. (15 pts) (a) Define what is a shortest s-t path in a weighted, connected graph G.

Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi

Greedy Algorithms CHAPTER 16

Lecture Notes: External Interval Tree. 1 External Interval Tree The Static Version

Efficient Non-Sequential Access and More Ordering Choices in a Search Tree

Planar Point Location

Problem Set 1. Solution. CS4234: Optimization Algorithms. Solution Sketches

Red-Black-Trees and Heaps in Timestamp-Adjusting Sweepline Based Algorithms

Binary Search Trees. Contents. Steven J. Zeil. July 11, Definition: Binary Search Trees The Binary Search Tree ADT...

B-Trees. Version of October 2, B-Trees Version of October 2, / 22

Design and Analysis of Algorithms Lecture- 9: B- Trees

Red-Black Trees. Based on materials by Dennis Frey, Yun Peng, Jian Chen, and Daniel Hood

CSE 3101: Introduction to the Design and Analysis of Algorithms. Office hours: Wed 4-6 pm (CSEB 3043), or by appointment.

Data Structure. IBPS SO (IT- Officer) Exam 2017

Transcription:

Introduction to Algorithms November 4, 2005 Massachusetts Institute of Technology 6.046J/18.410J Professors Erik D. Demaine and Charles E. Leiserson Handout 21 Problem Set 5 Solutions Problem 5-1. Skip Lists and B-trees Intuitively, it is easier to find an element that is nearby an element you ve already seen. In a dynamic-set data structure, a finger search from to is the following query: given the node in the data structure that stores the element, and given another element, find the node in the data structure that stores. Skip lists support fast finger searches in the following sense. (a) Give an algorithm for finger searching from to in a skip list. Your algorithm should run in rank rank time with high probability, where rank denotes the current rank of element in the sorted order of the dynamic set. When we say with high probability we mean high probability with respect to rank rank. That is, your algorithm should run in time with probability ' ),forany+,. Assume that the finger-search operation is given the node in the bottommost list of the skip list that stores the element. Solution: For the purposes of this problem, we assume that each node following fields: in the skip list has the 1 1 1 1 1 1 1 key/ the key associated with node next/ the next element in the linked list containing level/ the level of the linked list containing up/ the element in the linked list of level level/ containing the same key as down/ the element in the linked list of level level/ containing the same key as We present code for the case where key/ 1 6 7. The case where k 6 key/ 1 is symmetric. We also assume that level/ 1 9, i.e., the search starts at the lowest level of the skip list. (If the search starts at level ;, then the finger search may take ; time.) The algorithm proceeds in two phases. In the first phase, we ascend levels as rapidly as possible, until we find a level that does not allow forward motion.

& T T % 2 Handout 21: Problem Set 5 Solutions FINGER-SEARCH k 1 while keyg nextg I I k 2 do while upg I NIL 3 do upg I 4 nextg I 5 while levelg I D W 6 do while keyg nextg I I T k 7 do nextg I 8 if levelg I W 9 then downg I 10 return Search for U in skip list containing node. (Notice that the search climbs higher than it needs to. A better solution is to examine the next point at each level, and only continue up if the next key is smaller than the target key. As we prove, though, the simpler algorithm above succeeds as well.) We first establish the highest level reached during the finger search. The proof is exactly the same as the proof that a skip list has at most Y levels. Lemma 1 While executing FINGER-SEARCH, levelg I T %, with high probability, where % & rank keyg I rank U. Proof. Notice that there are at most % 6 elements in the skip list in the (closed) interval & G keyg I U I. We calculate the probability that any of these % 6 elements exceeds height %. G any element in M is in more than % 6 levelsi T % 6 % 6 % 6 Since none of the elements in the interval are promoted past level %, with high probability, and keyg I is always in the interval, the result follows. Notice that in the case of this proof, the high probability analysis is with respect to %, not Y. We now consider the cost of the two phases of the algorithm: first the cost of ascending, and then the cost of descending. Recall the lemma proved in class: % % Lemma 2 The number of coin flips required to see heads is with high probability. Using this lemma, we can see that the first phase, ascending levels, completes within % steps as there are only % levels to ascend and each heads results in moving up a level. This lemma also shows that the second phase completes in % steps: as in the analysis of a SEARCH, we calculate the cost in reverse, ascending

Handout 21: Problem Set 5 Solutions 3 7 from the node containing to the level reached during the first phase; since there are only levels to ascend, and each heads results in moving up a level, the cost of the second phase is also. Hence the total running time is as desired. To support fast finger searches in B-trees, we need two ideas: B -trees and level linking. Throughout this problem, assume that. A B -tree is a B-tree in which all the keys are stored in the leaves, and internal nodes store copies of these keys. More precisely, an internal node with 7 children stores 7 keys: the maximum key in s subtree, the maximum key in ssubtree,...,themaximumkeyin s subtree. (b) Describe how to modify the B-tree SEARCH algorithm in order to find the leaf containing a given key in a B -tree in ; time. Solution: The only modification necessary is to always search to the leaves, rather than to return if the key is found in an internal node. 7 6 ; / 1 7 / 1 1 6 ; / 1 7 / 1 / 1 7 7 B -TREE-SEARCH 1 2 while and key 3 do 4 if leaf/ 5 then if and key 6 then return 7 else return NIL 8 else return B -TREE-SEARCH (c) Describe how to modify the B-tree INSERT and DELETE algorithms to work for B - trees in ; time. Solution: Normally, during an insert, B-TREE-SPLIT-CHILD cuts a node into three pieces: the elements less than the median, the median, and the elements greater than the median. It puts the median in the parent, and the other parts in new nodes. In B -trees, the split is unchanged at internal nodes, but not at leaves. The median has to be kept in one of the two leaf nodes the left one as well as being inserted into a parent. The following pseudocode implements the new split routine.

4 Handout 21: Problem Set 5 Solutions 1 1 ; / 1 / 1 / 1 / 1 / 1 / 1 / 1 ; / 1 ; / 1 / 1 / 1 / 1 / 1 / 1 / 1 / 1 ; / 1 ; / 1 B -TREE-SPLIT-CHILD 1 if leaf/ 2 then ALLOCATE-NODE 3 leaf/ FALSE 4 5 for to 6 do key key 7 8 9 10 for downto Unlike a B-tree, the old node keeps 11 do 12 key key 13 14 key key 15 16 else return B-TREE-SPLIT-CHILD The remaining insert routines must be modified to use the new split routine. AB -tree delete consists of two parts: deleting the key from the leaf, which may entail fixing up the tree, and ensuring that a copy of the key no longer appears in an internal node. Each part takes ; time. The delete begins by descending from the root to the leaf containing the key. If the key being deleted is discovered in an internal node, it is ignored. (Remember these nodes; we will come back to these nodes later, in the second part of the delete.) As in the case of a B-tree, we want to ensure that each node along the descent path has at least keys. Recall that a B-tree delete consisted of three cases. In particular, case (3) ensured that while descending the tree to perform a delete, a child contained at least keys. Apply case (3) to ensure that each node along the path has at least keys. The only modification is that when merging leaves, it is unnecessary to move a key down into the new merged node; the key can simply be removed from the parent. On finding the key in the leaf, remove it from the leaf. In the second part of the delete, we want to ensure that a copy of the deleted key does not appear in an internal node. First, search for the predecessor to the deleted key. Recall the nodes discovered during the first part that contained the key being deleted. Wherever the deleted key appeared along the path, replace it with the predecessor. (Copies of the deleted key can only appear on the root-to-leaf path traversed during the deletion.) keys. A level-linked B -tree is a B -tree in which each node has an additional pointer to the node immediately to its left among nodes at the same depth, as well as an additional pointer to the node immediately to its right among nodes at the same depth.

Handout 21: Problem Set 5 Solutions 5 (d) Describe how your B -tree INSERT and DELETE algorithms from part (c) can be modified to maintain level links in Y time per operation. Solution: Each node now contains a pointer linkg I pointing to its right sibling. The split routine is modified to include the following two lines: I 1 linkg 2 linkg I linkg I This copies the link of the node being split, to the new node, and updates the link of to point to. Otherwise, the insert operation remains unchanged. During a delete operation, when two nodes are merged, the link pointers must be appropriately updated: the link from the left node (being deleted) must be copied to the link point of the remaining merged node. Otherwise, the delete operation remains unchanged. (e) Give an algorithm for finger searching from to in a level-linked B -tree. Your algorithm should run in rank rank time. Solution: The algorithm is quite similar to the solution to part (a). The search proceeds up the tree until the next key is greater than the key being searched for. At this point, the search can continue down the tree in the usual fashion, using links when necessary. As before, the code below assumes that initially U D key G I,where represents the pointer to a key in the tree. The opposite case is symmetric. In order to perform the finger-search efficiently, we assume that each node has a parent pointer G I and a parent index G I.

6 Handout 21: Problem Set 5 Solutions 7 7 / 1 1 7 / 1 1 6 ; / 1 7 / 1 ; / 1 7 / 1 1 / 1 / 1 1 7, / 1 1 1 / 1 6 ; / 1 7 / 1 7 / 1 B -TREE-FINGER-SEARCH 1 while key and link/ NIL and 2 do while and key key link/ 3 do 4 if and 5 then key link/ 6 7 Begin downwards search. 8 while not leaf/ 9 do if key link/ 10 then link/ 11 12 else 13 14 while and key 15 do 16 if key 17 then return 18 else return NIL The key to showing the search efficient is provingthat the first phase ascends no more than levels. In order to prove this fact, notice that is only set to / 1 in line 5when7 key / link/ 1 1. This implies that 7 is larger than every leaf in the subtree rooted at / link/ 1 1. Moreover, every leaf in the subtree rooted at / link/ 1 1 must be larger than the initial key / 1 when the search began (since the search moves only in the direction of increasing keys). Hence, if ascends from level to level (assuming that the leaves are at level 0), then,, implying that. That is, the search never ascends more than levels. On the other hand, while searching down the tree, only traverses at most one lateral link. Therefore we can conclude that the total search cost is. These ideas suggest a connection between skip lists and level-linked 2-3-4 trees. In fact, a skip list is essentially a randomized version of level-linked B -tree. (f) Describe how to implement a deterministic skip list. That is, your data structure should have the same general pointer structure as a skip list: a sequence of one or more linked lists with pointers between nodes in adjacent lists that store the same key. The SEARCH algorithm should be identical to that of a skip list. You will need to modify the INSERT operation to avoid the use of randomization to determine whether a key should be promoted. You may ignore DELETE for this problem part. Solution:

Handout 21: Problem Set 5 Solutions 7 The resulting deterministic skip-list is exactly the level-linked B -tree that was just developed in the prior parts. We repeat the same data structure here in a slightly different form, primarily to help see the connection between the two data structures. The data structure consists of a set of linked lists, just as a typical skip list. The lists all begin with. The SEARCH algorithm remains exactly as in the regular skip-list data structure. Each node in a linked list is augmented to include a count of consecutive non-promoted nodes that follow it in the list. For example, if the next node in the linked list is promoted, the count is 0; if the next node is not promoted, but the following node is promoted, the count is 1. The goal is to ensure that the count of a promoted node is at least andnomorethan. (The slightly different values as compared to a B-tree result from the slightly different definition of the count.) We now describe how the INSERT proceeds. The INSERT operation begins at the top level of the skip list at, and proceeds down the skip list following the usual skip list search. On reaching the lowest level, the new element is inserted into the linked list, and the counts are updated. (Notice that no more than counts must be updated.) If updating the counts causes the previous promoted node to have a count larger than, then the node with count is promoted. This continues recursively up the tree, adding a new level at the highest level if needed. In this way, we ensure that the skip list is always close to a perfect skip list, without using any randomization. Problem 5-2. Fun with Points in the Plane It is 3 a.m. and you are attempting to watch 6.046 lectures on video, looking for hints for Problem Set 5. For some odd reason, possibly because you are fading in and out of consciousness, you start to notice a strange cloud of black dots on an otherwise white wall in your room. Thus, instead of watching the lecture, your subconscious mind starts trying to solve the following problem. Let be a set of ; points in the plane. Each point has coordinates and has a weight (a real number representing the size of the dot). Let be an arbitrary function mapping a point with coordinates and weight to a real number, computable in time. For a subset of, define the function to be the sum of over all points in, i.e., For example, if,then is the sum of the weights of all ; points. This case is depicted in Figure 1. Our goal is to compute the function for certain subsets of the points. We call each subset a query, and for each query, we want to calculate. Because there may be a large number of queries, we want to design a data structure that will allow us to efficiently answer each query. First we consider queries that restrict the whose coordinate. In particular, consider the set of points coordinates are at least. Formally, let be the set of points,

. 8 Handout 21: Problem Set 5 Solutions & + # & + # & & & ) * & & & ) * / 1 2 Figure 1: In this example of eight points, if,then. Figure 2: When, 4 5 6 7 8 9 and 9. We want to answer queries of the following form: given any value as input, calculate the value. Figure 2 is an example of such a query. In this case,, and the points of interest are those with coordinate at least. (a) Show how to modify a balanced binary search tree to support such a query in ; time. More specifically, the computation of can be performed using only a single walk down the tree. You do not need to support updates (insertions and deletions) for this problem part. Solution: For this part of the problem, we present two different solutions using binary tree. In the first version, we store keys only at the leaves of the binary tree, while in the second version we store keys at every node. The solutions are nearly identical. Let : be the sum of for all points represented by the subtree rooted at. If is a leaf, then : is simply the value of for the point stored in. For both solutions, we augment each node in the tree to store :. The idea behind a query is to search for in the binary tree, computing the desired sum along the way. The first solution applies when the keys are stored only at the leaves. If we reach a node in our search and key/ 1, then recurse on the right subtree. Otherwise, we add the value : right/ 1 to the answer from recursion on the left subtree.

1 Handout 21: Problem Set 5 Solutions 9 FWITHXMIN-V1 Keys stored at leaves. 1 if NIL 2 then return 9 3 if leaf/ 4 then if key/ 1, return : 5 else return 0 6 if key/ 1 7 then return FWITHXMIN-V1(right/ 1 ) 8 else return : right/ 1 FWITHXMIN-V1(left/ 1 ) : 1 : : 1 9 1 1 : 1 1 Note that in the last step, we could also avoid visiting s right child by noticing that right/ left/. When the keys can be stored at internal nodes in the tree, the only modification to the query is that we have to look at the value of for the node we are currently visiting (in a slight abuse of notation, call this ). FWITHXMIN-V2 Keys stored at every node. 1 if NIL 2 then return 3 if key/ 4 then return FWITHXMIN-V2(right/ ) 5 else return right/ FWITHXMIN-V2(left/ ) As before, in the last recursive call, we can avoid visiting s right child by noticing that : right/ 1 : : left/ 1. Finally, one could prove the correctness of FWITHXMIN by induction on the height of the tree. (b) Consider the static problem, where all ; points are known ahead of time. How long does it take to build your data structure from part (a)? Solution: Constructing the data structure requires ; ; because we are effectively sorting the points. One solution is to sort the points by coordinate, and then construct a balanced tree by picking a median element as the root and recursively constructing the left and right subtrees. This construction takes ; ; time to sort the points and ; to construct the tree. (c) In total, given ; points, how long does it take to build your data structure and answer 7 different queries? On the other hand, how long would it take to answer 7 different queries without using any data structure and using the naïve algorithm of computing from scratch for every query? For what values of 7 is it asymptotically more efficient to use the data structure?

10 Handout 21: Problem Set 5 Solutions Solution: Using parts (a) and (b), we require ; ; 7 ; time to build the data structure and then answer 7 queries. The naïve algorithm to compute from scratch is for each query, to scan through all ; points and sum those with - coordinate at least. This second algorithm requires 7 ; time. Intuitively, from these expressions, we can conclude that if 7 ;, then building the data structure will be more efficient than computing from scratch. We were not looking for a formal proof for this part, but one way to give such an argument is as follows. Let ; 7 and ; 7 be the times required to answer 7 queries on ; points, for thefirst and second algorithms, respectively. Then we want to give sufficient conditions on 7 such that ; 7 6 ; 7 asymptotically. Lemma 3 If 7 ;, then there exists a constant ; such that ; 7 for all ;, ;. ; 7 Proof. Let and ; be the constants for the ; ; 7 ;, for the first algorithm, and let and ; be the constants for 7 ; for the second algorithm. Then we want to give sufficient conditions on 7 and ; that give us If ;, ; ; ; 7 6, then we know ; ; 7 ; 7 ; 7 ; 7 ; ; ; ; Let ; 4 be a constant such that for all ;, ; 4, ; ; ; 4, we know that ; 6 ; '. Then, for all ;, 7 ; If 7 ;, then by definition, choosing the constant to be, we can find a value ; 5 such that for all ;, ; 5, 7 is large enough to satisfy the above condition. Therefore, if ; ; ; ; 4 ; 5, ; 7 ; 7 for all ; ;. (d) We can make this data structure dynamic by using a red-black tree. Argue that the augmentation for your solution in part (a) can be efficiently supported in a red-black tree, i.e., that points can be inserted or deleted in ; time. Solution: When keys are stored at every node, the augmentation of maintaining at every node in a red-black tree is almost exactly the same as the augmentation to maintain the sizes of subtrees for the dynamic order-statistic tree. '

Handout 21: Problem Set 5 Solutions 11 When we insert a point to the tree, we increment : by for the appropriate nodes as we walk down the tree. For a delete, when a point is spliced out of the tree, we walk up the path from to the root and decrement : by. In the cases where we delete a node by replacing it with its predecessor or successor,then we walk up the path from and increment : by, and then recursively delete from the tree rooted at. Maintaining : during rotations is also similar. For example, after performing a leftrotation on a tree that originally had as the root and as its right child (see Figure 14.2 on p.306), we update : and : by : : and : : left/ 1 : right/ 1. Note that for the first type of solution, when keys are stored only at the leaves in a red-black, we also need to maintain the maximum value of the left subtree of.we should also argue that this property can be maintained. Similar arguments for normal tree inserts and deletes and for rotations will work. / 1 6 Next we consider queries that take an interval (with ) as input instead of a single number. Let be the set of points whose coordinates fall in that interval, i.e., / 1 See Figure 3 for an example of this sort of query. We claim that we can use the same dynamic data structure from part (d) to compute. (e) Show how to modify your algorithm from part (a) to compute in ; time. Hint: Find the shallowest node in the tree whose coordinate lies between and. Solution: First, we find a split node for and. The split node satisfies the property that it is the node of greatest depth whose subtree contains all the nodes whose keys are in the interval / 1. We find a split node by following the normal search algorithm for and until their paths diverge.

1... 12 Handout 21: Problem Set 5 Solutions & + # & + # / & & & ) * & & & ) * / / 1 2 / / 1 2 / 1 2 & Figure 3: When / 4 5 7 8 9 and 1,. Figure 4: For / 1 and / 1, 4 5 9 and. 6 1 6 6 1 1 1 1 FINDSPLITNODE-V1 Keys stored at leaves. 1 if NIL 2 then return NIL 3 if leaf/ 4 if ( key/ ) return 5 else return NIL 6 if key/ 7 then return FINDSPLITNODE-V1(left/ ) 8 if key/ 9 then return FINDSPLITNODE-V1(right/ ) 10 return z FINDSPLITNODE-V2 Keys stored at every node. 1 if NIL 2 then return NIL 3 if key/ 1 4 then return FINDSPLITNODE-V2(left/ 1 ) 5 if key/ 1 6 then return FINDSPLITNODE-V2(right/ 1 ) 7 return If we get here, 6 key/ 1 6. If FINDSPLITNODE returns NIL, then 9 because there are no points in the interval. Otherwise, to compute, we can use the function FWITHXMIN

1 Handout 21: Problem Set 5 Solutions 13 from part (a) and a corresponding function FWITHXMAX (this function computes / 1 ). FWITHXMAX-V1 Keys stored at leaves. 1 if NIL 2 then return 9 3 if leaf/ 4 then if key/ 5 else return 0 1 6 return : 6 if key/ 1 7 else return : left/ 1 FWITHXMAX-V1(right/ 1 ) 8 then return FWITHXMAX-V1(left/ 1 ) FWITHXMAX-V2 Keys stored at every node. 1 if NIL 2 then return 9 3 if key/ 1 4 then return FWITHXMAX-V2(left/ 1 ) 5 else return : left/ 1 FWITHXMAX-V2(right/ 1 ) If we have a split node, we knowall nodes in the left subtree of have keys at most, and all nodes in the right subtree have keys at least. If all keys are stored at the leaves, we have FWITHXMIN-V1 left/ 1 FWITHXMAX-V1 right/ 1 When keys are stored at both leaves and internal nodes, we have FWITHXMIN-V2 left/ 1 FWITHXMAX-V2 right/ 1 1 FWITHX-V2 1 if NIL 2 then return 0 3 FINDSPLITNODE-V2( ) 4 if NIL 5 then return 0 6 else return FWITHXMIN-V2 left/ FWITHXMAX-V2 right/ 1 / 1 / 1 Finally, we generalize the static problem to two dimensions. Suppose that we are given two intervals, and.let be the set of all points in this rectangle, i.e., and

14 Handout 21: Problem Set 5 Solutions See Figure 4 for an example of a two-dimensional query. Y (f) Describe a data structure that efficiently supports a query to compute for arbitrary intervals and. A query should run in time. Hint: Augment a range tree. Solution: We use a 2-d range tree with primary tree keyed on the -coordinate. Each node in this -tree contains a pointer to a 1-d range tree, keyed on the coordinate. We augment each of the nodes in the -trees with the same : as in part (a). To perform a query, we first search in the primary -tree for all nodes/subtrees whose points have -coordinates that fall in the interval G I. Then, for each of these nodes/subtrees, we query the -tree to compute for all the points in that -tree whose -coordinate falls in the interval G I. We can use the exact same pseudo code from parts (a) and (e) if we replace every occurrence of : with FWITHY ytreeg I,whereFWITHY is the query on a 1-d range tree. The algorithmfor queries is the sameas in part (a), except replacing the constant-time : function calls with the calls to query ytreeg I. Since each query in a -tree takes Y time, the total runtime for a query is Y. (g) How long does it take to build your data structure? How much space does it use? Y Y Y Y Y Y Y Y Y Y & Y 9 Y Y Y & Y Y Y Y I I I Y & Y 9 Y Y Y Solution: Since we are only adding a constant amount of information at every node in a -tree or -tree, the space used is asymptotically the same as a -d range tree, i.e.,. For a 2-d range tree, the space used is because every point appears in -trees. A simplealgorithmfor constructinga -d range tree takes time. First, sort the points by -coordinate and construct the -tree in time as in part (c). Then, for every node in the -tree, construct the corresponding -tree, using the algorithm from (c) by sorting on the -coordinates. The recurrence for the time to construct all -trees is,or. It is possible to construct a 2-d range tree in with a more clever solution. Notice that in the 2-d range tree, for a node in the -tree, the -trees for the left and right subtrees of are disjoint. This observation suggests the following algorithm. First, sort all the points by their -coordinate and construct the primary range tree as before. Then, sort the points by -coordinate, and use that ordering to construct the largest -tree for the root node.togetthe -trees for leftg and rightg, perform a stable partition on the -coordinates of the array of points about x-keyg,thekeyof in the -tree. This partition allows us to construct the -trees for the left and right subtrees of without having to sort the points by -coordinate again. Therefore, the recurrence for the runtime is,or.

Handout 21: Problem Set 5 Solutions 15 Unfortunately, there are problems with making this data structure dynamic. (h) Explain whether your argument in part (d) can be generalized to the two-dimensional case. What is the worst-case time required to insert a new point into the data structure in part (f)? ; + + ; ; ; Solution: We can t generalize the argument in part (d) to a 2-d range tree because we can not easily update the -tree. Performing a normal tree insert on a single or tree can be done in time. Performing a rotation on a node in the -tree, however, requires rebuilding an entire -tree. For example, consider the diagram of left-rotation in CLRS, Figure 13.2 on pg. 278. The -tree for after the rotation contains exactly the same nodes as -tree for before the rotation. To fix the -tree rooted at after the rotation, however, we must remove all the points in from s original -tree and add all the points in. In the worst case, if is at the root, then the subtrees and will have nodes. Thus, performing a rotation in -tree might require time. Since an update to a red-black tree requires only a constant number of rotations, the time is for an update. (i) Suppose that, once we construct the data structure with ; initial points, we will perform at most ; updates. How can we modify the data structure to support both queries and updates efficiently in this case? Solution: If we do not have to worry about maintaining balance in any of the -trees or -trees, then we can use a normal tree insert to add each new point into the -tree and the corresponding -trees. This algorithm requires ; time per update because we have to add a point to ; -trees. As a side note, when we are performing a small number of updates, it is actually possible to perform queries in ; time and handle updates in time! In this scheme, we augment the range tree with an extra buffer of size ;. Insert and delete operations are lazy: they only get queued in the buffer. When a query happens, it searches the original 2-d range tree and finds an old answer in ; time. It then updates its answer by scanning through each point in the buffer. Thus, we can actually support ; updates without changing the asymptotic running time for queries. Completely Optional Parts The remainder of this problem presents an example of a function that is useful in an actual application and that can be computed efficiently using the data structures you described in the previous parts. Parts (j) through (l) outline the derivation of the corresponding function. The remainder of this problem is completely optional. Please do not turn these parts in!

16 Handout 21: Problem Set 5 Solutions As before, consider a set of ; points in the plane, with each point having coordinates and a weight. We want to compute the axis that minimizes the moment of inertia of the points in the set. Formally, we want to compute a line in the plane that minimizes the quantity where is the distance from point to the line.if for all, we can think of this axis as the orientation of the set. (j) One parameterization of a line in the plane is to describe it using a pair, where is the distance from the origin to the line and is the angle the line makes with the axis. It can be shown that the distance between a point and a line parameterized by is We defined the orientation of the set of points as the line the function Show that setting where 9 gives us the constraint 9 that minimizes Solution: % % ( * ( * ( * Set % ' % equal to 9 to find / and /,wehave / / 9

Handout 21: Problem Set 5 Solutions 17 (k) Show that setting where 9 and using the constraint from part (j) leads to the equation Solution: This question is also doable, assuming the solution given above is actually correct. You may % want to double-check the math for this solution. % We compute ' and set it equal to 9. % % To find the extrema / / where both partial derivates are 0, we plug in the expression from the previous part to eliminate /. / / / / / / / / / 9 / / / / / / 9 ( * / / /

18 Handout 21: Problem Set 5 Solutions (l) Give the function that makes the orientation problem a special case of the problem we just solved. Hint: The function is a vector-valued function. Solution: Use.