Fractional Cascading

Similar documents
Point Enclosure and the Interval Tree

Planar Point Location

Lecture 3 February 9, 2010

1 The range query problem

Trapezoid and Chain Methods

8. Binary Search Tree

Orthogonal range searching. Range Trees. Orthogonal range searching. 1D range searching. CS Spring 2009

Advanced Algorithm Design and Analysis (Lecture 12) SW5 fall 2005 Simonas Šaltenis E1-215b

Problem Set 5 Solutions

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing

COMP Data Structures

Range Searching and Windowing

Computational Geometry

Dr. Amotz Bar-Noy s Compendium of Algorithms Problems. Problems, Hints, and Solutions

Computing intersections in a set of line segments: the Bentley-Ottmann algorithm

CS 372: Computational Geometry Lecture 3 Line Segment Intersection

Computational Geometry

CS Sorting Terms & Definitions. Comparing Sorting Algorithms. Bubble Sort. Bubble Sort: Graphical Trace

Chapter 12: Indexing and Hashing

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

Introduction to Algorithms I

Range-Aggregate Queries Involving Geometric Aggregation Operations

Principles of Algorithm Design

ITEC2620 Introduction to Data Structures

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:

Some Search Structures. Balanced Search Trees. Binary Search Trees. A Binary Search Tree. Review Binary Search Trees

Analysis of Algorithms

CSCI 104 Log Structured Merge Trees. Mark Redekopp

CSE 332, Spring 2010, Midterm Examination 30 April 2010

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics

CS 561, Lecture 2 : Randomization in Data Structures. Jared Saia University of New Mexico

Parallel and Sequential Data Structures and Algorithms Lecture (Spring 2012) Lecture 16 Treaps; Augmented BSTs

1 (15 points) LexicoSort

Hash Tables and Hash Functions

Lecture 8 13 March, 2012

Chapter 11: Indexing and Hashing

Balanced Trees Part Two

Review implementation of Stable Matching Survey of common running times. Turn in completed problem sets. Jan 18, 2019 Sprenkle - CSCI211

Range Reporting. Range Reporting. Range Reporting Problem. Applications

Formal Model. Figure 1: The target concept T is a subset of the concept S = [0, 1]. The search agent needs to search S for a point in T.

Lecture 3 February 23, 2012

Binary Search Trees. Contents. Steven J. Zeil. July 11, Definition: Binary Search Trees The Binary Search Tree ADT...

[ DATA STRUCTURES] to Data Structures

Linear Feature Engineering 22

Advanced Operations Research Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17

CMSC 754 Computational Geometry 1

Section 5.3: Event List Management

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

Chapter 13: Query Processing

CPSC 311 Lecture Notes. Sorting and Order Statistics (Chapters 6-9)

Introduction to Algorithms Massachusetts Institute of Technology Professors Erik D. Demaine and Charles E. Leiserson.

CS 561, Lecture 2 : Hash Tables, Skip Lists, Bloom Filters, Count-Min sketch. Jared Saia University of New Mexico

DIVIDE AND CONQUER ALGORITHMS ANALYSIS WITH RECURRENCE EQUATIONS

Jana Kosecka. Linear Time Sorting, Median, Order Statistics. Many slides here are based on E. Demaine, D. Luebke slides

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

Chapter 13: Query Processing Basic Steps in Query Processing

Orthogonal range searching. Orthogonal range search

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Trees. Chapter 6. strings. 3 Both position and Enumerator are similar in concept to C++ iterators, although the details are quite different.

Week - 04 Lecture - 01 Merge Sort. (Refer Slide Time: 00:02)

Data Structure and Algorithm Midterm Reference Solution TA

FINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 ( Marks: 1 ) - Please choose one The data of the problem is of 2GB and the hard

Lecture 6: Hashing Steven Skiena

1 Static-to-Dynamic Transformations

Persistent Data Structures (Version Control)

4 Greedy Approximation for Set Cover

would be included in is small: to be exact. Thus with probability1, the same partition n+1 n+1 would be produced regardless of whether p is in the inp

CMSC 341 Lecture 15 Leftist Heaps

Notes and Answers to Homework Exam 1, Geometric Algorithms, 2017

UNIT-1. Chapter 1(Introduction and overview) 1. Asymptotic Notations 2. One Dimensional array 3. Multi Dimensional array 4. Pointer arrays.

! Tree: set of nodes and directed edges. ! Parent: source node of directed edge. ! Child: terminal node of directed edge

Range Reporting. Range reporting problem 1D range reporting Range trees 2D range reporting Range trees Predecessor in nested sets kd trees

CHENNAI MATHEMATICAL INSTITUTE M.Sc. / Ph.D. Programme in Computer Science

ICS 691: Advanced Data Structures Spring Lecture 8

Problem Set 5 Solutions

FORTH SEMESTER DIPLOMA EXAMINATION IN ENGINEERING/ TECHNOLIGY- OCTOBER, 2012 DATA STRUCTURE

COMP3121/3821/9101/ s1 Assignment 1

1 Tree Sort LECTURE 4. OHSU/OGI (Winter 2009) ANALYSIS AND DESIGN OF ALGORITHMS

Lecture 9 March 15, 2012

Lecture 22 November 19, 2015

BBM 201 DATA STRUCTURES

Range Queries. Kuba Karpierz, Bruno Vacherot. March 4, 2016

Balanced Search Trees. CS 3110 Fall 2010

CMSC 341 Lecture 15 Leftist Heaps

An array is a collection of data that holds fixed number of values of same type. It is also known as a set. An array is a data type.

CSE100. Advanced Data Structures. Lecture 12. (Based on Paul Kube course materials)

Symbol Tables. Computing 2 COMP x1

Data Warehousing & Data Mining

Range Minimum Queries Part Two

Data Structures. Alice E. Fischer. Lecture 4, Fall Alice E. Fischer Data Structures L4... 1/19 Lecture 4, Fall / 19

Balanced Binary Search Trees. Victor Gao

1. O(log n), because at worst you will only need to downheap the height of the heap.

Unit-2 Divide and conquer 2016

CSE373 Fall 2013, Midterm Examination October 18, 2013

Data Structures Question Bank Multiple Choice

Solved by: Syed Zain Ali Bukhari (BSCS)

ECE 122. Engineering Problem Solving Using Java

403: Algorithms and Data Structures. Heaps. Fall 2016 UAlbany Computer Science. Some slides borrowed by David Luebke

Transcription:

C.S. 252 Prof. Roberto Tamassia Computational Geometry Sem. II, 1992 1993 Lecture 11 Scribe: Darren Erik Vengroff Date: March 15, 1993 Fractional Cascading 1 Introduction Fractional cascading is a data structuring technique designed to support iterative searching in a set of k catalogs. It was first developed by Chazelle and Guibas [. Chazelle Guibas Data, Chazelle Guibas Applications.]. Fractional cascading is able to beat this method by using information obtained in a search of one catalog to to guide searches of subsequent catalogs. 2 The Iterative Search Problem We should begin by formally defining the iterative search problem. We are given a set of k catalogs C 1, C 2,..., C k over an ordered universe U of keys. We can think of a catalog as a finite ordered subset of U. The number of elements in catalog i is C i = n i. The total number of elements in all the catalogs is n = i n i. We are allowed to preprocess the catalogs, and then asked to answer queries. A query is of the following form: given a query value x return the largest value less than or equal to x in each of the k catalogs. This is illustrated in Figure 1. 3 Three Approaches to Iterative Searching There are at least three ways that we might consider approaching the problem of iterative searching in k catalogs. We could construct a search tree for each of the catalogs individually, we could try to construct a single search tree over the union of all the catalogs, or we could use fractional cascading. As we shall see, each of the first two techniques give us either optimal space or optimal space, but not both, whereas fractional cascading gives us both. A lower bound on space usage for a data structure supporting iterative search is clearly Ω(n) since we must store the values of the n keys in the catalogs. A lower bound on query time is O(log n + k). The log n term follows from the decision tree model, while the k term follows from the need to report k separate results. As we shall see, fractional cascading will achieve both of these bounds. The most naive way of performing an iterative search would be to construct a balanced search tree for each of the catalogs and then answer a query by searching each of the trees individually. Let us consider the complexity of this approach. As far as space is concerned, 1

+...... x C 1 C 2 C 3 C i C k 1 C k Figure 1: A simple iterative search query. The tick marks represent the values stored in the catalogs, including dummy values of ±. The query value x is shown as a horizontal dotted line. Each value returned by the query is marked with a. Note that in each catalog the largest value less than or equal to x is returned. the balanced tree on the ith catalog will occupy O(n i ) space, thus the total space usage of all k such trees will be Θ(n). For query time, this approach will require a separate search in each of the k search trees. The size of each tree is bounded by O(n), so the total search time is bounded by O(k log n). We can easily construct a worst case example where this bound is tight. Consider the case where k = n and each catalog contains n elements. The the total query time will be k log n = k log n 2 = Θ(k log n). The second approach is to construct a balanced search tree of the elements of the set C = i C i. At each leaf of the tree, we store a k-tuple listing the largest element from each original catalog that is less than or equal to the value stored at the leaf. The space usage of this tree is O(n) for the tree and O(kn) for the k-tuples stored at the leaves. To perform a query in this data structure, we simply search the tree for the largest valued leaf less than or equal to x and then read of the answer stored there. The time to do this is O(log n) for the tree search and O(k) for the reporting of the k-tuple, for a total of O(log n + k). The third approach is fractional cascading, which achieves optimal query time using optimal space. The details of how this is done will comprise the remainder of this paper. The space and time usage of the three methods just described is summarized in Table 1. 4 Bridges The central idea of fractional cascading is that of bridges. Intuitively, a bridge is a pointer from an element c i of C i to an element c i+1 of C i+1 where c i c i+1 is small. This is illustrated 2

Method Space Time k Search Trees O(n) O(k log n) Combined Search Tree O(kn) O(log n + k) Fractional Cascading O(n) O(log n + k) Table 1: Comparison of the time and space usages of the three iterative search methods described in Section 3. Bridges + C i C i+1 Figure 2: A series of bridges between two consecutive catalogs C i and C i+1. in Figure 2. The idea is that once we have located the answer to a query in one catalog we should be able to follow a bridge to a key that is close to the answer in the next catalog. Ideally, we will be able to follow a bridge from the answer to the query in C i to a key in C i+1 and then from there locate the answer in C i+1, all in constant time. If we are able to do this, then once we have the answer in C 1 we can find the answer in the remaining k 1 catalogs in O(k) time. To find the answer in C 1, we can just use a balanced binary search tree, thus making the overall time complexity of our algorithm optimal O(log n + k). By cleverly choosing where and how to construct bridges, this is exactly what fractional cascading will do. Let us pause for a moment and consider a naive bridge building strategy and why it will not work. Suppose that we simply build a bridge from each element of each catalog to the element of the next catalog that is closest in value to it. Clearly this will not take any more than optimal space, since it will add only O(n) bridges to the catalogs, whose total space is O(n). Now, let us consider the time complexity of answering a query by looking at the worst case example illustrated in Figure 3. Here the query time is at least Ω(k log n) even if the additional step of building balanced search trees over ranges of keys that fall between incoming bridges is taken. 3

+... x... C 1 C 2 C 3 C k 1 C k Figure 3: A worst case for the naive bridge building strategy. The query value is x. Even if each time we find an answer in catalog C i we follow the bridge pointer from it as well as from the key above it we are still left with the entire contents of catalog C i+1 to search. This continues through the entire set of k catalogs. If we search each catalog by doing a linear scan from the point at which the bridge told us to begin, the total search time will be Ω(nk). Even if we build a balanced search tree over all elements in C i+1 that appear between two consecutive bridge pointers from C i the query time will still be Ω(k log n). 5 Bridges and Pointers for Fractional Cascading As is now clear, we must be clever about how we choose bridges in order for fractional cascading to perform queries in optimal time. The way will do this is by augmenting each catalog C i to produce an augmented catalog A i. The augmentation is performed by by introducing synthetic keys. Synthetic keys are keys that were not originally part of a catalog C i. Keys that were originally part of the catalog will be called real keys. We build the augmented catalogs by beginning with A k and proceeding backwards down to A 1. A k is simple; it is exactly C k. Now, for A i where 1 i < k, we take every other key of A i+1 and add it to C i as a synthetic key, then build a bridge from each such synthetic key to the element of A i+1 on which it was based. Additionally, we add bridges between the pseudo-elements ± in consecutive catalogs. This process is best understood through an example, such as that appearing in Figure 4. In addition to the bridges, each key in an augmented catalog A i will have two pointers. If the key is a real key, then the first pointer points to the smallest synthetic key above it in A i and the second pointer points to the largest key below it in A i. It the key is a synthetic key, the pointers are similar, except that they point to the smallest real key above it and largest real key below it. If there is no key of the appropriate type above or below the key, the pointer points to the pseudo-key at ±, whichever is appropriate. Bridges and pointers for a set of catalogs are illustrated in Figure 5. Note that in the figure only half the pointers (those pointing downward) are shown. This is done for clarity; the interested reader should 4

+ A i A i+1 Figure 4: The construction of A i. The real keys of A i (elements of C i ) are shown as tick marks, while the synthetic keys (elements of A i \ C i ) are shown as small squares. Note that the synthetic keys are simply copies of every other element of A i+1. be able to easily construct a similar set of upward pointing pointers. An example of a C++ implementation of the data structure representing a key in an augmented catalog appears in Figure 6. Compare this representation of a key to the pictorial representation in Figure 5 to get a better idea of exactly what is going on. This structure will be used in the code accompanying the discussion of the optimal iterative search algorithm in Section 6. 6 Fractional Cascading Search Algorithm Given the data structure described in Section 5 it is relatively easy to construct an algorithm for iterative search that runs in optimal time. An fragment of am implementation of the algorithm is shown in Figure 7; it is based on the data structure from Figure 6. 7 Bridge and Pointer Space Complexity 8 Graph Structured Collections of Catalogs References.[] 5

+ Pointers from real to synthetic keys Search Tree Pointers from synthetic to real keys 6 Bridges A 1 A 2 A 3 A 4 A 5 Figure 5: An example of a set of catalogs augmented for fractional cascading. As in Figure 4, tick marks represent elements of the original catalogs and squares represent synthetic keys added by bridging. The pointers that are shown illustrate how each synthetic key points to the largest valued key below it and similarly each real key points to the largest valued synthetic key below it. In addition to the pointers shown, each key has a pointer to the smallest valued key of the other type that is above it. These pointers are omitted for clarity. A balanced search tree whose leaves are the keys in A 1 is used to locate the key in A i from which bridge traversal begins.

// Types of keys (real, synthetic, or special +-infinity key): enum { KEY_TYPE_REAL, KEY_TYPE_SYNTHETIC, KEY_TYPE_PLUS_INFTY, KEY_TYPE_MINUS_INFTY ; // The key structure: struct key { int type; // KEY_TYPE_* keyvalue keyval; // The value stored at this key. key *prev; // The key immediately before this one // in the catalog. It can be of any type. // If this->type is KEY_TYPE_MINUS_INFTY // then this->prev == this. key *jump_next; // The key immediately after this one // in the catalog if all keys having the // same type as this are skipped. The only // exception to this rule is that if // If this->type is KEY_TYPE_PLUS_INFTY // then this->jump_next == this. ; key *bridge; // A bridge to the next catalog. If // type == KEY_TYPE_REAL then this is // NULL. Figure 6: A fragment of a C++ implementation of the data structure used to implement a key in an augmented catalog. This structure will be used in the code accompanying the discussion of the optimal iterative search algorithm in Section 6. 7

// Search the collection of catalogs for a given key value. int fc_catalog_collection::search(keyvalue x) { int ii; key *current_key; // Locate the key to be reported from the first catalog. current_key = search_balanced_tree(x); // For every catalog report the key found and then locate a key in // the next catalog. for (ii = k-1; ii--; ) { my_assert((current_key->type == KEY_TYPE_REAL) (current_key->type == KEY_TYPE_MINUS_INFTY), "Bad key type to report."); current_key->report_me(); // Follow the bridge below the current key to the next catalog. current_key = current_key->jump_below->bridge; // Now check the two keys above the current key to see if either // of them falls below the query value. If so, take the highest // one. if (current_key->prev->keyval > x) { if (current_key->prev->prev->keyval > x) { current_key = current_key->prev->prev; else { current_key = current_key->prev; // If the key we located is synthetic, then just jump to the // first real or -infinity key below it. if (current_key->type == KEY_TYPE_SYNTHETIC) { current_key = current_key->jump_next; // Now we just have to report the key from the final catalog // and we are done. my_assert((current_key->type == KEY_TYPE_REAL) (current_key->type == KEY_TYPE_MINUS_INFTY), "Bad key type to report."); current_key->report_me(); return 0; Figure 7: A fragment of a C++ implementation of iterative search using fractional cascading. 8