Multidimensional Divide and Conquer 2 Spatial Joins

Similar documents
Merge Sort. Yufei Tao. Department of Computer Science and Engineering Chinese University of Hong Kong

Recursion: The Beginning

Binary Heaps in Dynamic Arrays

Lecture Notes: External Interval Tree. 1 External Interval Tree The Static Version

Computing intersections in a set of line segments: the Bentley-Ottmann algorithm

The B-Tree. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland

Binary Search and Worst-Case Analysis

Finding Strongly Connected Components

Approximation Algorithms for Geometric Intersection Graphs

Dynamic Arrays and Amortized Analysis

Comparison Based Sorting Algorithms. Algorithms and Data Structures: Lower Bounds for Sorting. Comparison Based Sorting Algorithms

Lecture Notes: Range Searching with Linear Space

Lecture 3 February 9, 2010

CMSC 754 Computational Geometry 1

Algorithms and Data Structures: Lower Bounds for Sorting. ADS: lect 7 slide 1

Solutions to Problem Set 1

High Dimensional Indexing by Clustering

Algorithms. Lecture Notes 5

26 The closest pair problem

Computational Geometry

(Refer Slide Time: 01.26)

Range Searching II: Windowing Queries

Greedy Algorithms 1. For large values of d, brute force search is not feasible because there are 2 d

Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute

COMP Online Algorithms. k-server Problem & Advice. Shahin Kamali. Lecture 13 - Oct. 24, 2017 University of Manitoba

Theorem 2.9: nearest addition algorithm

1 The range query problem

Unit 6 Chapter 15 EXAMPLES OF COMPLEXITY CALCULATION

Planar Point Location

Computational Geometry

Lecture 19. Lecturer: Aleksander Mądry Scribes: Chidambaram Annamalai and Carsten Moldenhauer

Principles of Algorithm Design

COMP 352 FALL Tutorial 10

Computational Geometry

Problem Set 6 Solutions

Introduction to Algorithms

INDIAN STATISTICAL INSTITUTE

Computational Geometry

II (Sorting and) Order Statistics

Point Enclosure and the Interval Tree

Introduction to Indexing R-trees. Hong Kong University of Science and Technology

External Memory Algorithms for Geometric Problems. Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)

Algorithms Exam TIN093/DIT600

Orthogonal Range Search and its Relatives

Goal of the course: The goal is to learn to design and analyze an algorithm. More specifically, you will learn:

Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute. Week 02 Module 06 Lecture - 14 Merge Sort: Analysis

Review implementation of Stable Matching Survey of common running times. Turn in completed problem sets. Jan 18, 2019 Sprenkle - CSCI211

Solutions. Suppose we insert all elements of U into the table, and let n(b) be the number of elements of U that hash to bucket b. Then.

Evaluating Queries. Query Processing: Overview. Query Processing: Example 6/2/2009. Query Processing

EECS 2011M: Fundamentals of Data Structures

CS 372: Computational Geometry Lecture 3 Line Segment Intersection

CSE373: Data Structures & Algorithms Lecture 11: Implementing Union-Find. Lauren Milne Spring 2015

Greedy Algorithms 1 {K(S) K(S) C} For large values of d, brute force search is not feasible because there are 2 d {1,..., d}.

Lecture 25 Spanning Trees

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

COMP3121/3821/9101/ s1 Assignment 1

Solution for Homework set 3

EDAA40 At home exercises 1

Recurrences and Divide-and-Conquer

Lecture 6: External Interval Tree (Part II) 3 Making the external interval tree dynamic. 3.1 Dynamizing an underflow structure

DIVIDE AND CONQUER ALGORITHMS ANALYSIS WITH RECURRENCE EQUATIONS

Scan and its Uses. 1 Scan. 1.1 Contraction CSE341T/CSE549T 09/17/2014. Lecture 8

2. Lecture notes on non-bipartite matching

Divide and Conquer Algorithms

January 10-12, NIT Surathkal Introduction to Graph and Geometric Algorithms

SOFTWARE ENGINEERING Prof.N.L.Sarda Computer Science & Engineering IIT Bombay. Lecture #10 Process Modelling DFD, Function Decomp (Part 2)

EXERCISES SHORTEST PATHS: APPLICATIONS, OPTIMIZATION, VARIATIONS, AND SOLVING THE CONSTRAINED SHORTEST PATH PROBLEM. 1 Applications and Modelling

CMSC 754 Computational Geometry 1

CS 532: 3D Computer Vision 11 th Set of Notes

(Refer Slide Time: 00:50)

CS4311 Design and Analysis of Algorithms. Lecture 1: Getting Started

Computational Geometry

On k-dimensional Balanced Binary Trees*

Chapter 4. Greedy Algorithms. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Algorithms: Lecture 7. Chalmers University of Technology

T consists of finding an efficient implementation of access,

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

CSE 3101: Introduction to the Design and Analysis of Algorithms. Office hours: Wed 4-6 pm (CSEB 3043), or by appointment.

Sorting and Selection

Week - 04 Lecture - 01 Merge Sort. (Refer Slide Time: 00:02)

COMP Data Structures

Exam Datenstrukturen und Algorithmen D-INFK

The divide and conquer strategy has three basic parts. For a given problem of size n,

6.854J / J Advanced Algorithms Fall 2008

CS2351 Data Structures. Lecture 1: Getting Started

CS 532: 3D Computer Vision 14 th Set of Notes

Suffix Trees and Arrays

Matching Algorithms. Proof. If a bipartite graph has a perfect matching, then it is easy to see that the right hand side is a necessary condition.

CSE 373 MAY 10 TH SPANNING TREES AND UNION FIND

Geometric Algorithms. 1. Objects

2 A Plane Sweep Algorithm for Line Segment Intersection

Priority Queues and Binary Heaps

Hashing. Yufei Tao. Department of Computer Science and Engineering Chinese University of Hong Kong

Lecture 8: Orthogonal Range Searching

Improved Bounds for Intersecting Triangles and Halving Planes

COMP 355 Advanced Algorithms

RAM with Randomization and Quick Sort

Subset sum problem and dynamic programming

1 Closest Pair Problem

The angle measure at for example the vertex A is denoted by m A, or m BAC.

Transcription:

Multidimensional Divide and Conque Spatial Joins Yufei Tao ITEE University of Queensland

Today we will continue our discussion of the divide and conquer method in computational geometry. This lecture will discuss the spatial join, which is another fundamental problem on multidimensional rectangles. Central to our discussion is an output-sensitive technique, which aims to design an algorithm whose cost depends on the size of the result (i.e., how much do we need to output).

Spatial Join Let R and S be sets of axis-parallel rectangles in R. The objective of the spatial join problem is to output all pairs of rectangles (r, s) R S such that r intersects s. 1 r 1 s s 1 s r s 3 1 1 1 The result is {(, s ), (, s 3 ), (, s ), (, s ), (, s 3 ), (, s ), (r, s 3 )}.

Applications For each hotel, find all the restaurants that are within 5km. Find all the intersection points between railways and roads. Find all pairs of airplanes within a distance (in the 3D space) of km at 1pm of 1 Jan 017. Consider a dating website where each man registers his age, height, and salary, and each woman specifies ranges on the age, height, and salary of her ideal significant other. Find all pairs of (man, woman) such that the man satisfies the requirements of the woman.... Think: In the first 3 applications, where are the rectangles? Hint: Filter refinement.

Naive Solution Already Worst-Case Optimal One simple algorithm solving the spatial join is simply to inspect all the pairs. Set n = S + T. The running time is O(n ). In the worst case, however, every algorithm must incur Ω(n ) time because there can be n / pairs to report! This happens when every rectangle in S intersects every rectangle in T. In other words, O(n ) time is already asymptotically optimal.

Remedy: Output-Sensitive Algorithms It would be really disappointing if we had to accept O(n ) as the best we could do this complexity is horrible in practice. Fortunately, we do not have to. Notice that the optimality proof on the previous slide (although correct) is rather weak: it requires the result to have Ω(n ) pairs, which seldom happens in practice. In other words, if we denote by k the number of result pairs, we often have k n. Can we achieve good efficiency in those scenarios? The answer is yes. We will learn an algorithm that solves the problem in O(n log n + k) time. Such a time complexity is output-sensitive by being elastic to the output size. It is clearly better than a non-elastic running time of O(n ) the two are equivalent only when k = Ω(n ). Note that Ω(n + k) is a trivial lower bound for any algorithm (why?). Therefore, O(n log n + k) is nearly optimal, up to only a factor of O(log n).

Next, we will explain how to apply divide and conquer to achieve the aforementioned performance guarantees. Our goal is (again) to reduce the dimensionality of the problem. Attention should be paid to two aspects: How to divide a problem recursively into two sub-problems as we will see, this is done in a more sophisticated manner than in the skyline problem; The analysis, in particular, how the output-sensitive bound is established.

Rectangles-Join-Points Let R be a set of axis-parallel rectangles, and P be a set of points, all in R. The objective of the rectangle-join-point problem is to output all pairs of (r, p) R P such that r intersects p. 1 r 1 p p 1 p p p 3 p 5 r p p 7 The result is {(, p ), (, p 5 ), (, p 7 ), (, p 3 ), (, p ), (, p 5 ), (, p ), (, p 7 ), (r, p )}. 1 1 1 This problem cannot be harder than spatial join. An algorithm solving the latter problem efficiently must be able to do so on the former (why)?

1D Rectangles-Join-Points Let us first consider the rectangles-join-points problem in 1D space. Here, R is a set of intervals, and P a set of points, both in R. r 1 p 1 p p 3 p p 5 r 5 r By resorting to the binary search tree, we can easily settle the problem in O(n log n + k) time (think: how). But this will not be enough for us to solve the spatial join problem fast enough. It turns out that we can do better if the input sets have been sorted, as explained next.

1D Sorted Rectangles-Join-Points Again, R is a set of intervals, and P a set of points, both in R. The points of P have been sorted. Likewise, the intervals of R have been sorted by left endpoint. r 1 p 1 p p 3 p p 5 r 5 r Sorted list on P: p 1, p, p 3, p, p 5. Sorted list on R: r 1,,, r, r 5. Combined sorted list: r 1,,, p 1, p, r, r 5, p 3, p, p 5. Can be obtained in O(n) time from the sorted lists of P and R.

1D Sorted Rectangles-Join-Points: Algorithm We will process the combined sorted list in ascending order. Whenever we encounter an interval, it is added to a linked list L. r 1 p 1 p p 3 p p 5 r 5 r In the above example, after processing r 1,,, we have L = (r 1,, ). Let us say that these intervals are alive.

1D Sorted Rectangles-Join-Points: Algorithm Whenever a point p is encountered, we go through L to check whether the intervals therein contain p. For every such interval r, report (r, p). r 1 p 1 p p 3 p p 5 r 5 r Continuing our example, next the algorithm processes point p 1. Since all the intervals in L cover p 1, we report (r 1, p 1 ), (, p 1 ), and (, p 1 ).

1D Sorted Rectangles-Join-Points: Algorithm If we find an interval r L such that r does not cover p, we delete r from L. r 1 p 1 p p 3 p p 5 r 5 r From the previous slide, the algorithm turns to point p. Since r 1, cover p, we report (r 1, p ) and (, p ). However, is removed from L, after which L = (r 1, ). In other words, is dead. Think: Can cover any more points?

1D Sorted Rectangles-Join-Points: Algorithm r 1 p 1 p p 3 p p 5 r 5 r After processing r and r 5, we have L = (r 1,, r, r 5 ). The processing of p 3 outputs (, p 3 ), (r, p 3 ), and (r 5, p 3 ), but removes r 1 from L, which then becomes (, r, r 5 ). The rest of the algorithm proceeds in the same manner.

1D Sorted Rectangles-Join-Points: Analysis Let us now analyze the running time of the algorithm. First, apparently it takes only O(1) time process each interval all we need to do is to insert it into L. Hence, the total cost of processing all the intervals is O( R ) = O(n).

1D Sorted Rectangles-Join-Points: Analysis What is the time of processing the i-th point p i (1 i P )? Clearly it equals O( L ), namely, the number of alive intervals. The crux of the analysis is to break L into two terms: where L = k i + n del i k i is the number of pairs reported when processing p i. n del i is the number of intervals removed from L (i.e., the dead intervals).

1D Sorted Rectangles-Join-Points: Analysis Therefore, the total cost of processing all the points is P O i=1 ( ki + n del i ) P P = O k i + i=1 = O(k + R ) = O(n + k). Note: P i=1 k i = k because each pair is reported exactly once. P i=1 ndel i R because each interval can be deleted at most once. i=1 n del i

D Rectangles-Join-Points Having concluded that 1D sorted rectangles-join-points can be solved in O(n + k) time, we will now apply divide and conquer to attack the D problem. 1 r 1 p p 5 p 1 p r p 7 p p 3 p 1 1 1

D Rectangles-Join-Points: Algorithm Let X be the set of all x-coordinates in the input sets (i.e., the x-coordinate of a left/right boundary of a rectangle, or the x-coordinate of a point). Call the values in X the raw x-coordinates. In the above example, X = {1,, 3,, 5,, 7,, 9,, 11, 1, 13, 1, 15, 1}. 1 r 1 p p 5 p 1 p r p 7 p p 3 p 1 1 1 To facilitate our discussion, let us assume that all the raw x-coordinates are distinct. Removing the assumption is easy and is left to you (hint: by some tie-breaking).

D Rectangles-Join-Points: Algorithm Divide the data space by a vertical line l, such that there are one half of the raw x-coordinates on each side. In the example below, we place l at x =.5. 1 l r 1 p p 5 p 1 p r p 7 p p 3 p 1 1 1 On the left hand side of l, the set X 1 of raw x-coordinates is {1,, 3,, 5,, 7, }. On the right of l, the set X is {9,, 11, 1, 13, 1, 15, 1}.

D Rectangles-Join-Points: Algorithm The line divides the problem into two sub-problems: 1 l 1 l r 1 p p 5 r p 1 p p 7 p p 3 p 1 1 1 1 1 1 Note that some rectangles appear in both sub-problems: and. We will then solve (i.e., conquer) each sub-problem. No result merging is needed (why?).

D Rectangles-Join-Points: Algorithm 1 l 1 l r 1 p p 5 r p 1 p p 7 p p 3 p 1 1 1 1 1 1 Consider first the left sub-problem, whose raw x-coordinate set is X 1 = {1,, 3,, 5,, 7, }. Notice that, x =.5, is not counted as a raw x-coordinate. Indeed, as far as the left sub-problem is concerned, the right boundaries of and can be regarded of being at x =. In this way, we guarantee that we never create new raw x-coordinates in recursively dividing the problem.

D Rectangles-Join-Points: Algorithm Consider dividing the right sub-problem into two: 1 p 5 r l p p 7 p 1 1 1 The left sub-sub-problem has raw x-coordinate set X 1 = {9,, 11, 1}, while the right sub-sub-problem has raw x-coordinate set X = {13, 1, 15, 1}. See the next slide for a clearer illustration of the two sub-sub-problems.

D Rectangles-Join-Points: Algorithm From the previous slide, X 1 = {9,, 11, 1} and X = {13, 1, 15, 1}. l l 1 1 p p 5 r p 7 p 1 1 1 1 1 1 Something interesting happens in the left sub-sub-problem: and do not contribute any raw x-coordinates. Notice that their x-ranges span the x-range of this sub-sub-problem. This is great news for us! We can immediately get rid of and by finding their result pairs via a reduction to a 1D problem.

D Rectangles-Join-Points: Algorithm From the previous slide, X 1 = {9,, 11, 1} and X = {13, 1, 15, 1}. l l 1 1 p p 5 r p 7 p 1 1 1 1 1 1 Something interesting happens in the left sub-sub-problem: and do not contribute any raw x-coordinates. Notice that their x-ranges span the x-range of this sub-sub-problem. This is great news for us! We can immediately get rid of and by finding their result pairs via a reduction to a 1D problem.

D Rectangles-Join-Points: Algorithm 1 l. p 5 r p 1 1 1 p p 5. From the left sub-sub-problem, we construct a 1D rectangles-join-points instance with an interval set {, } and a point set {p 5, p }. and are then excluded from the left sub-sub-problem.

D Rectangles-Join-Points: Algorithm We can now summarize a principle behind our divide-and-conquer: A sub-problem with a raw x-coordinate set X includes only the rectangles and points that contribute at least one coordinate to X. Rectangles that do not contribute to X are dealt with directly with a 1D instance. They will not be passed further down into sub-sub-problems.

D Rectangles-Join-Points: Algorithm The previous discussion points to the following divide-and-conquer algorithm for solving the D rectangles-join-points problem (inputs: R and P, with raw x-coordinate set X ): 1. Let R span be the set of rectangles that do not contribute to X.. Construct a 1D rectangles-join-points instance with R and P where R is the set of intervals obtained by projecting R span onto the y-axis, and P the set of points obtained by projecting P onto the y-axis. Solve the 1D instance. 3. Divide X into two disjoint subsets X 1 and X with the same size (by respecting the ordering).. Let R 1 (or P 1 ) be the set of rectangles in R (or P, resp.) that contribute to X 1. Similarly, define R and P with respect to X. 5. Solve the left sub-problem with inputs R 1, P 1 and X 1, and the right sub-problem with inputs R, P, and X.

D Rectangles-Join-Points: Analysis Let f (m) be the running time of our algorithm when the raw X-coordinate set has a size of m. It holds that: f (m) f (m/) + g(m) where g(m) is the cost of solving a 1D instance of size m. When m, obviously f (m) = O(1). Plugging g(m) = O(m) plus the linear output cost, we obtain f (m) = O(m log m + k). Remark 1: Every result pair is reported only once (think: why?). Remark : To ensure g(m) = O(m) plus linear output time, we must ensure that the 1D instances are sorted. How to do so without increasing the time complexity? The above also implies that our algorithm finishes in O(n log n + k) time, because m n.

Segment Join Let V be a set of vertical segments, and H be a set of horizontal segments, all in R. The objective of the segment join problem is to output all pairs of (v, h) V H such that v intersects h. 1 v 1 v h 1 v 3 h h 3 h The result is {(v 1, h 1 ), (v, h ), (v, h 3 ), (v, h 3 )}. v 1 1 This problem can also be solved in O(n log n + k) time, using essentially the same algorithm. The details are left as an exercise.

Spatial Join Let R and S be sets of axis-parallel rectangles in R. The objective of the spatial join problem is to output all pairs of rectangles (r, s) R S such that r intersects s. 1 r 1 s s 1 s r s 3 1 1 1 This problem can be reduced to a rectangles-join-points problem and a segment-join problem. The overall time complexity is O(n log n + k). The details are left as an exercise.

Spatial Join in d-dimensional Space Let R and S be sets of axis-parallel rectangles in R d. The objective of the spatial join problem is to output all pairs of rectangles (r, s) R S such that r intersects s. Using divide and conquer, we can solve the problem in O(n log d 1 n + k) time. We will explore the details in an exercise.