Orthogonal Range Queries

Similar documents
Range Searching. Data structure for a set of objects (points, rectangles, polygons) for efficient range queries.

Introduction to Algorithms

Computational Geometry

Computational Geometry

Advanced Algorithm Design and Analysis (Lecture 12) SW5 fall 2005 Simonas Šaltenis E1-215b

Balanced Box-Decomposition trees for Approximate nearest-neighbor. Manos Thanos (MPLA) Ioannis Emiris (Dept Informatics) Computational Geometry

Geometric data structures:

Computational Geometry

External Memory Algorithms for Geometric Problems. Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)

1D Range Searching (cont) 5.1: 1D Range Searching. Database queries

CMSC 754 Computational Geometry 1

kd-trees Idea: Each level of the tree compares against 1 dimension. Let s us have only two children at each node (instead of 2 d )

Computational Geometry

Data Structures for Approximate Proximity and Range Searching

Computational Geometry

Algorithms. Algorithms GEOMETRIC APPLICATIONS OF BSTS. 1d range search line segment intersection kd trees interval search trees rectangle intersection

Computation Geometry Exercise Range Searching II

Algorithms. Algorithms GEOMETRIC APPLICATIONS OF BSTS. 1d range search line segment intersection kd trees interval search trees rectangle intersection

Nearest neighbors. Focus on tree-based methods. Clément Jamin, GUDHI project, Inria March 2017

Big Data Analytics. Special Topics for Computer Science CSE CSE April 14

Range Reporting. Range Reporting. Range Reporting Problem. Applications

Orthogonal Range Search and its Relatives

Warped parallel nearest neighbor searches using kd-trees

Algorithms. Algorithms GEOMETRIC APPLICATIONS OF BSTS. 1d range search line segment intersection kd trees interval search trees rectangle intersection

Range Reporting. Range reporting problem 1D range reporting Range trees 2D range reporting Range trees Predecessor in nested sets kd trees

Testing Bipartiteness of Geometric Intersection Graphs David Eppstein

Range Searching II: Windowing Queries

VQ Encoding is Nearest Neighbor Search

CMPS 3130/6130 Computational Geometry Spring Voronoi Diagrams. Carola Wenk. Based on: Computational Geometry: Algorithms and Applications

Geometric Optimization

Locality- Sensitive Hashing Random Projections for NN Search

Approximation Algorithms for Geometric Intersection Graphs

Task Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval

Geometric Algorithms. Geometric search: overview. 1D Range Search. 1D Range Search Implementations

Module 8: Range-Searching in Dictionaries for Points

Orthogonal range searching. Orthogonal range search

Multidimensional Indexes [14]

Lecture 3 February 9, 2010

Computational Geometry

Data Mining and Machine Learning: Techniques and Algorithms

Algorithms. Algorithms GEOMETRIC APPLICATIONS OF BSTS. 1d range search kd trees line segment intersection interval search trees rectangle intersection

Introduction to Spatial Database Systems

1 The range query problem

Orthogonal range searching. Range Trees. Orthogonal range searching. 1D range searching. CS Spring 2009

Average case analysis of dynamic geometric optimization

I/O-Algorithms Lars Arge Aarhus University

Computational Geometry

January 10-12, NIT Surathkal Introduction to Graph and Geometric Algorithms

Clustering Billions of Images with Large Scale Nearest Neighbor Search

Algorithms for GIS:! Quadtrees

Polygon Triangulation. (slides partially by Daniel Vlasic )

Geometric Computation: Introduction. Piotr Indyk

Lecture Notes: Range Searching with Linear Space

Point Enclosure and the Interval Tree

{ K 0 (P ); ; K k 1 (P )): k keys of P ( COORD(P) ) { DISC(P): discriminator of P { HISON(P), LOSON(P) Kd-Tree Multidimensional binary search tree Tak

Algorithms GEOMETRIC APPLICATIONS OF BSTS. 1d range search line segment intersection kd trees interval search trees rectangle intersection

Multidimensional Indexing The R Tree

External-Memory Algorithms with Applications in GIS - (L. Arge) Enylton Machado Roberto Beauclair

3 Competitive Dynamic BSTs (January 31 and February 2)

Approximate Nearest Neighbor via Point- Location among Balls

Parallel Physically Based Path-tracing and Shading Part 3 of 2. CIS565 Fall 2012 University of Pennsylvania by Yining Karl Li

WINDOWING PETR FELKEL. FEL CTU PRAGUE Based on [Berg], [Mount]

doc. RNDr. Tomáš Skopal, Ph.D. Department of Software Engineering, Faculty of Information Technology, Czech Technical University in Prague

On the Efficiency of Nearest Neighbor Searching with Data Clustered in Lower Dimensions

26 The closest pair problem

be a polytope. has such a representation iff it contains the origin in its interior. For a generic, sort the inequalities so that

Computational Geometry

Prof. Gill Barequet. Center for Graphics and Geometric Computing, Technion. Dept. of Computer Science The Technion Haifa, Israel

Outline. Other Use of Triangle Inequality Algorithms for Nearest Neighbor Search: Lecture 2. Orchard s Algorithm. Chapter VI

Range Searching and Windowing

The PMBR Procedure. Overview Procedure Syntax PROC PMBR Statement VAR Statement TARGET Statement CLASS Statement. The PMBR Procedure

arxiv: v1 [cs.cg] 8 Jan 2018

Lecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013

Machine Learning and Pervasive Computing

Spatial data structures in 2D

Problem 1: Complexity of Update Rules for Logistic Regression

Data Mining and Machine Learning. Instance-Based Learning. Rote Learning k Nearest-Neighbor Classification. IBL and Rule Learning

! Good algorithms also extend to higher dimensions. ! Curse of dimensionality. ! Range searching. ! Nearest neighbor.

Ray Tracing Acceleration Data Structures

CS 532: 3D Computer Vision 14 th Set of Notes

Approximate Line Nearest Neighbor in High Dimensions

Spatial Data Structures for Computer Graphics

arxiv:physics/ v2 [physics.data-an] 16 Aug 2004

Nearest Neighbor with KD Trees

What we have covered?

Trapezoid and Chain Methods

Collision and Proximity Queries

Scalable Nearest Neighbor Algorithms for High Dimensional Data Marius Muja (UBC), David G. Lowe (Google) IEEE 2014

Computational Geometry. Lecture 17

Approximation Algorithms for the Unit Disk Cover Problem in 2D and 3D

Spatial Data Management

Bounding Volume Hierarchies. CS 6965 Fall 2011

Acceleration Data Structures

Approximate range searching

Randomized incremental construction. Trapezoidal decomposition: Special sampling idea: Sample all except one item

Spatial Data Management

COT 5407: Introduction. to Algorithms. Giri NARASIMHAN. 1/29/19 CAP 5510 / CGS 5166

c 2009 Society for Industrial and Applied Mathematics

Algorithms for Nearest Neighbors

Geometric Data Structures Multi-dimensional queries Nearest neighbour problem References. Notes on Computational Geometry and Data Structures

Transcription:

Orthogonal Range Piotr Indyk

Range Searching in 2D Given a set of n points, build a data structure that for any query rectangle R, reports all points in R

Kd-trees [Bentley] Not the most efficient solution in theory Everyone uses it in practice Algorithm: Choose x or y coordinate (alternate) Choose the median of the coordinate; this defines a horizontal or vertical line Recurse on both sides We get a binary tree: Size: O(N) Depth: O(log N) Construction time: O(N log N)

Kd-tree: Example Each tree node v corresponds to a region Reg(v).

Kd-tree: Range 1. Recursive procedure, starting from v=root 2. Search (v,r): a) If v is a leaf, then report the point stored in v if it lies in R b) Otherwise, if Reg(v) is contained in R, report all points in the subtree of v c) Otherwise: If Reg(left(v)) intersects R, then Search(left(v),R) If Reg(right(v)) intersects R, then Search(right(v),R)

Query demo

Query Time Analysis We will show that Search takes at most O(n 1/2 +P) time, where P is the number of reported points The total time needed to report all points in all sub-trees (i.e., taken by step b) is O(P) We just need to bound the number of nodes v such that Reg(v) intersects R but is not contained in R. In other words, the boundary of R intersects the boundary of Reg(v) Will make a gross overestimation: will bound the number of Reg(v) which are crossed by any of the 4 horizontal/vertical lines

Query Time Continued What is the max number Q(n) of regions in an n-point kd-tree intersecting (say, vertical) line? If we split on x, Q(n)=1+Q(n/2) If we split on y, Q(n)=2*Q(n/2)+2 Since we alternate, we can write Q(n)=3+2Q(n/4) This solves to O(n 1/2 )

Analysis demo

A Faster Solution Query time: O(log 2 n+p) Space: O(n log n)

Idea I: Ranks Sort x and y coordinates of input points For a rectangle R=[x 1,x 2 ] [y 1,y 2 ], we have point (u,v) R iff succ x (x 1 ) rank x (u) pred x (x 2 ) succ y (y 1 ) rank y (v) pred y (y 2 ) Thus we can replace Point coordinates by their rank Query boundaries by succ/pred; this adds O(log n) to the query time

Dyadic intervals Assume n is a power of 2. Dyadic intervals are: [1,1], [2,2] [n,n] [1,2], [3,4] [n-1,n] [1,4], [5,8] [n-3,n]. [1 n] Any interval {a b} can be decomposed into O(log n) dyadic intervals: Imagine a full binary tree over {1 n} Each node corresponds to a dyadic interval Any interval {a b} can be covered using O(log n) sub-trees

Range Trees For each level l=1 log n, partition x-ranks using level-l dyadic intervals This induces vertical strips Within each strip, construct a BST on y- coordinates

Range Trees

Range Trees

Analysis Each point occurs in log n different levels Space: O(n log n) How do we implement the query?

Query procedure Consider query R=X Y Partition X into dyadic intervals For each interval, query the corresponding strip BST using Y

Query procedure

Query procedure

Analysis ctd. Query time: O(log n + output) time per strip O(log n) strips Total: O(log 2 n+p) Faster than kd-tree, but space O(n log n) Recursive application of the idea gives O(log d n) query time O(n log d-1 n) space for the d-dimensional problem

Approximate Nearest Neighbor (ANN) Given: a set of points P in the plane Goal: given a query point q, and ε>0, find a point p whose distance to q is at most (1+ε) times the distance from q to its nearest neighbor

Our solution We will solve the problem using kd-trees under the assumption that all leaf cells of the kd-tree for P have bounded aspect ratio Assumption somewhat strict, but satisfied in practice for most of the leaf cells We will show O(log n/ε 2 ) query time O(n) space (inherited from kd-tree)

ANN Query Procedure Locate the leaf cell containing q Enumerate all leaf cells C in the increasing order of distance from q (denote it by r) Update p so that it is the closest point seen so far Note: r increases, dist(q,p ) decreases Stop if dist(q,p )<(1+ε)*r

Analysis Running time: All cells C seen so far (except maybe for the last one) have diameter > ε*r Because if not, then p(c) would have been a (1+ε)-approximate nearest neighbor, and we would have stopped The number of cells with diameter ε*r, bounded aspect ratio, and touching a ball of radius r is at most O(1/ε 2 )