CG Lecture Orthogonal range searching Orthogonal range search. Problem definition and motiation. Space decomposition: techniques and trade-offs 3. Space decomposition schemes: Grids: uniform, non-hierarchical decomposition Quad trees: uniform, hierarchical decomposition Range trees and kd-trees: non-uniform, hierarchical. igher dimensions, ariations on the theme Problem: Gien a set of n points in R d, preprocess them such that reporting or counting the k points inside a d- dimensional axis-parallel box (range) will be efficient. Sample application: Report all cities within 0 KM radius of Tel i. Y X Space decomposition techniques Different schemes for different types of data, with arious trade-offs Types of space decomposition: Grids: uniform, non-hierarchical decomposition Quad trees: uniform, hierarchical decomposition Range trees and kd-trees: non-uniform, hierarchical Key efficiency parameters: Preprocessing time: f( Ω( Preprocessing storage f( Ω( erage query time f(n,k) Worst-case query time f(n,k) Grids: uniform, non-hierarchical Store points in a uniform array [:n] [:n] Query is answered by reporting points in subarray [i:j] [k:l] Complexity: Preprocessing in O( time and O(n ) storage. Query time O(k). Update in O() when points on grid. lternaties: Dynamical update (insertions/deletions) f( 3 List, sparse matrix, hashing Quad trees: uniform, hierarchical Recursiely partition the space into four quadrants until leaf quadrants contain a single point or no point. Query is answered by recursiely intersecting the query rectangle R with the quadrants and reporting the results according to the intersection. Complexity: depends on the nature of the data set, not just the number of points! 5 Quad trees: construction Start with smallest rectangle containing all points. Recusiely classify point set into four quadrants. Stop when quadrant has one or no points. The height h of the tree is related to the size of the initial rectangle s and the smallest distance c between two points: h = log (s/c) + 3/. quad tree with n points and height h has O(h+)n nodes and can be constructed in O(h+)n time. s/ i 6
Quad trees: query answer t each eel, check how the query rectangle R intersects each quadrant: If it does not intersect, stop. If it is included, report all points below Else, recursiely check each quadrant. Complexity: Tests takes O() at each leel. Worst case: must descend to deepest leel and test all four quadrants: Non-uniform, hierarchical decompositions Range tree Construct a binary search tree for coordinate x, and associate to each node in that tree a binary search tree for the y coordinate. Complexity: O(n log to build and store the trees, O(log n +k) to report k points. Kd-tree Construct a binary search tree by recursiely splitting the points along a median line alternating between in the x and y coordinates. Complexity: O(n log to build, O( to store the tree, O( n +k) to report k points. 7 O( (h+) 8 ). D range searching Points are real numbers, ranges are defined by two numbers u and. lgorithm: Sort points in O(n log time Store points in a balanced binary tree whose leaes are points. Each tree node stores the largest alue of its left subtree. Do binary search for u and in the list in O(log time. List all alues in between. - - - u - - 0 3 5 7-5 0 0 3 5 7 3 7 Search complexity: O(log n + 9k) check its alue. 0 Search complexity: Range searching in a D tree Find the two boundaries of the range in the leaes u and. Report all leaes in maximal sub-trees between u and. Mark the ertex at which the search paths dierge as -split. Proceed to find the two boundaries, reporting alues in the sub-trees: - When going to the left (right) endpoint of the range: If going left (right), report the entire right (left) subtree. When a leaf is reached, - - Input Range: 3.5-8. - 5 0 - split 3 7 0 3 5 7 D range trees split node finding D range query algorithm
D range query: run-time analysis k: output size Leaes: O(k) time Internal nodes: O(k) time (since this is a binary tree) Paths: O(log time Total: O(log n + k) time Worst case: k = n Θ( time 3 D range search: idea Generalize D range searching: Construction: Construct a tree ordered by x coordinates. Each inner ertex contains a pointer to a secondary containing all the points of the primary subtree ordered by y coordinate. Points are stored only in the secondary trees. T sorted by x T assoc () sorted by y D range tree: idea Searching: Gien a D range, we simulate a D search and find subtrees sorted by x. Instead of reporting the entire subtrees, inoke a search in the secondary trees sorted by y, and report only the points in the query range. T T assoc () D range tree construction algorithm 5 6 D range tree construction complexity Same as a D-Tree, except that in each leel the secondary trees are built as well. Theorem: The space complexity is Θ(n log. Proof: The size of the primary tree is Θ(. Each of its Θ(log leels corresponds to a collection of secondary trees that contains all the n points. Time complexity (naïe analysis): O() n = T( = n O( n log + T else = O( n log 7 D range tree construction complexity Improement: Source of inefficiency: repeated sorting by y coordinate! Instead, sort by y only once, and copy data in the recursie calls in linear time. The resulting recursie equation is: O() n = T( = n O( + T else = O( n log 8 3
D range tree query algorithm D range search complexity Recurrence equation: T ( = O(log + (log n + k ) = O(log n + k) traersing calls to traersing reporting primary secondary secondary structure structure structure The running time can be reduced to O(log n + k) by using fractional cascading. 9 0 d-dimensional dimensional range trees The idea generalizes directly to d dimensions: Create a series of trees, one for each dimension. Search each as before. Complexity: Construction (time and storage) T d ( = O(n log + T d (.O(log = O(n log d Query: Q d ( = O(log + Q d (.O(log = O(log d T sorted by x sorted by x sorted by x 3 D kd trees: idea ound the points by a rectangle. Split the points into two equal-size subsets, using a horizontal or ertical line. Continue recursiely to partition the subsets, alternating the directions of the lines, until point subsets are small enough (of constant size). Canonical subsets are subtrees. In higher (k) dimensions: Split directions alternate between the k axes ( kd trees). Example of a D kd tree 3 D kd tree construction Partition the plane into axisaligned rectangular regions. Nodes represent partition lines, and leaes represent input points. The bottleneck is finding the median, which requires only linear time! Time complexity: O() n= Tn ( ) = n O( + T n> Tn ( ) = On ( log L L C L L L D L E L 6 L 6 G C D E F G F
D kd tree construction algorithm D range query search Each node in the tree defines an axis-parallel rectangle in the plane, bounded by the lines marked by this ertex s ancestors. Label each node with the number of points in that rectangle. L L L L 6 F E C D 8 L G L L L 6 5 6 C D E F G D range query search (cont.) Gien an axis-parallel range query R, search for this range in the tree. Traerse only subtrees which represent regions oerlapping R. If a subtree entirly contained in R: Counting: dd up its count. Reporting: Report entire subtree. 9 L L L L 6 3 L L L L 6 F R E C D 5 C D E F G G L 8 L 8 I Example of kd tree query answering 7 I 8 D kd tree search algorithm 9 Query time complexity analysis k nodes are reported. ow much time is spent on internal nodes? The nodes isited are those that are stabbed by R but are not contained in R. ow many such cells exist? Theorem: Eery side of R stabs O( cells of the tree. Proof: Extend the side to a full line (wlog, horizontal). In the first leel it stabs two children, and in the next leel it stabs two out of the four grandchildren. Thus, the recursie equation is: Total query time: O( n + k). n = Q( = n + Q else = O ( 30 5
kd-trees: higher dimensions For a d-dimensional space: Same algorithm. Construction time: O(dn log. (O(d) time is needed to handle a point.) Space Complexity: O(d. Query time complexity: O(d (n -/d +k)). Note: For large d, full scan is almost equally good. Question: re kd-trees useful for non-orthogonal range queries, e.g., disks, conex polygons? Fact: Using interal trees, orthogonal range queries can be soled in O(d log d- n + k) time using O(d n log d- space. 3 Points in non-general position Question: ow can we handle sets of points which are not in general position, i.e., with multiple points with the same x coordinate? nswer: y two-step order checks. When comparing according to x, resole ties by y, and ice ersa. This splits points into two sides, haing the same effect as infinitesimally rotating the plane. Theorem: The modified order checks presere the correctness of the algorithms. 3 6