CRB-Tree: An Efficient Indexing Scheme for Range-Aggregate Queries

Size: px
Start display at page:

Download "CRB-Tree: An Efficient Indexing Scheme for Range-Aggregate Queries"

Transcription

1 : An Efficient Indexing Scheme for Range-Aggregate Queries Sathish Govindarajan, Pankaj K. Agarwal, and Lars Arge Department of Computer Science, Duke University, Durham, NC 2778 {gsat, pankaj, Abstract. We propose a new indexing scheme, called the CRB-tree, for efficiently answering range-aggregate queries. The range-aggregate problem is defined as follows: Given a set of weighted points in R d, compute the aggregate of weights of points that lie inside a d-dimensional query rectangle. In this paper we focus on range-count, SUM, AVG aggregates. First, we develop an indexing scheme for answering two-dimensional range-count queries that uses O(N/B) disk blocks and answers a query in O(log B N) I/Os, where N is the number of input points and B is the disk block size. This is the first optimal index structure for the 2D range- COUNT problem. The index can be extended to obtain a near-linear-size structure for answering range-sum queries using O(log B N) I/Os. We also obtain similar bounds for rectangle-intersection aggregate queries, in which the input is a set of weighted rectangles and a query asks to compute the aggregate of the weights of those input rectangles that overlap with the query rectangle. This result immediately improves a recent result on temporal-aggregate queries. Our indexing scheme can be dynamized and extended to higher dimensions. Finally, we demonstrate the practical efficiency of our index by comparing its performance against kdb-tree. For a dataset of around 1 million points, the CRB-tree query time is 8 1 times faster than the kdb-tree query time. Furthermore, unlike other indexing schemes, the query performance of CRB-tree is oblivious to the distribution of the input points and placement, shape and size of the query rectangle. 1 Introduction In order to be successful, any data model in a large database requires efficient external memory (secondary storage) support for its language features. Range searching and its variants are problems that often need to be solved efficiently [16]. In on-line analytical processing (OLAP), spatial databases such as geographic information systems (GIS), and several other applications, range-aggregate queries (e.g., range-count, range-sum, etc) play an extremely important role, and a large number of algorithms and indexing structures have been proposed to answer such queries; see e.g. [1] and references therein. With the rapid increase in the use of data warehouses to collect historical information, The first two authors are supported by Army Research Office MURI grant DAAH , by NSF grants ITR , EIA , EIA , and CCR , and by a grant from the U.S.-Israeli Binational Science Foundation. The third author is supported by the National Science Foundation through ESS grant EIA , RI grant EIA , CAREER grant CCR , and ITR grant EIA D. Calvanese et al. (Eds.): ICDT 23, LNCS 2572, pp , 23. c Springer-Verlag Berlin Heidelberg 23

2 144 S. Govindarajan, P.K. Agarwal, and L. Arge temporal aggregate queries have also received much attention in the last few years [21, 22]. In general, answering temporal (or bi-temporal) aggregate queries is harder than traditional aggregate queries because each data point is associated with a time interval during which its attribute values are valid and a query calls for computing an aggregate over only those points that are valid during the query time interval. In this paper we present an optimal indexing scheme for answering 2D range- COUNT queries. Our index can be extended to efficiently answer multidimensional rangeaggregate queries. We can also efficiently answer several important temporal aggregate queries using our index. Model. Let P be a set of N points in R d, and let w : P Z be a weight function. We wish to build an index on P so that we can efficiently answer range-aggregate queries, i.e., compute the aggregate of weights of points in P that lie inside a given d-dimensional query rectangle R. In this paper, we focus on range-aggregate queries, such as range- COUNT, SUM, AVG. These three examples call for computing P R, p P R w(p), and p P R w(p)/ P R, respectively; see Figure 1(i). We also study a more general problem in which P is a set of rectangles in R d.a query is also a d-dimensional rectangle and the goal is to compute an aggregation over the set of input rectangles that overlap with the query rectangle; see Figure 1(ii). We refer to this problem as the rectangleintersection-aggregate problem. This problem arises naturally when a massive set of points is replaced by a set of rectangles (the weight of a rectangle being the average weight of the data points inside that rectangle) or when an input is a set of complex objects and each object is replaced by its minimum bounding box. In the temporal range-aggregate problem, studied in [21,22], we are given a set P of (time) intervals in R d and the goal is to compute an aggregation over the set of intervals that intersect a query rectangle; see Figure 1(iii). The range-aggregate problem is a special case of the temporal range-aggregate problem, which in turn is a special case of the rectangleintersection-aggregate problem. As our main interest is minimizing the number of disk blocks used by the index and the number of disk accesses used to answer a query, we will consider the problem in the standard external memory model [2]. This model assumes that each disk access transmits a contiguous block of B units (words) of data in a single input/output operation (or I/O). The efficiency of an index is measured in terms of the amount of disk space it uses (measured in units of disk blocks) and the number of I/Os required to answer a query, to (i) (ii) (iii) Fig. 1. (i) Range-aggregate query, (ii) Rectangle-intersection-aggregate query, (iii) Temporal range-aggregate query.

3 : An Efficient Indexing Scheme for Range-Aggregate Queries 145 bulkload (construct) the index, and to update the index. The minimum number of disk blocks we need to store N points is n = N/B, which we refer to as linear size. We also assume that the size of internal memory M is at least B 2 and that the integer N can be stored in a single word, i.e., that a word can store log N bits and each bit can be accessed individually. The latter assumption is made in any reasonable computational model and is important for our space- and query-efficient index. Related work. There has been a lot of work in the spatial database community on indexing a set of points for answering range queries. Indexing schemes such as numerous variants of kdb-trees and R-trees, external priority search trees, etc. have been proposed; see [3, 16] for recent surveys. A kdb-tree uses O(n) disk blocks and can be used to answer a range-aggregate query in O( n) I/Os. No worst-case bound on the query performance of the R-tree is known. Recently, Tao et al. [18] proposed a ap-tree that uses O(n log B n) disk blocks and answers a range-aggregate query using O(log B n) I/Os. As temporal aggregation is being included in most of the temporal languages, there has been a flurry of activity on answering temporal aggregate queries. After several early results, including [14,13], Yang and Widom [21] proposed an indexing scheme called the SB-tree that stores a set of N time intervals in R 2 using O(n log B n) disk blocks so that for a time interval, the aggregate over all keys that are valid at some time in the interval can be computed using O(log B n) I/Os. Recently, Zhang et al. [22] proposed a multiversion SB-tree (MVSB-tree) that uses O(n log B n) disk blocks in the worst case and that can answer a temporal range-aggregate query in O(log B n) I/Os. Their index can also answer range-aggregate queries such as range-count and range-sum in O(log B n) I/Os. The SB-tree and MVSB-tree also support updates in O(log B n) I/Os. There is also a vast literature on answering range-aggregate queries in OLAP systems, in which the data is typically modeled as a multidimensional cube and queries typically involve aggregation across various cube dimensions [6,11,12,15]. Although the data cube is a good data model for OLAP systems, it only works well when the data is dense in the sense that the data in a d-dimensional data cube with a total of N cells has more than N 1/d distinct keys (on average) in each dimension. In many applications such as spatial databases, data tends to be sparse and data cube model does not work well. Several range-searching data structures in the internal memory model have been developed in computational geometry; see [1] for a survey. The best-known data structure in R 2 is the range tree, which uses O(N log 2 N) space and can answer a range-aggregate query such as range-sum in O(log 2 N) time [1]. Chazelle [7] developed the compressed range-tree data structure that uses O(N) space under the so-called bit model (in which a word can store log 2 N bits and each bit can be manipulated individually). This structure can be used to answer a range-count query in O(log 2 N) time. Both of these structures use subtraction to answer queries. The compressed range trees can be extended to other range-aggregate queries by paying a polylogarithmic overhead in the query time. If we do not allow subtraction, then the lower-bound results by Chazelle [8] in the semigroup model (see e.g. [1,8] for a precise definition of this model) suggest that a structure that answers a 2D range-count query in O(log 2 N) time has to use super-linear storage. Our results. Our main result, described in Section 2, is a new indexing scheme, called the Compressed Range B-tree (or CRB-tree), for answering two-dimensional range- COUNT queries. This structure is an external version of the compressed range-tree [7].

4 146 S. Govindarajan, P.K. Agarwal, and L. Arge It uses O(n) disk blocks, answers a query in O(log B n) I/Os, and can be bulk-loaded using O(n log B n) I/Os. This is the first optimal indexing scheme for the 2D range- COUNT problem in the I/O model. Using a partial-rebuilding scheme [5], a point can be inserted/deleted in O(log 2 B n) I/Os. Section 3 presents several extensions of our basic structure. We first adapt the structure for answering 2D range-sum queries in O(log B n) I/Os using O(n log B ((W log 2 W )/N )) disk blocks, where W = p S w(p) is the total weight of the input points. It can also answer other range-aggregate queries such as range- MAX,MIN in O(log 2 B n) I/Os. Next we extend our index to higher dimensions. In R d, the structure uses O(n log d 2 n) disk blocks, answers a range-count query in O(log d 1 B n) n) I/Os. Similar bounds can be derived for other range-aggregate queries. Using a result by Edelsbrunner and Overmars [9], our index can also be used to answer rectangle-intersection-count(resp. SUM) queries without affecting the asymptotic bound. Since temporal range-aggregate queries are a special case of rectangle-intersection aggregate queries, our structure improves upon the recent results by Zhang et al. [22] when the total weights of the points is O(N). More precisely, we can answer a temporal range-aggregate query in O(log B n) I/Os using an O(n log B (log 2 N))-size index. We have implemented the two-dimensional CRB-tree and in Section 4 we report the results of an extensive experimental evaluation of its efficiency. Since we are mainly interested in linear-size indexes, we compare the performance of the CRB-tree with that of the kdb-tree. We have evaluated the performance of these index structures using synthetic and TIGER/Line data. Our first set of experiments study the query and bulk loading performance on datasets of size ranging from 2 to 14 million points. Our experiments show that the query performance of CRB-trees is significantly better than that of kdb-trees. For a data set with around 1 million points, the CRB-tree query time is 8 1 times faster than the kdb-tree query time. Our second set of experiments compare the query performance by varying the size and shape of the query rectangle. The query performance of CRB-trees is independent of the shape and size of the query rectangle while the query time of kdb-tree increases rapidly with the size of the query rectangle. I/Os, and can be bulk loaded in O(n log d 1 B 2 In this section, we describe the Compressed Range B-tree (CRB-tree), an indexing scheme for answering two-dimensional range-count queries. The structure is an external version of an internal memory structure due to Chazelle [7]. We first describe the CRB-tree and how to bulkload it, and then present the query procedure. CRB-tree. Let P denote the set of N points in the plane. A CRB-tree consists of two structures: a B + -tree T constructed on the x-coordinates of P, with each internal node v storing a secondary structure Σ v, and a normal B + -tree Ψ constructed on the y- coordinates of P (Figure 2 shows an example of a CRB-tree T on N =16points.) Let P v = {p 1,p 2,...} denote the sequence of points contained in the subtree of T rooted at v, sorted in a non-decreasing order of their y-coordinates. Set N v = P v and n v = N v /B. For any y-value y, the secondary structure Σ v will be used to count the number of points of P v that belong to a given child of v and whose y-coordinates are at

5 : An Efficient Indexing Scheme for Range-Aggregate Queries 147 most y. Intuitively, for each child v i of v, Σ v should maintain an array whose jth entry, for j N v, stores the count, i.e., how many points among the first j points of P v belong to P vi. Using these prefix sums, we can determine in one I/O the number of points of P v that belong to P vi and whose y-coordinates are less than y. However, storing these prefix sums requires O(n v ) disk blocks, which would lead to an overall space of O(n log B n) blocks. We therefore store the prefix sum array compactly, using only O(n v /(log B n)) blocks. Roughly speaking, we partition P v into consecutive chunks and compute the prefix-sums only at the chunk level, i.e., for the lth chunk and for each child v i of v, we store the number of points in the first l chunks that belong to P vi. Let the predecessor of y in P v belong to the kth chunk of P v. The desired count is the sum of the prefix sum till the (k 1)th chunk and the number of points within the kth chunk that belong to P vi and whose y-coordinates are less than y. We also preprocess each chunk separately so that we can compute the latter count using O(1) I/Os. More precisely, Σ v consists of two arrays a child index array CI v and a prefix count array PC v. Let v,v 1,...,v B 1 be the children of v. We regard the child index array CI v as a one-dimensional array with a log 2 B bit entry for each of the N v points in P v. CI v [i] simply stores the index b i of the subtree of v storing the ith point p i of P v, i.e., p i is stored at a leaf in the subtree rooted at v bi. (For example, in Figure 2 (iv), CI v [3]=1 since the third point in P v, (7, 3), belongs to the subtree of v 1 ). Since b i <B, the log 2 B bits of the ith entry are enough to represent b i. CI v requires N v log 2 B bits and thus can be stored using Nv B log 2 B/(log 2 N)=n v / log B N blocks. Next, let µ = B log B N and r = N v /µ. (Note that µ entries of CI v fit in one block). We partition the CI v array into chunks of length µ and store the prefix sum at the chunk level in PC v, the prefix count array. We regard PC v as a two-dimensional array with r rows and B columns, with each entry storing one word (that is, log 2 N bits). For 1 i r and j<b, PC v [i, j] stores the number of points among the first iµ points of P v that belong to P vj, PC v [i, j] = {p 1,...,p iµ } P vj. (For example, in Figure 2(iv), µ =8 and PC[1, ]=2since among the first 8 points of P v, only 2 points belong to v.) PC v requires rb = N v / log B N words. Hence Σ v can be stored using O(n v / log B n) disk blocks. Since the height of T is O(log B n) and a point of P is stored only at one leaf of T, we have that v T n v = O(n log B n). The total space used by T is thus v T n v/ log B n = O(n) disk blocks. Bulk loading a CRB-tree. A CRB-tree can be bulk-loaded efficiently bottom-up, level by level. We construct the leaves of T using O(n log M/B n) I/Os by sorting the points in P in non-decreasing order of their x-coordinates [2]. We then sort the points stored at each leaf v in non-decreasing order of their y-coordinates to get P v. Below we describe how we construct all nodes and secondary structures at level i of T, given that we have already constructed the nodes at level i +1. We construct a level i node v and its secondary secondary structure Σ v as follows: We first, compute P v (sorted by the y-coordinates) by merging P v,p v1,...,p vu, where v,...,v u are the children of v. Since M>B 2, the internal memory can hold a block from each P vi at the same time. This means that we can compute P v using a single scan through the P vi s, i.e., in O(n v ) I/Os. Recall that CI v [i] contains the index of the child containing p i. In order to construct CI v [i] we record the index of the child of v from which each point of P v came from. This way we can construct CI v using a single scan

6 148 S. Govindarajan, P.K. Agarwal, and L. Arge P Σ v CI v PC v T (i) v (ii) CI v CI v1 CI v2 CI v3 v v 1 v 2 v 3 Σ v Σ v1 Σ v2 Σ v3 (iii) (iv) Fig. 2. CRB-tree for a set of N =16points. (i) P = {(5, 1), (9, 2), (7, 3), (6, 4), (3, 5), (4, 6), (13, 7), (12, 8), (2, 9), (15, 1), (1, 11), (8, 12), (11, 13), (14, 14), (16, 15), (1, 16)} (ii) B-tree Ψ on the y-coordinates of P ;B=4 and each leaf stores 2 points of P. (iii) B-tree T on the x-coordinates of P ;B=4 and each leaf stores 2 points of P. P v = P, P v = {(3, 5), (4, 6), (2, 9), (1, 16)}, P v1 = {(5, 1), (7, 3), (6, 4), (8, 12)}, P v2 = {(9, 2), (12, 8), (1, 11), (11, 13)}, P v3 = {(13, 7), (15, 1), (14, 14), (16, 15)} (iv) Secondary structure stored at each internal node of T. µ =8; since each child of the root stores 4(< 8) points, no entries are stored in the PC array at those nodes. PC v[1,j] (resp. PC v[2,j])is k if k points among the first 8 (resp. 16) points of P are stored at the child v j of the root. through P v. Since PC v [i, j] contains the number of points among the first iµ points in P v that belong to P vj, we can also construct PC v in a single scan through CI v as follows:we compute PC v [i, j] while scanning the points p (i 1)µ+1,...,p iµ. Since j B 1, we maintain PC v [i, ],PC v [i, 1],...,PC v [i, B 1] in internal memory. If i =1,we initialize PC v [i, j] =, otherwise we initially set PC v [i, j] =PC v [i 1,j].Ifweare currently scanning p k and CI v [k] =j we increment PC v [i, j]. After we have scanned p iµ, we store PC v [i, ],PC v [i, 1],...,PC v [i, B 1] to disk. Since the number of I/Os used to construct a level i node v and its secondary structure Σ v is n v, the total number of I/Os used to build level i of T is O(n). Thus we can construct the using O(n log B n) I/Os in total. Answering queries. Let Q =[x 1,x 2 ] [y 1,y 2 ] be a query rectangle. We compute P Q by traversing T in a top-down manner, visiting O(log B n) nodes. The query procedure traverses T along two paths, namely the paths from the root to the leaves w and z containing x 1 and x 2, respectively. We use the secondary structures along the two paths to answer the query. To explain how, imagine associating an x-interval I v with each node v of T. At the root u of T, I u is the entire x-axis and the interior of the node v partition I v into Bx-intervals. We associate the ith interval with the ith child of v. Consider the topmost node v such that I v contains [x 1,x 2 ] but none of the x-intervals I vj associated with a child of v contains [x 1,x 2 ]; v is the nearest common ancestor of w and z. Let v λ and v ρ be the two children of v such that x 1 and x 2 lie in the intervals I λ and I ρ respectively. (Figure 3 (i) shows the intervals for the children of the root node of T. For the given Q, λ =,ρ =3at the root.) Obviously P Q = λ j ρ P v j Q. To answer the query we compute the count C = λ<j<ρ P v j Q at v using the secondary structure Σ v and recursively visit the children v λ and v ρ to compute P vλ Q and P vρ Q. (As shown in Figure 3(i), we compute the counts P v1 Q and P v2 Q at v, and

7 : An Efficient Indexing Scheme for Range-Aggregate Queries 149 recursively compute P v Q and P v3 Q.) Below we show how we can compute the count P vi Q at one node in O(1) I/Os. When we reach a leaf w, we compute P w Q in O(1) I/Os by scanning all the (at most B) points in P v. Since we visit O(log B n) nodes we need O(log B n) I/Os to answer a query. v v 1 v v 2 I I 1 I 2 v 3 I 3 α v =2 x y β v =9 P v y 2 y Q x 1 (i) x CI v PC v (ii) Σ v Fig. 3. (i) The query rectangle Q =[2.5, 13.5] [1.5, 9.5] along with the input points P drawn in Figure 2. I j is the interval associated with the child v j of the root. λ =,ρ =3. P v Q, P v3 Q are computed recursively and P v1 Q, P v2 Q are computed at the root. (ii) P v and the secondary structure Σ v at the root. In order to compute the above count, we maintain two variables α v and β v when visiting a node v during the traversal of the tree: α v is the rank of the first point in P v whose y-coordinate is at least y 1, and β v is the rank of the last point in P v whose y- coordinate is not larger than y 2.For j B 1 and 1 r N v, let ϕ(j, r) denote the number of points in P v of rank at most r that belong to P vj, i.e., ϕ(j, r) is the number of entries among the first r entries of CI v that store the index j. Forλ<j<ρ, since the x-coordinates of P vj lie in the range [x 1,x 2 ], P vj Q = ϕ(j, β v ) ϕ(j, α v ).(For example, from the P v array in Figure 3 (ii), ϕ(1, 9)=3, since three points among the first nine points of P v belong to P v1. Similarly, ϕ(1, 2)=1. Thus we have P v1 Q = 3 1=2. From Figure 3 (i), we see that P v1 Q =2since there are two points of P v in the shaded region of child v 1.) It thus suffices to describe how to maintain variables α v and β v and how to compute ϕ(,r),...,ϕ(b 1,r). We can compute α v and β v, at the root of T in O(log B n) I/Os by searching for y 1 and y 2 in the B-tree Ψ. α v and β v are the ranks of the points at which the search terminates. Assuming we have computed α v and β v at a node v of T, we can compute α vj and β vj for all children v j of v as follows. Since α vj is the rank of the first point in P v that belongs to P vj and whose y-coordinate is at least y 1, α vj = ϕ(j, α v ) and β vj = ϕ(j, β v ). Thus the problem of maintaining α v,β v reduces to computing ϕ(j, α v ) and ϕ(j, β v ) for all j B 1. All that remains is to describe the procedure for computing ϕ(,r),...,ϕ(b 1,r) for a given r in O(1) I/Os. Suppose r = µa + c for a and c µ. Then ϕ(j, r) = {k k r and CI v [k] =j} = PC v [a, j]+ {k µa<k r and CI v [k] =j}. (1)

8 15 S. Govindarajan, P.K. Agarwal, and L. Arge Thus our procedure simply reads the two disk blocks storing PC v [a, 1],...,PC v [a, B] and CI v [µa +1],...,CI v [r], respectively, and then computes ϕ(,r),...,ϕ(b 1,r) using (1). For example, let us calculate ϕ(2, 11). From the PC and CI array in Figure 3 (ii), ϕ(2, 11) = PC[1, 2]+1=3. From the P v array in Figure 3 (i), there are three points among the first eleven points that belong to P v2. Theorem 1. A set of N points in the plane can be stored in a linear-size index structure using O(n log B n) I/Os so that a range-count query can be answered in O(log B n) I/Os. We can make the CRB-tree structure dynamic by slightly modifying the externalmemory logarithmic method of Arge and Vahrenhold [5]. Omitting all the details from this abstract, we obtain the following. Theorem 2. A set of N points in the plane can be stored in a linear-size index structure so that a range-count query can be answered in O(log 2 B n) I/Os and a point can be inserted or deleted in O(log 2 B n) amortized I/Os. 3 Extensions In this section we discuss various extensions of CRB-trees. We present the main ideas in these extensions and omit the details. Range-SUM queries. We first discuss how to extend CRB-trees to answer range-sum queries in the plane. Let P be a set of N points in R 2, and let w : P Z be the weight function. If the weight of each point can be stored using O(1) bits, then we can easily modify the CRB-tree by storing the weights in an additional array similar to CI and storing the prefix sum of the weights in another array similar to PC. So we focus on the case in which the weights of the points vary considerably. Set W = p P w(p) and ω = log B (log 2 W )+log B (W/N). We extend the CRB-tree by storing four additional arrays W v,cw v,l v, and CL v in the secondary structure Σ v of each internal node v of the B- tree. Let w i = w(p i ) be the ith point p i P v. Set s i = max{log 2 log 2 W, log w i }; s i log 2 W. We store w i in the array W v, using s i bits, as a continuous sequence of bits. W v requires at most µ v = i s i N v log 2 log 2 W + N v log 2 (W/N) bits and thus µ v = O(nω/ log B n) disk blocks. W v plays the role of CI v. Since W v is stored as a packed array, we need two additional arrays L v and CL v to determine the index in W v that stores the leftmost bit of w i, for any given i N v. L v is an array of length N v, each entry composed of log 2 log 2 W bits. L v [i] stores the value of s i. Since s i log 2 W,it needs at most log 2 log 2 W bits. The size of L v is thus O( Nv B log 2(log 2 W )/(log 2 N)) = O(n v log B (log 2 W )/(log B n)) disk blocks. CL v, an array of length O(nω/ log B n) blocks, stores the prefix sum of L v,aspc v in Section 2, so that the leftmost bit of w i in W v, for any i N v, can be computed using O(1) I/Os. Finally, CW v stores prefix sum of weights in W v,aspc v in Section 2, so that for any i N v, one can compute the sum of weights of points in {p 1,...,p i } P w for all children w of v. Since the details are similar to Section 2, we omit them from this abstract and conclude the following.

9 : An Efficient Indexing Scheme for Range-Aggregate Queries 151 Theorem 3. Let P be a set of N points in R 2, let w : P Z be a weight function, and let W = N i=1 w i. We can bulk load an index in O(n log B ((W log 2 W )/N ) log B n) I/Os that uses O(n log B ((W log 2 W )/N )) disk blocks so that a range-sum query can be answered using O(log B n) I/Os. If the weight of each point requires O(1) bits, then the size and bulk-loading bounds are O(n) and O(n log B n), respectively. Indexing in higher dimensions. The CRB-tree can be extended to answer rangeaggregate queries in R d by constructing a multi-level tree structure as follows. Again, we focus on range-count queries. Let P be a set of N points in R d, and set b = B 1/(d 1). A d-dimensional CRB-tree is a B-tree T d, with fanout b, built on the x d -coordinates of P. Each internal node v of T d is associated with a subset P v of P and stores a secondary structure, which is a (d 1)-dimensional CRB-tree T d 1 on the projection of P v onto the hyperplane x d =. The recursion stops when we have built the two-dimensional CRB-tree (with fanout b) inthex 1 x 2 -plane. As in the 2D case, the secondary structure of each internal node v of T 2 stores two arrays CI v and PC v, though each entry now keeps more information. Let T 2 be a two-dimensional CRB-tree and let v be an internal node of T 2.We associate a (d 1) tuple (w d,w d 1,...,w 2 ) with v where w 2 = v and w i is a node of the i-dimensional CRB-tree to which T 2 is attached. For each point p P v, and for 2 i d, let w ai be the child of w i (in the i-dimensional tree) so that p P wai P wi. We call (a d,...,a 2 ) the child-index sequence of p. CI v is a two-dimensional array with N v rows and d 1 columns. The row corresponding to the point p P v stores the child-index sequence of p. CI v requires N v (d 1) log b = O(N v log B) bits. PC v stores the prefix sum of CI v, as in Section 2. The total size of T d is O(n log d 2 n) disk blocks and can be constructed using O(n log d 1 n) I/Os. Let the query rectangle Q in R d be [α 1,β 1 ] [α d,β d ]. The query procedure in T d follows two paths to the leaves corresponding to the points α d and β d. For each node v on these paths, the procedure recursively visits the (d 1)-dimensional CRBtree stored at v.when we reach a node v of T 2, for all children w of v, we count P w [α 1,β 1 ] [α d,β d ]. This can be reduced to answering the following query: given a real value α and a (d 1)-tuple ω =(ω d,...,ω 2 ), count all the points in P w whose x 1 -coordinates are at most α and whose child-index sequence is ω. We use arrays CI v and PC v to answer the above query efficiently. Omitting all the details, we conclude the following. Theorem 4. Let P be a set of N points in R d. We can bulk load an index on P using O(n log d 1 B n) I/Os that uses O(n logd 2 B n) disk blocks and can answer a d-dimensional range-count query in O(log d 1 B n) I/Os. Rectangle-intersection aggregate queries. CRB-trees can be extended to answer rectangle-intersection aggregate queries by using the reduction of Edelsbrunner and Overmars [9] to transform rectangle-intersection COUNT queries in R d to range-count queries in R d. Omitting all the details, we have the following result. Theorem 5. Let P be a set of N rectangles in R d. We can bulk load an index on P using O(n log d 1 B n) I/Os that uses O(n logd 2 B n) disk blocks and can answer a d-dimensional range-intersection-count query in O(log d 1 B n) I/Os.

10 152 S. Govindarajan, P.K. Agarwal, and L. Arge 4 Experimental Results In this section we report the results of an extensive set of experiments with the CRBtree. The emphasis of our experiments is on the size and query time of the index, and therefore we provide experimental results only for 2D range COUNT queries. Since we are mainly interested in linear space indexes, we chose to compare the performance of the CRB-tree with that of the kdb-tree [17], and not e.g. with the MVSB-tree which uses O(n log B n) space [22]. In the full version of this paper, we will provide experimental results on SUM queries and as well as on dynamization. 4.1 Implementation We implemented the CRB-tree using the TPIE system developed at Duke. TPIE is designed to facilitate easy and portable implementations of I/O-efficient algorithms and indexing structures, and consists of a set of templated C++ classes and functions. The TPIE system consists of a stream and a block oriented part [2,4]. In the stream oriented part, user programs are fed a continuous stream of elements in an I/O-efficient manner. In the block oriented part, the external memory is viewed as a collection of blocks and primitives for manipulating such blocks are provided. Both the CRB-tree and kdb-tree implementation use both parts of TPIE. The nodes of the B-tree and kdb-tree are implemented using blocks. The stream oriented part is used for efficiently implementing the bulk loading algorithm of both the indexes. For the CRB-tree, the block size of 8K bytes allowed for a fanout of 5 and a maximum leaf size of 681. The precise number of blocks used for the CRB-tree, can be roughly estimated to 4n; n blocks for each of the base tree T, Ψ, and the secondary structure arrays CI and PC. The arrays CI and PCwere also implemented using blocks. We implemented the arrays such that the entries needed to compute the count C at any node of T can be loaded using only four I/Os (loading two blocks of CI and PC array). Thus the query process uses 5 I/Os at each node of T (4 I/Os to access the secondary structure and 1 I/O to access the node). The total number of nodes of T accessed by the query procedure is almost 2 log B n 1, since the query search corresponds to 2 root-to-leaf paths in T. The same is true for the number of nodes of Ψ accessed by the query. Thus the total number of I/Os performed by the query procedure is 5(2 log B n 1) + (2 log B n 1) = 6(2 log B n 1). Since each node v of our kdb-tree stores a balanced binary tree of height 8 whose leaves are the children of v, the 8K block size allowed for a fanout of 255 and a maximum leaf size of 681. The number of blocks used for the kdb-tree can be roughly estimated to be n. We bulk-load the kdb-tree using a top-down approach. At each node of the kdb-tree, we store the count of the number of points contained in the subtree of each of its children. The query process traverses the kdb-tree, starting from the root. At each node v, it checks which regions corresponding to v s children intersects the query region. The query process recurses on those childrens whose region is intersected by the query region and accumulates the count for those children whose region is contained in the query region.

11 : An Efficient Indexing Scheme for Range-Aggregate Queries Time in Minutes Number of I/O (X 1) No. of Points(in millions) No. of Points(in millions) (i) (ii) Fig. 4. Comparison of (i) running time and (ii) number of I/Os performed when bulk loading the CRB-tree and kdb-tree. 4.2 Experiments We evaluated the performance of the CRB-tree using both synthetic and TIGER/Line data. Below we report both the number of (TPIE) I/Os performed and the wall-clock running time of a set of bulk loading and query experiments. Query bounds are averages over 1 queries with the buffer cache being flushed between queries. All our experiments were performed on a Dell PowerEdge 24 workstation with a 5MHZ PIII processor and 128MB of main memory, running FREEBSD 4.3. Physically the machine had 1GB of main memory, but to simulate a real multi-user database environment we restricted the main memory usage to 128MB. Furthermore, TPIE was configured to use a maximum of 8MB, leaving the rest of the memory to the operating system. The external memory consisted of a RAID disk array of four 36GB SCSI disks (IBM DDYS T3695M). Uniformly distributed points. Our first set of experiments were performed on uniformly distributed points in the range [, 1 9 ] [, 1 9 ] (points were generated by independently choosing a random value for the x and y coordinates). The experiments were performed using data sets sizes ranging from 2 to 14 million points. For each query, we choose a random square with an area equal to 1% of the area of the bounding box of the data set. Figure 4 shows the number of I/Os and time taken by the bulk-loading algorithm. The bulk-loading time for the CRB-tree is times slower than the kdb-tree and as we can see, this is mainly because the number of blocks in the CRB-tree is 3 4 times larger than kdb-tree, hence the CRB-tree algorithm performs more I/Os than the kdb-tree algorithm. Figure 5 shows the number of I/Os and time taken by the query process. The query time of the CRB-tree is almost independent of the dataset size and significantly lower than the query time of the kdb-tree. For the datasets sizes used in the experiments, the height of the CRB-tree (T and Ψ) is either 2 or 3. Thus the total number of I/Os performed by the query is at most 6(2 3 1)=3. This explains the fact that the query time remains almost constant in these experiments. Since the number of nodes visited by the kdb-tree query algorithm increases with increase in data size (it varies as n in worst case), the query time (I/Os performed) increases significantly as N varies from 2 to 14 million.

12 154 S. Govindarajan, P.K. Agarwal, and L. Arge Time in Seconds Number of I/O No. of Points(in millions) No. of Points(in millions) (i) (ii) Fig. 5. Comparison of (i) running time and (ii) number of I/Os performed when querying the CRB-tree and kdb-tree. % of CPU Utilization No. of Points I/O Time in Seconds No. of Points (i) (ii) Fig. 6. Comparison of (i) percentage of CPU calculations and (ii) time to perform I/Os(I/O time) of the query algorithm for CRB-tree and kdb tree. From Figure 5, we can see that the speedup ratio between CRB-tree and kdb-tree is significantly higher for number of I/Os (Figure 5(i)) compared to that of the of execution time (Figure 5(ii)). The reason for this is as follows: The total execution time is composed of three components: (1) user CPU time, (2) I/O time (time spent in performing I/Os) and (3) kernel CPU time. The CRB-tree query process spends a significant time in CPU calculations (because of lots of bit operations) compared to kdb-tree query. Figure 6(i) shows the percentage of time spent in CPU calculations for both the CRB-tree and kdb-tree query processes. Figure 6(ii) shows the comparison of I/O time of the query processes. As we can see, the speedup of I/O time is almost similar to the speedup of the number of I/Os of Figure 5(ii). TIGER/Line data. We used the TIGER/Line data set from the US Bureau of the Census [19], which is one of the standard benchmark datasets used in spatial databases. The TIGER/Line 97 distribution we used consists of six CD-ROMs of data corresponding to six regions of the United States. We performed experiments with six point datasets, corresponding to the data on CD-ROM 1 through i, for 1 i 6. The number of points in each of these data sets is shown in Figure 7. Figure 8 shows the result of bulk loading and query experiments with the TIGER/Line datasets. Since the bulk loading time is independent of the characteristics of the data sets, the bulk loading results are similar to the results we obtained with uniformly distributed points. In the query experiments we again used a randomly placed query square with an

13 : An Efficient Indexing Scheme for Range-Aggregate Queries 155 CD1 CD1-2 CD1-3 CD1-4 CD1-5 CD1-6 Number of points(in millions) Size(in MB) Fig. 7. Points data sets extracted from TIGER/Line 97 area equal to 1% of the area of the bounding box of the data set. The query performance of CRB-trees is similar to that of uniformly distributed points (the base B-tree T is also of height 3 in these experiments). The query time of the kdb-tree on the other hand, increases significantly with increase in dataset size. 12 Number of I/O (X 1) Number of I/O No. of points(in millions) No. of points (in millions) (i) (ii) Fig. 8. Comparison of the number of I/Os performed when (i) bulk loading the CRB-tree and kdb-tree and (ii) querying the CRB-tree and kdb-tree using TIGER/Line datasets (number of points in millions). Next we investigated the effect of the query rectangle characteristic on query performance. The experiments were performed using the largest data set of TIGER/LINE. First we performed query experiments with query squares of different sizes. The results of these experiments are shown in Figure 9(i), where the size of the query square is characterized by the ratio of its area to the area of the bounding box of the input points. The size of query square is varied from 1 8 % to 2%. As it can be seen, the query time of the kdb-tree increases rapidly with increasing window sizes. The query time of the CRB-tree on the other hand is almost constant. Next we performed query experiments with query rectangles instead of query squares. The results of our experiments with rectangles of varying aspect ratio (the ratio between the length and the breadth of the rectangle) are shown in Figure 9(ii). The area of the query rectangle is fixed at 1% of the area of the bounding box of the input dataset while the aspect ratio is varied from.1 to 1. As it can be seen, the query time of the kdb-tree increases slightly as the query rectangle becomes skinny (high aspect ratio or low aspect ratio). The reason for this is that the kdb-tree consists of alternating splits along the x and y dimensions and hence a skinny rectangle intersects more nodes of the kdb-tree than a square of the same area. As expected, the query time for the CRB-tree is almost constant. Clustered data. Finally, in order to further investigate the influence of the input data distribution on the query performance of the two structures, we performed experiments

14 156 S. Govindarajan, P.K. Agarwal, and L. Arge Number of I/O Number of I/O Number of I/O E-8 1.E-6 1.E-4 1.E-2 1.E-1 Window Size Aspect Ratio No. of Clusters (i) (ii) (iii) Fig. 9. Comparison of number of I/Os performed when querying the CRB-tree and kdb-tree using the largest TIGER/Line data set and (i) varying the size of query square and (ii) varying the aspect ratio of query rectangle. (iii) shows the comparison of number of I/Os performed during query on synthetic clustered datasets. with artificial clustered datasets. The datasets consists of 15 million points distributed evenly among k clusters, where each cluster is generated by uniformly distributing the points on a randomly oriented ellipse of length and width 1 4 centered at (5 1 8, ). Figure 9(iii) shows the results of experiments when k is varied from 5 to 5. As previously, the CRB-tree performance is almost constant. Note that the CRB-tree query performance does not depend on whether the input data is uniform or skewed, since the number of I/Os performed by the CRB-tree query procedure, depends only on the height of the tree and not on the distribution of input data. Experimental conclusions. The overall conclusions of our experiments is that while the CRB-tree use 3 4 times more space than the kdb-tree and takes times longer to bulk load than a kdb-tree, the query performance of the CRB-tree is much better than that of the kdb-tree. For a data set with around 1 million points, the CRB-tree query time is 8 1 times faster than kdb-tree query time. Furthermore, the query time of the CRB-tree depends only on the height of the tree (log B n). Thus it is independent of the distribution of the input points and query characteristics, and almost constant for the range of data set sizes used in our experimentation. The query time of the kdb-tree on the other hand, depends significantly on the size of the input dataset and the size of the query rectangle. To a lesser extent the query time of the kdb-tree also depends on the aspect ratio of the query window and the input point distribution. Acknowledgments. The authors thank Sariel Har-Peled for useful discussions and Octavian Procopiuc for answering numerous questions related to the TPIE system. References 1. P. K. Agarwal and J. Erickson. Geometric range searching and its relatives. In B. Chazelle, J. E. Goodman, and R. Pollack, editors, Advances in Discrete and Computational Geometry, volume 223 of Contemporary Mathematics, pages American Mathematical Society, Providence, RI, 1999.

15 : An Efficient Indexing Scheme for Range-Aggregate Queries A. Aggarwal and J. S. Vitter. The input/output complexity of sorting and related problems. Commun. ACM, 31: , L. Arge. External memory data structures. In J. Abello, P. M. Pardalos, and M. G. C. Resende, editors, Handbook of Massive Data Sets, pages Kluwer Academic Publishers, L. Arge, O. Procopiuc, and J. S. Vitter. Implementing I/O-efficient data structures using TPIE. In Proc. 1th Annual European Symposium on Algorithms, pages 88 1, L. Arge and J. Vahrenhold. I/O efficient dynamic planar point location. In Proc. ACM Symp. on Computational Geometry, pages 191 2, C. Y. Chan and Y. E. Ioannidis. Hierarchical cubes for range-sum queries. In Proc. of 25th International Conference on Very Large DataBases, pages , B. Chazelle. A functional approach to data structures and its use in multidimensional searching. SIAM J. Comput., 17(3): , June B. Chazelle. Lower bounds for orthogonal range searching, II: The arithmetic model. J. ACM, 37: , H. Edelsbrunner and M. H. Overmars. On the equivalence of some rectangle problems. Information Processing Letters, 14(3): , V. Gaede and O. Günther. Multidimensional access methods. ACM Comput. Surv., 3:17 231, S. Geffner, D. Agarwal, and A. E. Abbadi. The dynamic datacube. In Proc of Intl. Conference on Extending Database Technology, pages , V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. In Proc. of ACM SIGMOD Intl. conference on Management of Data, pages , J. Kim, S. Kang, and M. Kim. Effective temporal aggregation using point-based trees. In Database and Expert Systems Applications, pages , N. Kline and R. T. Snodgrass. Computing temporal aggregates. In Proc. of Intl conference on Data Engineering, pages , S. Lee, W. Ling, and H. Li. Hierarchical compact cubes for range-max queries. In Proc of 26th International Conference on Very Large DataBases, pages , J. Nievergelt and P. Widmayer. Spatial data structures: Concepts and design choices. In J.-R. Sack and J. Urrutia, editors, Handbook of Computational Geometry, pages Elsevier Science Publishers B.V. North-Holland, Amsterdam, J. Robinson. The k-d-b tree: A search structure for large multidimensional dynamic indices. In Proc. of SIGMOD Conference on Management of Data, pages 1 18, Y. Tao, D. Papadias, and J. Zhang. Aggregate processing of planar points. In Extending Database Technology, pages 682 7, TIGER/Line TM Files, 1997 Technical Documentation. Washington, DC, September D. E. Vengroff. A transparent parallel I/O environment. In Proc. DAGS Symposium on Parallel Computation, J. Yang and J. Widom. Incremental computation and maintenance of temporal aggregates. In Proceedings of the 17th International Conference on Data Engineering, pages 51 6, D. Zhang, A. Markowetz, V. Tsotras, D. Gunopulos, and B. Seeger. Efficient computation of temporal aggregates with range predicates. In Proc. Principles Of Database Systems, pages , 21.

I/O-Efficient Structures for Orthogonal Range-Max and Stabbing-Max Queries

I/O-Efficient Structures for Orthogonal Range-Max and Stabbing-Max Queries I/O-Efficient Structures for Orthogonal Range-Max and Stabbing-Max Queries Pankaj K. Agarwal, Lars Arge, Jun Yang, and Ke Yi Department of Computer Science Duke University, Durham, NC 27708, USA {pankaj,large,junyang,yike}@cs.duke.edu

More information

Bkd-tree: A Dynamic Scalable kd-tree

Bkd-tree: A Dynamic Scalable kd-tree Bkd-tree: A Dynamic Scalable kd-tree Octavian Procopiuc, Pankaj K. Agarwal, Lars Arge, and Jeffrey Scott Vitter Department of Computer Science, Duke University Durham, NC 2778, USA Department of Computer

More information

An Optimal Dynamic Interval Stabbing-Max Data Structure?

An Optimal Dynamic Interval Stabbing-Max Data Structure? An Optimal Dynamic Interval Stabbing-Max Data Structure? Pankaj K. Agarwal Lars Arge Ke Yi Abstract In this paper we consider the dynamic stabbing-max problem, that is, the problem of dynamically maintaining

More information

(mainly range and proximity queries), and simplicity. See recent surveys [2, 14, 27]. The proposed data structures can roughly be divided into two cla

(mainly range and proximity queries), and simplicity. See recent surveys [2, 14, 27]. The proposed data structures can roughly be divided into two cla A Framework for Index Bulk Loading and Dynamization Pankaj K. Agarwal?, Lars Arge??, Octavian Procopiuc???, and Jerey Scott Vitter y Center for Geometric Computing, Dept. of Computer Science, Duke University,

More information

Bkd-tree: A Dynamic Scalable kd-tree

Bkd-tree: A Dynamic Scalable kd-tree Bkd-tree: A Dynamic Scalable kd-tree Octavian Procopiuc Pankaj K. Agarwal Lars Arge Jeffrey Scott Vitter July 1, 22 Abstract In this paper we propose a new index structure, called the Bkd-tree, for indexing

More information

I/O-efficient Point Location using Persistent B-Trees

I/O-efficient Point Location using Persistent B-Trees I/O-efficient Point Location using Persistent B-Trees Lars Arge Andrew Danner Sha-Mayn Teh Department of Computer Science Duke University Abstract We present an external planar point location data structure

More information

I/O-Algorithms Lars Arge Aarhus University

I/O-Algorithms Lars Arge Aarhus University I/O-Algorithms Aarhus University April 10, 2008 I/O-Model Block I/O D Parameters N = # elements in problem instance B = # elements that fits in disk block M = # elements that fits in main memory M T =

More information

1 The range query problem

1 The range query problem CS268: Geometric Algorithms Handout #12 Design and Analysis Original Handout #12 Stanford University Thursday, 19 May 1994 Original Lecture #12: Thursday, May 19, 1994 Topics: Range Searching with Partition

More information

I/O-efficient Point Location using Persistent B-Trees

I/O-efficient Point Location using Persistent B-Trees I/O-efficient Point Location using Persistent B-Trees Lars Arge, Andrew Danner, and Sha-Mayn Teh Department of Computer Science, Duke University We present an external planar point location data structure

More information

Lecture 6: External Interval Tree (Part II) 3 Making the external interval tree dynamic. 3.1 Dynamizing an underflow structure

Lecture 6: External Interval Tree (Part II) 3 Making the external interval tree dynamic. 3.1 Dynamizing an underflow structure Lecture 6: External Interval Tree (Part II) Yufei Tao Division of Web Science and Technology Korea Advanced Institute of Science and Technology taoyf@cse.cuhk.edu.hk 3 Making the external interval tree

More information

Simple and Semi-Dynamic Structures for Cache-Oblivious Planar Orthogonal Range Searching

Simple and Semi-Dynamic Structures for Cache-Oblivious Planar Orthogonal Range Searching Simple and Semi-Dynamic Structures for Cache-Oblivious Planar Orthogonal Range Searching ABSTRACT Lars Arge Department of Computer Science University of Aarhus IT-Parken, Aabogade 34 DK-8200 Aarhus N Denmark

More information

Range-Aggregate Queries Involving Geometric Aggregation Operations

Range-Aggregate Queries Involving Geometric Aggregation Operations Range-Aggregate Queries Involving Geometric Aggregation Operations Saladi Rahul, Ananda Swarup Das, K. S. Rajan, and Kannan Srinathan {srahul,anandaswarup}@research.iiit.ac.in {rajan, srinathan}@iiit.ac.in

More information

Cost Models for Query Processing Strategies in the Active Data Repository

Cost Models for Query Processing Strategies in the Active Data Repository Cost Models for Query rocessing Strategies in the Active Data Repository Chialin Chang Institute for Advanced Computer Studies and Department of Computer Science University of Maryland, College ark 272

More information

Benchmarking the UB-tree

Benchmarking the UB-tree Benchmarking the UB-tree Michal Krátký, Tomáš Skopal Department of Computer Science, VŠB Technical University of Ostrava, tř. 17. listopadu 15, Ostrava, Czech Republic michal.kratky@vsb.cz, tomas.skopal@vsb.cz

More information

Cache-Oblivious Planar Orthogonal Range Searching and Counting

Cache-Oblivious Planar Orthogonal Range Searching and Counting Cache-Oblivious Planar Orthogonal Range Searching and Counting Lars Arge BRICS Dept. of Computer Science University of Aarhus IT Parken, Aabogade 34 8200 Aarhus N, Denmark large@daimi.au.dk Gerth Stølting

More information

Data Structures for Moving Objects

Data Structures for Moving Objects Data Structures for Moving Objects Pankaj K. Agarwal Department of Computer Science Duke University Geometric Data Structures S: Set of geometric objects Points, segments, polygons Ask several queries

More information

Lecture 8 13 March, 2012

Lecture 8 13 March, 2012 6.851: Advanced Data Structures Spring 2012 Prof. Erik Demaine Lecture 8 13 March, 2012 1 From Last Lectures... In the previous lecture, we discussed the External Memory and Cache Oblivious memory models.

More information

CMSC 754 Computational Geometry 1

CMSC 754 Computational Geometry 1 CMSC 754 Computational Geometry 1 David M. Mount Department of Computer Science University of Maryland Fall 2005 1 Copyright, David M. Mount, 2005, Dept. of Computer Science, University of Maryland, College

More information

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Indexing Week 14, Spring 2005 Edited by M. Naci Akkøk, 5.3.2004, 3.3.2005 Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Overview Conventional indexes B-trees Hashing schemes

More information

vertices to be changed dynamically, two linear-space structures are known for general subdivisions: one by Cheng and Janardan [9] that answers queries

vertices to be changed dynamically, two linear-space structures are known for general subdivisions: one by Cheng and Janardan [9] that answers queries I/O-Ecient Dynamic Point Location in Monotone Planar Subdivisions (Extended Abstract) Pankaj K. Agarwal Lars Arge y Gerth Stlting rodal z Jerey S. Vitter x Abstract We present an ecient external-memory

More information

An Efficient Transformation for Klee s Measure Problem in the Streaming Model Abstract Given a stream of rectangles over a discrete space, we consider the problem of computing the total number of distinct

More information

The Dynamic Data Cube

The Dynamic Data Cube Steven Geffner, Divakant Agrawal, and Amr El Abbadi Department of Computer Science University of California Santa Barbara, CA 93106 {sgeffner,agrawal,amr}@cs.ucsb.edu Abstract. Range sum queries on data

More information

HISTORICAL BACKGROUND

HISTORICAL BACKGROUND VALID-TIME INDEXING Mirella M. Moro Universidade Federal do Rio Grande do Sul Porto Alegre, RS, Brazil http://www.inf.ufrgs.br/~mirella/ Vassilis J. Tsotras University of California, Riverside Riverside,

More information

Approximation Algorithms for Geometric Intersection Graphs

Approximation Algorithms for Geometric Intersection Graphs Approximation Algorithms for Geometric Intersection Graphs Subhas C. Nandy (nandysc@isical.ac.in) Advanced Computing and Microelectronics Unit Indian Statistical Institute Kolkata 700108, India. Outline

More information

Introduction to Indexing R-trees. Hong Kong University of Science and Technology

Introduction to Indexing R-trees. Hong Kong University of Science and Technology Introduction to Indexing R-trees Dimitris Papadias Hong Kong University of Science and Technology 1 Introduction to Indexing 1. Assume that you work in a government office, and you maintain the records

More information

Lecture Notes: External Interval Tree. 1 External Interval Tree The Static Version

Lecture Notes: External Interval Tree. 1 External Interval Tree The Static Version Lecture Notes: External Interval Tree Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk This lecture discusses the stabbing problem. Let I be

More information

External-Memory Algorithms with Applications in GIS - (L. Arge) Enylton Machado Roberto Beauclair

External-Memory Algorithms with Applications in GIS - (L. Arge) Enylton Machado Roberto Beauclair External-Memory Algorithms with Applications in GIS - (L. Arge) Enylton Machado Roberto Beauclair {machado,tron}@visgraf.impa.br Theoretical Models Random Access Machine Memory: Infinite Array. Access

More information

* (4.1) A more exact setting will be specified later. The side lengthsj are determined such that

* (4.1) A more exact setting will be specified later. The side lengthsj are determined such that D D Chapter 4 xtensions of the CUB MTOD e present several generalizations of the CUB MTOD In section 41 we analyze the query algorithm GOINGCUB The difference to the CUB MTOD occurs in the case when the

More information

On Computing Temporal Aggregates with Range Predicates

On Computing Temporal Aggregates with Range Predicates On Computing Temporal Aggregates with Range Predicates DONGHUI ZHANG Northeastern University ALEXANDER MARKOWETZ Hong Kong University of Science and Technology VASSILIS J. TSOTRAS University of California,

More information

will assume that each point p i is moving along a straight line at some xed speed, or more formally, that p i (t) = a i t + b i for some a i ; b i 2 R

will assume that each point p i is moving along a straight line at some xed speed, or more formally, that p i (t) = a i t + b i for some a i ; b i 2 R Indexing Moving Points (Extended Abstract) Pankaj K. Agarwal Lars Arge y Je Erickson z Abstract We propose three indexing schemes for storing a set S of N points in the plane, each moving along a linear

More information

Max-Count Aggregation Estimation for Moving Points

Max-Count Aggregation Estimation for Moving Points Max-Count Aggregation Estimation for Moving Points Yi Chen Peter Revesz Dept. of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588, USA Abstract Many interesting problems

More information

TRANSACTION-TIME INDEXING

TRANSACTION-TIME INDEXING TRANSACTION-TIME INDEXING Mirella M. Moro Universidade Federal do Rio Grande do Sul Porto Alegre, RS, Brazil http://www.inf.ufrgs.br/~mirella/ Vassilis J. Tsotras University of California, Riverside Riverside,

More information

Optimal External Memory Interval Management

Optimal External Memory Interval Management KU ScholarWorks http://kuscholarworks.ku.edu Please share your stories about how Open Access to this article benefits you. Optimal External Memory Interval Management by Lars Arge and Jeffrey Scott Vitter

More information

Lecture 3 February 23, 2012

Lecture 3 February 23, 2012 6.851: Advanced Data Structures Spring 2012 Prof. Erik Demaine Lecture 3 February 23, 2012 1 Overview In the last lecture we saw the concepts of persistence and retroactivity as well as several data structures

More information

Level-Balanced B-Trees

Level-Balanced B-Trees Gerth Stølting rodal RICS University of Aarhus Pankaj K. Agarwal Lars Arge Jeffrey S. Vitter Center for Geometric Computing Duke University January 1999 1 -Trees ayer, McCreight 1972 Level 2 Level 1 Leaves

More information

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See  for conditions on re-use Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files Static

More information

A Discrete and Dynamic Version of Klee s Measure Problem

A Discrete and Dynamic Version of Klee s Measure Problem CCCG 2011, Toronto ON, August 10 12, 2011 A Discrete and Dynamic Version of Klee s Measure Problem Hakan Yıldız John Hershberger Subhash Suri Abstract Given a set of axis-aligned boxes B = {B 1, B 2,...,

More information

I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries

I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries Pankaj K. Agarwal Lars Arge Jun Yang Ke Yi Department of Computer Science Duke University Durham, NC 27708 Abstract Due to their

More information

38 Cache-Oblivious Data Structures

38 Cache-Oblivious Data Structures 38 Cache-Oblivious Data Structures Lars Arge Duke University Gerth Stølting Brodal University of Aarhus Rolf Fagerberg University of Southern Denmark 38.1 The Cache-Oblivious Model... 38-1 38.2 Fundamental

More information

Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase

Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase Bumjoon Jo and Sungwon Jung (&) Department of Computer Science and Engineering, Sogang University, 35 Baekbeom-ro, Mapo-gu, Seoul 04107,

More information

ICS 691: Advanced Data Structures Spring Lecture 8

ICS 691: Advanced Data Structures Spring Lecture 8 ICS 691: Advanced Data Structures Spring 2016 Prof. odari Sitchinava Lecture 8 Scribe: Ben Karsin 1 Overview In the last lecture we continued looking at arborally satisfied sets and their equivalence to

More information

Algorithms and Data Structures: Efficient and Cache-Oblivious

Algorithms and Data Structures: Efficient and Cache-Oblivious 7 Ritika Angrish and Dr. Deepak Garg Algorithms and Data Structures: Efficient and Cache-Oblivious Ritika Angrish* and Dr. Deepak Garg Department of Computer Science and Engineering, Thapar University,

More information

A SIMPLE APPROXIMATION ALGORITHM FOR NONOVERLAPPING LOCAL ALIGNMENTS (WEIGHTED INDEPENDENT SETS OF AXIS PARALLEL RECTANGLES)

A SIMPLE APPROXIMATION ALGORITHM FOR NONOVERLAPPING LOCAL ALIGNMENTS (WEIGHTED INDEPENDENT SETS OF AXIS PARALLEL RECTANGLES) Chapter 1 A SIMPLE APPROXIMATION ALGORITHM FOR NONOVERLAPPING LOCAL ALIGNMENTS (WEIGHTED INDEPENDENT SETS OF AXIS PARALLEL RECTANGLES) Piotr Berman Department of Computer Science & Engineering Pennsylvania

More information

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing. Basic Concepts Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition

More information

Computing intersections in a set of line segments: the Bentley-Ottmann algorithm

Computing intersections in a set of line segments: the Bentley-Ottmann algorithm Computing intersections in a set of line segments: the Bentley-Ottmann algorithm Michiel Smid October 14, 2003 1 Introduction In these notes, we introduce a powerful technique for solving geometric problems.

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

Clustering Billions of Images with Large Scale Nearest Neighbor Search

Clustering Billions of Images with Large Scale Nearest Neighbor Search Clustering Billions of Images with Large Scale Nearest Neighbor Search Ting Liu, Charles Rosenberg, Henry A. Rowley IEEE Workshop on Applications of Computer Vision February 2007 Presented by Dafna Bitton

More information

Computational Geometry

Computational Geometry Windowing queries Windowing Windowing queries Zoom in; re-center and zoom in; select by outlining Windowing Windowing queries Windowing Windowing queries Given a set of n axis-parallel line segments, preprocess

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

Aggregate-Max Nearest Neighbor Searching in the Plane

Aggregate-Max Nearest Neighbor Searching in the Plane CCCG 2013, Waterloo, Ontario, August 8 10, 2013 Aggregate-Max Nearest Neighbor Searching in the Plane Haitao Wang Abstract We study the aggregate nearest neighbor searching for the Max operator in the

More information

Lecture 9 March 4, 2010

Lecture 9 March 4, 2010 6.851: Advanced Data Structures Spring 010 Dr. André Schulz Lecture 9 March 4, 010 1 Overview Last lecture we defined the Least Common Ancestor (LCA) and Range Min Query (RMQ) problems. Recall that an

More information

Range Tree Applications in Computational Geometry

Range Tree Applications in Computational Geometry Range Tree Applications in Computational Geometry ANTONIO-GABRIEL STURZU, COSTIN-ANTON BOIANGIU Computer Science Department Politehnica University of Bucharest Splaiul Independentei 313, Sector 6, Bucharest,

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

Efficient Range Query Processing on Uncertain Data

Efficient Range Query Processing on Uncertain Data Efficient Range Query Processing on Uncertain Data Andrew Knight Rochester Institute of Technology Department of Computer Science Rochester, New York, USA andyknig@gmail.com Manjeet Rege Rochester Institute

More information

R-Trees. Accessing Spatial Data

R-Trees. Accessing Spatial Data R-Trees Accessing Spatial Data In the beginning The B-Tree provided a foundation for R- Trees. But what s a B-Tree? A data structure for storing sorted data with amortized run times for insertion and deletion

More information

Computational Geometry in the Parallel External Memory Model

Computational Geometry in the Parallel External Memory Model Computational Geometry in the Parallel External Memory Model Nodari Sitchinava Institute for Theoretical Informatics Karlsruhe Institute of Technology nodari@ira.uka.de 1 Introduction Continued advances

More information

Hierarchical Intelligent Cuttings: A Dynamic Multi-dimensional Packet Classification Algorithm

Hierarchical Intelligent Cuttings: A Dynamic Multi-dimensional Packet Classification Algorithm 161 CHAPTER 5 Hierarchical Intelligent Cuttings: A Dynamic Multi-dimensional Packet Classification Algorithm 1 Introduction We saw in the previous chapter that real-life classifiers exhibit structure and

More information

Indexing Variable Length Substrings for Exact and Approximate Matching

Indexing Variable Length Substrings for Exact and Approximate Matching Indexing Variable Length Substrings for Exact and Approximate Matching Gonzalo Navarro 1, and Leena Salmela 2 1 Department of Computer Science, University of Chile gnavarro@dcc.uchile.cl 2 Department of

More information

Report on Cache-Oblivious Priority Queue and Graph Algorithm Applications[1]

Report on Cache-Oblivious Priority Queue and Graph Algorithm Applications[1] Report on Cache-Oblivious Priority Queue and Graph Algorithm Applications[1] Marc André Tanner May 30, 2014 Abstract This report contains two main sections: In section 1 the cache-oblivious computational

More information

Range Mode and Range Median Queries on Lists and Trees

Range Mode and Range Median Queries on Lists and Trees Range Mode and Range Median Queries on Lists and Trees Danny Krizanc 1, Pat Morin 2, and Michiel Smid 2 1 Department of Mathematics and Computer Science, Wesleyan University, Middletown, CT 06459 USA dkrizanc@wesleyan.edu

More information

BIASED RANGE TREES. Vida Dujmović John Howat Pat Morin

BIASED RANGE TREES. Vida Dujmović John Howat Pat Morin BIASED RANGE TREES Vida Dujmović John Howat Pat Morin ABSTRACT. A data structure, called a biased range tree, is presented that preprocesses a set S of n points in R 2 and a query distribution D for 2-sided

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Database System Concepts, 5th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

Temporal Range Exploration of Large Scale Multidimensional Time Series Data

Temporal Range Exploration of Large Scale Multidimensional Time Series Data Temporal Range Exploration of Large Scale Multidimensional Time Series Data Joseph JaJa Jusub Kim Institute for Advanced Computer Studies Department of Electrical and Computer Engineering University of

More information

CST-Trees: Cache Sensitive T-Trees

CST-Trees: Cache Sensitive T-Trees CST-Trees: Cache Sensitive T-Trees Ig-hoon Lee 1, Junho Shim 2, Sang-goo Lee 3, and Jonghoon Chun 4 1 Prompt Corp., Seoul, Korea ihlee@prompt.co.kr 2 Department of Computer Science, Sookmyung Women s University,

More information

Lecture 9 March 15, 2012

Lecture 9 March 15, 2012 6.851: Advanced Data Structures Spring 2012 Prof. Erik Demaine Lecture 9 March 15, 2012 1 Overview This is the last lecture on memory hierarchies. Today s lecture is a crossover between cache-oblivious

More information

Optimal Parallel Randomized Renaming

Optimal Parallel Randomized Renaming Optimal Parallel Randomized Renaming Martin Farach S. Muthukrishnan September 11, 1995 Abstract We consider the Renaming Problem, a basic processing step in string algorithms, for which we give a simultaneously

More information

PAPER Constructing the Suffix Tree of a Tree with a Large Alphabet

PAPER Constructing the Suffix Tree of a Tree with a Large Alphabet IEICE TRANS. FUNDAMENTALS, VOL.E8??, NO. JANUARY 999 PAPER Constructing the Suffix Tree of a Tree with a Large Alphabet Tetsuo SHIBUYA, SUMMARY The problem of constructing the suffix tree of a tree is

More information

Efficient Bundle Sorting

Efficient Bundle Sorting ' ) * Efficient Bundle Sorting Yossi Matias Eran Segal Jeffrey Scott Vitter Abstract Many data sets to be sorted consist of a limited number of distinct keys. Sorting such data sets can be thought of as

More information

Data Cubes in Dynamic Environments

Data Cubes in Dynamic Environments Data Cubes in Dynamic Environments Steven P. Geffner Mirek Riedewald Divyakant Agrawal Amr El Abbadi Department of Computer Science University of California, Santa Barbara, CA 9 Λ Abstract The data cube,

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

Report Seminar Algorithm Engineering

Report Seminar Algorithm Engineering Report Seminar Algorithm Engineering G. S. Brodal, R. Fagerberg, K. Vinther: Engineering a Cache-Oblivious Sorting Algorithm Iftikhar Ahmad Chair of Algorithm and Complexity Department of Computer Science

More information

Threshold Interval Indexing for Complicated Uncertain Data

Threshold Interval Indexing for Complicated Uncertain Data Threshold Interval Indexing for Complicated Uncertain Data Andrew Knight Department of Computer Science Rochester Institute of Technology Rochester, New York, USA Email: alk1234@rit.edu Qi Yu Department

More information

Funnel Heap - A Cache Oblivious Priority Queue

Funnel Heap - A Cache Oblivious Priority Queue Alcom-FT Technical Report Series ALCOMFT-TR-02-136 Funnel Heap - A Cache Oblivious Priority Queue Gerth Stølting Brodal, Rolf Fagerberg Abstract The cache oblivious model of computation is a two-level

More information

Computational Geometry

Computational Geometry Windowing queries Windowing Windowing queries Zoom in; re-center and zoom in; select by outlining Windowing Windowing queries Windowing Windowing queries Given a set of n axis-parallel line segments, preprocess

More information

Trees. Reading: Weiss, Chapter 4. Cpt S 223, Fall 2007 Copyright: Washington State University

Trees. Reading: Weiss, Chapter 4. Cpt S 223, Fall 2007 Copyright: Washington State University Trees Reading: Weiss, Chapter 4 1 Generic Rooted Trees 2 Terms Node, Edge Internal node Root Leaf Child Sibling Descendant Ancestor 3 Tree Representations n-ary trees Each internal node can have at most

More information

COMP Data Structures

COMP Data Structures COMP 2140 - Data Structures Shahin Kamali Topic 5 - Sorting University of Manitoba Based on notes by S. Durocher. COMP 2140 - Data Structures 1 / 55 Overview Review: Insertion Sort Merge Sort Quicksort

More information

Database index structures

Database index structures Database index structures From: Database System Concepts, 6th edijon Avi Silberschatz, Henry Korth, S. Sudarshan McGraw- Hill Architectures for Massive DM D&K / UPSay 2015-2016 Ioana Manolescu 1 Chapter

More information

Lecture 3 February 9, 2010

Lecture 3 February 9, 2010 6.851: Advanced Data Structures Spring 2010 Dr. André Schulz Lecture 3 February 9, 2010 Scribe: Jacob Steinhardt and Greg Brockman 1 Overview In the last lecture we continued to study binary search trees

More information

Optimal Decision Trees Generation from OR-Decision Tables

Optimal Decision Trees Generation from OR-Decision Tables Optimal Decision Trees Generation from OR-Decision Tables Costantino Grana, Manuela Montangero, Daniele Borghesani, and Rita Cucchiara Dipartimento di Ingegneria dell Informazione Università degli Studi

More information

Striped Grid Files: An Alternative for Highdimensional

Striped Grid Files: An Alternative for Highdimensional Striped Grid Files: An Alternative for Highdimensional Indexing Thanet Praneenararat 1, Vorapong Suppakitpaisarn 2, Sunchai Pitakchonlasap 1, and Jaruloj Chongstitvatana 1 Department of Mathematics 1,

More information

Mining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams

Mining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams Mining Data Streams Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction Summarization Methods Clustering Data Streams Data Stream Classification Temporal Models CMPT 843, SFU, Martin Ester, 1-06

More information

3 Competitive Dynamic BSTs (January 31 and February 2)

3 Competitive Dynamic BSTs (January 31 and February 2) 3 Competitive Dynamic BSTs (January 31 and February ) In their original paper on splay trees [3], Danny Sleator and Bob Tarjan conjectured that the cost of sequence of searches in a splay tree is within

More information

Proceedings of the 5th WSEAS International Conference on Telecommunications and Informatics, Istanbul, Turkey, May 27-29, 2006 (pp )

Proceedings of the 5th WSEAS International Conference on Telecommunications and Informatics, Istanbul, Turkey, May 27-29, 2006 (pp ) A Rapid Algorithm for Topology Construction from a Set of Line Segments SEBASTIAN KRIVOGRAD, MLADEN TRLEP, BORUT ŽALIK Faculty of Electrical Engineering and Computer Science University of Maribor Smetanova

More information

Using Natural Clusters Information to Build Fuzzy Indexing Structure

Using Natural Clusters Information to Build Fuzzy Indexing Structure Using Natural Clusters Information to Build Fuzzy Indexing Structure H.Y. Yue, I. King and K.S. Leung Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin, New Territories,

More information

Online algorithms for clustering problems

Online algorithms for clustering problems University of Szeged Department of Computer Algorithms and Artificial Intelligence Online algorithms for clustering problems Summary of the Ph.D. thesis by Gabriella Divéki Supervisor Dr. Csanád Imreh

More information

Planar Point Location

Planar Point Location C.S. 252 Prof. Roberto Tamassia Computational Geometry Sem. II, 1992 1993 Lecture 04 Date: February 15, 1993 Scribe: John Bazik Planar Point Location 1 Introduction In range searching, a set of values,

More information

Cache-Oblivious R-Trees

Cache-Oblivious R-Trees Cache-Oblivious R-Trees Lars Arge BRICS, Department of Computer Science, University of Aarhus IT-Parken, Aabogade 34, DK-8200 Aarhus N, Denmark. large@daimi.au.dk Mark de Berg Department of Computing Science,

More information

Computational Geometry

Computational Geometry Lecture 1: Introduction and convex hulls Geometry: points, lines,... Geometric objects Geometric relations Combinatorial complexity Computational geometry Plane (two-dimensional), R 2 Space (three-dimensional),

More information

6 Distributed data management I Hashing

6 Distributed data management I Hashing 6 Distributed data management I Hashing There are two major approaches for the management of data in distributed systems: hashing and caching. The hashing approach tries to minimize the use of communication

More information

Algorithms for GIS:! Quadtrees

Algorithms for GIS:! Quadtrees Algorithms for GIS: Quadtrees Quadtree A data structure that corresponds to a hierarchical subdivision of the plane Start with a square (containing inside input data) Divide into 4 equal squares (quadrants)

More information

Lecture 19 Apr 25, 2007

Lecture 19 Apr 25, 2007 6.851: Advanced Data Structures Spring 2007 Prof. Erik Demaine Lecture 19 Apr 25, 2007 Scribe: Aditya Rathnam 1 Overview Previously we worked in the RA or cell probe models, in which the cost of an algorithm

More information

Lecture Notes: Range Searching with Linear Space

Lecture Notes: Range Searching with Linear Space Lecture Notes: Range Searching with Linear Space Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk In this lecture, we will continue our discussion

More information

January 10-12, NIT Surathkal Introduction to Graph and Geometric Algorithms

January 10-12, NIT Surathkal Introduction to Graph and Geometric Algorithms Geometric data structures Sudebkumar Prasant Pal Department of Computer Science and Engineering IIT Kharagpur, 721302. email: spp@cse.iitkgp.ernet.in January 10-12, 2012 - NIT Surathkal Introduction to

More information

Multidimensional Indexes [14]

Multidimensional Indexes [14] CMSC 661, Principles of Database Systems Multidimensional Indexes [14] Dr. Kalpakis http://www.csee.umbc.edu/~kalpakis/courses/661 Motivation Examined indexes when search keys are in 1-D space Many interesting

More information

Reduction of Periodic Broadcast Resource Requirements with Proxy Caching

Reduction of Periodic Broadcast Resource Requirements with Proxy Caching Reduction of Periodic Broadcast Resource Requirements with Proxy Caching Ewa Kusmierek and David H.C. Du Digital Technology Center and Department of Computer Science and Engineering University of Minnesota

More information

Interval Stabbing Problems in Small Integer Ranges

Interval Stabbing Problems in Small Integer Ranges Interval Stabbing Problems in Small Integer Ranges Jens M. Schmidt Freie Universität Berlin, Germany Enhanced version of August 2, 2010 Abstract Given a set I of n intervals, a stabbing query consists

More information

Rotated-Box Trees: A Lightweight c-oriented Bounding-Volume Hierarchy

Rotated-Box Trees: A Lightweight c-oriented Bounding-Volume Hierarchy Rotated-Box Trees: A Lightweight c-oriented Bounding-Volume Hierarchy Mark de Berg 1 and Peter Hachenberger 2 1 Department of Computing Science, TU Eindhoven, P.O. Box 513, 5600 MB Eindhoven, the Netherlands

More information

Bichromatic Line Segment Intersection Counting in O(n log n) Time

Bichromatic Line Segment Intersection Counting in O(n log n) Time Bichromatic Line Segment Intersection Counting in O(n log n) Time Timothy M. Chan Bryan T. Wilkinson Abstract We give an algorithm for bichromatic line segment intersection counting that runs in O(n log

More information

9 Distributed Data Management II Caching

9 Distributed Data Management II Caching 9 Distributed Data Management II Caching In this section we will study the approach of using caching for the management of data in distributed systems. Caching always tries to keep data at the place where

More information