An index structure for efficient reverse nearest neighbor queries

Size: px
Start display at page:

Download "An index structure for efficient reverse nearest neighbor queries"

Transcription

1 An index structure for efficient reverse nearest neighbor queries Congjun Yang Division of Computer Science, Department of Mathematical Sciences The University of Memphis, Memphis, TN 38152, USA King-Ip Lin Division of Computer Science, Department of Mathematical Sciences The University of Memphis, Memphis, TN 38152, USA Abstract The Reverse Nearest Neighbor (RNN) problem is to find all points in a given data set whose nearest neighbor is a given query point. Just like the Nearest Neighbor (NN) queries, the RNN queries appear in many practical situations such as marketing and resource management. Thus efficient methods for the RNN queries in databases are required. This paper introduces a new index structure, the Rdnn-tree, that answers both RNN and NN queries efficiently. A single index structure is employed for a dynamic database, in contrast to the use of multiple indexes in previous work. This enables significant savings in dynamically maintaining the index structure. The Rdnn-tree outperforms existing methods in various aspects. Experiments on both synthetic and real world data show that our index structure outperforms previous method by a significant margin (more than 9% in terms of number of leaf nodes accessed) in RNN queries. It also shows improvement in NN queries over standard techniques. Furthermore, performance in insertion and deletion is significantly enhanced by the ability to combine multiple queries (NN and RNN) in one traversal of the tree. These facts make our index structure extremely preferable in both static and dynamic cases. 1 Introduction Indexing is an indispensable tool in database systems. Various kinds of indexes are used to speed up query execution. Moreover, new applications and queries continue to demand new and improved indexes and associated algorithms. One type of query that has recently received attention is the Reverse Nearest Neighbor (RNN) Query: Given a data set Ë and a query point Õ, an RNN query finds all the points in Ë having Õ as their nearest neighbor. This problem corresponds to a class of problems we called influence problems. For instance, suppose a bank is to open up a new branch at a location. It may want to know which existing branches will be affected by the new branch, assuming people choose the nearest branch to conduct business. Moreover, a rival bank may also assess the influence of putting a new branch in that location and what effects it has on existing branches of other banks. Also, with the advance of the Internet and the Web, people are expecting systems to deliver (or push) interesting and relevant information to them. While the users do not want to be inundated with a large volume of junk messages, it is crucial for them to receive information that is more relevant to them than what they have received. One way to achieve this balance is to push only information most pertaining to the interests of the user. For instance, a company can send advertisements about a new product to only those customers who will find this product more relevant than any of the existing products. This allows the users to receive the information that they absolutely need, and at the same time spare them from sorting through junk information that is cluttering their mailboxes, thus making the advertisement more effective. Hence we can see that the reverse nearest neighbor queries is a very practical and important class of queries. Korn and Muthukrishnan [8] provided more examples. A naive solution of the problem requires Ç Ò ¾ µ time with no preprocessing, as the nearest neighbors of all the points in Ë has to be found. Thus more efficient algorithms are required. One approach, described by Korn and Muthukrishnan [8] is to pre-compute the nearest neighbors of every point in Ë. Then given the query point Õ one can compare it with the existing nearest neighbors of every point in Ë. For each point Ü in Ë, one can compute and store a spherical region with Ü as the center and the distance from Ü to its nearest neighbors as radius. It can be seen that if a query point Õ falls into the region, Ü is an RNN of Õ ([8] provides a proof).

2 All the regions can be organized into a multi-dimensional index structure (for instance, the R-tree family [1, 5, 13]) for effective storage and query performance. This method, while pioneering, has some drawbacks. For instance, it requires two indexes in the dynamic case where insertions and deletions to the data set occur. Moreover, the regions that are stored tend to have significant overlap, thus hampering performance. In this paper we present a new structure, called the Rdnn-tree (R-tree containing Distance of Nearest Neighbors), which is well suited for RNN queries in both static and dynamic cases. Rdnn-tree differs from standard R-tree structure by storing extra information about nearest neighbor of the points in each node. This piece of information provides significant improvement in all algorithms. The Rdnn-tree has many advantages, including: It significantly outperforms the index structures in [8], and typically requires only 1-2 leaf access to locate the RNNs. The Rdnn-tree can perform NN queries efficiently. As a result, we only require one tree in the dynamic case, for both NN and RNN queries. The Rdnn-tree enables one to execute multiple NN and RNN queries in one traversal of the tree, further enhancing performance in the dynamic case. The rest of the paper is organized as follows: Section 2 outlines previous work in multi-dimensional index and queries. Section 3 describes the previous algorithms in RNN in more detail, and outlines the potential for improvement. The proposed Rdnn-tree is presented in Section 4. Section 5 provides experimental results and Section 6 summarizes our work and discusses future directions. 2 Related work There has been a large body of work on multidimensional index structures. For instance, the index structure we propose is based on the popular R-tree family [5, 13, 1], which generalizes the -tree to multiple dimensions by storing minimum bounding regions (hyperrectangles) instead of numbers which represent 1-D intervals. Interested readers are referred to [12] and [4] for full surveys on multi-dimensional index structures. Early work on multi-dimensional index structures focus on range queries. Recently the nearest neighbor query problem has received substantial attention. In addition to the work in computational geometry (e.g. see [11]), many algorithms have been proposed to search nearest neighbors using tree-based indexes like the R-trees. Many such algorithms take the branch-and-bound approach: the tree is traversed from the root; and at each step a certain heuristic is used to determine which branch to traverse next, and which branches can be pruned from the search. Various algorithms differ in the order of the search. For instance, Roussopoulos et al. [1] use a depth-first approach, while Hjaltason and Samet [6] propose a distance-browsing algorithm, using a priority queue to order the branches to be traversed. Other approaches have been proposed. One approach is to modify index structures to enhance the branch-and-bound algorithms. Two examples are the SS-tree [15] and the SRtree [7]. An alternative approach, proposed by Berchtold et al [2], indexes an approximation of the Voronoi diagram associated with the data set. 3 Definitions and existing algorithms This section presents existing algorithms for Reverse Nearest Neighbor search and discusses potential improvements. We first provide formal definitions of the Nearest neighbor and Reverse Nearest Neighbor search. In what follows, we assume that Ë is a set of points in dimensional space. Ô Õµ is the distance between two points Ô and Õ. If Ì is a subset of Ë, Ô Ì µ denotes the minimum distance between Ô and any points in Ì. Ô Öµ is a circle centered at Ô with radius Ö. DEFINITION: (Nearest Neighbor Search (NN Search)) Given a set Ë of points in some dimensional space and a query point Õ, the Nearest Neighbor Search problem is to find a subset ÆÆ Ë Õµ of Ë defined as follows ÆÆ Ë Õµ Ö ¾ Ë Ô ¾ Ë Õ Öµ Õ Ôµ DEFINITION: (Reverse Nearest Neighbor Search (RNN Search)) Given a set Ë of points in some dimensional space and a query point Õ, the Reverse Nearest Neighbor Search problem is to find a subset ÊÆÆ Ë Õµ of Ë defined as follows ÊÆÆ Ë Õµ Ö ¾ Ë Ô ¾ Ë Õ Öµ Ö Ôµ Notice that Ô ÆÆ Ë Ôµµ is the distance between Ô and its nearest neighbors in Ë. For simplicity we denote it by ÒÒ Ë Ôµ. Ë will be omitted from the above notations where the context is clear. In general, there is no natural relationship between ÆÆ Ë Õµ and ÊÆÆ Ë Õµ. For instance, Ö ¾ ÆÆ Ë Õµ µ Ö ¾ ÊÆÆ Ë Õµ, and vice versa. The general RNN-search algorithm is presented by Korn and Muthukrishnan [8]. Let Ë be a given set of points and Õ a query point. For any point Ô in Ë, Ô takes Õ as its nearest 2

3 neighbor if and only if Ô Õµ ÒÒ Ë Ôµ, i.e., Ô is at least as close to Õ as to its nearest neighbors in Ë. Since Ë is known, we can pre-compute ÆÆ Ë Ôµ for every point Ô in Ë and store it in a certain way. Korn and Muthukrishnan used an RNN-tree, which is essentially an Ê -tree. For every point Ô in the data set Ë, the RNN-tree stores the minimum bounding rectangle of the circle Ô ÒÒ Ë Ôµµ in the leaf node. With such an index structure, the RNN search problem becomes a simple point query problem. For any given query point Õ, Ô is in ÊÆÆ Ë Ôµ only if Ô falls inside the circle and hence inside the minimum bounding rectangle of the circle. Complications arise when points are inserted into or deleted from the tree. In such cases, the RNN-tree has to be updated. Consider first the case of insertion. When a point Ô ¼ is inserted into Ë, we need to make two kinds of adjustments: for every point in ÊÆÆ Ë Ô ¼ µ, we need to update the region stored in the RNN-tree since Ô ¼ is the new nearest neighbor; also the region corresponding to Ô ¼ (i.e. Ô ¼ ÒÒ Ë Ô ¼ µµ) needs to be computed and inserted to the RNN-tree. This implies that the insertion algorithm needs to find both ÆÆ Ë Ô ¼ µ and ÊÆÆ Ë Ô ¼ µ. One would like to use the RNN-tree to find the nearest neighbors. However, the leaf nodes of an RNN-tree contain geometric objects (the regions) instead of points themselves. This makes the higher level bounding region larger and makes the tree sub-optimal for standard nearest neighbor queries. Thus it is proposed that a second tree, the NN-tree (a simple Ê -tree) be created to ensure efficient nearest neighbor search. However, this implies the second tree also needs to be maintained during insertion. The insertion algorithm can be summerized as follows: Algorithm 1 RNN-Insert(RNN-tree, NN-tree, Ô ¼ ) 1) Perform RNN search RNN-tree for Ô ¼ to find ÊÆÆ Ë Ô ¼ µ. 2) For each Ô ¾ ÊÆÆ Ë Ô ¼ µ shrink Ô ÒÒ Ë Ô µµ to Ô Ô Ô ¼ µµ 3) Call standard NN search algorithm on NN-tree to find ÆÆ Ë Ô ¼ µ. 4) Insert Ô ¼ into RNN-tree using Ê -tree insertion algorithm. 5) Insert Ô ¼ into NN-tree using Ê -tree insertion algorithm. A similar situation arises when a point Ô ¼¼ is deleted. Again, we need to make two kinds of adjustments: deleting the region corresponding to Ô ¼¼ (i.e. Ô ¼¼ ÒÒ Ë Ô ¼¼ µµ); as well as finding the new nearest neighbors for all the points in ÊÆÆ Ë Ô ¼¼ µ and adjust their corresponding regions in the RNN-tree. Once again both NN and RNN queries are needed. Notice that we might have to perform multiple NN queries in the second case. The deletion algorithm is listed as follows. Algorithm 2 RNN-Delete(RNN-tree, NN-tree, Ô ¼¼ ) 1) Delete Ô ¼¼ into RNN-tree using Ê -tree deletion algorithm. 2) Delete Ô ¼¼ into NN-tree using Ê -tree deletion algorithm. 3) Perform RNN search on RNN-tree for Ô ¼¼ to find ÊÆÆ Ë Ô ¼¼ µ. 4) For each Ô ¾ ÊÆÆ Ë Ô ¼¼ µ, call standard NN search algorithm on NN-tree to find ÆÆ Ë Ô µ and enlarge Ô ÒÒ Ë Ô µµ to Ô Ô Ô ¼¼ µµ Thus in the dynamic case, one needs to update two trees to maintain the index structures. This leads to inefficiency in both time and space. While the technique above is a general approach, there are other techniques that work for lower dimensional points. One such approach is to take advantage of the geometric properties of the problem. Stanoi, Agrawal and El Abbadi [14] introduced an algorithm that works directly on an Ê -tree. It transforms the RNN problem into a set of constrained nearest neighbor queries. An interesting fact about RNN queries is that the maximum number of RNNs of a query point is bounded, and if multiple RNNs exist, they have to be distributed fairly evenly. Thus, upon receiving the query point Õ, the algorithm divides the entire space into a number of regions based on Õ. The number of regions is equal to the maximum possible solution. For each region, the algorithm finds the nearest neighbors of Õ. It can be shown that the true RNNs are among these points, and finding the correct solutions from them can be done easily. The main drawback of the algorithm is that the number of regions to be searched grows very fast when the dimensionality increases. For instance, for Ä ½ norm, the growth is exponential. This renders the algorithm ineffective in higher dimensions. Moreover, every region has to be searched, whether an RNN resides or not; thus, there can be a lot of wasted effort during the search. 4 The Rdnn-tree 4.1 Motivation We have dicussed the limitations of the RNN-tree approach in the last section. While storing the spherical region Ô ÒÒ Ë Ôµµ is necessary, the RNN-tree suffers from the following: Large overlap between the regions causes increased overlapping in parent nodes MBR (minimum bounding rectangles), hampering the RNN search performance. 3

4 Storing the spherical regions themselves renders the index structure ineffective in solving NN queries, thus a second tree is needed for the dynamic cases. This serverly adds cost for maintaining the index. Thus we want to find a structure such that the pointlocation and NN queries are utilized, while the information of ÒÒ Ë Ôµ is maintained to ensure RNN queries are supported properly. Thus we propose the Rdnn-tree (Ê -tree with Distance of Nearest Neighbors) to kill two birds with one stone: using the Ê -tree to store the data points themselves, but enhance the nodes with information about distance of the nearest neighbors of the points in the nodes. 4.2 The Rdnn-tree structure In an Rdnn-tree, a leaf node contains entries of the form ÔØ ÒÒµ, where ÔØ refers to a -dimensional point in the data set and ÒÒ is the distance from the point to its nearest neighbors in the data set. A non-leaf node contains an array of branches of the form ÔØÖ Ê Ø Ñ Ü ÒÒµ. ÔØÖ is the address of a child node in the tree. If ÔØÖ points to a leaf node, Ê Ø is the minimum bounding rectangle of all points in the leaf node. If ÔØÖ points to a nonleaf node, Ê Ø is the minimum bounding rectangle of all rectangles that are entries in the child node; Ñ Ü ÒÒ Ñ Ü ÒÒ Ë Ôµ, where Ô are points contained in the subtree rooted at this node. 4.3 Algorithms We first provide the NN and RNN algorithms for the Rdnn-tree, as they are both needed for the insertion and deletion algorithms. RNN search The reverse nearest neighbor search on the Rdnn-tree is similar to a point-location search. The only difference is the criterion to decide which branch(es) to go down the tree. Assume that Õ is the query point: For a leaf node, we need to examine each point Ô in the node. If Õ Ôµ ÒÒ Ë Ôµµ, i.e. Ô is at least as close to Õ as to its nearest neighbor, then Ô is one of the reverse nearest neighbors. For an internal node, we compare the query point Õ with each branch ÔØÖ Ê Ø Ñ Ü ÒÒµ. Here Ñ Ü ÒÒ plays a crucial role. By definition, all points in the subtree rooted at are contained in Ê Ø and the distance from each point to its nearest neighbor is not greater than Ñ Ü ÒÒ (Ñ Ü ÒÒ is the largest of them). Hence if Õ Ê Øµ Ñ Ü ÒÒ, then branch need not to be visited. This is because any point in cannot be closer to Õ than to its nearest neighbor in Ë. Our experiments (cf Section 5) show that this is very efficient in pruning the search path. To summarize the above description, we have the following formal algorithm. Algorithm 3 RNN-Search (Node Ò, Point Õ) Input: Node Ò to start the search and query point Õ Output: the reverse nearest neighbors of Õ If Ò is a leaf node, then for each entry ÔØ ÒÒµ, if Õ ÔØ µ ÒÒ, output ÔØ as one of the RNNs of q If Ò is an internal node, then for each branch ÔØÖ Ê Ø Ñ Ü ÒÒµ, if Õ Ê Øµ Ñ Ü ÒÒ, call RNN-Search( ÔØÖ Õ) NN search As the Rdnn-tree has all properties of the Ê - tree, we can apply the standard nearest neighbor search technique (e.g. [1]) for the NN search. Moreover, the ÆÆ Ë Ôµ information can help us prune extra branches during the branch-and-bound search. This is due to the following lemma: LEMMA 4.1 Let q be a query point and p any point from the data set S. If Ô Õµ ÒÒ Ë Ôµ ¾, then Ô is the nearest neighbor of Õ in Ë. The correctness of the above lemma is easy to see. Ô ÒÒ Ë Ôµµ is a circle that contains no points from the data set Ë. If Ô Õµ ÒÒ Ë Ôµ ¾, then Ü Õµ ÒÒ Ë Ôµ ¾ for any point Ü ¾ Ô ÒÒ Ë Ôµµ. This means that the distance from the query point Õ to any point outside the circle is greater than Õ Ôµ. Hence Ô is the nearest neighbor of Õ. When we search a leaf node for the nearest neighbor of a query point Õ, we can stop the search if the condition Ô Õµ ÒÒ Ë Ôµ ¾ is satisfied. Therefore we have the following improved NN search algorithm: Algorithm 4 NN-Search (Node Ò, Point Õ) Input: A node to start search and a query point Õ Output: The nearest neighbor of Õ 1) Initialize the candidate nearest neighbor 2) if Ò is a leaf node then for each data point Ô do: if Ô Õµ Ô ÆÆ Ë Ôµµ ¾, output Ô and stop the search; if Ô Õµ Õ µ, replace by Ô. 3) if Ò ½ µ is a non-leaf node, where ÔØÖ Ê Ø Ñ Ü ÒÒ µ. Let Õ Ê Ø µ and sort according to. For each, if Õ µ, call NN-Search(ÔØÖ Õ) 4

5 Insertion and Deletion Insertion and deletion are similar to the RNN-tree. The main difference is that we have only one tree and in the tree we maintain a number Ñ Ü ÒÒ containing nearest neighbor information instead of a rectangle. We first look at insertion. When a point Ô ¼ is to be inserted into an Rdnn-tree containing a data set Ë, we first perform an NN and an RNN search to find ÆÆ Ë Ô ¼ µ and ÊÆÆ Ë Ô ¼ µ respectively. With ÆÆ Ë Ô ¼ µ we can compute ÒÒ Ô ¼ µ, to create the entry for Ô ¼. The ÊÆÆ Ë Ô ¼ µ gives us the information of the points that are affected. The ÒÒ fields for those points will need to be recomputed and Ñ Ü ÒÒ field of their ascendent nodes will also need to be adjusted. This can be done in a way very similar to the RNN-Search algorithm. The only difference is that we adjust the ÒÒ field whenever we find a new RNN point for Ô ¼ in a leaf node and propagate the changes to the parent nodes on the way back up. Since we have one index structure for both NN and RNN search, we can combine the two steps into one. This means we also search for the nearest neighbors of Ô ¼ when we search for the affected points (ÊÆÆ Ë Ô ¼ µ) and adjust Ñ Ü ÒÒ field for the corresponding nodes. Our experiments show this combined NN-RNN search has virtually the same cost as the RNN search alone. This saves us one NN search while maintaining the correctness for the index. If we call this step the pre-insertion phase, we have the following formal Pre-insert algorithm. Algorithm 5 Pre-insert (Node Ò, Point Ô ¼ ) Input: The root node of the tree and a point Ô ¼ Output: the adjusted tree and ÆÆ Ë Ô ¼ µ 1) Initialize the candidate nearest neighbor 2) If Ò is a leaf node, then for each entry ÔØ ÒÒµ do: If Ô ¼ ÔØ µ Ô ¼ µ, then let ÔØ ; If Ô ¼ ÔØ µ ÒÒ, output ÔØ and return Ô ¼ ÔØ µ 3) If Ò is a non-leaf node, then for each branch ÔØÖ Ê Ø Ñ Ü ÒÒµ do: If Ô ¼ Ê Øµ Ñ Ü ÒÒ or Ô ¼ Ê Øµ Ô ¼ µ call Pre-Insert(ÔØÖ, Ô ¼ ); If ÔØÖ was adjusted, adjust Ñ Ü ÒÒ for and return Ñ Ü ÒÒ With the above preparations, we can present the following Insertion algorithm Algorithm 6 Insert (Node Ò, Point Ô ¼ ) Input: Root node Ò and point Ô ¼ to be inserted Output: the tree with Ô ¼ inserted 1) Pre-Insert(Ò, Ô ¼ ) 2) Call Ê -tree insertion algorithm to insert entry Ô ¼ ÒÒ Ë Ô ¼ µµ into Ò Now we turn our attention to deletion. Just like the RNNtree, deleting a point from the Rdnn-tree will also affect the reverse nearest neighbors of the point to be deleted. In order to maintain the integrity of the Rdnn-tree while deleting a point Ô ¼¼, an NN search needs to be done for each point in ÊÆÆ Ô ¼¼ µ. This is an expensive step. Observe that the points in ÊÆÆ Ô ¼¼ µ should be physically close to each other in the data set because they are the reverse nearest neighbors of a point Ô ¼¼. Moreover, the number of points in ÊÆÆ Ô ¼¼ ) is upper-bounded (the bound is based on the dimensionality). Hence, we can do a Batch NN search, finding the nearest neighbors for multiple query points in one pass. Let s call it the Batch-NN-Search. To delete the point physically, the standard Ê -tree deletion algorithm suffices. Algorithm 7 Delete (Node Ò, Point Ô ¼¼ ) Input: A tree rooted at Ò and a point Ô ¼¼ to be deleted Output: A tree with Ô ¼¼ deleted 1) Call Ê -tree algorithm to delete Ô ¼¼ from Ò 2) Call RNN-Search(Ò, Ô ¼¼ ) to find ÊÆÆ Ë Ô ¼¼ µ 3) Call Batch-NN-Search(Ò, ÊÆÆ Ë Ô ¼¼ µ) 4) Adjust the ÒÒ for each point in ÊÆÆ Ë Ô ¼¼ µ and propagate the change up to the root The Batch-NN-Search procedure is a slight modification of the NN-Search algorithm. Formally, it looks as follows. Algorithm 8 Batch-NN-Search (Node Ò, Point Õ ½ Õ ) Input: A tree rooted at node n and an array of query points Õ ½ Õ Output: The nearest neighbors of Õ ½ Õ 1) Initialize all candidate NNs ½ for Õ ½ Õ 2) If Ò is a leaf node, update if there is a better one in the leaf node. 3) If Ò is a non-leaf node. Let Ò ½ µ, where ÔØÖ Ê Ø Ñ Ü ÒÒ µ. Let Õ Ê Ø µ, Ñ Ü ½ µ. Sort according to. For, if Õ µ for any ¾ ½ call Batch-NN- Search(ÔØÖ Õ ½ Õ ). Comparing our insertion and deletion algorithms with those presented in Section 3, we need only a single index, as opposed to the combined NN, RNN tree approach. Considering the insertion algorithm, inserting one point into the Rdnn-tree and the RNN-tree are almost equivalent. Both have a pre insertion phase followed by a call to the standard R*-tree insertion algorithm. However, employing one index makes it possible for us to perform a combined NN-RNN search in the pre-insertion phase. Our experiments show 5

6 that the combined search saves us one NN search. Better yet, we do not need to insert the point into a second index. Regarding the deletion algorithm, we have the same situation. In addition, we propose to do batch NN searches in the post deletion phase. This provides greater savings. 5 Experimental results This section presents the results of our experiments. We compare the Rdnn-tree on reverse nearest neighbor (RNN) queries with the RNN-tree method of Korn and Muthukrishnan. We also measure the performance of the Rdnn-tree on nearest neighbor (NN) queries and compare it to standard NN algorithms. Furthermore, we look at two other kinds of queries, combined NN-RNN queries and batch NN queries. These results have significant impact on performance in the dynamic case. We implement both structures in C++, and run our tests on a machine with 2 5-MHz Pentium II processors with 512 MB RAM under SCO UNIX. For the RNN-tree, we use the code provided by Korn and Muthukrishnan. We obtain a large real data set from the US National Mapping Information web site (URL: It contains populated places in the USA, represented by latitude and longitude coordinates. We sample different number of items from the data set to create our various data sets to be indexed and then sample 5 items from the rest of the data set to be the query set. For higher dimensional data we generate random points for both the data and query set. Static performance : RNN search The first set of experiments compares RNN-search performance. Figure 1 shows the results. We measure both the number of leaf nodes and the total number of nodes. We can see that the Rdnn-tree provides significantly better performance than the RNNtree approach. For instance, in the 2-D case, the RNN-tree approach on average takes 2 leaf access for the 1, item data set, while in our case less than 2 leaf access is required on average, an improvement of more than 9%. Significant improvement can also be seen on the total disk access case the Rdnn-tree is consistently 4 to 5 times better than the RNN-tree in the 2-D case, and even better in the 4-D case. This establishes the effectiveness of the Rdnntree. Dynamic performance : NN queries One of the main advantages of the Rdnn-tree is the elimination of a second tree in dynamic cases, as the Rdnn-tree itself can perform NN queries effectively. To verify this, we implement the standard NN search algorithm by Roussopoulos et al. [1] and compare it to the Rdnn-tree approach. Table 1 shows the result for total pages accessed (results for the leaf access is similar). It turns out that the Rdnntree performs slightly better than the standard Ê -tree. This is due to the fact that the nearest neighbor information in the Rdnn-tree increases the pruning power of the algorithm. More importantly, this illustrates the feasibility of the Rdnntree for NN-queries, enabling us to eliminate the extra index and significantly cutting down the maintenance cost. Dynamic performance : Combined NN-RNN queries Inserting a data point into the index requires the algorithm to locate both the NNs and the RNNs of the point for update purposes. If one can combine the NN and RNN query for the point into one pass, there will be significant savings. Thus we run experiments to measure the costs for NN queries, RNN queries, and the combined NN-RNN queries. Figure 2 shows the results. We show only the 4-D results, as the 2-D results are similar. We can see that the cost for a combined NN-RNN query is essentially the same as that of an RNN query, and is much less than the combined cost of a separate NN and RNN query. This shows we can get the NN of the query point nearly for free when we run the RNN query. Dynamic performance : Batch NN queries Recall that batch NN queries can be used to speed up deletions. We run experiments to test its effectiveness by measuring the cost of NN queries involved in the deletions. In the experiments, we simulate the delete procedure by picking 5 points from the data set, finding their RNNs, and doing the NN queries for the RNNs of each point. Observe that for any point Ô, we have RNN Ôµ ¼, where Ë is the cardinality of a set Ë. If RNN Ôµ ½, batch NN and regular NN queries for RNN Ôµ are the same. Only when RNN Ôµ ¾ is a batch NN query necessary. For each data point Ô, we compare the cost of running the NN queries separately for each point in RNN Ôµ with that of running the batch-nn query for all points in RNN Ôµ. Figure 3 shows the average results of the 5 points. We can see that doing batch NN queries significantly reduces the disk access. Also not shown in the figure is that the cost of a batch NN query is comparable to a single NN query. This means that if RNN Ôµ, then the batch NN for RNN Ôµ reduces cost times. Our experiments show that is usually in the range of to 5. The importance of batch NN queries increases with the increase of dimensionality. For instance in 2-D only 2-3% of the deletion require a batch-nn, i.e., RNN ½, while at 4-D over 6% of the deletion requires a batch-nn query. 6

7 3 25 RNN-tree (leaf) Rnnd-tree (leaf) RNN-tree (total) Rnnd-tree (total) 8 7 RNN-tree (leaf) Rnnd-tree (leaf) RNN-tree (total) Rnnd-tree (total) D (real data set) 4-D (uniform data set) Figure 1. Comparison of performance for (static) RNN queries 2-D data sets 4-D data sets Number of points 1, 25, 5, 75, 1, 5, 5, Rdnn-tree Ê -tree Table 1. Comparison of NN queries performance (Total pages accessed) 6 Conclusion and future work In this paper, we presented the Rdnn-tree, an Ê -tree enhanced by storing nearest neighbor distance information. We demonstrated that this structure is much more efficient in answering RNN-queries, by eliminating the need for a second index, and by providing superior performance in both static and dynamic cases. Our focus in this paper is the monochromatic reverse nearest neighbor problem. A future direction for us is to adapt the Rdnn-tree to the bichromatic reverse nearest neighbor problem. In such problem the data are divided into 2 types. Given a query point Õ of one type, the system is required to find all the points of the second type that has Õ as its nearest neighbor. It will be interesting to see what different constraints (like a single index for both types, or separate index on each type) will have on the effect of the algorithms. Also it would be interesting to see how well Rdnn-tree adapts to the problem. The Rdnn-tree is based on the Ê -tree. While it works well in lower dimensions, its performance degrades in high dimensions. We plan on exploring how to adjust the Rdnntree techniques to high-dimensional indexing techniques, like the TV-tree [9] and the X-tree [3]. Acknowledgments We would like to thank Flip Korn for providing the RNN-tree code, and Flip Korn and Ioana Stanoi for their comments. We would also like to thank Diane Mittelmeier for proofreading the manuscript. References [1] N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The R*-Tree: an efficient and robust access method for points and rectangles. In Proc. of the 199 ACM SIGMOD International Conference on Management of Data, pages , Atlantic City, NJ, May 199. [2] S. Berchtold, B. Ertl, D. A. Keim, H.-P. Kriegel, and T. Seidl. Fast nearest neighbor search in high-dimensional spaces. In Proc. of the 14th IEEE Conference on Data Engineering, pages 23 27, Feb [3] S. Berchtold, D. A. Keim, and H.-P. Kriegel. The X-tree : An index structure for high-dimensional data. In Proc. of 22th International Conference on Very Large Data Bases, pages 28 39, 3 6 Sept [4] V. Gaeda and O. Gunther. Multidimensional access methods. ACM Computing Surveys, 3(2):17 231, June [5] A. Guttman. R-trees: a dynamic index structure for spatial searching. In Proc. of the 1984 ACM SIGMOD International Conference on Management of Data, pages 47 57, Boston, Mass, June

8 12 1 RNN query combined NN-RNN query seperate RNN + NN query 2 RNN query combined NN-RNN query seperate RNN + NN query D (uniform data set), leaf nodes 4-D (uniform data set), total nodes Figure 2. Performance of combined NN-RNN queries 4 Non-batch NN Batch NN 1 Non-batch NN Batch NN D (uniform data set), leaf nodes 4-D (uniform data set), total nodes Figure 3. Comparison for Batch NN queries for 4-D data [6] G. R. Hjaltason and H. Samet. Distance browsing in spatial databases. ACM Transactions on Database Systems, 24(2): , June [7] N. Katayama and S. Satoh. The SR-tree: An index structure for high-dimensional nearest neighbor queries. In Proc. of 1997 ACM SIGMOD International Conference on Management of Data, pages , June [8] F. Korn and S. Muthukrishnan. Influence sets based on reverse nearest neighbor queries. In Proc. of 2 ACM SIG- MOD International Conference on Management of Data, pages , May 2. [9] K.-I. Lin, H. Jagadish, and C. Faloutsos. The TV-tree - an index structure for high-dimensional data. The VLDB Journal, 3: , Oct [1] N. Roussopoulos, S. Kelley, and F. Vincent. Nearest neighbor queries. In Proc. of 1995 ACM SIGMOD International Conference on Management of Data, pages 71 79, San Jose, CA, May [11] J. Sack and J. Urrutia, editors. Handbook on Computational Geometry. North-Holland, 2. [12] H. Samet. The Design and Analysis of Spatial Data Structures. Addison-Wesley, 199. [13] T. Sellis, N. Roussopoulos, and C. Faloutsos. The Ê tree: a dynamic index for multi-dimensional objects. In Proc. 13th International Conference on Very Large Databases, pages , England, Sept [14] I. Stanoi, D. Agrawal, and A. E. Abbadi. Reverse nearest neighbor queries for dynamic databases. In Proc. of 2 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pages 44 53, May 2. [15] D. A. White and R. Jain. Similarity indexing with the sstree. In Proceedings of the 12th International Conference on Data Engineering, pages , Feb

X-tree. Daniel Keim a, Benjamin Bustos b, Stefan Berchtold c, and Hans-Peter Kriegel d. SYNONYMS Extended node tree

X-tree. Daniel Keim a, Benjamin Bustos b, Stefan Berchtold c, and Hans-Peter Kriegel d. SYNONYMS Extended node tree X-tree Daniel Keim a, Benjamin Bustos b, Stefan Berchtold c, and Hans-Peter Kriegel d a Department of Computer and Information Science, University of Konstanz b Department of Computer Science, University

More information

Search Space Reductions for Nearest-Neighbor Queries

Search Space Reductions for Nearest-Neighbor Queries Search Space Reductions for Nearest-Neighbor Queries Micah Adler 1 and Brent Heeringa 2 1 Department of Computer Science, University of Massachusetts, Amherst 140 Governors Drive Amherst, MA 01003 2 Department

More information

Visualizing and Animating Search Operations on Quadtrees on the Worldwide Web

Visualizing and Animating Search Operations on Quadtrees on the Worldwide Web Visualizing and Animating Search Operations on Quadtrees on the Worldwide Web František Brabec Computer Science Department University of Maryland College Park, Maryland 20742 brabec@umiacs.umd.edu Hanan

More information

Fast Similarity Search for High-Dimensional Dataset

Fast Similarity Search for High-Dimensional Dataset Fast Similarity Search for High-Dimensional Dataset Quan Wang and Suya You Computer Science Department University of Southern California {quanwang,suyay}@graphics.usc.edu Abstract This paper addresses

More information

Striped Grid Files: An Alternative for Highdimensional

Striped Grid Files: An Alternative for Highdimensional Striped Grid Files: An Alternative for Highdimensional Indexing Thanet Praneenararat 1, Vorapong Suppakitpaisarn 2, Sunchai Pitakchonlasap 1, and Jaruloj Chongstitvatana 1 Department of Mathematics 1,

More information

Geometric data structures:

Geometric data structures: Geometric data structures: Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade Sham Kakade 2017 1 Announcements: HW3 posted Today: Review: LSH for Euclidean distance Other

More information

Continuous Nearest Neighbor Search

Continuous Nearest Neighbor Search Continuous Nearest Neighbor Search Lien-Fa Lin * Department of Computer Science and Information Engineering National Cheng-Kung University E-mail: lienfa@cc.kyu.edu.tw Chao-Chun Chen Department of Information

More information

Using Natural Clusters Information to Build Fuzzy Indexing Structure

Using Natural Clusters Information to Build Fuzzy Indexing Structure Using Natural Clusters Information to Build Fuzzy Indexing Structure H.Y. Yue, I. King and K.S. Leung Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin, New Territories,

More information

Abstract. 1 Introduction. in both infrastructure-based and handset-based positioning

Abstract. 1 Introduction. in both infrastructure-based and handset-based positioning 2002 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or

More information

Surrounding Join Query Processing in Spatial Databases

Surrounding Join Query Processing in Spatial Databases Surrounding Join Query Processing in Spatial Databases Lingxiao Li (B), David Taniar, Maria Indrawan-Santiago, and Zhou Shao Monash University, Melbourne, Australia lli278@student.monash.edu, {david.taniar,maria.indrawan,joe.shao}@monash.edu

More information

Scan Scheduling Specification and Analysis

Scan Scheduling Specification and Analysis Scan Scheduling Specification and Analysis Bruno Dutertre System Design Laboratory SRI International Menlo Park, CA 94025 May 24, 2000 This work was partially funded by DARPA/AFRL under BAE System subcontract

More information

Benchmarking the UB-tree

Benchmarking the UB-tree Benchmarking the UB-tree Michal Krátký, Tomáš Skopal Department of Computer Science, VŠB Technical University of Ostrava, tř. 17. listopadu 15, Ostrava, Czech Republic michal.kratky@vsb.cz, tomas.skopal@vsb.cz

More information

Adaptive and Incremental Processing for Distance Join Queries

Adaptive and Incremental Processing for Distance Join Queries Adaptive and Incremental Processing for Distance Join Queries Hyoseop Shin Ý Bongki Moon Þ Sukho Lee Ý Ý Þ School of Computer Engineering Department of Computer Science Seoul National University University

More information

Two Ellipse-based Pruning Methods for Group Nearest Neighbor Queries

Two Ellipse-based Pruning Methods for Group Nearest Neighbor Queries Two Ellipse-based Pruning Methods for Group Nearest Neighbor Queries ABSTRACT Hongga Li Institute of Remote Sensing Applications Chinese Academy of Sciences, Beijing, China lihongga lhg@yahoo.com.cn Bo

More information

So, we want to perform the following query:

So, we want to perform the following query: Abstract This paper has two parts. The first part presents the join indexes.it covers the most two join indexing, which are foreign column join index and multitable join index. The second part introduces

More information

Spatial Processing using Oracle Table Functions

Spatial Processing using Oracle Table Functions Spatial Processing using Oracle Table Functions Ravi Kanth V Kothuri, Siva Ravada and Weisheng Xu Spatial Technologies, NEDC, Oracle Corporation, Nashua NH 03062. Ravi.Kothuri, Siva.Ravada, Weisheng.Xu

More information

The Effects of Dimensionality Curse in High Dimensional knn Search

The Effects of Dimensionality Curse in High Dimensional knn Search The Effects of Dimensionality Curse in High Dimensional knn Search Nikolaos Kouiroukidis, Georgios Evangelidis Department of Applied Informatics University of Macedonia Thessaloniki, Greece Email: {kouiruki,

More information

Monotone Constraints in Frequent Tree Mining

Monotone Constraints in Frequent Tree Mining Monotone Constraints in Frequent Tree Mining Jeroen De Knijf Ad Feelders Abstract Recent studies show that using constraints that can be pushed into the mining process, substantially improves the performance

More information

Structure-Based Similarity Search with Graph Histograms

Structure-Based Similarity Search with Graph Histograms Structure-Based Similarity Search with Graph Histograms Apostolos N. Papadopoulos and Yannis Manolopoulos Data Engineering Lab. Department of Informatics, Aristotle University Thessaloniki 5006, Greece

More information

O-Trees: a Constraint-based Index Structure

O-Trees: a Constraint-based Index Structure O-Trees: a Constraint-based Index Structure Inga Sitzmann and Peter Stuckey Department of Computer Science and Software Engineering The University of Melbourne Parkville, Victoria, 3052 inga,pjs @cs.mu.oz.au

More information

Inverse Queries For Multidimensional Spaces

Inverse Queries For Multidimensional Spaces Inverse Queries For Multidimensional Spaces Thomas Bernecker 1, Tobias Emrich 1, Hans-Peter Kriegel 1, Nikos Mamoulis 2, Matthias Renz 1, Shiming Zhang 2, and Andreas Züfle 1 1 Institute for Informatics,

More information

Exploiting a Page-Level Upper Bound for Multi-Type Nearest Neighbor Queries

Exploiting a Page-Level Upper Bound for Multi-Type Nearest Neighbor Queries Exploiting a Page-Level Upper Bound for Multi-Type Nearest Neighbor Queries Xiaobin Ma University of Minnesota xiaobin@cs.umn.edu Shashi Shekhar University of Minnesota shekhar@cs.umn.edu Hui Xiong Rutgers

More information

Physical Level of Databases: B+-Trees

Physical Level of Databases: B+-Trees Physical Level of Databases: B+-Trees Adnan YAZICI Computer Engineering Department METU (Fall 2005) 1 B + -Tree Index Files l Disadvantage of indexed-sequential files: performance degrades as file grows,

More information

Deletion Techniques for the ND-tree in Non-ordered Discrete Data Spaces

Deletion Techniques for the ND-tree in Non-ordered Discrete Data Spaces Deletion Techniques for the ND-tree in Non-ordered Discrete Data Spaces Hyun Jeong Seok Qiang Zhu Gang Qian Department of Computer and Information Science Department of Computer Science The University

More information

On Processing Location Based Top-k Queries in the Wireless Broadcasting System

On Processing Location Based Top-k Queries in the Wireless Broadcasting System On Processing Location Based Top-k Queries in the Wireless Broadcasting System HaRim Jung, ByungKu Cho, Yon Dohn Chung and Ling Liu Department of Computer Science and Engineering, Korea University, Seoul,

More information

Outline. Other Use of Triangle Inequality Algorithms for Nearest Neighbor Search: Lecture 2. Orchard s Algorithm. Chapter VI

Outline. Other Use of Triangle Inequality Algorithms for Nearest Neighbor Search: Lecture 2. Orchard s Algorithm. Chapter VI Other Use of Triangle Ineuality Algorithms for Nearest Neighbor Search: Lecture 2 Yury Lifshits http://yury.name Steklov Institute of Mathematics at St.Petersburg California Institute of Technology Outline

More information

Experimental Evaluation of Spatial Indices with FESTIval

Experimental Evaluation of Spatial Indices with FESTIval Experimental Evaluation of Spatial Indices with FESTIval Anderson Chaves Carniel 1, Ricardo Rodrigues Ciferri 2, Cristina Dutra de Aguiar Ciferri 1 1 Department of Computer Science University of São Paulo

More information

Indexing Mobile Objects Using Dual Transformations

Indexing Mobile Objects Using Dual Transformations Indexing Mobile Objects Using Dual Transformations George Kollios Boston University gkollios@cs.bu.edu Dimitris Papadopoulos UC Riverside tsotras@cs.ucr.edu Dimitrios Gunopulos Ý UC Riverside dg@cs.ucr.edu

More information

DeLiClu: Boosting Robustness, Completeness, Usability, and Efficiency of Hierarchical Clustering by a Closest Pair Ranking

DeLiClu: Boosting Robustness, Completeness, Usability, and Efficiency of Hierarchical Clustering by a Closest Pair Ranking In Proc. 10th Pacific-Asian Conf. on Advances in Knowledge Discovery and Data Mining (PAKDD'06), Singapore, 2006 DeLiClu: Boosting Robustness, Completeness, Usability, and Efficiency of Hierarchical Clustering

More information

High Dimensional Indexing by Clustering

High Dimensional Indexing by Clustering Yufei Tao ITEE University of Queensland Recall that, our discussion so far has assumed that the dimensionality d is moderately high, such that it can be regarded as a constant. This means that d should

More information

Improving the Query Performance of High-Dimensional Index Structures by Bulk Load Operations

Improving the Query Performance of High-Dimensional Index Structures by Bulk Load Operations Improving the Query Performance of High-Dimensional Index Structures by Bulk Load Operations Stefan Berchtold, Christian Böhm 2, and Hans-Peter Kriegel 2 AT&T Labs Research, 8 Park Avenue, Florham Park,

More information

A New Method for Similarity Indexing of Market Basket Data. Charu C. Aggarwal Joel L. Wolf. Philip S.

A New Method for Similarity Indexing of Market Basket Data. Charu C. Aggarwal Joel L. Wolf.  Philip S. A New Method for Similarity Indexing of Market Basket Data Charu C. Aggarwal Joel L. Wolf IBM T. J. Watson Research Center IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Yorktown Heights,

More information

Graph Traversal. 1 Breadth First Search. Correctness. find all nodes reachable from some source node s

Graph Traversal. 1 Breadth First Search. Correctness. find all nodes reachable from some source node s 1 Graph Traversal 1 Breadth First Search visit all nodes and edges in a graph systematically gathering global information find all nodes reachable from some source node s Prove this by giving a minimum

More information

Computational Geometry

Computational Geometry Windowing queries Windowing Windowing queries Zoom in; re-center and zoom in; select by outlining Windowing Windowing queries Windowing Windowing queries Given a set of n axis-parallel line segments, preprocess

More information

Indexing Techniques 3 rd Part

Indexing Techniques 3 rd Part Indexing Techniques 3 rd Part Presented by: Tarik Ben Touhami Supervised by: Dr. Hachim Haddouti CSC 5301 Spring 2003 Outline! Join indexes "Foreign column join index "Multitable join index! Indexing techniques

More information

Multimedia Database Systems

Multimedia Database Systems Department of Informatics Aristotle University of Thessaloniki Fall 2016-2017 Multimedia Database Systems Indexing Part A Multidimensional Indexing Techniques Outline Motivation Multidimensional indexing

More information

Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret

Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret Greedy Algorithms (continued) The best known application where the greedy algorithm is optimal is surely

More information

Indexing Non-uniform Spatial Data

Indexing Non-uniform Spatial Data Indexing Non-uniform Spatial Data K. V. Ravi Kanth Divyakant Agrawal Amr El Abbadi Ambuj K. Singh Department of Computer Science University of California at Santa Barbara Santa Barbara, CA 93106 Abstract

More information

Introduction to Indexing R-trees. Hong Kong University of Science and Technology

Introduction to Indexing R-trees. Hong Kong University of Science and Technology Introduction to Indexing R-trees Dimitris Papadias Hong Kong University of Science and Technology 1 Introduction to Indexing 1. Assume that you work in a government office, and you maintain the records

More information

Adaptive k-nearest-neighbor Classification Using a Dynamic Number of Nearest Neighbors

Adaptive k-nearest-neighbor Classification Using a Dynamic Number of Nearest Neighbors Adaptive k-nearest-neighbor Classification Using a Dynamic Number of Nearest Neighbors Stefanos Ougiaroglou 1 Alexandros Nanopoulos 1 Apostolos N. Papadopoulos 1 Yannis Manolopoulos 1 Tatjana Welzer-Druzovec

More information

Reverse k-nearest Neighbor Search in Dynamic and General Metric Databases

Reverse k-nearest Neighbor Search in Dynamic and General Metric Databases 12th Int. Conf. on Extending Database Technology (EDBT'9), Saint-Peterburg, Russia, 29. Reverse k-nearest Neighbor Search in Dynamic and General Metric Databases Elke Achtert Hans-Peter Kriegel Peer Kröger

More information

High-dimensional knn Joins with Incremental Updates

High-dimensional knn Joins with Incremental Updates GeoInformatica manuscript No. (will be inserted by the editor) High-dimensional knn Joins with Incremental Updates Cui Yu Rui Zhang Yaochun Huang Hui Xiong Received: date / Accepted: date Abstract The

More information

HOT asax: A Novel Adaptive Symbolic Representation for Time Series Discords Discovery

HOT asax: A Novel Adaptive Symbolic Representation for Time Series Discords Discovery HOT asax: A Novel Adaptive Symbolic Representation for Time Series Discords Discovery Ninh D. Pham, Quang Loc Le, Tran Khanh Dang Faculty of Computer Science and Engineering, HCM University of Technology,

More information

Advances in Data Management Principles of Database Systems - 2 A.Poulovassilis

Advances in Data Management Principles of Database Systems - 2 A.Poulovassilis 1 Advances in Data Management Principles of Database Systems - 2 A.Poulovassilis 1 Storing data on disk The traditional storage hierarchy for DBMSs is: 1. main memory (primary storage) for data currently

More information

Spatiotemporal Access to Moving Objects. Hao LIU, Xu GENG 17/04/2018

Spatiotemporal Access to Moving Objects. Hao LIU, Xu GENG 17/04/2018 Spatiotemporal Access to Moving Objects Hao LIU, Xu GENG 17/04/2018 Contents Overview & applications Spatiotemporal queries Movingobjects modeling Sampled locations Linear function of time Indexing structure

More information

Datenbanksysteme II: Multidimensional Index Structures 2. Ulf Leser

Datenbanksysteme II: Multidimensional Index Structures 2. Ulf Leser Datenbanksysteme II: Multidimensional Index Structures 2 Ulf Leser Content of this Lecture Introduction Partitioned Hashing Grid Files kdb Trees kd Tree kdb Tree R Trees Example: Nearest neighbor image

More information

Distance Browsing in Spatial Databases

Distance Browsing in Spatial Databases Distance Browsing in Spatial Databases GÍSLI R. HJALTASON and HANAN SAMET University of Maryland We compare two different techniques for browsing through a collection of spatial objects stored in an R-tree

More information

9/23/2009 CONFERENCES CONTINUOUS NEAREST NEIGHBOR SEARCH INTRODUCTION OVERVIEW PRELIMINARY -- POINT NN QUERIES

9/23/2009 CONFERENCES CONTINUOUS NEAREST NEIGHBOR SEARCH INTRODUCTION OVERVIEW PRELIMINARY -- POINT NN QUERIES CONFERENCES Short Name SIGMOD Full Name Special Interest Group on Management Of Data CONTINUOUS NEAREST NEIGHBOR SEARCH Yufei Tao, Dimitris Papadias, Qiongmao Shen Hong Kong University of Science and Technology

More information

A Study on Reverse Top-K Queries Using Monochromatic and Bichromatic Methods

A Study on Reverse Top-K Queries Using Monochromatic and Bichromatic Methods A Study on Reverse Top-K Queries Using Monochromatic and Bichromatic Methods S.Anusuya 1, M.Balaganesh 2 P.G. Student, Department of Computer Science and Engineering, Sembodai Rukmani Varatharajan Engineering

More information

Study on RNN Query in Broadcasting Environment

Study on RNN Query in Broadcasting Environment Study on RNN Query in Broadcasting Environment Lien-Fa Lin*, Chao-Chun Chen Department of Computer Science and Information Engineering National Cheng-Kung University, Tainan, Taiwan, R.O.C. Department

More information

Cofactoring-Based Upper Bound Computation for Covering Problems

Cofactoring-Based Upper Bound Computation for Covering Problems TR-CSE-98-06, UNIVERSITY OF MASSACHUSETTS AMHERST Cofactoring-Based Upper Bound Computation for Covering Problems Congguang Yang Maciej Ciesielski May 998 TR-CSE-98-06 Department of Electrical and Computer

More information

Using the Holey Brick Tree for Spatial Data. in General Purpose DBMSs. Northeastern University

Using the Holey Brick Tree for Spatial Data. in General Purpose DBMSs. Northeastern University Using the Holey Brick Tree for Spatial Data in General Purpose DBMSs Georgios Evangelidis Betty Salzberg College of Computer Science Northeastern University Boston, MA 02115-5096 1 Introduction There is

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

Inverse Queries For Multidimensional Spaces

Inverse Queries For Multidimensional Spaces Inverse Queries For Multidimensional Spaces Thomas Bernecker 1, Tobias Emrich 1, Hans-Peter Kriegel 1, Nikos Mamoulis 2, Matthias Renz 1, Shiming Zhang 2, and Andreas Züfle 1 1 Institute for Informatics,

More information

DART+: Direction-aware bichromatic reverse k nearest neighbor query processing in spatial databases

DART+: Direction-aware bichromatic reverse k nearest neighbor query processing in spatial databases DOI 10.1007/s10844-014-0326-3 DART+: Direction-aware bichromatic reverse k nearest neighbor query processing in spatial databases Kyoung-Won Lee Dong-Wan Choi Chin-Wan Chung Received: 14 September 2013

More information

Efficient Spatial Query Processing in Geographic Database Systems

Efficient Spatial Query Processing in Geographic Database Systems Efficient Spatial Query Processing in Geographic Database Systems Hans-Peter Kriegel, Thomas Brinkhoff, Ralf Schneider Institute for Computer Science, University of Munich Leopoldstr. 11 B, W-8000 München

More information

Hike: A High Performance knn Query Processing System for Multimedia Data

Hike: A High Performance knn Query Processing System for Multimedia Data Hike: A High Performance knn Query Processing System for Multimedia Data Hui Li College of Computer Science and Technology Guizhou University Guiyang, China cse.huili@gzu.edu.cn Ling Liu College of Computing

More information

Optimal Static Range Reporting in One Dimension

Optimal Static Range Reporting in One Dimension of Optimal Static Range Reporting in One Dimension Stephen Alstrup Gerth Stølting Brodal Theis Rauhe ITU Technical Report Series 2000-3 ISSN 1600 6100 November 2000 Copyright c 2000, Stephen Alstrup Gerth

More information

A Quantization Approach for Efficient Similarity Search on Time Series Data

A Quantization Approach for Efficient Similarity Search on Time Series Data A Quantization Approach for Efficient Similarity Search on Series Data Inés Fernando Vega LópezÝ Bongki Moon Þ ÝDepartment of Computer Science. Autonomous University of Sinaloa, Culiacán, México ÞDepartment

More information

RSA (Rivest Shamir Adleman) public key cryptosystem: Key generation: Pick two large prime Ô Õ ¾ numbers È.

RSA (Rivest Shamir Adleman) public key cryptosystem: Key generation: Pick two large prime Ô Õ ¾ numbers È. RSA (Rivest Shamir Adleman) public key cryptosystem: Key generation: Pick two large prime Ô Õ ¾ numbers È. Let Ò Ô Õ. Pick ¾ ½ ³ Òµ ½ so, that ³ Òµµ ½. Let ½ ÑÓ ³ Òµµ. Public key: Ò µ. Secret key Ò µ.

More information

Constrained Nearest Neighbor Queries

Constrained Nearest Neighbor Queries Constrained Nearest Neighbor Queries Hakan Ferhatosmanoglu, Ioanna Stanoi, Divyakant Agrawal, and Amr El Abbadi Computer Science Department, University of California at Santa Barbara {hakan,ioana,agrawal,amr}@csucsbedu

More information

Worst-Case Utilization Bound for EDF Scheduling on Real-Time Multiprocessor Systems

Worst-Case Utilization Bound for EDF Scheduling on Real-Time Multiprocessor Systems Worst-Case Utilization Bound for EDF Scheduling on Real-Time Multiprocessor Systems J.M. López, M. García, J.L. Díaz, D.F. García University of Oviedo Department of Computer Science Campus de Viesques,

More information

Web Based Spatial Ranking System

Web Based Spatial Ranking System Research Inventy: International Journal Of Engineering And Science Vol.3, Issue 5 (July 2013), Pp 47-53 Issn(e): 2278-4721, Issn(p):2319-6483, Www.Researchinventy.Com 1 Mr. Vijayakumar Neela, 2 Prof. Raafiya

More information

Competitive Analysis of On-line Algorithms for On-demand Data Broadcast Scheduling

Competitive Analysis of On-line Algorithms for On-demand Data Broadcast Scheduling Competitive Analysis of On-line Algorithms for On-demand Data Broadcast Scheduling Weizhen Mao Department of Computer Science The College of William and Mary Williamsburg, VA 23187-8795 USA wm@cs.wm.edu

More information

Multi-Cube Computation

Multi-Cube Computation Multi-Cube Computation Jeffrey Xu Yu Department of Sys. Eng. and Eng. Management The Chinese University of Hong Kong Hong Kong, China yu@se.cuhk.edu.hk Hongjun Lu Department of Computer Science Hong Kong

More information

Online Facility Location

Online Facility Location Online Facility Location Adam Meyerson Abstract We consider the online variant of facility location, in which demand points arrive one at a time and we must maintain a set of facilities to service these

More information

Search K Nearest Neighbors on Air

Search K Nearest Neighbors on Air Search K Nearest Neighbors on Air Baihua Zheng 1, Wang-Chien Lee 2, and Dik Lun Lee 1 1 Hong Kong University of Science and Technology Clear Water Bay, Hong Kong {baihua,dlee}@cs.ust.hk 2 The Penn State

More information

VoR-Tree: R-trees with Voronoi Diagrams for Efficient Processing of Spatial Nearest Neighbor Queries

VoR-Tree: R-trees with Voronoi Diagrams for Efficient Processing of Spatial Nearest Neighbor Queries VoR-Tree: R-trees with Voronoi Diagrams for Efficient Processing of Spatial Nearest Neighbor Queries Mehdi Sharifzadeh Google mehdish@google.com Cyrus Shahabi University of Southern California shahabi@usc.edu

More information

R-Trees. Accessing Spatial Data

R-Trees. Accessing Spatial Data R-Trees Accessing Spatial Data In the beginning The B-Tree provided a foundation for R- Trees. But what s a B-Tree? A data structure for storing sorted data with amortized run times for insertion and deletion

More information

DDS Dynamic Search Trees

DDS Dynamic Search Trees DDS Dynamic Search Trees 1 Data structures l A data structure models some abstract object. It implements a number of operations on this object, which usually can be classified into l creation and deletion

More information

Comparison of spatial indexes

Comparison of spatial indexes Comparison of spatial indexes Nathalie Andrea Barbosa Roa To cite this version: Nathalie Andrea Barbosa Roa. Comparison of spatial indexes. [Research Report] Rapport LAAS n 16631,., 13p. HAL

More information

Locality- Sensitive Hashing Random Projections for NN Search

Locality- Sensitive Hashing Random Projections for NN Search Case Study 2: Document Retrieval Locality- Sensitive Hashing Random Projections for NN Search Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 18, 2017 Sham Kakade

More information

Spatial Queries. Nearest Neighbor Queries

Spatial Queries. Nearest Neighbor Queries Spatial Queries Nearest Neighbor Queries Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer efficiently point queries range queries k-nn

More information

Indexing High-Dimensional Space:

Indexing High-Dimensional Space: Indexing High-Dimensional Space: Database Support for Next Decade s Applications Stefan Berchtold AT&T Research berchtol@research.att.com Daniel A. Keim University of Halle-Wittenberg keim@informatik.uni-halle.de

More information

Nearest and reverse nearest neighbor queries for moving objects

Nearest and reverse nearest neighbor queries for moving objects VLDB Journal manuscript No. (will be inserted by the editor) Nearest and reverse nearest neighbor queries for moving objects Rimantas Benetis, Christian S. Jensen, Gytis Karčiauskas, Simonas Šaltenis Aalborg

More information

Indexing Biometric Databases using Pyramid Technique

Indexing Biometric Databases using Pyramid Technique Indexing Biometric Databases using Pyramid Technique Amit Mhatre, Sharat Chikkerur and Venu Govindaraju Center for Unified Biometrics and Sensors (CUBS), University at Buffalo, New York, U.S.A http://www.cubs.buffalo.edu

More information

Data Structure. IBPS SO (IT- Officer) Exam 2017

Data Structure. IBPS SO (IT- Officer) Exam 2017 Data Structure IBPS SO (IT- Officer) Exam 2017 Data Structure: In computer science, a data structure is a way of storing and organizing data in a computer s memory so that it can be used efficiently. Data

More information

Non-Hierarchical Clustering with Rival. Retrieval? Irwin King and Tak-Kan Lau. The Chinese University of Hong Kong. Shatin, New Territories, Hong Kong

Non-Hierarchical Clustering with Rival. Retrieval? Irwin King and Tak-Kan Lau. The Chinese University of Hong Kong. Shatin, New Territories, Hong Kong Non-Hierarchical Clustering with Rival Penalized Competitive Learning for Information Retrieval? Irwin King and Tak-Kan Lau Department of Computer Science & Engineering The Chinese University of Hong Kong

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

A Pivot-based Index Structure for Combination of Feature Vectors

A Pivot-based Index Structure for Combination of Feature Vectors A Pivot-based Index Structure for Combination of Feature Vectors Benjamin Bustos Daniel Keim Tobias Schreck Department of Computer and Information Science, University of Konstanz Universitätstr. 10 Box

More information

Fast Branch & Bound Algorithm in Feature Selection

Fast Branch & Bound Algorithm in Feature Selection Fast Branch & Bound Algorithm in Feature Selection Petr Somol and Pavel Pudil Department of Pattern Recognition, Institute of Information Theory and Automation, Academy of Sciences of the Czech Republic,

More information

Data Warehousing & Data Mining

Data Warehousing & Data Mining Data Warehousing & Data Mining Wolf-Tilo Balke Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Summary Last week: Logical Model: Cubes,

More information

RSA (Rivest Shamir Adleman) public key cryptosystem: Key generation: Pick two large prime Ô Õ ¾ numbers È.

RSA (Rivest Shamir Adleman) public key cryptosystem: Key generation: Pick two large prime Ô Õ ¾ numbers È. RSA (Rivest Shamir Adleman) public key cryptosystem: Key generation: Pick two large prime Ô Õ ¾ numbers È. Let Ò Ô Õ. Pick ¾ ½ ³ Òµ ½ so, that ³ Òµµ ½. Let ½ ÑÓ ³ Òµµ. Public key: Ò µ. Secret key Ò µ.

More information

Voronoi-Based K Nearest Neighbor Search for Spatial Network Databases

Voronoi-Based K Nearest Neighbor Search for Spatial Network Databases Voronoi-Based K Nearest Neighbor Search for Spatial Network Databases Mohammad Kolahdouzan and Cyrus Shahabi Department of Computer Science University of Southern California Los Angeles, CA, 90089, USA

More information

Benchmarking Access Structures for High-Dimensional Multimedia Data

Benchmarking Access Structures for High-Dimensional Multimedia Data Benchmarking Access Structures for High-Dimensional Multimedia Data by Nathan G. Colossi and Mario A. Nascimento Technical Report TR 99-05 December 1999 DEPARTMENT OF COMPUTING SCIENCE University of Alberta

More information

6 Distributed data management I Hashing

6 Distributed data management I Hashing 6 Distributed data management I Hashing There are two major approaches for the management of data in distributed systems: hashing and caching. The hashing approach tries to minimize the use of communication

More information

Dynamic Skyline Queries in Metric Spaces

Dynamic Skyline Queries in Metric Spaces Dynamic Skyline Queries in Metric Spaces Lei Chen and Xiang Lian Department of Computer Science and Engineering Hong Kong University of Science and Technology Clear Water Bay, Kowloon Hong Kong, China

More information

Temporal Range Exploration of Large Scale Multidimensional Time Series Data

Temporal Range Exploration of Large Scale Multidimensional Time Series Data Temporal Range Exploration of Large Scale Multidimensional Time Series Data Joseph JaJa Jusub Kim Institute for Advanced Computer Studies Department of Electrical and Computer Engineering University of

More information

Reverse Furthest Neighbors in Spatial Databases

Reverse Furthest Neighbors in Spatial Databases Reverse Furthest Neighbors in Spatial Databases Bin Yao, Feifei Li, Piyush Kumar Computer Science Department, Florida State University Tallahassee, Florida, USA {yao, lifeifei, piyush}@cs.fsu.edu Abstract

More information

Analysis of Algorithms

Analysis of Algorithms Algorithm An algorithm is a procedure or formula for solving a problem, based on conducting a sequence of specified actions. A computer program can be viewed as an elaborate algorithm. In mathematics and

More information

Multidimensional Indexing The R Tree

Multidimensional Indexing The R Tree Multidimensional Indexing The R Tree Module 7, Lecture 1 Database Management Systems, R. Ramakrishnan 1 Single-Dimensional Indexes B+ trees are fundamentally single-dimensional indexes. When we create

More information

Threshold Interval Indexing for Complicated Uncertain Data

Threshold Interval Indexing for Complicated Uncertain Data Threshold Interval Indexing for Complicated Uncertain Data Andrew Knight Department of Computer Science Rochester Institute of Technology Rochester, New York, USA Email: alk1234@rit.edu Qi Yu Department

More information

Distributed k-nn Query Processing for Location Services

Distributed k-nn Query Processing for Location Services Distributed k-nn Query Processing for Location Services Jonghyeong Han 1, Joonwoo Lee 1, Seungyong Park 1, Jaeil Hwang 1, and Yunmook Nah 1 1 Department of Electronics and Computer Engineering, Dankook

More information

Index-Driven Similarity Search in Metric Spaces

Index-Driven Similarity Search in Metric Spaces Index-Driven Similarity Search in Metric Spaces GISLI R. HJALTASON and HANAN SAMET University of Maryland, College Park, Maryland Similarity search is a very important operation in multimedia databases

More information

From Static to Dynamic Routing: Efficient Transformations of Store-and-Forward Protocols

From Static to Dynamic Routing: Efficient Transformations of Store-and-Forward Protocols From Static to Dynamic Routing: Efficient Transformations of Store-and-Forward Protocols Christian Scheideler Ý Berthold Vöcking Þ Abstract We investigate how static store-and-forward routing algorithms

More information

Nearest Neighbor Search by Branch and Bound

Nearest Neighbor Search by Branch and Bound Nearest Neighbor Search by Branch and Bound Algorithmic Problems Around the Web #2 Yury Lifshits http://yury.name CalTech, Fall 07, CS101.2, http://yury.name/algoweb.html 1 / 30 Outline 1 Short Intro to

More information

CMSC 754 Computational Geometry 1

CMSC 754 Computational Geometry 1 CMSC 754 Computational Geometry 1 David M. Mount Department of Computer Science University of Maryland Fall 2005 1 Copyright, David M. Mount, 2005, Dept. of Computer Science, University of Maryland, College

More information

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing. Basic Concepts Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition

More information

Organizing Spatial Data

Organizing Spatial Data Organizing Spatial Data Spatial data records include a sense of location as an attribute. Typically location is represented by coordinate data (in 2D or 3D). 1 If we are to search spatial data using the

More information

1 The range query problem

1 The range query problem CS268: Geometric Algorithms Handout #12 Design and Analysis Original Handout #12 Stanford University Thursday, 19 May 1994 Original Lecture #12: Thursday, May 19, 1994 Topics: Range Searching with Partition

More information