Deletion Techniques for the ND-tree in Non-ordered Discrete Data Spaces

Size: px
Start display at page:

Download "Deletion Techniques for the ND-tree in Non-ordered Discrete Data Spaces"

Transcription

1 Deletion Techniques for the ND-tree in Non-ordered Discrete Data Spaces Hyun Jeong Seok Qiang Zhu Gang Qian Department of Computer and Information Science Department of Computer Science The University of Michigan - Dearborn University of Central Oklahoma Dearborn, MI 48128, USA Edmond, OK 73034, USA {hseok, qzhu}@umich.edu gqian@uco.edu Sakti Pramanik Department of Computer Science and Engineering Michigan State University East Lansing, MI 48824, USA pramanik@cse.msu.edu Wen-Chi Hou Department of Computer Science Southern Illinois University Carbondale, IL 62901, USA hou@cs.siu.edu Abstract Similarity searches in contemporary applications such as bioinformatics, biometrics and E-commerce are becoming increasingly prevalent. Data in these applications often involve domains with non-ordered discrete values such as genomic nucleotide bases, gender and profession. Existing multidimensional index trees (e.g., the R*-tree and the K-D-B-tree) for continuous data spaces (CDS) cannot be directly applied to the above non-ordered discrete data space (NDDS) due to lack of essential geometric concepts (e.g., rectangles) in such spaces. An index tree, called the ND-tree, was recently proposed to support efficient similarity searches for NDDSs. However, this earlier work focused on solving the construction and query issues for the The issue for developing an efficient and effective deletion technique for the ND-tree remains open. In this paper, we present three deletion algorithms based on different underflow-handling strategies for the ND-tree in NDDSs. The performance of these deletion algorithms as well as the quality of their produced ND-trees are evaluated and compared. The study shows that a tradeoff between the deletion efficiency and the tree quality needs to be carefully considered when choosing a deletion technique for the NDtree. 1 Introduction A non-ordered discrete data space (NDDS) contains multidimensional vectors whose component values are discrete and have no natural ordering. Nonordered discrete data domains such as gender, genomic nucleotide and profession are very common in Research supported by the US National Science Foundation (under grants # IIS and # IIS ) and the University of Michigan. database applications such as bioinformatics, biometrics, E-commerce and data mining. For example, in a genome sequence database, sequences with alphabet {a, g, t, c} are broken into substrings of some fixedlength d (i.e., vectors in a d-dimensional NDDS) for similarity searches [7], where no ordering exists among letters a, g, t and c. There is an increasing demand for similarity searches in NDDSs. To support efficient evaluation of such queries in NDDSs, robust multidimensional indexes are required. Most existing multidimensional index structures such as the R-trees [1, 3], the SS-tree [17], the X-tree [2], the K-D-B-tree [15] and the LSDh-tree [5] were designed for continuous data spaces (CDS). They cannot be directly applied to NDDSs, due to lack of essential geometric concepts such as (hyper-)rectangle, edge length and region area in such spaces. The ND-tree, which utilizes special properties of an NDDS, was proposed recently to support efficient similarity queries in NDDSs [12, 14]. Studies showed that the ND-tree is quite promising in supporting efficient evaluation of such similarity queries. Although the ND-tree was mainly designed to support efficient queries, its maintenance, such as deletion and update operations, are indispensable. Since an update can be realized by a deletion followed by an insertion, the deletion techniques are crucial for the index maintenance. Deletion techniques are typically designed to achieve two goals: (1) to obtain an efficient deletion procedure in terms of the number of I/Os required (i.e., efficiency) and (2) to make the result index tree after deletions have a query performance comparable to that of an index tree built from scratch for the same set of indexed vectors (i.e., effectiveness). All existing related work on deletion is for index trees in CDSs [4, 6, 8, 9, 10, 11]. No study has been re-

2 ported for developing efficient and effective deletion techniques for index trees in NDDSs. In this paper, we present three deletion techniques for the ND-tree in NDDSs and conduct empirical studies to evaluate and compare their efficiency and effectiveness. These three deletion techniques adopt different underflow-handling strategies, which are the key factor to distinguish them. The method to handle underflow nodes caused by the removal of node entries is critical to achieve an efficient deletion and a high query performance. Investigation of the effects of different underflow-handling strategies on the efficiency and effectiveness of deletion for the ND-tree is important. The rest of the paper is organized as follows. Section 2 introduces the relevant concepts and notion for NDDSs and the ND-tree structure. Section 3 presents three deletion algorithms. Section 4 reports our experimental results. Section 5 concludes the paper. 2 Preliminaries To understand our deletion algorithms for the ND-tree in an NDDS, it is necessary to know the relevant concepts about an NDDS and the structure of the ND-tree, which were introduced in [12, 13, 14]. A d-dimensional NDDS Ω d is defined as the Cartesian product of d alphabets: Ω d = A 1 A 2... A d, where A i (1 i d) is the alphabet of the i-th dimension of Ω d, consisting of a finite set of letters. There is no natural ordering among the letters. α = (a 1, a 2,..., a d ) (or a 1 a 2...a d ) is a vector in Ω d, where a i A i (1 i d). A discrete rectangle R in Ω d is defined as R = S 1 S 2... S d, where S i A i (1 i d) is called the i-th component set of R. R is also called a subspace of Ω d. The area of R is defined as S 1 S 2... S d. The overlap of two discrete rectangles R and R is R R = (S 1 S 1 ) (S 2 S 2 )... (S d S d ). For a given set SV of vectors, the discrete minimum bounding rectangle (DMBR) of SV is defined as the discrete rectangle whose i-th component set (1 i d) consists of all letters appearing on the i-th dimension of the given vectors. As discussed in [12, 14], the Hamming distance is a suitable distance measure for NDDSs. The Hamming distance between two vectors gives the number of mismatching dimensions between them. Using the Hamming distance, a similarity (range) query is defined as follows: given a query vector α q and a query range of Hamming distance r q, find all the vectors whose Hamming distance to α q is less than or equal to r q. The ND-tree based on the NDDS concepts was proposed in [12, 14] to support efficient similarity queries in NDDSs. Its structure is outlined as follows. The ND-tree is a disk-based balanced tree, whose structure has some similarities to that of the R-tree [3] in continuous data spaces. Let M and m (2 m M/2 ) be the maximum number and the minimum number of entries allowed in each node of an ND-tree, respectively. An ND-tree satisfies the following two requirements: (1) every non-leaf node has between m and M children unless it is the root (which may have a minimum of two children in this case); (2) every leaf node contains between m and M entries unless it is the root (which may have a minimum of one entry/vector in this case). A leaf node in an ND-tree contains an array of entries of the form (op, key), where key is a vector in an NDDS Ω d and op is a pointer to the object represented by key in the database. A non-leaf node N in an ND-tree contains an array of entries of the form (cp, DMBR), where cp is a pointer to a child node N of N in the tree and DMBR is the discrete minimum bounding rectangle of N. Figure 1 shows an example of the ND-tree for a genome sequence database with alphabet {a, g, t, c} [12]. Level 1 (root): Level 2: Level 3 (leaves):... {a,g}x{a,c,g,t}x... {t,c}x{a,c,g,t}x {a,g}x{g,c}x... {a,g}x{a,t}x {t}x{c,g,t}x... {c}x{a,c,g}x "at..." "ga..." "tc..." "tt..." Figure 1: An example of the ND-tree 3 Deletion Algorithms In this section, we discuss the main deletion procedure which invokes three different reinsertion functions in case of underflow. 3.1 Main-Deletion procedure The main idea of a deletion algorithm is to locate the leaf node that contains the given vector and remove it from that node. After a sequence of deletions, a node may become underflow. In case of underflow, some adjustments of the tree are required in order to maintain the tree properties. The adjustment procedure may be propagated from the underflow node up to the root. Different underflow-handling strategies are possible. We will discuss three underflow-handling strategies, which are described by functions: VectorReinsertion, NodeReinsertion, and BorrowReinsertion, in subsequent subsections. Whenever there is a change in a node, some adjustment may also be needed for the relevant DMBRs to maintain the structure of an Function ComputeDMBR is described in [12]. The main deletion procedure is described as follows, which yields a new algorithm (named the same

3 as its respective function) when integrating with each underflow-handling technique: Procedure 3.1 Main-Deletion Input: (1) an ND-tree with root RN; (2) vector α that is to be deleted; (3) underflow-handling type: VR vector reinsertion, NR node reinsertion, or BR borrowing reinsertion. Output: the root of the modified ND-tree (may be empty) with α deleted. 1. locate the leaf node N containing α by following a path from root RN 2. if N does not exist then 3. return RN 4. endif 5. remove α from leaf node N 6. while N is underflow and is not the root do 7. let P be the parent of N 8. if underflow handling type is VR then 9. invoke Function RN=VectorReinsertion(N, RN) 10. else if underflow handling type is NR then 11. invoke Function RN=NodeReinsertion(N, RN) 12. else 13. invoke Function RN=BorrowingReinsertion(N, RN) 14. end if 15. N = P 16. end while 17. if N is the root then 18. if N is empty then 19. return RN=empty 20. else if RN has only one child CN then 21. return RN=CN 22. else 23. return RN=N 24. end if 25. end if 26. invoke Function ComputeDBMR(N, P) to adjust the DBMR of node N that is affected by the deletion in its parent node P in a bottom-up fashion when needed 27. return RN In procedure Main-Deletion, steps 2 through 4 return the original tree if the given vector is not in the tree. Steps 6 through 16 handle underflow nodes in a bottom-up fashion. It propagates the processing from the current underflow node up to the root of tree. The underflow-handling type is set based on the user s selection on which deletion strategy is adopted before the deletion procedure starts. According to this type, the procedure invokes the VectorReinsertion, NodeReinsertion, or BorrowingReinsertion underflow-handling technique, which will be described in the following subsections. Steps 17 through 25 handle the situation in which an entry is removed from the root. Step 26 adjusts the DMBRs for the nodes that are affected by the deletion in a bottom-up fashion when needed. 3.2 Deletion via Reinserting Vectors Using this vector reinsertion method, when a node underflows, we reinsert the vectors which belong to the subtree rooted at this underflow node back to the tree via the root one by one. The insertion procedure itself is the same as the normal one-by-one insertion procedure for the ND-tree [12], except that the parent of the current underflow node may be underflow during the processing. The underflow-handling may propagate up to the root, and, in the extreme case, the whole tree may need to be reinserted and reorganized. The following function reinserts vectors under an underflow node back to the corresponding Function 3.1 VectorReinsertion(UN,RN) Input: (1) an underflow node UN; (2) the root RN of the corresponding Output: the root of the modified 1. remove the entry for UN, including its DMBR, from its parent in the tree rooted at RN 2. if UN is a leaf node then 3. for each vector α in UN do 4. the ND-tree normal insertion algorithm to insert α into tree RN 5. end for 6. else 7. for each Child CN of UN do 8. recursively invoke VectorReinsertion(CN, RN) 9. end for 10. end if 11. return RN In this function, step 1 first detaches the underflow node from its parent node. If the underflow node is a leaf, steps 2 through 5 reinsert the vectors in the node into the corresponding ND-tree by invoking the normal one-by-one insertion algorithm for the ND-tree [12]. If the underflow node is a non-leaf node, steps 6 through 10 traverse down to the leaf nodes of the subtree rooted at the given underflow node. The vectors in these leaf nodes are then reinserted into the tree at steps 2 through Deletion via Reinserting Nodes The aforementioned VectorReinsertion method may suffer a deletion performance problem since it reinserts individual vectors in the subtree rooted at the given underflow node. To improve the performance, the NodeReinsertion method inserts the subtree/child nodes of a given underflow node directly back to the The idea is described in the following function. Function 3.2 NodeReinsertion(UN,RN) Input: (1) an underflow node UN; (2) the root RN of the corresponding Output: the root of the modified 1. remove the entry for UN, including its DMBR, from its parent in the tree rooted at RN 2. traverse the ND-tree down from root RN to reach at the same level with UN 3. find a sibling node SN with the least overlap enlargement after accommodating UN subtrees 4. if there is a tie then 5. choose a sibling node SN with the least area enlargement after accommodating UN subtrees among ties 6. if there is a tie then 7. choose a sibling node SN with the minimum area after accommodating UN subtrees among ties

4 8. if there is a tie then 9. choose a random sibling SN among the ties 10. end if 11. end if 12. end if 13. for each entry E in UN do 14. insert E into SN 15. end for 16. if # of entries in SN > MAX NODE SIZE then 17. invoke an overflow handling algorithm for SN 18. end if 19. return RN. In the NodeReinsertion function, step 1 first detaches the underflow node from its parent node. Steps 2 through 12 find a suitable sibling node of the given underflow node UN to accommodate the subtree/child nodes of UN. Which sibling node we should choose for this purpose? We applied the following three heuristics, which are consistent with the ones for building the ND-tree [12]. IH1: Choose a sibling node corresponding to the entry with the least enlargement of overlap with other entries after accommodating the subtree nodes of UN. IH2: Choose a sibling node corresponding to the entry with the least enlargement of its area after accommodating the subtree nodes of UN. IH3: Choose a sibling node corresponding to the entry with the minimum area. These heuristics are applied in the above order to break ties. If all of them are not sufficient to break a tie, a sibling is chosen randomly as shown at step 9. Steps 13 through 15 insert the subtree nodes of UN into the chosen sibling node. If the resulting node becomes overflow, an overflow handling algorithm similar to the one suggested for the bulking load technique discussed in [16] is applied. The main difference between this overflow handling algorithm from the original overflow handling algorithm for the ND-tree in[12] is that the former may need to deal with multiple splits since the node may contain more than one entry over the maximum size. 3.4 Deletion via Borrow Reinsertion To further improve the deletion performance, we employ another underflow-handling method. The main idea is to borrow a best fit vector from another leaf node to resolve the underflow of the given leaf node. If the overflow leaf node can be resolved in this way, no underflow node at the higher level would be produced, which eliminates the necessity of the propagation of the underflow processing upwards. However, this processing may not always feasible. In such a case, the underflow processing boils down to the NodeReinsertion method. To realize the BorrowReinsertion method, we need to solve two problems. The first one is how to choose a friend leaf node from which a vector can be borrowed. A constraint for this node is that itself cannot become underflow after lending out a vector, i.e., should have at least MIN NODE SIZE+1 vectors. Also, this node should be very close to the given underflow node. We evaluate the closeness between two nodes by their overlap - the larger, the closer. To choose a best fit vector to borrow, we employ three heuristics similar to the previous ones, i.e., minimizing the overlap enlargement, minimizing the area enlargement, and minimize the area. The details of this method is given in the following function. Function 3.3 BorrowReinsertion(UN,RN) Input: (1) an underflow node UN; (2) the root RN of the corresponding Output: the root of the modified 1. if UN is a leaf node then 2. find a leaf sibling SN with maximum overlap with UN (randomly choose one if there is a tie) 3. if # of vectors in SN > MIN NODE SIZE then 4. find a vector α in SN with the least overlap enlargement after inserted to UN 5. if there is a tie then 6. choose a vector α among the ties in SN with the least area enlargement after inserted into UN 7. if there is a tie then 8. choose a vector α among the ties in SN with the minimum area after inserted into UN 9. if there is a tie then 10. choose a random vector α among ties in SN 11. end if 12. end if 13. end if 14. remove α from SN 15. insert α into UN 16. return RN 17. end if 18. end if 19. invoke NodeReinsertion(UN, RN) 20. return RN In the BorrowReinsertion function, steps 1 through 18 implement the borrowing strategy for an underflow leaf node. Specifically, steps 2 and 3 find a friend leaf node for lending out a vector. Steps 4 through 13 find a best fit vector to be borrowed. Steps 14 and 15 move the found vector from the lending leaf node to the underflow node. If no such a suitable friend leaf node, the underflow node is handled by the NodeReinsertition method. 4 Experiments Extensive experiments were conducted to evaluate the efficiency and effectiveness of the three deletion algorithms. The algorithms were implemented in C++. The efficiency is measured in terms of the number of disk I/Os for performing deletions, while the effectiveness is evaluated by the performance of the resulting ND-trees for executing queries.

5 Our experiments were conducted on a PC with Pentium D 3.40GHz CPU, 2GB memory and 400 GB hard disk. Performance evaluation was based on both the number of disk I/Os with the disk block size set at 4 kilobytes and the query performance in terms of the average number of I/Os for 100 random test queries. For each deletion algorithm, three experiments were conducted on three pre-built ND-trees with the set of indexed vectors of size 1 million, 2 millions, and 4 millions, respectively. Each set is made of 25-dimensional vectors with an alphabet of size 4, taking from real genome sequence data. For each tree, 50%, 70% and 90% deletions were performed. The minimum space utilization percentage for a disk block was set to 30%. Vector Node Borrow 50% M 70% % % M 70% % % M 70% % Table 1: Number of I/Os for Deletion Figure 2 and Table 1 show the comparison of disk I/Os for using the three deletion algorithms to delete 50%, 70%, 90% of the vectors from each ND-tree of size 1, 2 or 4 millions. I/O 6 x Vector Node Borrow M I/O Performance M Number of Vectors M Figure 2: Comparison of Deletion Performance The size of memory available for the algorithms was fixed at 4 megabytes. Since the VectorReinsertion algorithm reinserts all vectors contained in the relevant subtree(s) of the underflow node(s) from scratch, its deletion performance is the worst. Since the NodeReinsertion algorithm directly re-distributes the subtrees of the underflow node(s) in the given tree, its deletion performance is better. Since the BorrowReinsertion algorithm only performs local adjustments among sibling leaf nodes, its deletion performance is the best. From the experimental data, we can see that the BorrowReinsertion algorithm significantly outperformed the VectorReinsertion algorithm. For example, for the 90% deletions on the ND-tree of size 4 millions, the number of I/O for BorrowReinsertion algorithm was 56.2% less than that for the VectorReinsertion algorithm. The effectiveness of a deletion algorithm is evaluated by the quality of its resulting The quality of an ND-tree is measured in terms of (1) the average number of I/Os for performing 100 random test queries using the ND-tree and (2) the space utilization of the Table 2 shows the comparison of the query performance (for query ranges R=1, 2, 3) using the ND-trees obtained from the three deletion algorithms. From the experimental results, we can see that the query performance of the ND-tree from the VectorReinsertion algorithm is the best. This is because the reinserted vectors can be re-distributed in the entire tree without any restriction. The query performance of the NDtree from the NodeReinsertion algorithm is worse than the previous one. This is because this algorithm redistributes subtrees rather individual vectors, which binds a group of vectors together when re-inserting them. However, its query performance is better than that of the ND-tree obtained from the BorrowReinsertion algorithm since the latter only borrows one entry from a local sibling. DB Size 1 M 2 M 4 M VectorReinsertion % % % NodeReinsertion % % % BorrowReinsertion % % % Table 3: Space Utilization for ND-trees vectors. Table 3 shows the space utilization of the NDtrees from three deletion algorithms for deleting 50%. From the table, we can see that the space utilizations of these trees are comparable. 5 Conclusion There is an increasing demand on similarity searches in NDDSs. The ND-tree is an index tree specially-designed for supporting such queries in NDDSs. The earlier work on the ND-tree focused on its construction and query processing. This paper studies the deletion techniques and their performance for the ND-tree, which is an important issue for maintaining an efficient We present and evaluate three deletion algorithms for the They differ by the underflow

6 R=1 R=2 R=3 Vector Node Borrow Vector Node Borrow Vector Node Borrow 50% M 70% % % M 70% % % M 70% % Table 2: Query performance comparison node handling strategies. Algorithm VectorReinsertion handles an underflow node by removing it from the tree and reinserting all vectors contained in the subtree rooted at the underflow node into the tree via its root. Algorithm NodeReinsertion processes an underflow node by removing it from the tree, moving its child subtrees to a best sibling node and handling the overflow when needed. Algorithm BorrowReinsertion resolves an underflow node by borrowing a vector from a best sibling node. Heuristics are employed in the last two algorithms to decide a best sibling node. Experiments were conducted to evaluate the efficiency and effectiveness of these deletion techniques. The efficiency is measured by the number of I/Os for performing deletions, while the effectiveness is measured by the query performance using the resulting tree after deletions. Experimental results show that BorrowReinsertion has the best efficiency, followed by NodeReinsertion and then by VectorReinsertion. As for the effectiveness, VectorReinsertion is the best, followed by NodeReinsertion and then by BorrowReinsertion. The space utilizations of the result trees by all the above deletion techniques are comparable. From these observations, it is clear that a tradeoff between efficiency and effectiveness needs to be carefully considered when choosing a proper deletion technique for the Our future work includes investigating efficient direct update techniques for the ND-tree as well as maintenance techniques for the NSP-tree, which is another index technique for NDDSs. Acknowledgment The authors would like to thank Rachel Radziszewski for her help in implementing part of the experimental programs. References [1] Beckman, N., Kriegel, H., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. In Proc. of SIGMOD, pp , [2] Berchtold, S., Keim, D. A., Kriegel, H.-P.: The X-tree: an index structure for high-dimensional data. In Proc. of VLDB pp , [3] Guttman, A.: R-trees: a dynamic index structure for spatial searching. In Proc. of SIGMOD, pp , [4] Hanan, S.: Deletion in Two-Dimensional Quad Trees. In Commun. ACM, Vol. 23, No. 12, pp , [5] Henrich, A.: The LSDh-tree: an access structure for feature vectors. In Proc. of IEEE ICDE, pp , [6] Jannink, J.: Implementing Deletion in B + -Trees. In SIGMOD RECORD, Vol 24, No. 1, pp , [7] Kent, W. J.: BLAT the BLAST-like alignment tool. In Genome Research, Vol. 12, pp , [8] Maelbrancke, R., Olivie, H.: Optimizing Jan Jannink s Implementation of B-tree Deletion. In Proc. of SIG- MOD RECORD, Vol 24, No. 3, pp. 5 7, [9] Nanopoulos, A., Vassilakopoulos, M., Manolopoulos, Y.: Performance Evaluation of Lazy Deletion Methods in R-trees. In Proc. of GeoInformatica, pp , [10] Navarro, G., Reyes, N.: Improved Deletions in Dynamic Spatial Approximation Trees. In Proc. of Int l Conf. of the Chilean Computer Science Society, pp , [11] Ostadzadeh, S. A., Moulavi, M. A., Zeinalpour, Z.: The R*-tree: an efficient and robust access method for points and rectangles. In Proc. of SIGMOD, pp , [12] Qian, G., Zhu, Q., Xue, Q., Pramanik, S.: The ND- Tree: a dynamic indexing technique for multidimensional non-ordered discrete data spaces. In Proc. of VLDB, pp , [13] Qian, G., Zhu, Q., Xue, Q., Pramanik, S.: A Space- Partitioning-Based Indexing Method for Multidimensional Non-ordered Discrete Data Spaces. In ACM TOIS, Vol. 23, pp , [14] Qian, G., Zhu, Q., Xue, Q., Pramanik, S.: Dynamic Indexing for Multidimensional Non-ordered Discrete Data Spaces Using a Data-Partitioning Approach. In ACM TODS, Vol. 31, pp , [15] Robinson, J. T.: The K-D-B-tree: a search structure for large multidimensional dynamic indexes. In Proc. of SIGMOD, pp , [16] Seok, H.J., Qian, G., Zhu, Q., Oswald, A. and Pramanik, S.: Bulk-Loading the ND-Tree in Non-ordered Discrete Data Spaces. In Proc. of DASFAA 08, pp , [17] White, D., Jain, J.: Similarity indexing with the SStree. In Proc. of IEEE ICDE, pp , 1996.

The ND-Tree: A Dynamic Indexing Technique for Multidimensional Non-ordered Discrete Data Spaces

The ND-Tree: A Dynamic Indexing Technique for Multidimensional Non-ordered Discrete Data Spaces The ND-Tree: A Dynamic Indexing Technique for Multidimensional Non-ordered Discrete Data Spaces Gang Qian Qiang Zhu Qiang Xue Sakti Pramanik Department of Computer Science and Engineering Michigan State

More information

A Performance-Guaranteed Approximate Range Query Algorithm for the ND-Tree

A Performance-Guaranteed Approximate Range Query Algorithm for the ND-Tree A Performance-Guaranteed Approximate Range Query Algorithm for the ND-Tree Gang Qian 1 Qiang Zhu 2 Sakti Pramanik 3 1 Department of Computer Science, University of Central Oklahoma, Edmond, OK 73034, USA

More information

Exploring Deletion Strategies for the BoND-Tree in Multidimensional Non-ordered Discrete Data Spaces

Exploring Deletion Strategies for the BoND-Tree in Multidimensional Non-ordered Discrete Data Spaces Exploring Deletion Strategies for the BoND-Tree in Multidimensional Non-ordered Discrete Data Spaces Ramblin Cherniak Department of Computer and Information Science The University of Michigan - Dearborn

More information

Striped Grid Files: An Alternative for Highdimensional

Striped Grid Files: An Alternative for Highdimensional Striped Grid Files: An Alternative for Highdimensional Indexing Thanet Praneenararat 1, Vorapong Suppakitpaisarn 2, Sunchai Pitakchonlasap 1, and Jaruloj Chongstitvatana 1 Department of Mathematics 1,

More information

PRINCIPLES AND APPLICATIONS FOR SUPPORTING SIMILARITY QUERIES IN NON-ORDERED-DISCRETE AND CONTINUOUS DATA SPACES. Gang Qian A DISSERTATION

PRINCIPLES AND APPLICATIONS FOR SUPPORTING SIMILARITY QUERIES IN NON-ORDERED-DISCRETE AND CONTINUOUS DATA SPACES. Gang Qian A DISSERTATION PRINCIPLES AND APPLICATIONS FOR SUPPORTING SIMILARITY QUERIES IN NON-ORDERED-DISCRETE AND CONTINUOUS DATA SPACES By Gang Qian A DISSERTATION Submitted to Michigan State University in partial fulfillment

More information

X-tree. Daniel Keim a, Benjamin Bustos b, Stefan Berchtold c, and Hans-Peter Kriegel d. SYNONYMS Extended node tree

X-tree. Daniel Keim a, Benjamin Bustos b, Stefan Berchtold c, and Hans-Peter Kriegel d. SYNONYMS Extended node tree X-tree Daniel Keim a, Benjamin Bustos b, Stefan Berchtold c, and Hans-Peter Kriegel d a Department of Computer and Information Science, University of Konstanz b Department of Computer Science, University

More information

Advances in Data Management Principles of Database Systems - 2 A.Poulovassilis

Advances in Data Management Principles of Database Systems - 2 A.Poulovassilis 1 Advances in Data Management Principles of Database Systems - 2 A.Poulovassilis 1 Storing data on disk The traditional storage hierarchy for DBMSs is: 1. main memory (primary storage) for data currently

More information

(2,4) Trees. 2/22/2006 (2,4) Trees 1

(2,4) Trees. 2/22/2006 (2,4) Trees 1 (2,4) Trees 9 2 5 7 10 14 2/22/2006 (2,4) Trees 1 Outline and Reading Multi-way search tree ( 10.4.1) Definition Search (2,4) tree ( 10.4.2) Definition Search Insertion Deletion Comparison of dictionary

More information

Introduction to Indexing R-trees. Hong Kong University of Science and Technology

Introduction to Indexing R-trees. Hong Kong University of Science and Technology Introduction to Indexing R-trees Dimitris Papadias Hong Kong University of Science and Technology 1 Introduction to Indexing 1. Assume that you work in a government office, and you maintain the records

More information

So, we want to perform the following query:

So, we want to perform the following query: Abstract This paper has two parts. The first part presents the join indexes.it covers the most two join indexing, which are foreign column join index and multitable join index. The second part introduces

More information

Physical Level of Databases: B+-Trees

Physical Level of Databases: B+-Trees Physical Level of Databases: B+-Trees Adnan YAZICI Computer Engineering Department METU (Fall 2005) 1 B + -Tree Index Files l Disadvantage of indexed-sequential files: performance degrades as file grows,

More information

Background: disk access vs. main memory access (1/2)

Background: disk access vs. main memory access (1/2) 4.4 B-trees Disk access vs. main memory access: background B-tree concept Node structure Structural properties Insertion operation Deletion operation Running time 66 Background: disk access vs. main memory

More information

Indexing Techniques 3 rd Part

Indexing Techniques 3 rd Part Indexing Techniques 3 rd Part Presented by: Tarik Ben Touhami Supervised by: Dr. Hachim Haddouti CSC 5301 Spring 2003 Outline! Join indexes "Foreign column join index "Multitable join index! Indexing techniques

More information

Module 4: Index Structures Lecture 13: Index structure. The Lecture Contains: Index structure. Binary search tree (BST) B-tree. B+-tree.

Module 4: Index Structures Lecture 13: Index structure. The Lecture Contains: Index structure. Binary search tree (BST) B-tree. B+-tree. The Lecture Contains: Index structure Binary search tree (BST) B-tree B+-tree Order file:///c /Documents%20and%20Settings/iitkrana1/My%20Documents/Google%20Talk%20Received%20Files/ist_data/lecture13/13_1.htm[6/14/2012

More information

Multimedia Database Systems

Multimedia Database Systems Department of Informatics Aristotle University of Thessaloniki Fall 2016-2017 Multimedia Database Systems Indexing Part A Multidimensional Indexing Techniques Outline Motivation Multidimensional indexing

More information

(2,4) Trees Goodrich, Tamassia (2,4) Trees 1

(2,4) Trees Goodrich, Tamassia (2,4) Trees 1 (2,4) Trees 9 2 5 7 10 14 2004 Goodrich, Tamassia (2,4) Trees 1 Multi-Way Search Tree A multi-way search tree is an ordered tree such that Each internal node has at least two children and stores d -1 key-element

More information

Efficient k-nearest Neighbor Searching in Non-Ordered Discrete Data Spaces

Efficient k-nearest Neighbor Searching in Non-Ordered Discrete Data Spaces Efficient k-nearest Neighbor Searching in Non-Ordered Discrete Data Spaces DASHIELL KOLBE Michigan State University QIANG ZHU University of Michigan SAKTI PRAMANIK Michigan State University Numerous techniques

More information

R-Trees. Accessing Spatial Data

R-Trees. Accessing Spatial Data R-Trees Accessing Spatial Data In the beginning The B-Tree provided a foundation for R- Trees. But what s a B-Tree? A data structure for storing sorted data with amortized run times for insertion and deletion

More information

CSIT5300: Advanced Database Systems

CSIT5300: Advanced Database Systems CSIT5300: Advanced Database Systems L08: B + -trees and Dynamic Hashing Dr. Kenneth LEUNG Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong SAR,

More information

Intro to DB CHAPTER 12 INDEXING & HASHING

Intro to DB CHAPTER 12 INDEXING & HASHING Intro to DB CHAPTER 12 INDEXING & HASHING Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing

More information

M-ary Search Tree. B-Trees. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. Maximum branching factor of M Complete tree has height =

M-ary Search Tree. B-Trees. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. Maximum branching factor of M Complete tree has height = M-ary Search Tree B-Trees Section 4.7 in Weiss Maximum branching factor of M Complete tree has height = # disk accesses for find: Runtime of find: 2 Solution: B-Trees specialized M-ary search trees Each

More information

An index structure for efficient reverse nearest neighbor queries

An index structure for efficient reverse nearest neighbor queries An index structure for efficient reverse nearest neighbor queries Congjun Yang Division of Computer Science, Department of Mathematical Sciences The University of Memphis, Memphis, TN 38152, USA yangc@msci.memphis.edu

More information

CST-Trees: Cache Sensitive T-Trees

CST-Trees: Cache Sensitive T-Trees CST-Trees: Cache Sensitive T-Trees Ig-hoon Lee 1, Junho Shim 2, Sang-goo Lee 3, and Jonghoon Chun 4 1 Prompt Corp., Seoul, Korea ihlee@prompt.co.kr 2 Department of Computer Science, Sookmyung Women s University,

More information

Experimental Evaluation of Spatial Indices with FESTIval

Experimental Evaluation of Spatial Indices with FESTIval Experimental Evaluation of Spatial Indices with FESTIval Anderson Chaves Carniel 1, Ricardo Rodrigues Ciferri 2, Cristina Dutra de Aguiar Ciferri 1 1 Department of Computer Science University of São Paulo

More information

M-ary Search Tree. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. B-Trees (4.7 in Weiss)

M-ary Search Tree. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. B-Trees (4.7 in Weiss) M-ary Search Tree B-Trees (4.7 in Weiss) Maximum branching factor of M Tree with N values has height = # disk accesses for find: Runtime of find: 1/21/2011 1 1/21/2011 2 Solution: B-Trees specialized M-ary

More information

More B-trees, Hash Tables, etc. CS157B Chris Pollett Feb 21, 2005.

More B-trees, Hash Tables, etc. CS157B Chris Pollett Feb 21, 2005. More B-trees, Hash Tables, etc. CS157B Chris Pollett Feb 21, 2005. Outline B-tree Domain of Application B-tree Operations Hash Tables on Disk Hash Table Operations Extensible Hash Tables Multidimensional

More information

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing. Basic Concepts Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition

More information

Spatial Data Management

Spatial Data Management Spatial Data Management [R&G] Chapter 28 CS432 1 Types of Spatial Data Point Data Points in a multidimensional space E.g., Raster data such as satellite imagery, where each pixel stores a measured value

More information

(2,4) Trees Goodrich, Tamassia. (2,4) Trees 1

(2,4) Trees Goodrich, Tamassia. (2,4) Trees 1 (2,4) Trees 9 2 5 7 10 14 (2,4) Trees 1 Multi-Way Search Tree ( 9.4.1) A multi-way search tree is an ordered tree such that Each internal node has at least two children and stores d 1 key-element items

More information

Multi-way Search Trees. (Multi-way Search Trees) Data Structures and Programming Spring / 25

Multi-way Search Trees. (Multi-way Search Trees) Data Structures and Programming Spring / 25 Multi-way Search Trees (Multi-way Search Trees) Data Structures and Programming Spring 2017 1 / 25 Multi-way Search Trees Each internal node of a multi-way search tree T: has at least two children contains

More information

Using Natural Clusters Information to Build Fuzzy Indexing Structure

Using Natural Clusters Information to Build Fuzzy Indexing Structure Using Natural Clusters Information to Build Fuzzy Indexing Structure H.Y. Yue, I. King and K.S. Leung Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin, New Territories,

More information

Spatial Data Management

Spatial Data Management Spatial Data Management Chapter 28 Database management Systems, 3ed, R. Ramakrishnan and J. Gehrke 1 Types of Spatial Data Point Data Points in a multidimensional space E.g., Raster data such as satellite

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

Using the Holey Brick Tree for Spatial Data. in General Purpose DBMSs. Northeastern University

Using the Holey Brick Tree for Spatial Data. in General Purpose DBMSs. Northeastern University Using the Holey Brick Tree for Spatial Data in General Purpose DBMSs Georgios Evangelidis Betty Salzberg College of Computer Science Northeastern University Boston, MA 02115-5096 1 Introduction There is

More information

Material You Need to Know

Material You Need to Know Review Quiz 2 Material You Need to Know Normalization Storage and Disk File Layout Indexing B-trees and B+ Trees Extensible Hashing Linear Hashing Decomposition Goals: Lossless Joins, Dependency preservation

More information

Multidimensional Indexing The R Tree

Multidimensional Indexing The R Tree Multidimensional Indexing The R Tree Module 7, Lecture 1 Database Management Systems, R. Ramakrishnan 1 Single-Dimensional Indexes B+ trees are fundamentally single-dimensional indexes. When we create

More information

CSE 530A. B+ Trees. Washington University Fall 2013

CSE 530A. B+ Trees. Washington University Fall 2013 CSE 530A B+ Trees Washington University Fall 2013 B Trees A B tree is an ordered (non-binary) tree where the internal nodes can have a varying number of child nodes (within some range) B Trees When a key

More information

Multiway searching. In the worst case of searching a complete binary search tree, we can make log(n) page faults Everyone knows what a page fault is?

Multiway searching. In the worst case of searching a complete binary search tree, we can make log(n) page faults Everyone knows what a page fault is? Multiway searching What do we do if the volume of data to be searched is too large to fit into main memory Search tree is stored on disk pages, and the pages required as comparisons proceed may not be

More information

2. Dynamic Versions of R-trees

2. Dynamic Versions of R-trees 2. Dynamic Versions of R-trees The survey by Gaede and Guenther [69] annotates a vast list of citations related to multi-dimensional access methods and, in particular, refers to R-trees to a significant

More information

Datenbanksysteme II: Multidimensional Index Structures 2. Ulf Leser

Datenbanksysteme II: Multidimensional Index Structures 2. Ulf Leser Datenbanksysteme II: Multidimensional Index Structures 2 Ulf Leser Content of this Lecture Introduction Partitioned Hashing Grid Files kdb Trees kd Tree kdb Tree R Trees Example: Nearest neighbor image

More information

THE B+ TREE INDEX. CS 564- Spring ACKs: Jignesh Patel, AnHai Doan

THE B+ TREE INDEX. CS 564- Spring ACKs: Jignesh Patel, AnHai Doan THE B+ TREE INDEX CS 564- Spring 2018 ACKs: Jignesh Patel, AnHai Doan WHAT IS THIS LECTURE ABOUT? The B+ tree index Basics Search/Insertion/Deletion Design & Cost 2 INDEX RECAP We have the following query:

More information

Spatiotemporal Access to Moving Objects. Hao LIU, Xu GENG 17/04/2018

Spatiotemporal Access to Moving Objects. Hao LIU, Xu GENG 17/04/2018 Spatiotemporal Access to Moving Objects Hao LIU, Xu GENG 17/04/2018 Contents Overview & applications Spatiotemporal queries Movingobjects modeling Sampled locations Linear function of time Indexing structure

More information

Benchmarking the UB-tree

Benchmarking the UB-tree Benchmarking the UB-tree Michal Krátký, Tomáš Skopal Department of Computer Science, VŠB Technical University of Ostrava, tř. 17. listopadu 15, Ostrava, Czech Republic michal.kratky@vsb.cz, tomas.skopal@vsb.cz

More information

Data Warehousing & Data Mining

Data Warehousing & Data Mining Data Warehousing & Data Mining Wolf-Tilo Balke Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Summary Last week: Logical Model: Cubes,

More information

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See  for conditions on re-use Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files Static

More information

Motivation for B-Trees

Motivation for B-Trees 1 Motivation for Assume that we use an AVL tree to store about 20 million records We end up with a very deep binary tree with lots of different disk accesses; log2 20,000,000 is about 24, so this takes

More information

Organizing Spatial Data

Organizing Spatial Data Organizing Spatial Data Spatial data records include a sense of location as an attribute. Typically location is represented by coordinate data (in 2D or 3D). 1 If we are to search spatial data using the

More information

Chap4: Spatial Storage and Indexing. 4.1 Storage:Disk and Files 4.2 Spatial Indexing 4.3 Trends 4.4 Summary

Chap4: Spatial Storage and Indexing. 4.1 Storage:Disk and Files 4.2 Spatial Indexing 4.3 Trends 4.4 Summary Chap4: Spatial Storage and Indexing 4.1 Storage:Disk and Files 4.2 Spatial Indexing 4.3 Trends 4.4 Summary Learning Objectives Learning Objectives (LO) LO1: Understand concept of a physical data model

More information

Introduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana

Introduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana Introduction to Indexing 2 Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana Indexed Sequential Access Method We have seen that too small or too large an index (in other words too few or too

More information

Balanced Search Trees

Balanced Search Trees Balanced Search Trees Computer Science E-22 Harvard Extension School David G. Sullivan, Ph.D. Review: Balanced Trees A tree is balanced if, for each node, the node s subtrees have the same height or have

More information

Chapter 12: Indexing and Hashing (Cnt(

Chapter 12: Indexing and Hashing (Cnt( Chapter 12: Indexing and Hashing (Cnt( Cnt.) Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition

More information

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15 Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2014/15 Lecture II: Indexing Part I of this course Indexing 3 Database File Organization and Indexing Remember: Database tables

More information

Improving the Query Performance of High-Dimensional Index Structures by Bulk Load Operations

Improving the Query Performance of High-Dimensional Index Structures by Bulk Load Operations Improving the Query Performance of High-Dimensional Index Structures by Bulk Load Operations Stefan Berchtold, Christian Böhm 2, and Hans-Peter Kriegel 2 AT&T Labs Research, 8 Park Avenue, Florham Park,

More information

CS 350 : Data Structures B-Trees

CS 350 : Data Structures B-Trees CS 350 : Data Structures B-Trees David Babcock (courtesy of James Moscola) Department of Physical Sciences York College of Pennsylvania James Moscola Introduction All of the data structures that we ve

More information

Database index structures

Database index structures Database index structures From: Database System Concepts, 6th edijon Avi Silberschatz, Henry Korth, S. Sudarshan McGraw- Hill Architectures for Massive DM D&K / UPSay 2015-2016 Ioana Manolescu 1 Chapter

More information

CSE 326: Data Structures B-Trees and B+ Trees

CSE 326: Data Structures B-Trees and B+ Trees Announcements (2/4/09) CSE 26: Data Structures B-Trees and B+ Trees Midterm on Friday Special office hour: 4:00-5:00 Thursday in Jaech Gallery (6 th floor of CSE building) This is in addition to my usual

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

CS350: Data Structures B-Trees

CS350: Data Structures B-Trees B-Trees James Moscola Department of Engineering & Computer Science York College of Pennsylvania James Moscola Introduction All of the data structures that we ve looked at thus far have been memory-based

More information

Summary. 4. Indexes. 4.0 Indexes. 4.1 Tree Based Indexes. 4.0 Indexes. 19-Nov-10. Last week: This week:

Summary. 4. Indexes. 4.0 Indexes. 4.1 Tree Based Indexes. 4.0 Indexes. 19-Nov-10. Last week: This week: Summary Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Last week: Logical Model: Cubes,

More information

B-Trees & its Variants

B-Trees & its Variants B-Trees & its Variants Advanced Data Structure Spring 2007 Zareen Alamgir Motivation Yet another Tree! Why do we need another Tree-Structure? Data Retrieval from External Storage In database programs,

More information

Remember. 376a. Database Design. Also. B + tree reminders. Algorithms for B + trees. Remember

Remember. 376a. Database Design. Also. B + tree reminders. Algorithms for B + trees. Remember 376a. Database Design Dept. of Computer Science Vassar College http://www.cs.vassar.edu/~cs376 Class 14 B + trees, multi-key indices, partitioned hashing and grid files B and B + -trees are used one implementation

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Database System Concepts, 5th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase

Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase Bumjoon Jo and Sungwon Jung (&) Department of Computer Science and Engineering, Sogang University, 35 Baekbeom-ro, Mapo-gu, Seoul 04107,

More information

B-Trees. Based on materials by D. Frey and T. Anastasio

B-Trees. Based on materials by D. Frey and T. Anastasio B-Trees Based on materials by D. Frey and T. Anastasio 1 Large Trees n Tailored toward applications where tree doesn t fit in memory q operations much faster than disk accesses q want to limit levels of

More information

A Study on Reverse Top-K Queries Using Monochromatic and Bichromatic Methods

A Study on Reverse Top-K Queries Using Monochromatic and Bichromatic Methods A Study on Reverse Top-K Queries Using Monochromatic and Bichromatic Methods S.Anusuya 1, M.Balaganesh 2 P.G. Student, Department of Computer Science and Engineering, Sembodai Rukmani Varatharajan Engineering

More information

Pivoting M-tree: A Metric Access Method for Efficient Similarity Search

Pivoting M-tree: A Metric Access Method for Efficient Similarity Search Pivoting M-tree: A Metric Access Method for Efficient Similarity Search Tomáš Skopal Department of Computer Science, VŠB Technical University of Ostrava, tř. 17. listopadu 15, Ostrava, Czech Republic tomas.skopal@vsb.cz

More information

A Parallel Access Method for Spatial Data Using GPU

A Parallel Access Method for Spatial Data Using GPU A Parallel Access Method for Spatial Data Using GPU Byoung-Woo Oh Department of Computer Engineering Kumoh National Institute of Technology Gumi, Korea bwoh@kumoh.ac.kr Abstract Spatial access methods

More information

Problem. Indexing with B-trees. Indexing. Primary Key Indexing. B-trees: Example. B-trees. primary key indexing

Problem. Indexing with B-trees. Indexing. Primary Key Indexing. B-trees: Example. B-trees. primary key indexing 15-82 Advanced Topics in Database Systems Performance Problem Given a large collection of records, Indexing with B-trees find similar/interesting things, i.e., allow fast, approximate queries 2 Indexing

More information

amiri advanced databases '05

amiri advanced databases '05 More on indexing: B+ trees 1 Outline Motivation: Search example Cost of searching with and without indices B+ trees Definition and structure B+ tree operations Inserting Deleting 2 Dense ordered index

More information

Search Trees - 1 Venkatanatha Sarma Y

Search Trees - 1 Venkatanatha Sarma Y Search Trees - 1 Lecture delivered by: Venkatanatha Sarma Y Assistant Professor MSRSAS-Bangalore 11 Objectives To introduce, discuss and analyse the different ways to realise balanced Binary Search Trees

More information

A Pivot-based Index Structure for Combination of Feature Vectors

A Pivot-based Index Structure for Combination of Feature Vectors A Pivot-based Index Structure for Combination of Feature Vectors Benjamin Bustos Daniel Keim Tobias Schreck Department of Computer and Information Science, University of Konstanz Universitätstr. 10 Box

More information

Packet Classification Using Dynamically Generated Decision Trees

Packet Classification Using Dynamically Generated Decision Trees 1 Packet Classification Using Dynamically Generated Decision Trees Yu-Chieh Cheng, Pi-Chung Wang Abstract Binary Search on Levels (BSOL) is a decision-tree algorithm for packet classification with superior

More information

Indexing Biometric Databases using Pyramid Technique

Indexing Biometric Databases using Pyramid Technique Indexing Biometric Databases using Pyramid Technique Amit Mhatre, Sharat Chikkerur and Venu Govindaraju Center for Unified Biometrics and Sensors (CUBS), University at Buffalo, New York, U.S.A http://www.cubs.buffalo.edu

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

STUDIA INFORMATICA Nr 1-2 (19) Systems and information technology 2015

STUDIA INFORMATICA Nr 1-2 (19) Systems and information technology 2015 STUDIA INFORMATICA Nr 1- (19) Systems and information technology 015 Adam Drozdek, Dušica Vujanović Duquesne University, Department of Mathematics and Computer Science, Pittsburgh, PA 158, USA Atomic-key

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

CARNEGIE MELLON UNIVERSITY DEPT. OF COMPUTER SCIENCE DATABASE APPLICATIONS

CARNEGIE MELLON UNIVERSITY DEPT. OF COMPUTER SCIENCE DATABASE APPLICATIONS CARNEGIE MELLON UNIVERSITY DEPT. OF COMPUTER SCIENCE 15-415 DATABASE APPLICATIONS C. Faloutsos Indexing and Hashing 15-415 Database Applications http://www.cs.cmu.edu/~christos/courses/dbms.s00/ general

More information

Appropriate Item Partition for Improving the Mining Performance

Appropriate Item Partition for Improving the Mining Performance Appropriate Item Partition for Improving the Mining Performance Tzung-Pei Hong 1,2, Jheng-Nan Huang 1, Kawuu W. Lin 3 and Wen-Yang Lin 1 1 Department of Computer Science and Information Engineering National

More information

Implementation Techniques

Implementation Techniques V Implementation Techniques 34 Efficient Evaluation of the Valid-Time Natural Join 35 Efficient Differential Timeslice Computation 36 R-Tree Based Indexing of Now-Relative Bitemporal Data 37 Light-Weight

More information

Extra: B+ Trees. Motivations. Differences between BST and B+ 10/27/2017. CS1: Java Programming Colorado State University

Extra: B+ Trees. Motivations. Differences between BST and B+ 10/27/2017. CS1: Java Programming Colorado State University Extra: B+ Trees CS1: Java Programming Colorado State University Slides by Wim Bohm and Russ Wakefield 1 Motivations Many times you want to minimize the disk accesses while doing a search. A binary search

More information

Indexing Cached Multidimensional Objects in Large Main Memory Systems

Indexing Cached Multidimensional Objects in Large Main Memory Systems Indexing Cached Multidimensional Objects in Large Main Memory Systems Beomseok Nam and Alan Sussman UMIACS and Dept. of Computer Science University of Maryland College Park, MD 2742 bsnam,als @cs.umd.edu

More information

Chapter 11: Indexing and Hashing" Chapter 11: Indexing and Hashing"

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing" Database System Concepts, 6 th Ed.! Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use " Chapter 11: Indexing and Hashing" Basic Concepts!

More information

Perfect Hashing Base R-tree for Multiple Queries

Perfect Hashing Base R-tree for Multiple Queries Perfect Hashing Base R-tree for Multiple Queries Thesis submitted in partial fulfillment of the requirements for the award of degree of Master of Engineering in Computer Science and Engineering Submitted

More information

Introduction. New latch modes

Introduction. New latch modes A B link Tree method and latch protocol for synchronous node deletion in a high concurrency environment Karl Malbrain malbrain@cal.berkeley.edu Introduction A new B link Tree latching method and protocol

More information

Merging R-trees. Abstract. 1. Introduction

Merging R-trees. Abstract. 1. Introduction R-trees Vasilis Vasaitis Alexandros Nanopoulos Panayiotis Bozanis Dept. of Informatics Dept. of Informatics Computer & Commun. Eng. Dept. Aristotle University Aristotle University University of Thessaly

More information

Chapter 17 Indexing Structures for Files and Physical Database Design

Chapter 17 Indexing Structures for Files and Physical Database Design Chapter 17 Indexing Structures for Files and Physical Database Design We assume that a file already exists with some primary organization unordered, ordered or hash. The index provides alternate ways to

More information

Chapter 2: The Game Core. (part 2)

Chapter 2: The Game Core. (part 2) Ludwig Maximilians Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Lecture Notes for Managing and Mining Multiplayer Online Games for the Summer Semester 2017

More information

Multi-Way Search Trees

Multi-Way Search Trees Multi-Way Search Trees Manolis Koubarakis 1 Multi-Way Search Trees Multi-way trees are trees such that each internal node can have many children. Let us assume that the entries we store in a search tree

More information

Search Space Reductions for Nearest-Neighbor Queries

Search Space Reductions for Nearest-Neighbor Queries Search Space Reductions for Nearest-Neighbor Queries Micah Adler 1 and Brent Heeringa 2 1 Department of Computer Science, University of Massachusetts, Amherst 140 Governors Drive Amherst, MA 01003 2 Department

More information

Multi-Way Search Trees

Multi-Way Search Trees Multi-Way Search Trees Manolis Koubarakis 1 Multi-Way Search Trees Multi-way trees are trees such that each internal node can have many children. Let us assume that the entries we store in a search tree

More information

Relaxed Space Bounding for Moving Objects: A Case for the Buddy Tree

Relaxed Space Bounding for Moving Objects: A Case for the Buddy Tree Relaxed Space Bounding for Moving Objects: A Case for the Buddy ree Shuqiao Guo Zhiyong Huang H. V. Jagadish Beng Chin Ooi Zhenjie Zhang Department of Computer Science National University of Singapore,

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

Extendible Chained Bucket Hashing for Main Memory Databases. Abstract

Extendible Chained Bucket Hashing for Main Memory Databases. Abstract Extendible Chained Bucket Hashing for Main Memory Databases Pyung-Chul Kim *, Kee-Wook Rim, Jin-Pyo Hong Electronics and Telecommunications Research Institute (ETRI) P.O. Box 106, Yusong, Taejon, 305-600,

More information

The B-Tree. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland

The B-Tree. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland Yufei Tao ITEE University of Queensland Before ascending into d-dimensional space R d with d > 1, this lecture will focus on one-dimensional space, i.e., d = 1. We will review the B-tree, which is a fundamental

More information

B-Trees. Disk Storage. What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree. Deletion in a B-tree

B-Trees. Disk Storage. What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree. Deletion in a B-tree B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree Deletion in a B-tree Disk Storage Data is stored on disk (i.e., secondary memory) in blocks. A block is

More information

Principles of Data Management. Lecture #14 (Spatial Data Management)

Principles of Data Management. Lecture #14 (Spatial Data Management) Principles of Data Management Lecture #14 (Spatial Data Management) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s Notable News v Project

More information

Main Memory and the CPU Cache

Main Memory and the CPU Cache Main Memory and the CPU Cache CPU cache Unrolled linked lists B Trees Our model of main memory and the cost of CPU operations has been intentionally simplistic The major focus has been on determining

More information

Distributed k-nn Query Processing for Location Services

Distributed k-nn Query Processing for Location Services Distributed k-nn Query Processing for Location Services Jonghyeong Han 1, Joonwoo Lee 1, Seungyong Park 1, Jaeil Hwang 1, and Yunmook Nah 1 1 Department of Electronics and Computer Engineering, Dankook

More information

Adaptive k-nearest-neighbor Classification Using a Dynamic Number of Nearest Neighbors

Adaptive k-nearest-neighbor Classification Using a Dynamic Number of Nearest Neighbors Adaptive k-nearest-neighbor Classification Using a Dynamic Number of Nearest Neighbors Stefanos Ougiaroglou 1 Alexandros Nanopoulos 1 Apostolos N. Papadopoulos 1 Yannis Manolopoulos 1 Tatjana Welzer-Druzovec

More information

Indexing High-Dimensional Data for Content-Based Retrieval in Large Databases

Indexing High-Dimensional Data for Content-Based Retrieval in Large Databases Indexing High-Dimensional Data for Content-Based Retrieval in Large Databases Manuel J. Fonseca, Joaquim A. Jorge Department of Information Systems and Computer Science INESC-ID/IST/Technical University

More information