Striped Grid Files: An Alternative for Highdimensional

Size: px
Start display at page:

Download "Striped Grid Files: An Alternative for Highdimensional"

Transcription

1 Striped Grid Files: An Alternative for Highdimensional Indexing Thanet Praneenararat 1, Vorapong Suppakitpaisarn 2, Sunchai Pitakchonlasap 1, and Jaruloj Chongstitvatana 1 Department of Mathematics 1, Department of Computer Engineering 2 Chulalongkorn University Bangkok 133 THAILAND Abstract-In this paper, we propose an index structure for high-dimensional data which applies to the concept of striping to grid files. We call this structure a striped grid file. In a striped grid file, there are a number of grid files, each of which is indexed by a subset of attributes which is striped from the original set of attributes. Each entry in these grid files is the index of another grid file, which contains pointers to the actual disk pages storing data records. Some experiments are performed to measure the performance of striped grid files, in term of the number of disk accesses and the storage utilization. It is found that striped grid files give much better storage utilization than single grid files, but little higher disk access. However, from the nature of grid files, the number of disk accesses for point queries in striped grid files is also constant. Furthermore, it is found that if the lower the number of dimension of the root grid, the better storage utilization of the striped grid file. I. INTRODUCTION High-dimensional index structures are becoming increasingly necessary because of the use of multimedia databases, text databases, genomic databases, etc. Multidimensional index structures, e.g. kdb-trees [1], R-trees [2], grid files [3], etc., do not work well when the number of dimensions or attributes is high. This problem, called curse of dimensionality [4], is caused by the exponential growth of data space with respect to the number of dimensions. Many index structures for high-dimensional databases are proposed to alleviate this problem. The approaches can be classified into two types. One approach, e.g. TV-trees [5], X-trees [6] and NSP-trees [7], is to use some heuristics to organize the index structures. This approach works well for specific applications and specific data types. The other approach, called tree striping [8], is to reduce the data dimension. The data space is divided into disjoint subspaces of lower dimensionality such that the cross-product of the subspaces is the original data space. The subspaces are organized using an arbitrary multidimensional index structure. A grid file [3] is a multidimensional index structure which requires low disk access at the expense of storage utilization. A grid file is composed of data pages containing data records and directory pages containing a multidimensional array of pointers to data pages. It is not practical to apply grid files for high-dimensional data because the directory size grows exponentially with respect to the number of dimensions. In this paper, we propose to apply striping to grid files in order to improve the storage utilization while maintaining low disk access of grid files. A striped grid file is composed of many reduceddimension grid files, called leaf grids, and another grid file, called root grid, which is used to combine the result of queries from the leaf grids. From the experiments, it is found that striping can reduce the storage required for grid files while maintaining a reasonably low disk access. Furthermore, it is found that the lower the dimension of root grid, the better the storage utilization. This paper is organized as follows. The structure of striped grid files is defined in Section 2. The algorithms for striped grid files are elaborated in Section 3. In Section 4, the experiment set-up, experimental results and discussion are shown. Conclusion is given in Section 5. II. STRUCTURE OF STRIPED GRID FILES A grid file partitions a set of data into smaller sets according to the range of each attribute. The partition for one dimension is independent of that for other dimensions. The partitioned range for each attribute is stored in a onedimensional array called a linear scale. The index for n- dimensional data is stored in an n-dimension array, called a grid directory, in which each entry is a pointer to a disk page, called a data page, containing data within the specified range. As a result, the size of grid directory is in the order of c n, where n is the dimension of data. However, the access of multi-dimensional array is very efficient. To apply the concept of striping on grid files, a reduceddimension grid directory, called a leaf grid, is created to partition data into stripes according to a set of attributes. For example, given data records with key <a 1, a 2, a 3, a 4 >, data can be striped according to <a 1, a 2 > and <a 3, a 4 >. According to attributes a 1 and a 2, data are partitioned using traditional grid file, and the index is stored in a leaf grid g 1. Similarly for attributes a 3 and a 4, the index is stored in a leaf grid g 2. However, a pointer in each leaf grid does not point directly to a data page, but point to another grid directory, called a root grid. In the root grid, each dimension in the grid directory is the range of the attributes in each leaf grid. Following from the previous example, the two dimensions of the root grid correspond to the set of elements in the leaf grids g 1 and g 2. A striped grid file SG can be denoted by <GR, G 1 G k >, where GR is the root grid, and G 1, G 2,, and G k are the leaf grids. For n-dimensional data, a striped grid file partitions data according to k sets of attributes, where k is neither 1 nor n (assuming n is divisible by k). For the striped grid file illustrated in figure 1, there are 2 leaf grid of 2-dimension. Each leaf grid partitions data according to d attributes of the

2 3 556 LS LS2 2 point to the same data page only when the two entries are adjacent in the grid directory. Thus, the split algorithm and the merge algorithm must not violate this condition. This can reduce the number of disk access in range queries. However, it can lower the storage utilization. In striped grid files, a page pointer list is included, as an intermediate structure, at the root grid to allow any entry in the root grid to associate with the same data page. A page pointer list, denoted by PR, is a list of pointers to data pages. Figure 2 shows a data page p which is associated with three non-adjacent entries in the root grid. If a data page is associated with more than one entry in the grid directory, it is called a packed page. Otherwise; it is called an unpacked page. When data pages are packed together, the storage utilization can be increased and the split and the merge algorithms are simple. However, the number of disk access in range queries can be increased because data records in a data page are not necessarily adjacent in values of keys. Figure 1. A structure of striped grid files n attributes where d = n/k. Data records are stored in data pages which can contain a fixed number of data records. Next, each structure is described in more detail. A. Leaf Grids Each leaf grid, G i (1 i k), is a single d-dimensional grid directory. Thus, each is composed of d linear scales, LS i j (1 i k, 1 j d), one d-dimensional grid directory, D i, called a leaf grid directory and one buddy tree T i. Linear scales for leaf grids are exactly those for the original grid files. On the other hand, grid directories in each leaf grid are different from those in original grid files. For a leaf grid, each entry in the grid directory contains an integer which is the number of its corresponding leaf node in its buddy tree. That is, G i (a i 1, a i 2,, a i d ) contains a pointer bt to a leaf node L in the buddy tree T i where the leaf node L represents a range R L of values of attributes <a i 1,..., a i d > specified in the linear scales. A buddy tree is a binary tree whose nodes are associated with areas in its grid directory. The root node of a buddy tree associated with the whole area of its corresponding grid directory. Furthermore, in the grid the union of the area in two child nodes is the area in the grid associated with their parent. A number is assigned to each leaf node in the buddy tree T, and is used as an index of the root grid. This number, denoted by Inorder(T, x), is the sequence number of the leaf node visited in an inorder traversal. In other words, Inorder(T, x) = y, if x is the y th leaf node visited in the inorder traversal of T. B. Root Grid The root grid GR is a k-dimensional grid, in which each dimension is indexed by k values obtained from k leaf grids. In the original grid files, two entries in the grid directory can III. ALGORITHMS Algorithms of point query, range query, insertion and deletion depend on those of tree striping and traditional grid files plus some modifications as described below. Basic algorithm for grid files can be found in [3]. A. Point Queries A point query is used to find a record with a specified search key. Algorithm PointQuery Given a striped grid file SG, find a record with a search key <a 1,..., a n >. (1) [Retrieve a data page] Invoke PageQuery to retrieve a data page pp containing data with the key <a 1,..., a n >. (2) [Find the data record in data page] Search for a record containing the key <a 1,..., a n > in the data page pp. If found, return the data record. If not, return null. Figure 2. The association between page pointer list and data pages

3 Algorithm PageQuery Given a search key <a 1,..., a n >, find a data page contains a record with the specified search key. (1) [Divide key] Divide the key <a 1,..., a n > to k d- dimensional keys <a i 1,..., a i d > for 1 i k. (2) [Query the leaf grids] For 1 i k, query leaf grid G i with d-dimensional key <a i 1,..., a i d >. The result is <NL 1, NL 2,, NL k > where NL i is a pointer to a leaf node in a buddy tree T i. (3) [Inorder traversal in the buddy trees] For 1 i k, O i = Inorder(NL i ). (4) [Query the root grid] Query the root grid with the key <O 1,..., O k > obtained from (3). The result is a pointer pr in the page pointer list. (5) [Look up the page pointer] Look up the pointer pr in the page pointer list. Return the pointer to a data page pp as a result. B. Range Queries A range query finds records with search keys that are in the specified range from a striped grid file. The result is a set of data within the specified range. Algorithm RangeQuery Given a striped grid file SG, find records whose search key is in the range R = <[a 1,b 1 ],..., [a n,b n ]>. (1) [Retrieve data pages] Invoke PageRangeQuery to retrieve a set pp of data pages in which each member of pp is in the range R. (2) [Find the data records in data page] Search for a record containing the key <a 1,..., a n > which is in the range R in all data pages in pp. Return all of the members that are found. Algorithm PageRangeQuery Given an n-dimensional range R = <[a 1,b 1 ],..., [a n,b n ]>, find a data page containing records whose search key is in the range R. (1) [Divide range] Divide the range <[a 1,b 1 ],..., [a n,b n ]> to k d-dimensional ranges <[a i 1, b i 1 ],..., [a i d, b i d ]> for 1 i k. (2) [Query the leaf grids] For 1 i k, query a leaf grid G i with d-dimensional ranges <[a i 1,b i 1 ],..., [a i d,b i d ]>. The result is a set of <NL 1, NL 2,, NL k > where NL i is a pointer to a leaf node in a buddy tree T i. (3) [Inorder traversal in the buddy trees] For 1 i k and for each member of the set of <NL 1, NL 2,, NL k >, find the set of <O 1,..., O k > where O i = Inorder(NL i ). (4) [Query the root grid] Query the root grid with each of the key <O 1,..., O k > obtained from (3). The result is a set pr of pointers to an element in the page pointer list. (5) [Look up the page pointers] Look up the set of the pointers pr in the page pointer list. Return a set of the pointers to a data page pp as a result. C. Insertion Insertion algorithm inserts a data record into a striped grid file. If the data page is overflow, then invoke split algorithm. Algorithm Insert Insert a data record with a search key <a 1,..., a n > to a striped grid file SG. (1) [Retrieve data page] Invoke PageQuery with the key <a 1,..., a n > to retrieve a data page pp. (2) [Add record to data page] If the inserted record makes the data page pp overflow, invoke Split to split this data page into two new data pages pp and pr. Add the new record to the data page pp. D. Splitting When the number of records in a data page exceeds the maximum size of the data page, it needs to be split into two new data pages so that new records can be inserted. Splitting occurs at two levels at the page pointer list and at the leaf grid. If an overflow occurs in a packed page, the split occurs at the page pointer list. That is, a new data page is allocated and the pointer to the overflow page in the page pointer list is moved to the new data page, as shown in figure 3. Finally, the data within the range of the data which causes splitting must be reallocated to the new data page. On the other hand, if an overflow occurs in an unpacked page, the split occurs at the leaf grid, and the root grid is split as a result. Splitting at this level is similar to splitting in the original grid file. The following is the splitting algorithm. Algorithm Split Given an entry q in the root grid such that q points to the element pr i in the page pointer list and pr i points to the data page pp, and the range R of data in q. (1) [Determine the level of splitting] If pp is a packed page, goto (2) to split at the page pointer list, else, goto (3) to split at the leaf grid. (2) [Split at the page pointer list] Allocate a new data page pr, move data in the range R from pp to pq, and change the pointer pr i in the page pointer list to point to pq. Then, return pq. (3) [Split at the leaf grid] (3.1) [Choose the attribute for splitting and the splitting point] Randomly choose the attribute S n to be split, and choose the median, sp, of data in pp, with respect to the dimension S n, as the splitting point. (3.2) [Split the leaf grid directory] Find a leaf grid G si in which S n is an index in one of its dimensions. Split G si along the dimension S n at the value sp. (3.3) [Update the buddy tree] Find a leaf node v in the buddy tree corresponding to the leaf grid G si such that v points to the entry q in the root grid. Create two children, v 1 and v 2, of v. Associate these two nodes with their two corresponding entries in G si created in (3.2) (4) [Split the root grid] Find the dimension corresponding to the node v in (3.3) and split q in the root grid in that dimension to make two rows, for v 1 and v 2, instead of v. Then, let the two split entries point to a new data page pr and the old data page pp. Finally, partition data among pr and pp according to the splitting point sp. Figure 3. The method to split a data page

4 E. Deletion Deletion algorithm deletes a specified record from a striped grid file. If a data page containing the newly deleted record is underflow, then invoke merging algorithm to reorganize the structure. Algorithm Delete Delete a record with a search key <a 1,..., a n > from a striped grid file SG. (1) [Retrieve data page] Invoke PageQuery with the key <a 1,..., a n > to retrieve a data page pp. (2) [Delete record from data page] Delete the record with the key <a 1,..., a n > from the data page pp. If the deleted record makes the data page pp underflow, invoke Merge to reorganize the structure. F. Merging When the number of records in some data pages is below the specified threshold or the storage utilization become low because of the split algorithm, the underflow data pages need to be merged to maintain the storage utilization. Merging occurs only at the level of page pointer list although splitting occurs at the level of grid also. Merging at the level of grid can reduce both the size of grid directory and the number of data pages. However, with the restriction of grid directories, merging hardly ever occurs. On the other hand, merging at the level of page pointer list is simple and can occur whenever there is an underflow page. To merge two data pages p i and p j, pointed by PR i and PR j in the page pointer list, data records in p j are added to p i, PR j is set to PR i, and p j is free. As a result, it is effective to maintain the required storage utilization. Algorithm Merge Given a data page list P = <P 1, P 2, P m >. (1) [Sort the data page list] Sort the data pages according to their sizes. (2) [Find a pair of data pages] Find a data page with the minimum number of data records among all the data pages and a data page with the maximum number of data records. If no pair of data pages satisfies this condition, go to (5). (3) [Merge the pair of data pages] Merge the selected pair of data pages from (2) and store its data records in one of the data pages, and free the other data page. (4) [Update the page pointer list] Update the page pointer to point to the merged data page. (5) [Repeat until the condition cannot be satisfied] Goto (2). IV. EXPERIMENTS AND RESULTS To evaluate a performance of striped grid files, we conducted experiments for synthetic data with different dimensionalities and distribution. The simulation of the index structures was implemented on JAVA 5 with 2 GB memory, assuming the disk block is 2 KB, to measure the number of disk access and the storage utilization. Synthetic data sets were generated with uniform distribution. We experimented on 4-, 6-, and 8-dimensional data. For 4-dimensional data, a 4-dimensional grid file is compared to a 2x2 striped grid file (2 stripes of 2- dimensional grid files). For 6-dimensional data, 6- dimensional grid, a 3x2 striped grid file (3 stripes of 2- dimensional grid files), and a 2x3 striped grid file (2 stripes of 3-dimensional grid files) are compared. For 8- dimensional data, a 4x2 striped grid file (4 stripes of 2- dimensional grid files), and a 2x4 striped grid file (2 stripes of 4-dimensional grid files) are compared. An 8-dimensional grid file is not used in the experiment because of its massive storage required. To create an index structure, data records are inserted and deleted alternatively until the required number of data records is met. The performance is measured when the number of data records in the index reached 1K, 2K,, and 1K. A. Storage Utilization Figure 4 and figure 5 show the number of disk pages used for 4-dimensional and 6-dimensional data in single grid files and striped grid files. From these figures, it is clear that the storage required for striped grid files is lower than the storage required for a single grid file. Furthermore, from figure 5, the storage required for a 2x3 striped grid is lower than that for a 3x2 striped grid. The difference is caused mainly by the size of the directories, especially the root grids. For each spilt, the root grid is always split and grows while only one among all the leaf grids is split. As a result, a root grid grows faster than a leaf grid. Furthermore, when the number of the dimensions of the root grid is larger, the size of the root grid grows even faster. As for the overall storage utilization shown in Figure 6, striped grid files yield 5-9%, whereas single grid files yield lower than 5% utilization. This shows that striped grid files use storage more efficiently than single grid files. Moreover, in striped grid files for 6-dimensional data, the storage utilization is lower when the number of data records increases. Number of Disk Pages Directories (single grid 4d) Directories and Data Pages (single grid 4d) Directories (striped 2x2d) Directories and Data Pages (striped 2x2d) Number of Data Figure 4. Number of disk pages used for 4-dimensional grid files B. Number of Disk Accesses In this section, the number of disk accesses for point queries, range queries, insertions, and deletions are examined. Point queries Similar to the traditional grid files, the number of disk accesses for each point query in a striped grid file is constant. When k is a number of leaf grids, the number of disk accesses for each query is k+2 (k disk accesses to access k leaf grids, one disk access for the root grid, and one for the data page containing the data record).

5 Number of Disk Pages Directories (single grid 6d) Directories and Data Pages (single grid 6d) Directories (striped 2x3d) Directories and Data Pages (striped 2x3d) Directories (striped 3x2d) Directories and Data Pages (striped 3x2d) Number of Data Figure 5. Number of disk pages used for 6-dimensional grid files 5 data records. Moreover, the average number of disk accesses for merging page files is only about per deletion. This is because most of insertion and deletion are not required to invoke split and merge algorithm respectively. Hence, disk accesses for both operations depend only on a point query which is used for choosing a suitable page to insert or delete data. Average Disk Access Single Grid 6 d Striped Grid 2 x 3 d Striped Grid 3 x 2 d Query Area (x 1^12) Figure 7. Average disk accesses from range quries Storage Utilization (%) Single grid 4d Striped 2x2d Single grid 6d Striped 2x3d 2 Striped 3x2d Number of Data Figure 6. Storage utilization for 4-, and 6-dimensional grid files Range queries From figure 7, the number of disk accesses for striped grid files is a little higher then that for single grid files. The reason for this is, unlike the traditional grid files, from PageRangeQuery algorithm for striped grid files gives a set of data pages which may not be adjacent. As a result, several parts of the root grid might have to be accessed, and more disk pages are accessed. However, since the root grids are often much smaller than the traditional grid files, the difference is nominal. Insertion and Deletion From our experiment, the average number of disk accesses used for splitting in striped 2x2d grid files is only about.2214 per insertion computing from the insertion of V. CONCLUSIONS We propose to apply the idea of striping to the traditional grid files and call this structure a striped grid file. This structure is composed of many reduced-dimension grid files, called leaf grids, and another grid file, called root grid, which is used to combine the result of queries from the leaf grids. The experiments show that the storage utilization of striped grid files is better than that of traditional grid files, while the number of disk accesses in striped grid file is not much higher than that in traditional grid files. Also, striped grid files are scaled better than traditional grid files when the number of dimensions is increased. It is also found that, as a benefit from the inherent characteristics of grid files, the number of disk accesses for point queries in striped grid files is always a constant, depending on the structure of the striped grid files. Furthermore, we found that if the number of dimensions of the root grid in a striped grid file is low, the striped grid file yields better storage utilization. ACKNOWLEDGMENT We would like to thank Scientific Parallel Computer Engineering Lab, Department of Computer Engineering, Chulalongkorn University for granting access to a computer cluster for our experiments. REFERENCES [1] J.T. Robinson, The K-D-B-tree: a search structure for large multidimensional dynamic indexes, Proc. ACM SIGMOD Int. Conf. on Management of Data, Ann Arbor, MI, 1981, pp [2] A. Guttman, R-trees: a dynamic index structure for spatial searching, Proc. ACM SIGMOD Int. Conf. on Management of Data, Boston, MA, 1984, pp [3] J. Nievergelt, H. Hinterberger, and K.C Sevcik, The grid file: an adaptable, symmetric multikey file structure, ACM Transaction on Database Systems (TODS), 9(1), 1984, pp [4] C. Böhm, S. Berchtold, and D.A. Keim, Searching in highdimensional spaces index structures for improving the performance of multimedia databases, ACM Computing Surveys, 33(3), 21, pp

6 [5] K. Lin, H.V. Jagadish, and C. Faloutsos, The TV-tree: an index structure for high-dimensional data, Very Large Databases Journal (VLDB), 3, 1995, pp [6] S. Berchtold, D. Keim, and H.-P. Kriegel, The X-tree: an index structure for high-dimensional data, 22 nd Conf. on Very Large Data Bases, Bombay, India, [7] G. Qian, Q. Zhu, Q. Xue, and S. Pramanik, A space-partitioningbased indexing method for multidimensional non-ordered discrete data spaces, ACM Transaction on Information Systems (TOIS), 24(1), 26, pp [8] S. Berchthold, C. Böhm, D.A. Keim, H.-P. Kriegel and X. Xu, Optimal multidimensional query processing using tree striping, Proc. 2 nd Int. Conf. on Data Warehousing and Knowledge Discovery (DaWaK), Greenwich, U.K., 2.

X-tree. Daniel Keim a, Benjamin Bustos b, Stefan Berchtold c, and Hans-Peter Kriegel d. SYNONYMS Extended node tree

X-tree. Daniel Keim a, Benjamin Bustos b, Stefan Berchtold c, and Hans-Peter Kriegel d. SYNONYMS Extended node tree X-tree Daniel Keim a, Benjamin Bustos b, Stefan Berchtold c, and Hans-Peter Kriegel d a Department of Computer and Information Science, University of Konstanz b Department of Computer Science, University

More information

Deletion Techniques for the ND-tree in Non-ordered Discrete Data Spaces

Deletion Techniques for the ND-tree in Non-ordered Discrete Data Spaces Deletion Techniques for the ND-tree in Non-ordered Discrete Data Spaces Hyun Jeong Seok Qiang Zhu Gang Qian Department of Computer and Information Science Department of Computer Science The University

More information

Using the Holey Brick Tree for Spatial Data. in General Purpose DBMSs. Northeastern University

Using the Holey Brick Tree for Spatial Data. in General Purpose DBMSs. Northeastern University Using the Holey Brick Tree for Spatial Data in General Purpose DBMSs Georgios Evangelidis Betty Salzberg College of Computer Science Northeastern University Boston, MA 02115-5096 1 Introduction There is

More information

So, we want to perform the following query:

So, we want to perform the following query: Abstract This paper has two parts. The first part presents the join indexes.it covers the most two join indexing, which are foreign column join index and multitable join index. The second part introduces

More information

The Grid File: An Adaptable, Symmetric Multikey File Structure

The Grid File: An Adaptable, Symmetric Multikey File Structure The Grid File: An Adaptable, Symmetric Multikey File Structure Presentation: Saskia Nieckau Moderation: Hedi Buchner The Grid File: An Adaptable, Symmetric Multikey File Structure 1. Multikey Structures

More information

Benchmarking the UB-tree

Benchmarking the UB-tree Benchmarking the UB-tree Michal Krátký, Tomáš Skopal Department of Computer Science, VŠB Technical University of Ostrava, tř. 17. listopadu 15, Ostrava, Czech Republic michal.kratky@vsb.cz, tomas.skopal@vsb.cz

More information

Multidimensional Data and Modelling (grid technique)

Multidimensional Data and Modelling (grid technique) Multidimensional Data and Modelling (grid technique) 1 Grid file Increase of database usage and integrated information systems File structures => efficient access to records How? Combine attribute values

More information

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing. Basic Concepts Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

Introduction to Indexing R-trees. Hong Kong University of Science and Technology

Introduction to Indexing R-trees. Hong Kong University of Science and Technology Introduction to Indexing R-trees Dimitris Papadias Hong Kong University of Science and Technology 1 Introduction to Indexing 1. Assume that you work in a government office, and you maintain the records

More information

Datenbanksysteme II: Multidimensional Index Structures 2. Ulf Leser

Datenbanksysteme II: Multidimensional Index Structures 2. Ulf Leser Datenbanksysteme II: Multidimensional Index Structures 2 Ulf Leser Content of this Lecture Introduction Partitioned Hashing Grid Files kdb Trees kd Tree kdb Tree R Trees Example: Nearest neighbor image

More information

Organizing Spatial Data

Organizing Spatial Data Organizing Spatial Data Spatial data records include a sense of location as an attribute. Typically location is represented by coordinate data (in 2D or 3D). 1 If we are to search spatial data using the

More information

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Indexing Week 14, Spring 2005 Edited by M. Naci Akkøk, 5.3.2004, 3.3.2005 Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Overview Conventional indexes B-trees Hashing schemes

More information

Advances in Data Management Principles of Database Systems - 2 A.Poulovassilis

Advances in Data Management Principles of Database Systems - 2 A.Poulovassilis 1 Advances in Data Management Principles of Database Systems - 2 A.Poulovassilis 1 Storing data on disk The traditional storage hierarchy for DBMSs is: 1. main memory (primary storage) for data currently

More information

Chapter 17 Indexing Structures for Files and Physical Database Design

Chapter 17 Indexing Structures for Files and Physical Database Design Chapter 17 Indexing Structures for Files and Physical Database Design We assume that a file already exists with some primary organization unordered, ordered or hash. The index provides alternate ways to

More information

Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase

Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase Bumjoon Jo and Sungwon Jung (&) Department of Computer Science and Engineering, Sogang University, 35 Baekbeom-ro, Mapo-gu, Seoul 04107,

More information

Module 4: Index Structures Lecture 13: Index structure. The Lecture Contains: Index structure. Binary search tree (BST) B-tree. B+-tree.

Module 4: Index Structures Lecture 13: Index structure. The Lecture Contains: Index structure. Binary search tree (BST) B-tree. B+-tree. The Lecture Contains: Index structure Binary search tree (BST) B-tree B+-tree Order file:///c /Documents%20and%20Settings/iitkrana1/My%20Documents/Google%20Talk%20Received%20Files/ist_data/lecture13/13_1.htm[6/14/2012

More information

Indexing Techniques 3 rd Part

Indexing Techniques 3 rd Part Indexing Techniques 3 rd Part Presented by: Tarik Ben Touhami Supervised by: Dr. Hachim Haddouti CSC 5301 Spring 2003 Outline! Join indexes "Foreign column join index "Multitable join index! Indexing techniques

More information

Indexing by Shape of Image Databases Based on Extended Grid Files

Indexing by Shape of Image Databases Based on Extended Grid Files Indexing by Shape of Image Databases Based on Extended Grid Files Carlo Combi, Gian Luca Foresti, Massimo Franceschet, Angelo Montanari Department of Mathematics and ComputerScience, University of Udine

More information

Physical Level of Databases: B+-Trees

Physical Level of Databases: B+-Trees Physical Level of Databases: B+-Trees Adnan YAZICI Computer Engineering Department METU (Fall 2005) 1 B + -Tree Index Files l Disadvantage of indexed-sequential files: performance degrades as file grows,

More information

CARNEGIE MELLON UNIVERSITY DEPT. OF COMPUTER SCIENCE DATABASE APPLICATIONS

CARNEGIE MELLON UNIVERSITY DEPT. OF COMPUTER SCIENCE DATABASE APPLICATIONS CARNEGIE MELLON UNIVERSITY DEPT. OF COMPUTER SCIENCE 15-415 DATABASE APPLICATIONS C. Faloutsos Indexing and Hashing 15-415 Database Applications http://www.cs.cmu.edu/~christos/courses/dbms.s00/ general

More information

Chapter 20: Binary Trees

Chapter 20: Binary Trees Chapter 20: Binary Trees 20.1 Definition and Application of Binary Trees Definition and Application of Binary Trees Binary tree: a nonlinear linked list in which each node may point to 0, 1, or two other

More information

Indexing and Hashing

Indexing and Hashing C H A P T E R 1 Indexing and Hashing This chapter covers indexing techniques ranging from the most basic one to highly specialized ones. Due to the extensive use of indices in database systems, this chapter

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

Remember. 376a. Database Design. Also. B + tree reminders. Algorithms for B + trees. Remember

Remember. 376a. Database Design. Also. B + tree reminders. Algorithms for B + trees. Remember 376a. Database Design Dept. of Computer Science Vassar College http://www.cs.vassar.edu/~cs376 Class 14 B + trees, multi-key indices, partitioned hashing and grid files B and B + -trees are used one implementation

More information

FINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 ( Marks: 1 ) - Please choose one The data of the problem is of 2GB and the hard

FINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 ( Marks: 1 ) - Please choose one The data of the problem is of 2GB and the hard FINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 The data of the problem is of 2GB and the hard disk is of 1GB capacity, to solve this problem we should Use better data structures

More information

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15 Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2014/15 Lecture II: Indexing Part I of this course Indexing 3 Database File Organization and Indexing Remember: Database tables

More information

A Scalable Index Mechanism for High-Dimensional Data in Cluster File Systems

A Scalable Index Mechanism for High-Dimensional Data in Cluster File Systems A Scalable Index Mechanism for High-Dimensional Data in Cluster File Systems Kyu-Woong Lee Hun-Soon Lee, Mi-Young Lee, Myung-Joon Kim Abstract We address the problem of designing index structures that

More information

Operations on Heap Tree The major operations required to be performed on a heap tree are Insertion, Deletion, and Merging.

Operations on Heap Tree The major operations required to be performed on a heap tree are Insertion, Deletion, and Merging. Priority Queue, Heap and Heap Sort In this time, we will study Priority queue, heap and heap sort. Heap is a data structure, which permits one to insert elements into a set and also to find the largest

More information

Bkd-tree: A Dynamic Scalable kd-tree

Bkd-tree: A Dynamic Scalable kd-tree Bkd-tree: A Dynamic Scalable kd-tree Octavian Procopiuc, Pankaj K. Agarwal, Lars Arge, and Jeffrey Scott Vitter Department of Computer Science, Duke University Durham, NC 2778, USA Department of Computer

More information

Multidimensional Indexes [14]

Multidimensional Indexes [14] CMSC 661, Principles of Database Systems Multidimensional Indexes [14] Dr. Kalpakis http://www.csee.umbc.edu/~kalpakis/courses/661 Motivation Examined indexes when search keys are in 1-D space Many interesting

More information

Mining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams

Mining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams Mining Data Streams Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction Summarization Methods Clustering Data Streams Data Stream Classification Temporal Models CMPT 843, SFU, Martin Ester, 1-06

More information

Bkd-tree: A Dynamic Scalable kd-tree

Bkd-tree: A Dynamic Scalable kd-tree Bkd-tree: A Dynamic Scalable kd-tree Octavian Procopiuc Pankaj K. Agarwal Lars Arge Jeffrey Scott Vitter July 1, 22 Abstract In this paper we propose a new index structure, called the Bkd-tree, for indexing

More information

Data Structure. IBPS SO (IT- Officer) Exam 2017

Data Structure. IBPS SO (IT- Officer) Exam 2017 Data Structure IBPS SO (IT- Officer) Exam 2017 Data Structure: In computer science, a data structure is a way of storing and organizing data in a computer s memory so that it can be used efficiently. Data

More information

Algorithms. Deleting from Red-Black Trees B-Trees

Algorithms. Deleting from Red-Black Trees B-Trees Algorithms Deleting from Red-Black Trees B-Trees Recall the rules for BST deletion 1. If vertex to be deleted is a leaf, just delete it. 2. If vertex to be deleted has just one child, replace it with that

More information

Notes on Binary Dumbbell Trees

Notes on Binary Dumbbell Trees Notes on Binary Dumbbell Trees Michiel Smid March 23, 2012 Abstract Dumbbell trees were introduced in [1]. A detailed description of non-binary dumbbell trees appears in Chapter 11 of [3]. These notes

More information

A Real Time GIS Approximation Approach for Multiphase Spatial Query Processing Using Hierarchical-Partitioned-Indexing Technique

A Real Time GIS Approximation Approach for Multiphase Spatial Query Processing Using Hierarchical-Partitioned-Indexing Technique International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2017 IJSRCSEIT Volume 2 Issue 6 ISSN : 2456-3307 A Real Time GIS Approximation Approach for Multiphase

More information

Multimedia Database Systems

Multimedia Database Systems Department of Informatics Aristotle University of Thessaloniki Fall 2016-2017 Multimedia Database Systems Indexing Part A Multidimensional Indexing Techniques Outline Motivation Multidimensional indexing

More information

Storage hierarchy. Textbook: chapters 11, 12, and 13

Storage hierarchy. Textbook: chapters 11, 12, and 13 Storage hierarchy Cache Main memory Disk Tape Very fast Fast Slower Slow Very small Small Bigger Very big (KB) (MB) (GB) (TB) Built-in Expensive Cheap Dirt cheap Disks: data is stored on concentric circular

More information

CSIT5300: Advanced Database Systems

CSIT5300: Advanced Database Systems CSIT5300: Advanced Database Systems L08: B + -trees and Dynamic Hashing Dr. Kenneth LEUNG Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong SAR,

More information

Nearest Neighbor Search on Vertically Partitioned High-Dimensional Data

Nearest Neighbor Search on Vertically Partitioned High-Dimensional Data Nearest Neighbor Search on Vertically Partitioned High-Dimensional Data Evangelos Dellis, Bernhard Seeger, and Akrivi Vlachou Department of Mathematics and Computer Science, University of Marburg, Hans-Meerwein-Straße,

More information

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See  for conditions on re-use Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files Static

More information

Fast Similarity Search for High-Dimensional Dataset

Fast Similarity Search for High-Dimensional Dataset Fast Similarity Search for High-Dimensional Dataset Quan Wang and Suya You Computer Science Department University of Southern California {quanwang,suyay}@graphics.usc.edu Abstract This paper addresses

More information

Pivoting M-tree: A Metric Access Method for Efficient Similarity Search

Pivoting M-tree: A Metric Access Method for Efficient Similarity Search Pivoting M-tree: A Metric Access Method for Efficient Similarity Search Tomáš Skopal Department of Computer Science, VŠB Technical University of Ostrava, tř. 17. listopadu 15, Ostrava, Czech Republic tomas.skopal@vsb.cz

More information

More B-trees, Hash Tables, etc. CS157B Chris Pollett Feb 21, 2005.

More B-trees, Hash Tables, etc. CS157B Chris Pollett Feb 21, 2005. More B-trees, Hash Tables, etc. CS157B Chris Pollett Feb 21, 2005. Outline B-tree Domain of Application B-tree Operations Hash Tables on Disk Hash Table Operations Extensible Hash Tables Multidimensional

More information

Indexing Non-uniform Spatial Data

Indexing Non-uniform Spatial Data Indexing Non-uniform Spatial Data K. V. Ravi Kanth Divyakant Agrawal Amr El Abbadi Ambuj K. Singh Department of Computer Science University of California at Santa Barbara Santa Barbara, CA 93106 Abstract

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Database System Concepts, 5th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

Database Systems. File Organization-2. A.R. Hurson 323 CS Building

Database Systems. File Organization-2. A.R. Hurson 323 CS Building File Organization-2 A.R. Hurson 323 CS Building Indexing schemes for Files The indexing is a technique in an attempt to reduce the number of accesses to the secondary storage in an information retrieval

More information

Trees. Courtesy to Goodrich, Tamassia and Olga Veksler

Trees. Courtesy to Goodrich, Tamassia and Olga Veksler Lecture 12: BT Trees Courtesy to Goodrich, Tamassia and Olga Veksler Instructor: Yuzhen Xie Outline B-tree Special case of multiway search trees used when data must be stored on the disk, i.e. too large

More information

Evaluating XPath Queries

Evaluating XPath Queries Chapter 8 Evaluating XPath Queries Peter Wood (BBK) XML Data Management 201 / 353 Introduction When XML documents are small and can fit in memory, evaluating XPath expressions can be done efficiently But

More information

(2,4) Trees. 2/22/2006 (2,4) Trees 1

(2,4) Trees. 2/22/2006 (2,4) Trees 1 (2,4) Trees 9 2 5 7 10 14 2/22/2006 (2,4) Trees 1 Outline and Reading Multi-way search tree ( 10.4.1) Definition Search (2,4) tree ( 10.4.2) Definition Search Insertion Deletion Comparison of dictionary

More information

Multi-way Search Trees! M-Way Search! M-Way Search Trees Representation!

Multi-way Search Trees! M-Way Search! M-Way Search Trees Representation! Lecture 10: Multi-way Search Trees: intro to B-trees 2-3 trees 2-3-4 trees Multi-way Search Trees A node on an M-way search tree with M 1 distinct and ordered keys: k 1 < k 2 < k 3

More information

Optimal Dimension Order: A Generic Technique for the Similarity Join

Optimal Dimension Order: A Generic Technique for the Similarity Join 4th Int. Conf. on Data Warehousing and Knowledge Discovery (DaWaK) Aix-en-Provence, France, 2002. Optimal Dimension Order: A Generic Technique for the Similarity Join Christian Böhm 1, Florian Krebs 2,

More information

2, 3, 5, 7, 11, 17, 19, 23, 29, 31

2, 3, 5, 7, 11, 17, 19, 23, 29, 31 148 Chapter 12 Indexing and Hashing implementation may be by linking together fixed size buckets using overflow chains. Deletion is difficult with open hashing as all the buckets may have to inspected

More information

Some Practice Problems on Hardware, File Organization and Indexing

Some Practice Problems on Hardware, File Organization and Indexing Some Practice Problems on Hardware, File Organization and Indexing Multiple Choice State if the following statements are true or false. 1. On average, repeated random IO s are as efficient as repeated

More information

CSE 530A. B+ Trees. Washington University Fall 2013

CSE 530A. B+ Trees. Washington University Fall 2013 CSE 530A B+ Trees Washington University Fall 2013 B Trees A B tree is an ordered (non-binary) tree where the internal nodes can have a varying number of child nodes (within some range) B Trees When a key

More information

A HASHING TECHNIQUE USING SEPARATE BINARY TREE

A HASHING TECHNIQUE USING SEPARATE BINARY TREE Data Science Journal, Volume 5, 19 October 2006 143 A HASHING TECHNIQUE USING SEPARATE BINARY TREE Md. Mehedi Masud 1*, Gopal Chandra Das 3, Md. Anisur Rahman 2, and Arunashis Ghose 4 *1 School of Information

More information

The ND-Tree: A Dynamic Indexing Technique for Multidimensional Non-ordered Discrete Data Spaces

The ND-Tree: A Dynamic Indexing Technique for Multidimensional Non-ordered Discrete Data Spaces The ND-Tree: A Dynamic Indexing Technique for Multidimensional Non-ordered Discrete Data Spaces Gang Qian Qiang Zhu Qiang Xue Sakti Pramanik Department of Computer Science and Engineering Michigan State

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 6 - Storage and Indexing

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 6 - Storage and Indexing CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2009 Lecture 6 - Storage and Indexing References Generalized Search Trees for Database Systems. J. M. Hellerstein, J. F. Naughton

More information

Speeding up Queries in a Leaf Image Database

Speeding up Queries in a Leaf Image Database 1 Speeding up Queries in a Leaf Image Database Daozheng Chen May 10, 2007 Abstract We have an Electronic Field Guide which contains an image database with thousands of leaf images. We have a system which

More information

2-3 Tree. Outline B-TREE. catch(...){ printf( "Assignment::SolveProblem() AAAA!"); } ADD SLIDES ON DISJOINT SETS

2-3 Tree. Outline B-TREE. catch(...){ printf( Assignment::SolveProblem() AAAA!); } ADD SLIDES ON DISJOINT SETS Outline catch(...){ printf( "Assignment::SolveProblem() AAAA!"); } Balanced Search Trees 2-3 Trees 2-3-4 Trees Slide 4 Why care about advanced implementations? Same entries, different insertion sequence:

More information

Summary. 4. Indexes. 4.0 Indexes. 4.1 Tree Based Indexes. 4.0 Indexes. 19-Nov-10. Last week: This week:

Summary. 4. Indexes. 4.0 Indexes. 4.1 Tree Based Indexes. 4.0 Indexes. 19-Nov-10. Last week: This week: Summary Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Last week: Logical Model: Cubes,

More information

An index structure for efficient reverse nearest neighbor queries

An index structure for efficient reverse nearest neighbor queries An index structure for efficient reverse nearest neighbor queries Congjun Yang Division of Computer Science, Department of Mathematical Sciences The University of Memphis, Memphis, TN 38152, USA yangc@msci.memphis.edu

More information

Chapter 13: Query Processing

Chapter 13: Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

Introduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana

Introduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana Introduction to Indexing 2 Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana Indexed Sequential Access Method We have seen that too small or too large an index (in other words too few or too

More information

B-Trees and External Memory

B-Trees and External Memory Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015 and External Memory 1 1 (2, 4) Trees: Generalization of BSTs Each internal node

More information

ΗΥ360 Αρχεία και Βάσεις εδοµένων

ΗΥ360 Αρχεία και Βάσεις εδοµένων ΗΥ360 Αρχεία και Βάσεις εδοµένων ιδάσκων:. Πλεξουσάκης Φυσική Σχεδίαση ΒΔ και Ευρετήρια Μπαριτάκης Παύλος 2018-2019 Data Structures for Primary Indices Structures that determine the location of the records

More information

Chapter 11: Indexing and Hashing" Chapter 11: Indexing and Hashing"

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing" Database System Concepts, 6 th Ed.! Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use " Chapter 11: Indexing and Hashing" Basic Concepts!

More information

UNIT IV -NON-LINEAR DATA STRUCTURES 4.1 Trees TREE: A tree is a finite set of one or more nodes such that there is a specially designated node called the Root, and zero or more non empty sub trees T1,

More information

Parallel Similarity Join with Data Partitioning for Prefix Filtering

Parallel Similarity Join with Data Partitioning for Prefix Filtering 22 ECTI TRANSACTIONS ON COMPUTER AND INFORMATION TECHNOLOGY VOL.9, NO.1 May 2015 Parallel Similarity Join with Data Partitioning for Prefix Filtering Jaruloj Chongstitvatana 1 and Methus Bhirakit 2, Non-members

More information

B-Trees and External Memory

B-Trees and External Memory Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015 B-Trees and External Memory 1 (2, 4) Trees: Generalization of BSTs Each internal

More information

The R+-Tree: A Dynamic Index for Multi- Dimensional Objects

The R+-Tree: A Dynamic Index for Multi- Dimensional Objects Carnegie Mellon University Research Showcase @ CMU Computer Science Department School of Computer Science 9-1987 The R+-Tree: A Dynamic Index for Multi- Dimensional Objects Timos Sellis University of Maryland

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join

More information

Announcements. Reading Material. Recap. Today 9/17/17. Storage (contd. from Lecture 6)

Announcements. Reading Material. Recap. Today 9/17/17. Storage (contd. from Lecture 6) CompSci 16 Intensive Computing Systems Lecture 7 Storage and Index Instructor: Sudeepa Roy Announcements HW1 deadline this week: Due on 09/21 (Thurs), 11: pm, no late days Project proposal deadline: Preliminary

More information

V Advanced Data Structures

V Advanced Data Structures V Advanced Data Structures B-Trees Fibonacci Heaps 18 B-Trees B-trees are similar to RBTs, but they are better at minimizing disk I/O operations Many database systems use B-trees, or variants of them,

More information

Data Structures and Algorithms

Data Structures and Algorithms Data Structures and Algorithms Trees Sidra Malik sidra.malik@ciitlahore.edu.pk Tree? In computer science, a tree is an abstract model of a hierarchical structure A tree is a finite set of one or more nodes

More information

Improving the Query Performance of High-Dimensional Index Structures by Bulk Load Operations

Improving the Query Performance of High-Dimensional Index Structures by Bulk Load Operations Improving the Query Performance of High-Dimensional Index Structures by Bulk Load Operations Stefan Berchtold, Christian Böhm 2, and Hans-Peter Kriegel 2 AT&T Labs Research, 8 Park Avenue, Florham Park,

More information

Indexing High-Dimensional Data for Content-Based Retrieval in Large Databases

Indexing High-Dimensional Data for Content-Based Retrieval in Large Databases Indexing High-Dimensional Data for Content-Based Retrieval in Large Databases Manuel J. Fonseca, Joaquim A. Jorge Department of Information Systems and Computer Science INESC-ID/IST/Technical University

More information

Experimental Evaluation of Spatial Indices with FESTIval

Experimental Evaluation of Spatial Indices with FESTIval Experimental Evaluation of Spatial Indices with FESTIval Anderson Chaves Carniel 1, Ricardo Rodrigues Ciferri 2, Cristina Dutra de Aguiar Ciferri 1 1 Department of Computer Science University of São Paulo

More information

(2,4) Trees Goodrich, Tamassia (2,4) Trees 1

(2,4) Trees Goodrich, Tamassia (2,4) Trees 1 (2,4) Trees 9 2 5 7 10 14 2004 Goodrich, Tamassia (2,4) Trees 1 Multi-Way Search Tree A multi-way search tree is an ordered tree such that Each internal node has at least two children and stores d -1 key-element

More information

Intro to DB CHAPTER 12 INDEXING & HASHING

Intro to DB CHAPTER 12 INDEXING & HASHING Intro to DB CHAPTER 12 INDEXING & HASHING Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing

More information

SDD Advanced-User Manual Version 1.1

SDD Advanced-User Manual Version 1.1 SDD Advanced-User Manual Version 1.1 Arthur Choi and Adnan Darwiche Automated Reasoning Group Computer Science Department University of California, Los Angeles Email: sdd@cs.ucla.edu Download: http://reasoning.cs.ucla.edu/sdd

More information

Lecture 13. Lecture 13: B+ Tree

Lecture 13. Lecture 13: B+ Tree Lecture 13 Lecture 13: B+ Tree Lecture 13 Announcements 1. Project Part 2 extension till Friday 2. Project Part 3: B+ Tree coming out Friday 3. Poll for Nov 22nd 4. Exam Pickup: If you have questions,

More information

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Chapter 13: Query Processing Basic Steps in Query Processing

Chapter 13: Query Processing Basic Steps in Query Processing Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

B-Trees. Disk Storage. What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree. Deletion in a B-tree

B-Trees. Disk Storage. What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree. Deletion in a B-tree B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree Deletion in a B-tree Disk Storage Data is stored on disk (i.e., secondary memory) in blocks. A block is

More information

Search Space Reductions for Nearest-Neighbor Queries

Search Space Reductions for Nearest-Neighbor Queries Search Space Reductions for Nearest-Neighbor Queries Micah Adler 1 and Brent Heeringa 2 1 Department of Computer Science, University of Massachusetts, Amherst 140 Governors Drive Amherst, MA 01003 2 Department

More information

(2,4) Trees Goodrich, Tamassia. (2,4) Trees 1

(2,4) Trees Goodrich, Tamassia. (2,4) Trees 1 (2,4) Trees 9 2 5 7 10 14 (2,4) Trees 1 Multi-Way Search Tree ( 9.4.1) A multi-way search tree is an ordered tree such that Each internal node has at least two children and stores d 1 key-element items

More information

Data Warehousing & Data Mining

Data Warehousing & Data Mining Data Warehousing & Data Mining Wolf-Tilo Balke Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Summary Last week: Logical Model: Cubes,

More information

Indexing Methods. Lecture 9. Storage Requirements of Databases

Indexing Methods. Lecture 9. Storage Requirements of Databases Indexing Methods Lecture 9 Storage Requirements of Databases Need data to be stored permanently or persistently for long periods of time Usually too big to fit in main memory Low cost of storage per unit

More information

Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique

Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique www.ijcsi.org 29 Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique Anotai Siltepavet 1, Sukree Sinthupinyo 2 and Prabhas Chongstitvatana 3 1 Computer Engineering, Chulalongkorn

More information

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,

More information

V Advanced Data Structures

V Advanced Data Structures V Advanced Data Structures B-Trees Fibonacci Heaps 18 B-Trees B-trees are similar to RBTs, but they are better at minimizing disk I/O operations Many database systems use B-trees, or variants of them,

More information

Chapter 12: Query Processing. Chapter 12: Query Processing

Chapter 12: Query Processing. Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join

More information

Appropriate Item Partition for Improving the Mining Performance

Appropriate Item Partition for Improving the Mining Performance Appropriate Item Partition for Improving the Mining Performance Tzung-Pei Hong 1,2, Jheng-Nan Huang 1, Kawuu W. Lin 3 and Wen-Yang Lin 1 1 Department of Computer Science and Information Engineering National

More information

Binary Tree Node Relationships. Binary Trees. Quick Application: Expression Trees. Traversals

Binary Tree Node Relationships. Binary Trees. Quick Application: Expression Trees. Traversals Binary Trees 1 Binary Tree Node Relationships 2 A binary tree is either empty, or it consists of a node called the root together with two binary trees called the left subtree and the right subtree of the

More information

Binary Trees. For example: Jargon: General Binary Trees. root node. level: internal node. edge. leaf node. Data Structures & File Management

Binary Trees. For example: Jargon: General Binary Trees. root node. level: internal node. edge. leaf node. Data Structures & File Management Binary Trees 1 A binary tree is either empty, or it consists of a node called the root together with two binary trees called the left subtree and the right subtree of the root, which are disjoint from

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part IV Lecture 14, March 10, 015 Mohammad Hammoud Today Last Two Sessions: DBMS Internals- Part III Tree-based indexes: ISAM and B+ trees Data Warehousing/

More information

What we have covered?

What we have covered? What we have covered? Indexing and Hashing Data warehouse and OLAP Data Mining Information Retrieval and Web Mining XML and XQuery Spatial Databases Transaction Management 1 Lecture 6: Spatial Data Management

More information

Data and File Structures Laboratory

Data and File Structures Laboratory Binary Trees Assistant Professor Machine Intelligence Unit Indian Statistical Institute, Kolkata September, 2018 1 Basics 2 Implementation 3 Traversal Basics of a tree A tree is recursively defined as

More information