Database performance optimization

Size: px
Start display at page:

Download "Database performance optimization"

Transcription

1 Database performance optimization by DALIA MOTZKIN Western Michigan University Kalamazoo, Michigan ABSTRACT A generalized model for the optimization of relational databases has been developed and implemented. The model, an extension of previous works, is more general and more complete than former models. It consists of a set of algorithms and cost equations, and its output is an optimal set of indices for all fields of all files in the database. It also determines which fields should not be indexed. It allows the user to indicate indices to be evaluated and it takes into consideration periodic reorganization, a variety of transaction types, and the multifield/multifile effects of some transaction types. It distinguishes between dense and nondense attributes, between primary and secondary fields, and between sorted and unsorted files. The optimal physical database configuration produced by the model is generated so that it can work within reasonable system constraints of time and space. 555

2

3 Database Performance Optimization 557 INTRODUCTION The design of physical database is concerned with the optimization of access time and space requirements and with the prediction of database performance. These problems have been approached from two sides. One aspect is query optimization, and the other is the selection of an optimal set of indices and reorganization points. This paper concentrates on the second aspect. The selection of the indices of a database is an important part of physical database design. Since index performances vary, there is a need to select the most suitable index for each field of each file. Whereas appropriate indexing improves performance considerably, excessive indexing can result in major performance degradation as well as in significant increases in storage requirements. performance of some types of indices deteriorates in time due to overflow situations and other problems. Reorganization is then required. Problems of modeling, optimization, and prediction of database performance have been studied by many researchers, and interesting results have been published. 1-3,4-8,1-12,16,17,19,21,23 Additional bibliography, related to earlier work, can be found in extensive surveys. 18, 2, 22 However, previous mode1ing and optimization techniques are not complete; they suffer from one or more of the following problems: 1. The work is file-oriented rather than database-oriented. 2. The list of evaluated indices is inadequate: In some models only a few indices can be evaluated, having omitted the entire B-tree family and other indices. In some, a variety of updating techniques is not incorporated. Others compare too many indices rendering the models slow and inefficient. 3. Periodic reorganization is not addressed: Some file organizations and index structures require periodic reorganization due to overflow and other problems. Some models do not take reorganization into consideration in the selection process. 4. System constraints are not taken into consideration: Many computer systems, especially microcomputers, have limited amount of space. Other systems have time constraints due to heavy workload. These constraints do not playa role in many current models. 5. Important transaction types and the effects of these transactions are not included: Some models are oriented toward retrieval only. Others take into consideration maintenance operations such as insertions, deletions, and modification, but do not include the multi-index! multifield effects of transaction. The following example may illustrate a problem of the kind cited in 5: field i (say salary field) of record R has to be modified. To find the record R, the system uses the value of field j (say social security number). An index on field j will obviously improve the search performance, but an index on field i will not contribute to the speed of the search. On the other hand, the index on field i will have to be modified, thus decreasing the performance of this transaction. 6. The interaction and distinction between primary and secondary fields are lacking. Some models are oriented toward primary fields only, others toward secondary fields. Some models evaluate indices for all fields but do not distinguish between primary and secondary fields. 7. There is no discrimination between dense and nondense field attributes. 8. Performance prediction is not provided: Some models present optimal database configuration and/or reorganization points. But they do not provide the user with the time and space requirements of the database. This work is an extension of previous work integrating the performance issues 1 to 8 above. The databases considered are assumed to be relational. All files are assumed to be in first normal form (possibly they are also in any or all other normal forms). No reference is made to modeling network or hierarchical databases. DESCRIPTION OF THE MODEL The model is composed of four parts: input parameters, the algorithm, performance and cost equations, and the output. We will first describe the input so that the parameters affecting the model will be evident; we will then describe the output (I.e., the outcome of the model). This will be followed by a general description of the algorithm, the computations, and the assumptions made. The detailed formulas used can be found in Motzkin. 15 Input Parameters The input to the system consists of four groups of parameters: system parameters, database parameters, user workload parameters, and index parameters. (For examples of parame-. ters, see the section below, "Experimental Results." System paradleters These parameters are concerned with system constraints and costs. They include total number of available blocks, the

4 558 National Computer Conference, 1985 blocking factor (number of characters per block), average access time, cost per block (per day), and cost per access. Note that the total time available per unit of time, such as a day or a month, is not an input parameter. The time required to execute user workload is provided as part of the output. The user can modify the space allocation and achieve better access time. This procedure is described below in the section "An Overview of the Algorithms, Computations, and Assumptions. " Database parameters These parameters provide database information such as the number of files, the number and names of fields in each file, and needed information on each field. The parameters include number of files; for each file, the name of the file, a flag indicating whether the file is sorted or unsorted, the name of the field on which the file is sorted, the name of the primary field and the total number of records in the file; for each field, the name of the field, a flag indicating whether the field attribute is dense or not dense (this parameter is needed because different types of indices are suitable for dense attributes and nondense attributes), the number of distinct attribute values (this is meaningful only for non dense attributes), and the number of characters in the field. User workload parameters This group of parameters is concerned with various transactions such as retrievals (also referred to as searches), insertions, deletions, modifications (also referred to as updates), and frequency of reorganization. It is difficult to obtain the values for user workload parameters. The values may be estimated, or a program may count them over a period of time and come up with an average per unit of time such as a day or a month. This paper does not discuss methods by which user workload parameters may be obtained. It is assumed that such input is available for the model. User workload parameters include processes that can use indices; thus accesses that are results of operations such as PROJECT are not included. The user workload parameters include total number of records inserted in each file per unit of time, total number of records deleted from each file per unit of time, total number of searches per field per unit of time, total number of updates per field per unit of time, and frequency of merge. We used the day as the unit of time. Note that some transactions are required for each file, whereas other transactions are required for each field. Insertions and deletions are measured per file because when a record is inserted or deleted, all indices have to be modified. Searches are measured per field. Database users usually request records with given field values. Updates (modifications) also affect individual fields; for example, if a salary field is changed, only the salary index is modified. It is assumed that the searches for records to be deleted or modified are done using the primary field. The frequency-of-merge parameter indicates how often reorganization is done. Reorganization involves merging overflow areas with main areas, removing empty areas that might have been created as a result of deletions, regenerating some indices, and other related operations. This input parameter is required per file, since each file is reorganized along with the file indices. Index parameters The system provides a few of the more widely used indices as a default option. The user may add any additional indices to be used in the valuation. Default option: The default option is used when the user does not specify her/his choice of indices. For dense fields the system evaluates the B-tree index and sequential index. The B-tree index is chosen as a representative of the B-tree family, which includes B-tree, B +tree, Btree, and multilevel sequential index with block-splitting techniques used for updating. These four directories have similar performance; therefore, one representative is selected. The formulas for the B-tree are taken from Horowitz and Sahni. 9 The B-tree family has a very efficient access time, but space utilization may be as low as 5%. Therefore, the other directory chosen as a default option is a simple one-level sequential directory. The sequential directory is not as fast as a B-tree, but it is more economical than a B-tree in space requirements. In some database environments, especially on small computers, the space constraints may be stronger than the time constraints; thus the slower but more economical sequential directory may be more suitable. For the nondense attribute an inverted file is selected as the default option. It was pointed out by Motzkin 14 and others that inverted files are superior to multilists in most situations. The uniform organization of inverted files 13 is assumed. The detailed formulas used can be found in Motzkin. 15 Additional user-selected indices to be evaluated A user may wish to evaluate and compare indices other than the default option ones. It is possible to enter additional indices and their characteristics. The~system will incorporate all additional indices into the optimization process. Output The output includes total time required by the database operations described above per unit of time (per month in our implementation); total space required by the database; related cost of the database; and a list of files, fields, and selected indices-as well as fields for which not having an index was more cost effective. (For sample output see the section "Experimental Results.") An Overview of the Algorithm, Computations, and Assumptions For each field of each file the system first selects the best index (that with the lowest cost) out of all indices to be eval-

5 Database Performance Optimization 559 uated. In selecting the best index, the following cost considerations are included: 1. The cost of retrievals (searches) that use the index 2. The cost of index modifications due to insertions and deletions to the corresponding file 3. The cost of index modifications due to changes of corresponding field values in the corresponding file 4. The cost of space occupied by the index 5. The cost of index reorganization due to overflow and other deterioration factors The searches for records to be deleted and modified are assumed to be done using the primary field. They are added to the cost of each index evaluated for each primary field of each file. After the best index has been determined for a field, the cost of related processing of the field without an index is computed. Cost without an index will include the cost of direct search in the file for records associated with certain field values. Obviously the cost of direct search in the file will be significantly higher than the search that uses an index; however, there will be no cost of index space and index maintenance. Now, for each field, the cost without index is compared with the cost with the best index. It is then determined whether the best index or no index will be selected for the field. When indices (or no indices) are selected for all fields, the total database space is computed, including the space occupied by the files and the indices. If the total database space is greater than the available space, then the least useful index is removed. The process of removing the least useful index continues until enough indices have been removed yielding a total database space that is less than or equal to the available space (see Figure 1-utline of the algorithm). The usefulness of an index is determined by the difference between the cost associated with the corresponding field if an index is not used for the field and the cost associated with the field when an index (the best) is provided. An index is considered less useful if it does not reduce the cost of the field considerably. An index is normally more useful when the corresponding field has more searches and less modification, and if the index does not occupy a very large amount of space. (The exact formulas used in the cost equations can be found in Motzkin. 15 ) At the end of the computations the user is presented with the total space, the total time of accesses computed from the output, and the related cost of the database. It is possible that FOR i = 1 TO number of files DO FIND the best index for the primary field p; denote it by INDj,p FIND whether it is "better" to have INDj, p or no index for field p of file i FOR j = 1 TO number of fields in file i DO IF I4P THEN find the "best" index for field j; denote it by INDi, i. Find whether it is better to have INDi, i or no index for field j of file i. Store the information on INDi,i' ENDIF END FOR STATEMENT END FOR STATEMENT Compute total database space (include space requirement for files and indices). IF total database> total space available THEN FOR i = 1 TO number of files DO FOR j = 1 TO number of fields in file i DO USEFULi,j = COST_OF YIELDi,i without index - COST_OF YIELDi,i with index END OF FOR STATEMENT END OF FOR STATEMENT SORT USEFULi,i denote the sorted list USEFULti (k = 1 for USEFULi, i with smallest value and k = number of indices for USEFULi, i with highest value of USEFUL.) FOR k = 1 TO number of indices DO Remove IND~ i from database Database Space ~ Database Space - Space of INDt i IF Database Space:5 Available Space THEN Exit Loop END OF FOR STATEMENT END IF Compute Database Cost and Time Print output reports Figure 1-utline of the algorithm

6 56 National Computer Conference, 1985 while the space is acceptable, the time figure is too large. The database designer may then allow for more space for the database and run the optimization program again. The additional space allocation will allow the database to use more indices and thus improve the time figure. The user may also try to put less weight on the space by reducing the cost of a block; this reduction may also increase the number of indices used. The frequency-of-merge parameters can also be changed. This iterative procedure may continue until an acceptable configuration is achieved or until there is no further improvement. Outline of the Algorithm An outline of the algorithm appears in Figure 1. A note on the complexity of the algorithm: The separability assumptions have been used. 4, 19 Thus the computations are performed on each field of each file separately. Denote the total number of fields over all files of the data base by NF, and denote the number of indices to be evaluated by NI. Then the time required for the optimization process is T = NF. (NI + 1). Each additional iteration will take another T time. EXPERIMENTAL RESULTS The optimization and prediction model has been implemented by a PASCAL program. Four different simulation runs are provided (Figures 2-5). Field 1 is assumed to be the primary field in all files. The input parameter FILE TYPE with values U or S means unsorted or sorted file. Sorted files in this implementation are assumed to be sorted on the primary field. The FIELD TYPE parameter with values D or N means dense or nondense attributes. The time and cost figures are related to the accesses and maintenance parameters that were included in the input. (Processes that do not use indices, such as PROJECT operations, are not part of this model.) The input parameters, such as costs and user workload, are given per day. The output summary is computed per month. The rest is self-explanatory. CONCLUDING REMARKS A model for prediction and optimization of the performance of relational databases has been developed. The model is concerned with selection of an optimal set of indices and reorganization points. It provides the total cost, time, and space associated with the selected indices for the given input parameters. It is a natural extension of previous work. It takes into consideration the effects of transaction on different fields and the total system's capacity and constraints, and it allows the user to evaluate indices that the user is interested in. The model distinguishes between primary field and secondary fields, between dense and nondense attributes, and between sorted and unsorted files. The model has been implemented by a PASCAL program, and sample simulation runs are provided. It is more complete than previous work, and it is easy to use. The complexity of the algorithm is (number of fields) (number of evaluated indices + 1». ACKNOWLEDGMENTS The author wishes to thank Mr. Chung-Liang Lin for converting the algorithms into a PASCAL program and generating the tests. The author also wishes to thank Dr. Donna Kaminski for her valuable comments and suggestions. REFERENCES 1. Batory, D. s. "B+ Trees and Indexed Sequential Files: A Performance Comparison." ACM ProceedingsofSIGMOD. New York: ACM, 1981,pp Batory, D. S. "Optimal File Designs and Reorganization Points." ACM Transactions on Database Systems, 7 (1982), pp Batory, D. S., and C. C. Gotlieb. "A Unifying Model of Physical Databases." ACM Transactions on Database Systems, 7 (1982), pp Bonfatti, F., D. Maio, and P. Tiberio. "A Separability-based Method for Secondary Index Selection in Physical Database Design." In Methodology and Tools for Database Design. Amsterdam: North-Holland, 1983, 5. Carlis, J. V., S. T. March, andg. W. Dickson. "Physical Database Design, a DSS Approach." Information and Management, 6 (1983), pp Chen, P. P., and S. B. Yao. "Design and Performance Tools for Database Systems." IEEE Proceedings of the International Conference on Very Large Data Bases. New York: IEEE, 1977, pp Christodoulakis, S. "Estimating Record Selectivities." Information Systems, 8 (1983), pp Hoffer, J. A. "An Integer Programming Formulation of Computer Database Design Problems." Information Science, 11 (1976), pp Horowitz, E., and S. Sahni. Fundamentals of Data Structures. Rockville, Md.: Computer Science Press, Lum, V. Y., and H. Ling. "An Optimization Problem on the Selection of Secondary Keys." Proceedings of ACM Annual Conference. New York: ACM, 1971, pp March, S. T., and D. G. Severance. "The Determination of Efficient Record Segmentation and Blocking Factors for Shared Data Files." ACM Transactions on Database Systems, 2 (1977) 3, pp Mendelson, H. "Analysis of Extendible Hashing." IEEE Transactions on Software Engineering, SE-8 (1982) 6, pp Motzkin, D., K. Williams, and K. Chang. "Uniform Organization of Inverted Files." AFIPS, Proceedings of 1984 National Computer Conference (Vol. 53), 1984, pp Motzkin, D. "The Use of Normal Multiplication Tables For Information Storage and Retrieval." Communication of the Association for Computing Machinery (CACM), Vol. 22, (1979) 3, pp Motzkin, D. "Computer Assisted Optimization and Prediction of Database Performance." Western Michigan University, Computer Science Department, Report 84-1, September Nicolas, G. S. "A Generalized Database Access Path ModeL" AFIPS Proceedings of the National Computer Conference, 1981, pp Schkolnick, M. "The Optimal Selection of Secondary Indices for Files." Information Systems, 1 (1975), pp Schkolnick, M. "A Survey of Physical Database Design Methodology and Techniques." Proceedings of the Fourth International Conference on Very Large Databases. New York: IEEE, 1978, pp Whang, K. W., G. Wiederhold, and D. Segalowics. "Separability-An Approach to Physical Database Design." Proceedings of the Seventh International Conference on Very Large Databases. New York: IEEE, 1982, pp Yao, S. B., and A. G. Mertin. "Selection of File Organization Using an Analytic Model." Proceedings of the International Conference on Very Large Databases. New York: IEEE, 1975, pp Yao, S. B., K. S. Das, and T. J. Theorey. "A Dynamic Database Reorganization Algorithm." ACM Transactions on Database Systems, 1, pp Yao, S. B. "Modelling and Performance Evaluation of Physical Database Structures." ACM Proceedings of ACM National Conference. New York: ACM, 1976, pp Yao, S. B. "An Attribute Based Model for Database Access Cost Analysis." ACM Transactions on Database Systems, 2 (1977), pp

7 SYSTEM PARAMETERS AVAILABLE CHARACTERS -AVERAGE BLOCKS PER BLOCK ACCESS TIME COST PER BLOCK (;OST PER ACCESS SEC FILE INFORMATION : DATABASE PARAMETERS USER WORKLOAD PARAMETERS FILE FILE PRIMARY TOTAL II II OF II OF MERGE NAME TYPE ATTRIBUTE OF RECS INSERTIONS DELETIONS FREQUENCY 1 U FIELD III S FIELD FIELD FIELD INFORMATION : DATABASE PARAMETERS USER WORKLOAD PARAMETERS FILE FIELD FIELD NUMBER OF II OF II OF DISTINCT NAME NAME TYPE CHARACTERS SEARCHES UPDATES VALUES 1 1 D N N D N N D N N N DIRECTORIES TESTED OUTPUT 1. SEQUENTIAL 2. B TREE 3. INVERTED 4. MULTI-LEVEL SEQUENTIAL RECOMMENDED DIRECTORIES FOR ALL FIELDS OF ALL FILES IN DATABASE # 1 ARE: FILE# FILE-TYPE FIELD# FIELD-TYPE ORGANIZATION # OF BLOCKS DIR ACCESS TIME ACCESS COST PER MONTH DAY HR MIN -- SEC 1 U 1 D M LEVEL INDEX 1 U 2 N inverted FILE 1 U 3 N INVERTED-FILE 2 S 1 D M LEVEL INDEX 2 S 2 N inverted FILE 2 S 3 N INVERTED-FILE 3 S 1 D M LEVEL INDEX 3 S 2 N SO DENSE NO DIR 3 S 3 N SO-DENSE-NO-DIR 3 S 4 N SO-DENSE-NO-DIR - -- TOTAL FOR DATABASE DIRECTORY SUMMARY OF DATABASE # DAY HR MIN $ SEC NUMBER OF BLOCKS NEEDED FOR FILE NUMBER 1 NUMBER OF BLOCKS NEEDED FOR FILE NUMBER 2 NilliBER OF BLOCKS NEEDED FOR FILE NUMBER 3 NUMBER OF BLOCKS NEEDED FOR ALL FILES IN DATABASE NUMBER OF BLOCKS NEEDED FOR DIRECTORIES OF DATABASE TOTAL SPACE AVAILABLE FOR STORAGE TOTAL SPACE PER MONTH FOR ENTIRE DATABASE TOTAL COST PER MONTH FOR ENTIRE DATABASE TOTAL TIME PER MONTH FOR ENTIRE DATABASE $ DAYS 23 HOURS 45 MINUTES SECONDS Figure 2-Simulation run: Database 1

8 SYSTEM PARAMETERS : AVAILABLE CHARACTERS AVERAGE COST PER COST PER BLOCK PER BLOCK ACCESS TIME BLOCK ACCESS SEC $.15 $.7 FILE INFORMATION : DATABASE PARAMETERS USER WORKLOAD PARAMETERS FILE FILE PRIMARY TOTAL II II OF II OF MERGE NAME TYPE ATTRIBUTE OF RECS INSERTIONS DELETIONS FREQUENCY U FIELD S FIELD FIELD INFORMATION : DATABASE PARAMETERS USER WORKLOAD PARAMETERS FILE FIELD FIELD NUMBER OF II OF /I OF DISTINCT NAME NAME TYPE CHARACTERS SEARCHES UPDATES VALUES 1 1 D N N D N N DIRECTORIES TESTED 1. SEQUENTIAL 2. B TREE 3. INVERTED 4. MULTI-LEVEL SEQUENTIAL OUTPUT RECOMMENDED DIRECTORIES FOR ALL FIELDS OF ALL FILES IN DATABASE # 2 ARE: FILE# FILE-TYPE FIELD# FIELD-TYPE ORGANIZATION # OF BLOCKS ~B-ACCESS_IIME ACCESS COST PER MONTH 1 U 1 D B TREE 1 U 2 N INVERTED-FILE 1 U 3 N INVERTED-FILE 2 S 1 D B-TREE 2 S 2 N SO DENSE NO DIR 2 S 3 N SO-DENSE-NO-DIR - -- TOTAL FOR DATABASE DIRECTORY SUMMARY OF DATABASE # 2 NUMBER OF BLOCKS NEEDED FOR FILE NUMBER 1 NUMBER OF BLOCKS NEEDED FOR FILE NUMBER 2 NUMBER OF BLOCKS NEEDED FOR ALL FILES IN DATABASE NUMBER OF BLOCKS NEEDED FOR DIRECTORIES OF DATABASE = TOTAL SPACE AVAILABLE FOR STORAGE TOTAL SPACE PER MONTH FOR ENTIRE DATABASE TOTAL COST PER MONTH FOR ENTIRE DATABASE TOTAL TIME PER MONTH FOR ENTIRE DATABASE DAY HR $ MIN SEC $ DAY 22 HOURS 51 MINUTES 4. SECONDS Figure 3-Simulation run: Database 2

9 SYSTEM PARAMETERS : AVAILABLE CHARACTERS AVERAGE COST PER COST PER BLOCK PER BLOCK ACCESS TIME BLOCK ACCESS SEC S.15 $.7 FILE INFORMATION DATABASE PARAMETERS USER WORKLOAD PARAMETERS FILE FILE PRIMARY TOTAL II II OF II OF MERGE NAME TYPE ATTRIBUTE OF RECS INSERTIONS DELETIONS FREQUENCY 1 2 U S FIELD INFORMATION : FIELD IiI 1 6 FIELD DATABASE PARAMETERS USER FILE FIELD FIELD NUMBER OF II OF NAME NAME TYPE CHARACTERS SEARCHES D D N D D N N 4 25 WORKLOAD II OF UPDATES PARAMETERS DISTINCT VALUES DIRECTORIES TESTED 1. SEQUENTIAL 2. B TREE 3. INVERTED 4. MULTI-LEVEL SEQUENTIAL OUTPUT : RECOMMENDED DIRECTORIES FOR ALL FIELDS OF ALL FILES IN DATABASE # 3 ARE: FILE II FILE-TYPE FIELDII li'ield-type ORGANIZATION 1 U 1 D M LEVEL INDEX D M-LEVEL-INDEX 1 U 2 1 U 3 N RANDOM NO DIR 2 S 1 2 S 2 D D M LEVEL INDEX M-LEVEL-INDEX 2 S 3 N INvERTED FILE 2 S 4 N INVERTED:=FILE TOTAL FOR DATABASE DIRECTORY SUMMARY OF DATABASE #3 # OF BLOCKS _!UK.. ACGf;S'._:.tl.~ ACCESS COST PER MONTH $ NUMBER OF BLOCKS NEEDED FOR FILE NUMBER 1 NUMBER OF BLOCKS NEEDED FOR FILE NUMBER 2 NUMBER OF BLOCKS NEEDED FOR ALL FILES IN DATABASE NUMBER OF BLOCKS NEEDED FOR DIRECTORIES OF DATABASE = TOTAL SPACE AVAILABLE FOR STORAGE TOTAL SPACE PER MONTH FOR ENTIRE DATABASE TOTAL COST PER MONTH FOR ENTIRE DATABASE $ TOTAL TIME PER MONTH FOR ENTIRE DATABASE o DAYS 9 HOURS 5 MINUTES SECONDS Figure 4-Simulation run: Database 3

10 DATABASE EXCEEDS AVAILABLE SPACE, THE FOLLOWING ADJUSTMENTS HAVE BEEN MADE : FILE II DELETED DIR FIELD II SPACE SAVED SPACE NEEDED FOR NEW DATABASE AFTER ADJUSTMENT, THE RECOMMENDED DIRECTORIES FOR ALL FIELDS OF ALL FILES IN DATABASE II 3 ARE: FILE/I FILE-TYPE FIELD/I FIELD-TYPE ORGANIZATION II OF BLOCKS DIR ACCESS TIME ACCESS COST PER MONTH 1 U 1 D M LEVEL INDEX 1 U 2 D RANDOM NO DIR 1 U 3 N RANDOM-NO - DIR 2 S 1 D SO DENSE-NO-DIR 2 S 2 D SO-DENSE-NO-DIR 2 S 3 N INVERTED FILE 2 S 4 N INVERTED=FILE TOTAL FOR DATABASE DIRECTORY SUMMARY OF DATABASE II NUMBER OF BLOCKS NEEDED FOR FILE NUMBER 1 NUMBER OF BLOCKS NEEDED FOR FILE NUMBER 2 NUMBER OF BLOCKS NEEDED FOR ALL FILES IN DATABASE NUMBER OF BLOCKS NEEDED FOR DIRECTORIES OF DATABASE = TOTAL SPACE AVAILABLE FOR STORAGE TOTAL SPACE PER MONTH FOR ENTIRE DATABASE TOTAL COST PER MONTH FOR ENTIRE DATABASE TOTAL TIME PER MONTH FOR ENTIRE DATABASE $ o DAYS 11 HOURS 41 MINUTES 19.3 SECONDS Figure 4-( continued)

11 SYSTEM PARAMETERS : AVAILABLE CHARACTERS AVERAGE COST PER COST PER BLOCKS PER BLOCK ACCESS TIME BLOCK ACCESS SEC $.15 $.7 FILE INFORMATION : DATABASE PARAMETERS USER WORKLOAD PARAMETERS FILE FILE PRIMARY TOTAL /I II OF II OF MERGE NAME TYPE ATTRIBUTE OF RECS INSERTIONS DELETIONS FREQUENCY 1 2 U S FIELD INFORMATION : FIELD FIELD DATABASE PARAMETERS "I< USER FILE FIELD FIELD NUMBER OF "I< II OF NAME NAME TYPE CHARACTERS "I< SEARCHES "I< 1 1 D 2 "I< N N D 15 SO 2 2 N N 5 SO 2 4 N 8 25 DIRECTORIES TESTED WORKLOAD II OF UPDATES SO 25 PARAMETERS DISTINCT VALUES 5 SO SEQUENTIAL 2. B TREE 3. INVERTED 4. MULTI-LEVEL SEQUENTIAL OUTPUT RECOMMENDED DIRECTORIES FOR ALL FIELDS OF ALL FILES IN DATABASE II 4 ARE: FILEII FILE-TYPE FIELDII FIELD-TYPE ORGANIZATION II OF BLOCKS DIR ACCESS TIME ACCESS COST PER MONTH 1 U D M LEVEL INDEX 1 U 2 N INvERTED FILE 1 U 3 N INVERTED-FILE 2 S 1 D M LEVEL INDEX 2 S 2 N INvERTED FILE 2 S 3 N INVERTED-FILE 2 S 4 N INVERTED-FILE TOTAL FOR DATABASE DIRECTORY $ SUMMARY OF DATABASE II 4 NUMBER OF BLOCKS NEEDED FOR FILE NUMBER 1 NUMBER OF BLOCKS NEEDED FOR FILE NUMBER 2 NUMBER OF BLOCKS NEEDED FOR ALL FILES IN DATABASE NUMBER OF BLOCKS NEEDED FOR DIRECTORIES OF DATABASE = TOTAL SPACE AVAILABLE FOR STORAGE TOTAL SPACE PER MONTH FOR ENTIRE DATABASE TOTAL COS-T PER MONTH FOR ENTIRE DATABASE $ TOTAL TIME PER MONTH FOR ENTIRE DATABASE o DAYS 12 HOURS 44 MINUTES SECONDS Figure 5-Simulation run: Database 4

12 DATABASE EXCEEDS AVAILABLE SPACE, THE FOLLOWING ADJUSTMENTS HAVE BEEN MADE : FILE II DELETED DIR FIELD II SPACE SAVED SPACE NEEDED FOR NEW DATABASE AFTER ADJUSTMENT, THE RECOMMENDED DIRECTORIES FOR ALL FIELDS OF ALL FILES IN DATABASE II 4 ARE: FILEfl FILE-TYPE FIELDII HELD-TYPE ORGANIZATION II OF BLOCKS DIR ACCESS TIME ACCESS COST PER MONTH 1 U 1 D RANDOM NO DIR 1 U 2 N RANDOM-NO-DIR 1 U 3 N RANDOM-NO-DIR 2 1 D M LEVEL INDEX 2 2 N INvERTED FILE 2 3 N INVERTED-FILE 2 4 N INVERTED-FILE TOTAL FOR DATABASE DIRECTORY SUMMARY OF DATABASE II DAY $ HR MIN SEC NUMBER OF BLOCKS NEEDED FOR FILE NUMBER 1 NUMBER OF BLOCKS NEEDED FOR FILE NUMBER 2 NUMBER OF BLOCKS NEEDED FOR ALL FILES IN DATABASE NUMBER OF BLOCKS NEEDED FOR DIRECTORIES OF DATABASE = TOTAL SPACE AVAILABLE FOR STORAGE TOTAL SPACE PER MONTH FOR ENTIRE DATABASE TOTAL COST PER MONTH FOR ENTIRE DATABASE TOTAL TIME PER MONTH FOR ENTIRE DATABASE DAYS 1 HOURS 37 MINUTES 4.6 SECONDS Figure 5-( continued)

Kathleen Durant PhD Northeastern University CS Indexes

Kathleen Durant PhD Northeastern University CS Indexes Kathleen Durant PhD Northeastern University CS 3200 Indexes Outline for the day Index definition Types of indexes B+ trees ISAM Hash index Choosing indexed fields Indexes in InnoDB 2 Indexes A typical

More information

Chapter 18 Indexing Structures for Files. Indexes as Access Paths

Chapter 18 Indexing Structures for Files. Indexes as Access Paths Chapter 18 Indexing Structures for Files Indexes as Access Paths A single-level index is an auxiliary file that makes it more efficient to search for a record in the data file. The index is usually specified

More information

Hashed-Based Indexing

Hashed-Based Indexing Topics Hashed-Based Indexing Linda Wu Static hashing Dynamic hashing Extendible Hashing Linear Hashing (CMPT 54 4-) Chapter CMPT 54 4- Static Hashing An index consists of buckets 0 ~ N-1 A bucket consists

More information

Database Technology. Topic 7: Data Structures for Databases. Olaf Hartig.

Database Technology. Topic 7: Data Structures for Databases. Olaf Hartig. Topic 7: Data Structures for Databases Olaf Hartig olaf.hartig@liu.se Database System 2 Storage Hierarchy Traditional Storage Hierarchy CPU Cache memory Main memory Primary storage Disk Tape Secondary

More information

Indexing and Hashing

Indexing and Hashing C H A P T E R 1 Indexing and Hashing This chapter covers indexing techniques ranging from the most basic one to highly specialized ones. Due to the extensive use of indices in database systems, this chapter

More information

CARNEGIE MELLON UNIVERSITY DEPT. OF COMPUTER SCIENCE DATABASE APPLICATIONS

CARNEGIE MELLON UNIVERSITY DEPT. OF COMPUTER SCIENCE DATABASE APPLICATIONS CARNEGIE MELLON UNIVERSITY DEPT. OF COMPUTER SCIENCE 15-415 DATABASE APPLICATIONS C. Faloutsos Indexing and Hashing 15-415 Database Applications http://www.cs.cmu.edu/~christos/courses/dbms.s00/ general

More information

CS 525: Advanced Database Organization 04: Indexing

CS 525: Advanced Database Organization 04: Indexing CS 5: Advanced Database Organization 04: Indexing Boris Glavic Part 04 Indexing & Hashing value record? value Slides: adapted from a course taught by Hector Garcia-Molina, Stanford InfoLab CS 5 Notes 4

More information

Chapter 18. Indexing Structures for Files. Chapter Outline. Indexes as Access Paths. Primary Indexes Clustering Indexes Secondary Indexes

Chapter 18. Indexing Structures for Files. Chapter Outline. Indexes as Access Paths. Primary Indexes Clustering Indexes Secondary Indexes Chapter 18 Indexing Structures for Files Chapter Outline Types of Single-level Ordered Indexes Primary Indexes Clustering Indexes Secondary Indexes Multilevel Indexes Dynamic Multilevel Indexes Using B-Trees

More information

Storage hierarchy. Textbook: chapters 11, 12, and 13

Storage hierarchy. Textbook: chapters 11, 12, and 13 Storage hierarchy Cache Main memory Disk Tape Very fast Fast Slower Slow Very small Small Bigger Very big (KB) (MB) (GB) (TB) Built-in Expensive Cheap Dirt cheap Disks: data is stored on concentric circular

More information

Hash-Based Indexing 1

Hash-Based Indexing 1 Hash-Based Indexing 1 Tree Indexing Summary Static and dynamic data structures ISAM and B+ trees Speed up both range and equality searches B+ trees very widely used in practice ISAM trees can be useful

More information

Chapter 13 Disk Storage, Basic File Structures, and Hashing.

Chapter 13 Disk Storage, Basic File Structures, and Hashing. Chapter 13 Disk Storage, Basic File Structures, and Hashing. Copyright 2004 Pearson Education, Inc. Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files

More information

Chapter 17. Disk Storage, Basic File Structures, and Hashing. Records. Blocking

Chapter 17. Disk Storage, Basic File Structures, and Hashing. Records. Blocking Chapter 17 Disk Storage, Basic File Structures, and Hashing Records Fixed and variable length records Records contain fields which have values of a particular type (e.g., amount, date, time, age) Fields

More information

Selection Queries. to answer a selection query (ssn=10) needs to traverse a full path.

Selection Queries. to answer a selection query (ssn=10) needs to traverse a full path. Hashing B+-tree is perfect, but... Selection Queries to answer a selection query (ssn=) needs to traverse a full path. In practice, 3-4 block accesses (depending on the height of the tree, buffering) Any

More information

CSIT5300: Advanced Database Systems

CSIT5300: Advanced Database Systems CSIT5300: Advanced Database Systems L08: B + -trees and Dynamic Hashing Dr. Kenneth LEUNG Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong SAR,

More information

Access Methods. Basic Concepts. Index Evaluation Metrics. search key pointer. record. value. Value

Access Methods. Basic Concepts. Index Evaluation Metrics. search key pointer. record. value. Value Access Methods This is a modified version of Prof. Hector Garcia Molina s slides. All copy rights belong to the original author. Basic Concepts search key pointer Value record? value Search Key - set of

More information

ITERATIVE MULTI-LEVEL MODELLING - A METHODOLOGY FOR COMPUTER SYSTEM DESIGN. F. W. Zurcher B. Randell

ITERATIVE MULTI-LEVEL MODELLING - A METHODOLOGY FOR COMPUTER SYSTEM DESIGN. F. W. Zurcher B. Randell ITERATIVE MULTI-LEVEL MODELLING - A METHODOLOGY FOR COMPUTER SYSTEM DESIGN F. W. Zurcher B. Randell Thomas J. Watson Research Center Yorktown Heights, New York Abstract: The paper presents a method of

More information

The Impact of Write Back on Cache Performance

The Impact of Write Back on Cache Performance The Impact of Write Back on Cache Performance Daniel Kroening and Silvia M. Mueller Computer Science Department Universitaet des Saarlandes, 66123 Saarbruecken, Germany email: kroening@handshake.de, smueller@cs.uni-sb.de,

More information

Chapter 1 Disk Storage, Basic File Structures, and Hashing.

Chapter 1 Disk Storage, Basic File Structures, and Hashing. Chapter 1 Disk Storage, Basic File Structures, and Hashing. Adapted from the slides of Fundamentals of Database Systems (Elmasri et al., 2003) 1 Chapter Outline Disk Storage Devices Files of Records Operations

More information

CLUSTERING BIG DATA USING NORMALIZATION BASED k-means ALGORITHM

CLUSTERING BIG DATA USING NORMALIZATION BASED k-means ALGORITHM Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

ESTIMATING DISK HEAD MOVEMENT. Y. P. MANOLOPOULOS and J. G. KOLL1AS Thessaloniki, Greece Athens, Greece

ESTIMATING DISK HEAD MOVEMENT. Y. P. MANOLOPOULOS and J. G. KOLL1AS Thessaloniki, Greece Athens, Greece BIT 28 (1988), 27~-36 ESTIMATING DISK HEAD MOVEMENT IN BATCHED SEARCHING Y. P. MANOLOPOULOS and J. G. KOLL1AS Division of Computer and Electronics Engineering Division of Computer Science, Department of

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

Hash-Based Indexes. Chapter 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Hash-Based Indexes. Chapter 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Hash-Based Indexes Chapter Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Introduction As for any index, 3 alternatives for data entries k*: Data record with key value k

More information

Some Practice Problems on Hardware, File Organization and Indexing

Some Practice Problems on Hardware, File Organization and Indexing Some Practice Problems on Hardware, File Organization and Indexing Multiple Choice State if the following statements are true or false. 1. On average, repeated random IO s are as efficient as repeated

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 14-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 14-1 Slide 14-1 Chapter 14 Indexing Structures for Files Chapter Outline Types of Single-level Ordered Indexes Primary Indexes Clustering Indexes Secondary Indexes Multilevel Indexes Dynamic Multilevel Indexes

More information

2, 3, 5, 7, 11, 17, 19, 23, 29, 31

2, 3, 5, 7, 11, 17, 19, 23, 29, 31 148 Chapter 12 Indexing and Hashing implementation may be by linking together fixed size buckets using overflow chains. Deletion is difficult with open hashing as all the buckets may have to inspected

More information

CSE 562 Database Systems

CSE 562 Database Systems Goal of Indexing CSE 562 Database Systems Indexing Some slides are based or modified from originals by Database Systems: The Complete Book, Pearson Prentice Hall 2 nd Edition 08 Garcia-Molina, Ullman,

More information

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See  for conditions on re-use Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files Static

More information

Chapter 17 Indexing Structures for Files and Physical Database Design

Chapter 17 Indexing Structures for Files and Physical Database Design Chapter 17 Indexing Structures for Files and Physical Database Design We assume that a file already exists with some primary organization unordered, ordered or hash. The index provides alternate ways to

More information

Indexing: Overview & Hashing. CS 377: Database Systems

Indexing: Overview & Hashing. CS 377: Database Systems Indexing: Overview & Hashing CS 377: Database Systems Recap: Data Storage Data items Records Memory DBMS Blocks blocks Files Different ways to organize files for better performance Disk Motivation for

More information

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15 Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2014/15 Lecture II: Indexing Part I of this course Indexing 3 Database File Organization and Indexing Remember: Database tables

More information

Improved Integral Histogram Algorithm. for Big Sized Images in CUDA Environment

Improved Integral Histogram Algorithm. for Big Sized Images in CUDA Environment Contemporary Engineering Sciences, Vol. 7, 2014, no. 24, 1415-1423 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ces.2014.49174 Improved Integral Histogram Algorithm for Big Sized Images in CUDA

More information

CS 245: Database System Principles

CS 245: Database System Principles CS 2: Database System Principles Notes 4: Indexing Chapter 4 Indexing & Hashing value record value Hector Garcia-Molina CS 2 Notes 4 1 CS 2 Notes 4 2 Topics Conventional indexes B-trees Hashing schemes

More information

The Adaptive Radix Tree

The Adaptive Radix Tree Department of Informatics, University of Zürich MSc Basismodul The Adaptive Radix Tree Rafael Kallis Matrikelnummer: -708-887 Email: rk@rafaelkallis.com September 8, 08 supervised by Prof. Dr. Michael

More information

Indexing. Jan Chomicki University at Buffalo. Jan Chomicki () Indexing 1 / 25

Indexing. Jan Chomicki University at Buffalo. Jan Chomicki () Indexing 1 / 25 Indexing Jan Chomicki University at Buffalo Jan Chomicki () Indexing 1 / 25 Storage hierarchy Cache Main memory Disk Tape Very fast Fast Slower Slow (nanosec) (10 nanosec) (millisec) (sec) Very small Small

More information

The physical database. Contents - physical database design DATABASE DESIGN I - 1DL300. Introduction to Physical Database Design

The physical database. Contents - physical database design DATABASE DESIGN I - 1DL300. Introduction to Physical Database Design DATABASE DESIGN I - 1DL300 Fall 2011 Introduction to Physical Database Design Elmasri/Navathe ch 16 and 17 Padron-McCarthy/Risch ch 21 and 22 An introductory course on database systems http://www.it.uu.se/edu/course/homepage/dbastekn/ht11

More information

Speeding up Queries in a Leaf Image Database

Speeding up Queries in a Leaf Image Database 1 Speeding up Queries in a Leaf Image Database Daozheng Chen May 10, 2007 Abstract We have an Electronic Field Guide which contains an image database with thousands of leaf images. We have a system which

More information

Chapter 18 Indexing Structures for Files

Chapter 18 Indexing Structures for Files Chapter 18 Indexing Structures for Files Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Disk I/O for Read/ Write Unit for Disk I/O for Read/ Write: Chapter 18 One Buffer for

More information

Report on Cache-Oblivious Priority Queue and Graph Algorithm Applications[1]

Report on Cache-Oblivious Priority Queue and Graph Algorithm Applications[1] Report on Cache-Oblivious Priority Queue and Graph Algorithm Applications[1] Marc André Tanner May 30, 2014 Abstract This report contains two main sections: In section 1 the cache-oblivious computational

More information

Do Hypercubes Sort Faster Than Tree Machines?

Do Hypercubes Sort Faster Than Tree Machines? Syracuse University SURFACE Electrical Engineering and Computer Science Technical Reports College of Engineering and Computer Science 12-1991 Do Hypercubes Sort Faster Than Tree Machines? Per Brinch Hansen

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization

More information

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Indexing Week 14, Spring 2005 Edited by M. Naci Akkøk, 5.3.2004, 3.3.2005 Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Overview Conventional indexes B-trees Hashing schemes

More information

Object Placement in Shared Nothing Architecture Zhen He, Jeffrey Xu Yu and Stephen Blackburn Λ

Object Placement in Shared Nothing Architecture Zhen He, Jeffrey Xu Yu and Stephen Blackburn Λ 45 Object Placement in Shared Nothing Architecture Zhen He, Jeffrey Xu Yu and Stephen Blackburn Λ Department of Computer Science The Australian National University Canberra, ACT 2611 Email: fzhen.he, Jeffrey.X.Yu,

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part V Lecture 15, March 15, 2015 Mohammad Hammoud Today Last Session: DBMS Internals- Part IV Tree-based (i.e., B+ Tree) and Hash-based (i.e., Extendible

More information

Chapter 11: Indexing and Hashing" Chapter 11: Indexing and Hashing"

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing" Database System Concepts, 6 th Ed.! Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use " Chapter 11: Indexing and Hashing" Basic Concepts!

More information

The Grid File: An Adaptable, Symmetric Multikey File Structure

The Grid File: An Adaptable, Symmetric Multikey File Structure The Grid File: An Adaptable, Symmetric Multikey File Structure Presentation: Saskia Nieckau Moderation: Hedi Buchner The Grid File: An Adaptable, Symmetric Multikey File Structure 1. Multikey Structures

More information

Heuristic Optimization of Physical Data Bases: Using a Generic and Abstract Design Model

Heuristic Optimization of Physical Data Bases: Using a Generic and Abstract Design Model Heuristic Optimization of Physical Data Bases: Using a Generic and Abstract Design Model By: Prashant Palvia Palvia, P. "Heuristic Optimization of Physical Databases; Using a Generic & Abstract Design

More information

Database files Organizations Indexing B-tree and B+ tree. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Database files Organizations Indexing B-tree and B+ tree. Copyright 2011 Ramez Elmasri and Shamkant Navathe Database files Organizations Indexing B-tree and B+ tree Outline Type of Single-Level Ordered Indexes Multilevel Indexes Dynamic Multilevel Indexes Using B-Trees and B + -Trees Indexes on Multiple Keys

More information

THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER

THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER Akhil Kumar and Michael Stonebraker EECS Department University of California Berkeley, Ca., 94720 Abstract A heuristic query optimizer must choose

More information

Algorithms and Data Structures

Algorithms and Data Structures Lesson 4: Sets, Dictionaries and Hash Tables Luciano Bononi http://www.cs.unibo.it/~bononi/ (slide credits: these slides are a revised version of slides created by Dr. Gabriele D Angelo)

More information

Material You Need to Know

Material You Need to Know Review Quiz 2 Material You Need to Know Normalization Storage and Disk File Layout Indexing B-trees and B+ Trees Extensible Hashing Linear Hashing Decomposition Goals: Lossless Joins, Dependency preservation

More information

Chapter 13: Indexing. Chapter 13. ? value. Topics. Indexing & Hashing. value. Conventional indexes B-trees Hashing schemes (self-study) record

Chapter 13: Indexing. Chapter 13. ? value. Topics. Indexing & Hashing. value. Conventional indexes B-trees Hashing schemes (self-study) record Chapter 13: Indexing (Slides by Hector Garcia-Molina, http://wwwdb.stanford.edu/~hector/cs245/notes.htm) Chapter 13 1 Chapter 13 Indexing & Hashing value record? value Chapter 13 2 Topics Conventional

More information

Hash-Based Indexes. Chapter 11

Hash-Based Indexes. Chapter 11 Hash-Based Indexes Chapter 11 1 Introduction : Hash-based Indexes Best for equality selections. Cannot support range searches. Static and dynamic hashing techniques exist: Trade-offs similar to ISAM vs.

More information

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications Lecture #14: Implementation of Relational Operations (R&G ch. 12 and 14) 15-415 Faloutsos 1 introduction selection projection

More information

Column Stores vs. Row Stores How Different Are They Really?

Column Stores vs. Row Stores How Different Are They Really? Column Stores vs. Row Stores How Different Are They Really? Daniel J. Abadi (Yale) Samuel R. Madden (MIT) Nabil Hachem (AvantGarde) Presented By : Kanika Nagpal OUTLINE Introduction Motivation Background

More information

Remember. 376a. Database Design. Also. B + tree reminders. Algorithms for B + trees. Remember

Remember. 376a. Database Design. Also. B + tree reminders. Algorithms for B + trees. Remember 376a. Database Design Dept. of Computer Science Vassar College http://www.cs.vassar.edu/~cs376 Class 14 B + trees, multi-key indices, partitioned hashing and grid files B and B + -trees are used one implementation

More information

Mobile Cloud Multimedia Services Using Enhance Blind Online Scheduling Algorithm

Mobile Cloud Multimedia Services Using Enhance Blind Online Scheduling Algorithm Mobile Cloud Multimedia Services Using Enhance Blind Online Scheduling Algorithm Saiyad Sharik Kaji Prof.M.B.Chandak WCOEM, Nagpur RBCOE. Nagpur Department of Computer Science, Nagpur University, Nagpur-441111

More information

Module 5: Hash-Based Indexing

Module 5: Hash-Based Indexing Module 5: Hash-Based Indexing Module Outline 5.1 General Remarks on Hashing 5. Static Hashing 5.3 Extendible Hashing 5.4 Linear Hashing Web Forms Transaction Manager Lock Manager Plan Executor Operator

More information

Indexes as Access Paths

Indexes as Access Paths Chapter 18 Indexing Structures for Files Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Indexes as Access Paths A single-level index is an auxiliary file that makes it more

More information

Market Splitting Algorithm for Congestion Management in Electricity Spot Market

Market Splitting Algorithm for Congestion Management in Electricity Spot Market Proceedings of the 6th WSEAS International Conference on Power Systems, Lisbon, Portugal, September 22-24, 2006 338 Market Splitting Algorithm for Congestion Management in Electricity Spot Market Marta

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

Topics to Learn. Important concepts. Tree-based index. Hash-based index

Topics to Learn. Important concepts. Tree-based index. Hash-based index CS143: Index 1 Topics to Learn Important concepts Dense index vs. sparse index Primary index vs. secondary index (= clustering index vs. non-clustering index) Tree-based vs. hash-based index Tree-based

More information

CS143: Index. Book Chapters: (4 th ) , (5 th ) , , 12.10

CS143: Index. Book Chapters: (4 th ) , (5 th ) , , 12.10 CS143: Index Book Chapters: (4 th ) 12.1-3, 12.5-8 (5 th ) 12.1-3, 12.6-8, 12.10 1 Topics to Learn Important concepts Dense index vs. sparse index Primary index vs. secondary index (= clustering index

More information

A Weighted Least Squares PET Image Reconstruction Method Using Iterative Coordinate Descent Algorithms

A Weighted Least Squares PET Image Reconstruction Method Using Iterative Coordinate Descent Algorithms A Weighted Least Squares PET Image Reconstruction Method Using Iterative Coordinate Descent Algorithms Hongqing Zhu, Huazhong Shu, Jian Zhou and Limin Luo Department of Biological Science and Medical Engineering,

More information

Buffer Heap Implementation & Evaluation. Hatem Nassrat. CSCI 6104 Instructor: N.Zeh Dalhousie University Computer Science

Buffer Heap Implementation & Evaluation. Hatem Nassrat. CSCI 6104 Instructor: N.Zeh Dalhousie University Computer Science Buffer Heap Implementation & Evaluation Hatem Nassrat CSCI 6104 Instructor: N.Zeh Dalhousie University Computer Science Table of Contents Introduction...3 Cache Aware / Cache Oblivious Algorithms...3 Buffer

More information

Keywords: Binary Sort, Sorting, Efficient Algorithm, Sorting Algorithm, Sort Data.

Keywords: Binary Sort, Sorting, Efficient Algorithm, Sorting Algorithm, Sort Data. Volume 4, Issue 6, June 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com An Efficient and

More information

6.2 DATA DISTRIBUTION AND EXPERIMENT DETAILS

6.2 DATA DISTRIBUTION AND EXPERIMENT DETAILS Chapter 6 Indexing Results 6. INTRODUCTION The generation of inverted indexes for text databases is a computationally intensive process that requires the exclusive use of processing resources for long

More information

Chapter 12: Query Processing. Chapter 12: Query Processing

Chapter 12: Query Processing. Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part V Lecture 13, March 10, 2014 Mohammad Hammoud Today Welcome Back from Spring Break! Today Last Session: DBMS Internals- Part IV Tree-based (i.e., B+

More information

Query Processing: A Systems View. Announcements (March 1) Physical (execution) plan. CPS 216 Advanced Database Systems

Query Processing: A Systems View. Announcements (March 1) Physical (execution) plan. CPS 216 Advanced Database Systems Query Processing: A Systems View CPS 216 Advanced Database Systems Announcements (March 1) 2 Reading assignment due Wednesday Buffer management Homework #2 due this Thursday Course project proposal due

More information

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing. Basic Concepts Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition

More information

Evaluating the Effect of Inheritance on the Characteristics of Object Oriented Programs

Evaluating the Effect of Inheritance on the Characteristics of Object Oriented Programs Journal of Computer Science 2 (12): 872-876, 26 ISSN 1549-3636 26 Science Publications Evaluating the Effect of Inheritance on the Characteristics of Object Oriented 1 Thabit Sultan Mohammed and 2 Hayam

More information

Intro to DB CHAPTER 12 INDEXING & HASHING

Intro to DB CHAPTER 12 INDEXING & HASHING Intro to DB CHAPTER 12 INDEXING & HASHING Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing

More information

Efficient Prefix Computation on Faulty Hypercubes

Efficient Prefix Computation on Faulty Hypercubes JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 17, 1-21 (21) Efficient Prefix Computation on Faulty Hypercubes YU-WEI CHEN AND KUO-LIANG CHUNG + Department of Computer and Information Science Aletheia

More information

Analysis of Algorithms. Unit 4 - Analysis of well known Algorithms

Analysis of Algorithms. Unit 4 - Analysis of well known Algorithms Analysis of Algorithms Unit 4 - Analysis of well known Algorithms 1 Analysis of well known Algorithms Brute Force Algorithms Greedy Algorithms Divide and Conquer Algorithms Decrease and Conquer Algorithms

More information

Question Bank Subject: Advanced Data Structures Class: SE Computer

Question Bank Subject: Advanced Data Structures Class: SE Computer Question Bank Subject: Advanced Data Structures Class: SE Computer Question1: Write a non recursive pseudo code for post order traversal of binary tree Answer: Pseudo Code: 1. Push root into Stack_One.

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part IV Lecture 14, March 10, 015 Mohammad Hammoud Today Last Two Sessions: DBMS Internals- Part III Tree-based indexes: ISAM and B+ trees Data Warehousing/

More information

Study of Secondary and Approximate Authorizations Model (SAAM)

Study of Secondary and Approximate Authorizations Model (SAAM) Study of Secondary and Approximate Authorizations Model (SAAM) Kyle Zeeuwen kylez@ece.ubc.ca Abstract Request response access control systems with off-site Policy Decision Points have their reliability

More information

Indexing. Announcements. Basics. CPS 116 Introduction to Database Systems

Indexing. Announcements. Basics. CPS 116 Introduction to Database Systems Indexing CPS 6 Introduction to Database Systems Announcements 2 Homework # sample solution will be available next Tuesday (Nov. 9) Course project milestone #2 due next Thursday Basics Given a value, locate

More information

DOWNLOAD PDF BIG IDEAS MATH VERTICAL SHRINK OF A PARABOLA

DOWNLOAD PDF BIG IDEAS MATH VERTICAL SHRINK OF A PARABOLA Chapter 1 : BioMath: Transformation of Graphs Use the results in part (a) to identify the vertex of the parabola. c. Find a vertical line on your graph paper so that when you fold the paper, the left portion

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

Corporate Author Entry Records Retrieved by Use of Derived Truncated Search Keys

Corporate Author Entry Records Retrieved by Use of Derived Truncated Search Keys 156 Corporate Author Entry Records Retrieved by Use of Derived Truncated Search Keys Alan L. LANDGRAF, Kunj B. RASTOGI, and Philip L. LONG, The Ohio College Library Center. An experiment was conducted

More information

Using Statistics for Computing Joins with MapReduce

Using Statistics for Computing Joins with MapReduce Using Statistics for Computing Joins with MapReduce Theresa Csar 1, Reinhard Pichler 1, Emanuel Sallinger 1, and Vadim Savenkov 2 1 Vienna University of Technology {csar, pichler, sallinger}@dbaituwienacat

More information

Equipartition Search a New Algorithm for Searching

Equipartition Search a New Algorithm for Searching Equipartition Search a New Algorithm for Searching Arindam Agarwal A-18 Pundrik Vihar, Pitampura, New Delhi-34 Apoorv Gakhar c8/10 sector 8 New Delhi-85 Narina Thakur A4 Paschim vihar New Delhi-63 ABSTRACT

More information

11. Implementation of sequential file

11. Implementation of sequential file 11. Implementation of sequential file AIM: Department maintains a student information. The file contains roll number, name, division and address. Write a program to create a sequential file to store and

More information

Advances in Data Management Principles of Database Systems - 2 A.Poulovassilis

Advances in Data Management Principles of Database Systems - 2 A.Poulovassilis 1 Advances in Data Management Principles of Database Systems - 2 A.Poulovassilis 1 Storing data on disk The traditional storage hierarchy for DBMSs is: 1. main memory (primary storage) for data currently

More information

Operating Systems. Lecture 09: Input/Output Management. Elvis C. Foster

Operating Systems. Lecture 09: Input/Output Management. Elvis C. Foster Operating Systems 141 Lecture 09: Input/Output Management Despite all the considerations that have discussed so far, the work of an operating system can be summarized in two main activities input/output

More information

Hash-Based Indexes. Chapter 11 Ramakrishnan & Gehrke (Sections ) CPSC 404, Laks V.S. Lakshmanan 1

Hash-Based Indexes. Chapter 11 Ramakrishnan & Gehrke (Sections ) CPSC 404, Laks V.S. Lakshmanan 1 Hash-Based Indexes Chapter 11 Ramakrishnan & Gehrke (Sections 11.1-11.4) CPSC 404, Laks V.S. Lakshmanan 1 What you will learn from this set of lectures Review of static hashing How to adjust hash structure

More information

Indexing by Shape of Image Databases Based on Extended Grid Files

Indexing by Shape of Image Databases Based on Extended Grid Files Indexing by Shape of Image Databases Based on Extended Grid Files Carlo Combi, Gian Luca Foresti, Massimo Franceschet, Angelo Montanari Department of Mathematics and ComputerScience, University of Udine

More information

The Relationship between Slices and Module Cohesion

The Relationship between Slices and Module Cohesion The Relationship between Slices and Module Cohesion Linda M. Ott Jeffrey J. Thuss Department of Computer Science Michigan Technological University Houghton, MI 49931 Abstract High module cohesion is often

More information

B-Tree. CS127 TAs. ** the best data structure ever

B-Tree. CS127 TAs. ** the best data structure ever B-Tree CS127 TAs ** the best data structure ever Storage Types Cache Fastest/most costly; volatile; Main Memory Fast access; too small for entire db; volatile Disk Long-term storage of data; random access;

More information

Creating Meaningful Training Data for Dicult Job Shop Scheduling Instances for Ordinal Regression

Creating Meaningful Training Data for Dicult Job Shop Scheduling Instances for Ordinal Regression Creating Meaningful Training Data for Dicult Job Shop Scheduling Instances for Ordinal Regression Helga Ingimundardóttir University of Iceland March 28 th, 2012 Outline Introduction Job Shop Scheduling

More information

Database Optimization

Database Optimization Database Optimization June 9 2009 A brief overview of database optimization techniques for the database developer. Database optimization techniques include RDBMS query execution strategies, cost estimation,

More information

Fast Bit Sort. A New In Place Sorting Technique. Nando Favaro February 2009

Fast Bit Sort. A New In Place Sorting Technique. Nando Favaro February 2009 Fast Bit Sort A New In Place Sorting Technique Nando Favaro February 2009 1. INTRODUCTION 1.1. A New Sorting Algorithm In Computer Science, the role of sorting data into an order list is a fundamental

More information

Striped Grid Files: An Alternative for Highdimensional

Striped Grid Files: An Alternative for Highdimensional Striped Grid Files: An Alternative for Highdimensional Indexing Thanet Praneenararat 1, Vorapong Suppakitpaisarn 2, Sunchai Pitakchonlasap 1, and Jaruloj Chongstitvatana 1 Department of Mathematics 1,

More information

DATA STRUCTURES/UNIT 3

DATA STRUCTURES/UNIT 3 UNIT III SORTING AND SEARCHING 9 General Background Exchange sorts Selection and Tree Sorting Insertion Sorts Merge and Radix Sorts Basic Search Techniques Tree Searching General Search Trees- Hashing.

More information

Prices and Auctions in Markets with Complex Constraints

Prices and Auctions in Markets with Complex Constraints Conference on Frontiers of Economics and Computer Science Becker-Friedman Institute Prices and Auctions in Markets with Complex Constraints Paul Milgrom Stanford University & Auctionomics August 2016 1

More information

Samuel Coolidge, Dan Simon, Dennis Shasha, Technical Report NYU/CIMS/TR

Samuel Coolidge, Dan Simon, Dennis Shasha, Technical Report NYU/CIMS/TR Detecting Missing and Spurious Edges in Large, Dense Networks Using Parallel Computing Samuel Coolidge, sam.r.coolidge@gmail.com Dan Simon, des480@nyu.edu Dennis Shasha, shasha@cims.nyu.edu Technical Report

More information