Indexing. Chapter 8, 10, 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Size: px
Start display at page:

Download "Indexing. Chapter 8, 10, 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1"

Transcription

1 Indexing Chapter 8, 10, 11 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Tree-Based Indexing The data entries are arranged in sorted order by search key value. A hierarchical search data structure (tree) is maintained that directs searches to the correct page of data entries. Tree-structured indexing techniques support both range searches and equality searches. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2

2 Trees Tree A structure with a unique starting node (the root), in which each node is capable of having child nodes and a unique path exists from the root to every other node Root The top node of a tree structure; a node with no parent Leaf Node A tree node that has no children Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3 Trees Level Distance of a node from the root Height The maximum level Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4

3 Time Complexity of Tree Operations Time complexity of Searching: O(height of the tree) Time complexity of Inserting: O(height of the tree) Time complexity of Deleting: O(height of the tree) Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 5 Multi-way Tree If we relax the restriction that each node can have only one key, we can reduce the height of the tree. A multi-way search tree is a tree in which the nodes hold between 1 to m-1 distinct keys Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 6

4 Range Searches ``Find all students with gpa > 3.0 If data is in sorted file, do binary search to find first such student, then scan to find others. Cost of binary search can be quite high. Simple idea: Create an `index file. k1 k2 kn Index File Page 1 Page 2 Page 3 Page N Data File Can do binary search on (smaller) index file! Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 7 Property of Multi-way Tree The keys in each node are sorted. A node with k values has k+1 sub-trees, where the sub-trees may be empty. The i th sub-tree of a node [v 1,..., v k ], 0 < i < k, may hold only values v in the range v i < v < v i+1 (v 0 is assumed to equal -, and v k+1 is assumed to equal + ). A m-way tree of height h has between h and m h - 1 keys. The height of a complete m-ary tree with n nodes is ceiling(log m n). Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 8

5 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 9 Searching m-way Tree We make an m-way branching decision at each node according to the number of the node s children. Searching is performed in a recursive way. Time complexity is O(h). Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 10

6 Two Popular Trees ISAM (Indexed Sequential Access Method) A static structure; B+ tree: A dynamic structure, adjusts gracefully under inserts and deletes. Leaf pages of both of them contain data entries. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 11 ISAM index entry P 0 K 1 P 1 K 2 P 2 K m P m Index file may still be quite large. But we can apply the idea repeatedly! Non-leaf Pages Leaf Pages Overflow page Primary pages Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 12

7 Comments on ISAM A multi-way tree Each tree node is a disk page; Leaf pages contain data entries. Alternative 1 : Leaf pages are created with data record with key value k; All leaf pages are allocated sequentially and sorted on the search key value. Alternative 2 or 3 : Data records are created and sorted in a separate file, and then storing <key, rid> in the leaf pages of ISAM index. Then index pages allocated, then space for overflow pages. Index entries: <search key value, page id>; they `direct search for data entries, which are in leaf pages. Data Pages Index Pages Overflow pages Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 13 Comments on ISAM Data Pages Search: Start at root; use key comparisons to go to leaf. Cost log F N ; F = # entries/index pg, N = # leaf pgs Insert: Find leaf data entry belongs to, and put it there. Delete: Find and remove from leaf; if empty overflow page, de-allocate. Index Pages Overflow pages Static tree structure: inserts/deletes affect only leaf pages. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 14

8 Example ISAM Tree Each node can hold 2 entries; no need for `next-leaf-page pointers. (Why? ->static) Example: search a record with the key value Root * 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97* Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 15 After Inserting 23*, 48*, 41*, 42*... Index Pages Root Primary Leaf Pages 10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97* Overflow Pages 23* 48* 41* 42* Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 16

9 ... Then Deleting 42*, 51*, 97* The number of primary leaf pages is fixed at file creation time STATIC. Root * 15* 20* 27* 33* 37* 40* 46* 55* 63* 23* 48* 41* Static tree structure: inserts/deletes affect only leaf pages. Note that 51* appears in index levels, but not in leaf! Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 17 Comments on ISAM Static design leads to the problem that long overflow chains could develop. To alleviate this problem, the tree is initially created so that about 20% of each page is free. Static design has the advantage that locking step is not needed since index-level pages are never modified. Static tree structure: inserts/deletes affect only leaf pages. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 18

10 B+ Trees A Dynamic Index Structure Most Widely Used Index A balanced tree in which the internal nodes direct the search and the leaf nodes contain the data entries. Tree structure grows and shrinks dynamically, leaf pages are not sequentially allocated. Leaf pages are organized in doubly linked list. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 19 B+ Tree Indexes Non-leaf Pages Leaf Pages (Sorted by search key) Leaf pages contain data entries, and are chained (prev & next) Non-leaf pages have index entries; only used to direct searches: index entry Double linked list P 0 K 1 P 1 K 2 P 2 K m P m Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 20

11 Fan-out of the Tree Fan-out of the tree: the average number of children for a non-leaf node. If every non-leaf node has n children, a tree of height h has n h leaf pages. A good approximation of the number of leaf pages, F h ( F is the average # of children, which is at least 100). Example: A tree of height 4 contains 100 million leaf pages. A binary search will take log 2 100,000,000 > 25 I/Os. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 21 B+ Trees Insert/delete at log F N cost; keep tree height-balanced. (F = fanout, N = # leaf pages) Minimum 50% occupancy (except for root). Each node contains m entries, where d <= m <= 2d. Except the root has m entries, where 1 <= m <= 2d. The parameter d is called the order of the tree. Supports equality and range-searches efficiently. Index Entries (Direct search) Data Entries ("Sequence set") Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 22

12 Example B+ Tree Root 17 Note how data entries in leaf level are sorted Entries <= 17 Entries > * 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39* Find 28*? 29*? All > 15* and < 30* Insert/delete: Find data entry in leaf, then change it. Need to adjust parent sometimes. And change sometimes bubbles up the tree Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 23 B+ Trees in Practice Typical order: d=100. Typical fill-factor: 67%. average fanout = 133 Typical capacities: Height 4: = 312,900,700 records Height 3: = 2,352,637 records Can often hold top levels in buffer pool: Level 1 = 1 page = 8 Kbytes Level 2 = 133 pages = 1 Mbyte Level 3 = 17,689 pages = 133 MBytes Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 24

13 Inserting a Data Entry into a B+ Tree Find correct leaf node L. Put data entry onto L. If L has enough space, done! Else, must split L (into L and a new node L2) Redistribute entries evenly, copy up middle key. Insert index entry pointing to L2 into parent of L. This can happen recursively To split index node, redistribute entries evenly, but push up middle key. (Contrast with leaf splits.) Splits grow tree; root split increases height. Tree growth: gets wider or one level taller at top. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 25 Inserting 8* entry into B+ Tree Root * 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* Split the full leaf node, copy up the middle key. 2* 3* 5* 7* 8* Entry to be inserted in parent node. 5 (Note that 5 iss copied up and continues to appear in the leaf.) Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 26

14 Inserting 8* into Example B+ Tree To split index node, redistribute entries evenly, but push up middle key. (Contrast with leaf splits.) appears once in the index. Contrast Entry to be inserted in parent node. (Note that 17 is pushed up and only this with a leaf split.) Observe how minimum occupancy is guaranteed in both leaf and index pg splits. Note difference between copy-up and push-up; be sure you understand the reasons for this. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 27 Example B+ Tree After Inserting 8* Root * 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* Notice that root was split, leading to increase in height. In this example, we can avoid split by re-distributing entries; however, this is usually not done in practice. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 28

15 Deleting a Data Entry from a B+ Tree Start at root, find leaf node L where entry belongs. Remove the entry. If L is at least half-full, done! If L has only d-1 entries, Try to re-distribute, borrowing from sibling (adjacent node with same parent as L). If re-distribution fails, merge L and sibling. If merge occurred, must delete entry (pointing to L or sibling) from parent of L. Merge could propagate to root, decreasing height. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 29 Example Tree After (Inserting 8*, Then) Deleting 19* Root * 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* Root * 3* 5* 7* 8* 14* 16* 20* 22* 24* 27* 29* 33* 34* 38* 39* Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 30

16 Example Tree After (Inserting 8*, Deleting 19*, then) Deleting 20*... Root * 3* 5* 7* 8* 14* 16* 20* 22* 24* 27* 29* 33* 34* 38* 39* Root * 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39* Deleting 20* is done with re-distribution. Notice how middle key is copied up. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke And Then Deleting 24* Must merge. Observe `toss of index entry * 27* 29* 33* 34* 38* 39* Root * 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39* Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 32

17 ... And Then Deleting 24* Merge recursively. Observe `pull down of index entry Root * 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39* Root * 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39* Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 33 Example of Non-leaf Re-distribution Tree is shown below during deletion of 24*. In contrast to previous example, can re-distribute entry from left child of root to right child. Root * 3* 5* 7* 8* 14* 16* 17* 18* 20* 21* 22* 27* 29* 33* 34* 38* 39* Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 34

18 After Re-distribution Intuitively, entries are re-distributed by `pushing through the splitting entry in the parent node. It suffices to re-distribute index entry with key 20; we ve re-distributed 17 as well for illustration. Root * 3* 5* 7* 8* 14* 16* 17* 18* 20* 21* 22* 27* 29* 33* 34* 38* 39* Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 35 Bulk Loading of a B+ Tree If we have a large collection of records, and we want to create a B+ tree on some field, doing so by repeatedly inserting records is very slow. Bulk Loading can be done much more efficiently. Initialization: Sort all data entries, insert pointer to first (leaf) page in a new (root) page. Root Sorted pages of data entries; not yet in B+ tree 3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44* Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 36

19 Bulk Loading (Contd.) Index entries for leaf pages always entered into rightmost index page just above leaf level. When this fills up, it splits. (Split may go up right-most path to the root.) Much faster than repeated inserts, especially when one considers locking! 6 Root Data entry pages not yet in B+ tree 3* 4* 6* 9* 10*11* 12*13* 20*22* 23* 31* 35*36* 38*41* 44* 6 3* 4* 6* 9* 10* 11* 12* 13* 20*22* 23* 31* 35*36* 38*41* 44* Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke Root Data entry pages not yet in B+ tree Summary of Bulk Loading Option 1: multiple inserts. Slow. Does not give sequential storage of leaves. Option 2: Bulk Loading Has advantages for concurrency control. Fewer I/Os during build. Leaves will be stored sequentially (and linked, of course). Can control fill factor on pages. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 38

20 A Note on `Order Order (d) concept replaced by physical space criterion in practice (`at least half-full ). Index pages can typically hold many more entries than leaf pages. Variable sized records and search keys mean differnt nodes will contain different numbers of entries. Even with fixed length fields, multiple records with the same search key value (duplicates) can lead to variable-sized data entries (if we use Alternative (3)). Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 39 Summary Tree-structured indexes are ideal for rangesearches, also good for equality searches. ISAM is a static structure. Only leaf pages modified; overflow pages needed. Overflow chains can degrade performance unless size of data set and data distribution stay constant. B+ tree is a dynamic structure. Inserts/deletes leave tree height-balanced; log F N cost. High fanout (F) means depth rarely more than 3 or 4. Almost always better than maintaining a sorted file. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 40

21 Summary (Contd.) Typically, 67% occupancy on average. Usually preferable to ISAM, modulo locking considerations; adjusts to growth gracefully. If data entries are data records, splits can change rids! Key compression increases fanout, reduces height. Bulk loading can be much faster than repeated inserts for creating a B+ tree on a large data set. Most widely used index in database management systems because of its versatility. One of the most optimized components of a DBMS. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 41 Comparing File Organizations A collection of employee records with composite search key <age, sal> Heap files (random order; insert at eof) Sorted files, sorted on <age, sal> Clustered B+ tree file, Alternative (1), search key <age, sal> Heap file with unclustered B + tree index on search key <age, sal> Heap file with unclustered hash index on search key <age, sal> Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 42

22 Operations to Compare Scan: Fetch all records from disk Search with Equality search Search with Range selection Insert a record Delete a record Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 43 Cost Model for Our Analysis We ignore CPU costs, for simplicity: B: The number of data pages R: Number of records per page D: (Average) time to read or write a disk page 15 milliseconds C: (Average) time to process a record (e.g., comparing a field value to a selection constant) 100 nanoseconds H: the time to apply the hash function to a record. 100 nanoseconds F: Fan-out of a index tree. 100 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 44

23 Cost Model We expect the cost of I/O to dominate. Disk speeds are not increasing at a similar pace as CPU speeds rises. Measuring number of page I/O s ignores gains of pre-fetching a sequence of pages; thus, even I/O cost is only approximated. Average-case analysis; based on several simplistic assumptions. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 45 Assumptions in Our Analysis Heap Files: Equality selection on key; exactly one match. Sorted Files: Files compacted after deletions. Indexes: Alt (2), (3): data entry size = 10% size of record Hash: No overflow buckets. 80% page occupancy => File size = 1.25 data size Tree: 67% occupancy (this is typical). Implies file size = 1.5 data size Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 46

24 Assumptions (contd.) Scans: Leaf levels of a tree-index are chained. Index data-entries plus actual file scanned for unclustered indexes. Range searches: We use tree indexes to restrict the set of data records fetched, but ignore hash indexes. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 47 Heap Files Scan: B(D+RC) Search with Equality Selection: 0.5B(D+RC) Average case of linear search Search with Range Selection: B(D+RC) Insert: 2D+C Insert at the end of file Delete: Searching Cost +(D+C) Search the page through rid <page #, slot #> 2D+C Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 48

25 Sorted Files Scan: B(D+RC) Search with Equality Selection: Dlog 2 B + Clog 2 R Search with Range Selection: The cost of search + the cost of retrieving the set of records. The cost of search includes fetching first page. Insert: Searching cost + 2*0.5B(D+RC) Delete: Searching cost + 2*0.5B(D+RC) Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 49 Clustered Files with B+ Tree Scan: 1.5B(D+RC) Search with Equality Selection: Dlog F 1.5B + Clog 2 R Search with Range Selection: Similar to Search with many qualifying records Insert/Delete: Dlog F 1.5B + Clog 2 R+D Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 50

26 Heap File with Un-clustered Tree Index The number of leaf pages is 0.1*1.5B = 0.15B. The number of data entries on a page is 10 *0.67R = 6.7R. Scan: 0.15B(D+6.7RC) I/Os + BR(D+C) or 4B (sorting) Data Entries Cost : 0.15B(D+6.7RC) Fetching records (one I/O per record): BR(D+C) or Just sort records : 4B Search with Equality Selection: Dlog F 0.15B + Clog 2 6.7R+D Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 51 Heap File with Un-clustered Tree Index Search with Range Selection: Similar to Search with many qualifying records - one I/O per each qualifying record If 10% of data records qualify, better sort the data file. Insert: Insert the record in heap file: 2D+C Insert the data entry in the index: Dlog F 0.15B + Clog 2 6.7R+D Delete: Dlog F 0.15B + Clog 2 6.7R+2D Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 52

27 Heap File with Un-clustered Hash Index The number of leaf pages is 1.25*0.1B = 0.125B The number of data entries on a page is 10*0.8R=8R. Scan: 0.125B(D+8RC) I/Os + BR(D+C). Data entries: 0.125B(D+8RC) Fetching each record: BR(D+C) Search with Equality Selection: H+2D+4RC Hashing cost: H Fetching the data entry page: D Search the data entry page: 0.5 *8RC= 4RC Fetching the data record page: D Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 53 Heap File with Un-clustered Hash Index Search with Range Selection: B(D+RC) Scan the whole heap file. Insert: 2D+C plus H+2D+C Insert the data record in heap file: 2D+C Update the hash index : H+2D+C Delete: H+2D+4RC plus 2D. Hashing cost: H Fetching the data entry page: D Search the data entry page: 0.5 *8RC= 4RC Fetching the data record page: D Update both data entry page and data record page: 2D Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 54

28 Cost of Operations (a) Scan (b) Equality (c ) Range (d) Insert (e) Delete (1) Heap BD 0.5BD BD 2D Search +D (2) Sorted BD Dlog 2B D(log 2 B + # pgs with match recs) Search + BD Search +BD (3) Clustered (4) Unclust. Tree index (5) Unclust. Hash index 1.5BD Dlog F 1.5B D(log F 1.5B + # pgs w. match recs) BD(R+0.15) D(1 + log F 0.15B) D(log F 0.15B + # pgs w. match recs) Search + D Search + 2D BD(R+0.125) 2D BD Search + 2D Several assumptions underlie these (rough) estimates! Search +D Search + 2D Search + 2D Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 55 Summary Heap file: Good storage efficiency, fast scan, slow search and deletion. Sorted file: Good storage efficiency, slow insertion and deletion. Searching is faster than heap file. Clustered file: as good as sorted file plus efficient insertion and deletion. Space overhead. Un-clustered file: fast search, insertion, deletion. Slow scan and range searching. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 56

29 Impact of the Workload An index supports efficient retrieval of data entries that satisfy a given selection condition. Hash-based indexing are optimized for equality selections and poor on range selection. Tree-based indexing support both efficiently. Both of them support inserts, deletes, and updates quite efficiently. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 57 Impact of the Workload B+ tree has two important advantages over sorted files Handle inserts and deletes of data entries efficiently. Finding the correct leaf page when searching for a record by search key value is much faster than binary search in a sorted file. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 58

30 Understanding the Workload For each query in the workload: Which relations does it access? Which attributes are retrieved? Which attributes are involved in selection/join conditions? How selective are these conditions likely to be? For each update in the workload: Which attributes are involved in selection/join conditions? How selective are these conditions likely to be? The type of update (INSERT/DELETE/UPDATE), and the attributes that are affected. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 59 Choice of Indexes What indexes should we create? Which relations should have indexes? What field(s) should be the search key? Should we build several indexes? For each index, what kind of an index should it be? Clustered? Hash/tree? Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 60

31 Choice of Indexes Before creating an index, must also consider the impact on updates in the workload! Trade-off: Indexes can make queries go faster, updates slower. Require disk space, too. Using indexes on tables that are frequently updated can result in poor performance. Indexes should be used on tables whose data does not change frequently but is used a lot in queries. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 61 Choice of Indexes One approach: Consider the most important queries in turn. Consider the best plan using the current indexes, and see if a better plan is possible with an additional index. If so, create it. Try to choose indexes that benefit as many queries as possible. Since only one index can be clustered per relation, choose it based on important queries that would benefit the most from clustering. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 62

32 Index Selection Guidelines Attributes in WHERE clause are candidates for index keys. Exact match condition suggests hash index. Range query suggests tree index. Clustering is especially useful for range queries; can also help on equality queries if there are many duplicates. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 63 Examples of Clustered Indexes B+ tree index on E.age can be used to get qualifying tuples. Alternative way is a sorted file on E.age. Considering: How selective is the condition? Is the index clustered? SELECT E.dno FROM Emp E WHERE E.age>40; Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 64

33 Examples of Clustered Indexes Consider the GROUP BY query. If many tuples have E.age > 10, using E.age index and sorting the retrieved tuples may be costly. Clustered E.dno index may be better since sorting is expensive. SELECT E.dno, COUNT (*) FROM Emp E WHERE E.age>10 GROUP BY E.dno; Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 65 Examples of Clustered Indexes If an index on a search key that does not include a candidate key, clustering is important. Equality queries and duplicates: Clustering on E.hobby helps! SELECT E.dno FROM Emp E WHERE E.hobby= Stamps ; Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 66

34 Index Selection Guidelines Composite search keys should be considered when a WHERE clause contains several conditions. Hash index works for equality conditions on every field. Tree index works for equality or range condition on a prefix of the composite search key. So order of attributes is important for range queries. Such indexes can sometimes enable index-only strategies for important queries. For index-only strategies, clustering is not important! Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 67 Indexes with Composite Search Keys Examples of composite key indexes using lexicographic order. Equality query: <sal,age> index: age=20 and sal =75 Data entries in index sorted by search key to support range queries. Lexicographic order, or Spatial order. 11,80 12,10 12,20 13,75 <age, sal> 10,12 20,12 75,13 80,11 <sal, age> name age sal bob 12 cal Data entries in index sorted by <sal,age> 11 joe sue Data records sorted by name <age> <sal> Data entries sorted by <sal> Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 68

35 Composite Search Keys To retrieve Emp records with age=30 AND sal=4000, an index on <age,sal> would be better than an index on age or an index on sal. If condition is: 20<age<30 AND 3000<sal<5000: Clustered tree index on <age,sal> or <sal,age> is best. If condition is: age=30 AND 3000<sal<5000: Clustered <age,sal> index much better than <sal,age> index! Composite indexes are larger, updated more often. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 69 Index-Only Evaluation Plan A: sort Employees on E.dno to compute the count. Plan B: if an index on E.dno is available, the query could be answered by scanning only the index. SELECT E.dno, COUNT(*) FROM Emp E GROUP BY E.dno; Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 70

36 Index-Only Plans A composite B+ tree index on <age,sal> could answer the query with an index-only scan. SELECT AVG(E.sal) FROM Emp E WHERE E.age=25 AND E.sal BETWEEN 3000 AND 5000 A composite B+ tree index on <dno, sal> could answer this query with an index-only scan SELECT E.dno, MIN(E.sal) FROM Emp E GROUP BY E.dno Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 71 Create Index Basic Syntax Indexes can be defined in two ways At the time of table creation After table has been created. Example schemas Sailors (sid: integer, sname: string, rating: integer, age: real) Boats (bid: integer, bname: string, color: string) Reserves (sid: integer, bid: integer, day: dates) Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 72

37 Example For Sailor table, we expect lots of searches to the database on Sailor id, which is the primary key. CREATE TABLE Sailors (sid INTEGER NOT NULL AUTO_INCREMENT, sname CHAR(30) NOT NULL, rating INTEGER, age REAL, CONSTRAINT StudentsKey PRIMARY KEY (sid) USING BTREE, CHECK (rating >=1 AND rating<=10)) *The primary key of the table have already been indexed by MySql. Its name is PRIMARY KEY. This index is the clustered index. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 73 Example For Sailor table, we could also create an index on sname when we create the table. CREATE TABLE Sailors (sid INTEGER NOT NULL AUTO_INCREMENT, sname CHAR(30) NOT NULL, rating INTEGER, age REAL, CONSTRAINT StudentsKey PRIMARY KEY (sid) USING BTREE, CHECK (rating >=1 AND rating<=10), INDEX sname_index (sname) USING HASH ) Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 74

38 Example Later, we found people search a lot with sailors names and ages. CREATE TABLE Sailors (sid INTEGER NOT NULL AUTO_INCREMENT, sname CHAR(30) NOT NULL, rating INTEGER, age REAL, CONSTRAINT StudentsKey PRIMARY KEY (sid), CHECK (rating >=1 AND rating<=10)); CREATE INDEX sname_age_index ON Sailors(sname, age) USING BTREE ; Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 75 Index Status View the indexes defined on a particular table SHOW INDEX FROM Sailors; Sometimes, to drop an index to improve performance. DROP INDEX sname_index FROM Sailors; Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 76

39 Summary Many alternative file organizations exist, each appropriate in some situation. If selection queries are frequent, sorting the file or building an index is important. Hash-based indexes only good for equality search. Sorted files and tree-based indexes best for range search; also good for equality search. (Files rarely kept sorted in practice; B+ tree index is better.) Index is a collection of data entries plus a way to quickly find entries with given key values. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 77 Summary (Contd.) Data entries can be actual data records, <key, rid> pairs, or <key, rid-list> pairs. Choice orthogonal to indexing technique used to locate data entries with a given key value. Can have several indexes on a given file of data records, each with a different search key. Indexes can be classified as clustered vs. unclustered, primary vs. secondary, and dense vs. sparse. Differences have important consequences for utility/performance. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 78

Overview of Storage and Indexing

Overview of Storage and Indexing Overview of Storage and Indexing Chapter 8 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Data on External Storage Disks: Can retrieve random page at fixed cost But reading several consecutive

More information

Tree-Structured Indexes

Tree-Structured Indexes Tree-Structured Indexes Chapter 9 Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Introduction As for any index, 3 alternatives for data entries k*: Data record with key value k

More information

Tree-Structured Indexes ISAM. Range Searches. Comments on ISAM. Example ISAM Tree. Introduction. As for any index, 3 alternatives for data entries k*:

Tree-Structured Indexes ISAM. Range Searches. Comments on ISAM. Example ISAM Tree. Introduction. As for any index, 3 alternatives for data entries k*: Introduction Tree-Structured Indexes Chapter 10 As for any index, 3 alternatives for data entries k*: Data record with key value k

More information

Overview of Storage and Indexing

Overview of Storage and Indexing Overview of Storage and Indexing Chapter 8 How index-learning turns no student pale Yet holds the eel of science by the tail. -- Alexander Pope (1688-1744) Database Management Systems 3ed, R. Ramakrishnan

More information

Tree-Structured Indexes

Tree-Structured Indexes Tree-Structured Indexes Chapter 10 Comp 521 Files and Databases Fall 2010 1 Introduction As for any index, 3 alternatives for data entries k*: index refers to actual data record with key value k index

More information

The use of indexes. Iztok Savnik, FAMNIT. IDB, Indexes

The use of indexes. Iztok Savnik, FAMNIT. IDB, Indexes The use of indexes Iztok Savnik, FAMNIT Slides & Textbook Textbook: Raghu Ramakrishnan, Johannes Gehrke, Database Management Systems, McGraw-Hill, 3 rd ed., 2007. Slides: From Cow Book : R.Ramakrishnan,

More information

Overview of Storage and Indexing

Overview of Storage and Indexing Overview of Storage and Indexing Yanlei Diao UMass Amherst Feb 21, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke 1 DBMS Architecture Query Parser Query Rewriter Query Optimizer Query Executor Lock

More information

Overview of Storage and Indexing. Data on External Storage

Overview of Storage and Indexing. Data on External Storage Overview of Storage and Indexing Chapter 8 How index-learning turns no student pale Yet holds the eel of science by the tail. -- Alexander Pope (1688-1744) Database Management Systems 3ed, R. Ramakrishnanand

More information

Tree-Structured Indexes

Tree-Structured Indexes Tree-Structured Indexes Chapter 10 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke Introduction As for any index, 3 alternatives for data entries k*: Data record with key value k

More information

Why Is This Important? Overview of Storage and Indexing. Components of a Disk. Data on External Storage. Accessing a Disk Page. Records on a Disk Page

Why Is This Important? Overview of Storage and Indexing. Components of a Disk. Data on External Storage. Accessing a Disk Page. Records on a Disk Page Why Is This Important? Overview of Storage and Indexing Chapter 8 DB performance depends on time it takes to get the data from storage system and time to process Choosing the right index for faster access

More information

Tree-Structured Indexes. Chapter 10

Tree-Structured Indexes. Chapter 10 Tree-Structured Indexes Chapter 10 1 Introduction As for any index, 3 alternatives for data entries k*: Data record with key value k 25, [n1,v1,k1,25] 25,

More information

Introduction. Choice orthogonal to indexing technique used to locate entries K.

Introduction. Choice orthogonal to indexing technique used to locate entries K. Tree-Structured Indexes Werner Nutt Introduction to Database Systems Free University of Bozen-Bolzano 2 Introduction As for any index, three alternatives for data entries K : Data record with key value

More information

Tree-Structured Indexes

Tree-Structured Indexes Tree-Structured Indexes Chapter 9 Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Introduction As for any index, 3 alternatives for data entries k*: ➀ Data record with key value k ➁

More information

Tree-Structured Indexes

Tree-Structured Indexes Introduction Tree-Structured Indexes Chapter 10 As for any index, 3 alternatives for data entries k*: Data record with key value k

More information

Principles of Data Management. Lecture #5 (Tree-Based Index Structures)

Principles of Data Management. Lecture #5 (Tree-Based Index Structures) Principles of Data Management Lecture #5 (Tree-Based Index Structures) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s Headlines v Project

More information

Tree-Structured Indexes

Tree-Structured Indexes Tree-Structured Indexes CS 186, Fall 2002, Lecture 17 R & G Chapter 9 If I had eight hours to chop down a tree, I'd spend six sharpening my ax. Abraham Lincoln Introduction Recall: 3 alternatives for data

More information

Overview of Storage and Indexing

Overview of Storage and Indexing Overview of Storage and Indexing Chapter 8 How index-learning turns no student pale Yet holds the eel of science by the tail. -- Alexander Pope (1688-1744) Database Management Systems 3ed, R. Ramakrishnan

More information

Review of Storage and Indexing

Review of Storage and Indexing Review of Storage and Indexing CMPSCI 591Q Sep 17, 2007 Slides adapted from those of R. Ramakrishnan and J. Gehrke 1 File organizations & access methods Many alternatives exist, each ideal for some situations,

More information

Tree-Structured Indexes. A Note of Caution. Range Searches ISAM. Example ISAM Tree. Introduction

Tree-Structured Indexes. A Note of Caution. Range Searches ISAM. Example ISAM Tree. Introduction Tree-Structured Indexes Lecture R & G Chapter 9 If I had eight hours to chop down a tree, I'd spend six sharpening my ax. Abraham Lincoln Introduction Recall: 3 alternatives for data entries k*: Data record

More information

Tree-Structured Indexes

Tree-Structured Indexes Tree-Structured Indexes Yanlei Diao UMass Amherst Slides Courtesy of R. Ramakrishnan and J. Gehrke Access Methods v File of records: Abstraction of disk storage for query processing (1) Sequential scan;

More information

Indexes. File Organizations and Indexing. First Question to Ask About Indexes. Index Breakdown. Alternatives for Data Entries (Contd.

Indexes. File Organizations and Indexing. First Question to Ask About Indexes. Index Breakdown. Alternatives for Data Entries (Contd. File Organizations and Indexing Lecture 4 R&G Chapter 8 "If you don't find it in the index, look very carefully through the entire catalogue." -- Sears, Roebuck, and Co., Consumer's Guide, 1897 Indexes

More information

Announcements. Reading Material. Recap. Today 9/17/17. Storage (contd. from Lecture 6)

Announcements. Reading Material. Recap. Today 9/17/17. Storage (contd. from Lecture 6) CompSci 16 Intensive Computing Systems Lecture 7 Storage and Index Instructor: Sudeepa Roy Announcements HW1 deadline this week: Due on 09/21 (Thurs), 11: pm, no late days Project proposal deadline: Preliminary

More information

Administrivia. Tree-Structured Indexes. Review. Today: B-Tree Indexes. A Note of Caution. Introduction

Administrivia. Tree-Structured Indexes. Review. Today: B-Tree Indexes. A Note of Caution. Introduction Administrivia Tree-Structured Indexes Lecture R & G Chapter 9 Homeworks re-arranged Midterm Exam Graded Scores on-line Key available on-line If I had eight hours to chop down a tree, I'd spend six sharpening

More information

Storage and Indexing

Storage and Indexing CompSci 516 Data Intensive Computing Systems Lecture 5 Storage and Indexing Instructor: Sudeepa Roy Duke CS, Spring 2016 CompSci 516: Data Intensive Computing Systems 1 Announcement Homework 1 Due on Feb

More information

Introduction to Data Management. Lecture 15 (More About Indexing)

Introduction to Data Management. Lecture 15 (More About Indexing) Introduction to Data Management Lecture 15 (More About Indexing) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Announcements v HW s and quizzes:

More information

Tree-Structured Indexes (Brass Tacks)

Tree-Structured Indexes (Brass Tacks) Tree-Structured Indexes (Brass Tacks) Chapter 10 Ramakrishnan and Gehrke (Sections 10.3-10.8) CPSC 404, Laks V.S. Lakshmanan 1 What will I learn from this set of lectures? How do B+trees work (for search)?

More information

INDEXES MICHAEL LIUT DEPARTMENT OF COMPUTING AND SOFTWARE MCMASTER UNIVERSITY

INDEXES MICHAEL LIUT DEPARTMENT OF COMPUTING AND SOFTWARE MCMASTER UNIVERSITY INDEXES MICHAEL LIUT (LIUTM@MCMASTER.CA) DEPARTMENT OF COMPUTING AND SOFTWARE MCMASTER UNIVERSITY SE 3DB3 (Slides adapted from Dr. Fei Chiang) Fall 2016 An Index 2 Data structure that organizes records

More information

Lecture 8 Index (B+-Tree and Hash)

Lecture 8 Index (B+-Tree and Hash) CompSci 516 Data Intensive Computing Systems Lecture 8 Index (B+-Tree and Hash) Instructor: Sudeepa Roy Duke CS, Fall 2017 CompSci 516: Database Systems 1 HW1 due tomorrow: Announcements Due on 09/21 (Thurs),

More information

Single Record and Range Search

Single Record and Range Search Database Indexing 8 Single Record and Range Search Single record retrieval: Find student name whose Age = 20 Range queries: Find all students with Grade > 8.50 Sequentially scanning of file is costly If

More information

CompSci 516: Database Systems

CompSci 516: Database Systems CompSci 516 Database Systems Lecture 9 Index Selection and External Sorting Instructor: Sudeepa Roy Duke CS, Fall 2017 CompSci 516: Database Systems 1 Announcements Private project threads created on piazza

More information

Introduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana

Introduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana Introduction to Indexing 2 Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana Indexed Sequential Access Method We have seen that too small or too large an index (in other words too few or too

More information

Data on External Storage

Data on External Storage Advanced Topics in DBMS Ch-1: Overview of Storage and Indexing By Syed khutubddin Ahmed Assistant Professor Dept. of MCA Reva Institute of Technology & mgmt. Data on External Storage Prg1 Prg2 Prg3 DBMS

More information

Announcements. Reading Material. Today. Different File Organizations. Selection of Indexes 9/24/17. CompSci 516: Database Systems

Announcements. Reading Material. Today. Different File Organizations. Selection of Indexes 9/24/17. CompSci 516: Database Systems CompSci 516 Database Systems Lecture 9 Index Selection and External Sorting Announcements Private project threads created on piazza Please use these threads (and not emails) for all communications on your

More information

Spring 2017 B-TREES (LOOSELY BASED ON THE COW BOOK: CH. 10) 1/29/17 CS 564: Database Management Systems, Jignesh M. Patel 1

Spring 2017 B-TREES (LOOSELY BASED ON THE COW BOOK: CH. 10) 1/29/17 CS 564: Database Management Systems, Jignesh M. Patel 1 Spring 2017 B-TREES (LOOSELY BASED ON THE COW BOOK: CH. 10) 1/29/17 CS 564: Database Management Systems, Jignesh M. Patel 1 Consider the following table: Motivation CREATE TABLE Tweets ( uniquemsgid INTEGER,

More information

Kathleen Durant PhD Northeastern University CS Indexes

Kathleen Durant PhD Northeastern University CS Indexes Kathleen Durant PhD Northeastern University CS 3200 Indexes Outline for the day Index definition Types of indexes B+ trees ISAM Hash index Choosing indexed fields Indexes in InnoDB 2 Indexes A typical

More information

Overview of Indexing. Chapter 8 Part II. A glimpse at indices and workloads

Overview of Indexing. Chapter 8 Part II. A glimpse at indices and workloads Overview of Indexing Chapter 8 Part II. A glimpse at indices and workloads 1 Understanding the Workload For each query in workload: Which relations does it access? Which attributes are retrieved? Which

More information

Modern Database Systems Lecture 1

Modern Database Systems Lecture 1 Modern Database Systems Lecture 1 Aristides Gionis Michael Mathioudakis T.A.: Orestis Kostakis Spring 2016 logistics assignment will be up by Monday (you will receive email) due Feb 12 th if you re not

More information

CS 443 Database Management Systems. Professor: Sina Meraji

CS 443 Database Management Systems. Professor: Sina Meraji CS 443 Database Management Systems Professor: Sina Meraji jdu@cs.toronto.edu Logistics Instructor: Sina Meraji Email: sina.mrj@gmail.com Office hours: Mondays 17-18 pm(by appointment) TAs: Location: BA3219

More information

Lecture 13. Lecture 13: B+ Tree

Lecture 13. Lecture 13: B+ Tree Lecture 13 Lecture 13: B+ Tree Lecture 13 Announcements 1. Project Part 2 extension till Friday 2. Project Part 3: B+ Tree coming out Friday 3. Poll for Nov 22nd 4. Exam Pickup: If you have questions,

More information

Overview of Storage and Indexing

Overview of Storage and Indexing Overview of Storage and Indexing Chapter 8 Instructor: Vladimir Zadorozhny vladimir@sis.pitt.edu Information Science Program School of Information Sciences, University of Pittsburgh 1 Data on External

More information

Introduction to Data Management. Lecture 21 (Indexing, cont.)

Introduction to Data Management. Lecture 21 (Indexing, cont.) Introduction to Data Management Lecture 21 (Indexing, cont.) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Announcements v Midterm #2 grading

More information

Department of Computer Science University of Cyprus EPL446 Advanced Database Systems. Lecture 6. B+ Trees: Structure and Functions

Department of Computer Science University of Cyprus EPL446 Advanced Database Systems. Lecture 6. B+ Trees: Structure and Functions Department of Computer Science University of Cyprus EPL446 Advanced Database Systems Lecture 6 B+ Trees: Structure and Functions Chapt. 10.3-10.8: Ramakrishnan & Gehrke Demetris Zeinalipour http://www.cs.ucy.ac.cy/~dzeina/courses/epl446

More information

Lecture 34 11/30/15. CMPSC431W: Database Management Systems. Instructor: Yu- San Lin

Lecture 34 11/30/15. CMPSC431W: Database Management Systems. Instructor: Yu- San Lin CMPSC431W: Database Management Systems Lecture 34 11/30/15 Instructor: Yu- San Lin yusan@psu.edu Course Website: hcp://www.cse.psu.edu/~yul189/cmpsc431w Slides based on McGraw- Hill & Dr. Wang- Chien Lee

More information

Review: Memory, Disks, & Files. File Organizations and Indexing. Today: File Storage. Alternative File Organizations. Cost Model for Analysis

Review: Memory, Disks, & Files. File Organizations and Indexing. Today: File Storage. Alternative File Organizations. Cost Model for Analysis File Organizations and Indexing Review: Memory, Disks, & Files Lecture 4 R&G Chapter 8 "If you don't find it in the index, look very carefully through the entire catalogue." -- Sears, Roebuck, and Co.,

More information

Chapter 1: overview of Storage & Indexing, Disks & Files:

Chapter 1: overview of Storage & Indexing, Disks & Files: Chapter 1: overview of Storage & Indexing, Disks & Files: 1.1 Data on External Storage: DBMS stores vast quantities of data, and the data must persist across program executions. Therefore, data is stored

More information

ACCESS METHODS: FILE ORGANIZATIONS, B+TREE

ACCESS METHODS: FILE ORGANIZATIONS, B+TREE ACCESS METHODS: FILE ORGANIZATIONS, B+TREE File Storage How to keep blocks of records on disk files but must support operations: scan all records search for a record id ( RID ) insert new records delete

More information

Goals for Today. CS 133: Databases. Example: Indexes. I/O Operation Cost. Reason about tradeoffs between clustered vs. unclustered tree indexes

Goals for Today. CS 133: Databases. Example: Indexes. I/O Operation Cost. Reason about tradeoffs between clustered vs. unclustered tree indexes Goals for Today CS 3: Databases Fall 2018 Lec 09/18 Tree-based Indexes Prof. Beth Trushkowsky Reason about tradeoffs between clustered vs. unclustered tree indexes Understand the difference and tradeoffs

More information

CSIT5300: Advanced Database Systems

CSIT5300: Advanced Database Systems CSIT5300: Advanced Database Systems L08: B + -trees and Dynamic Hashing Dr. Kenneth LEUNG Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong SAR,

More information

Context. File Organizations and Indexing. Cost Model for Analysis. Alternative File Organizations. Some Assumptions in the Analysis.

Context. File Organizations and Indexing. Cost Model for Analysis. Alternative File Organizations. Some Assumptions in the Analysis. File Organizations and Indexing Context R&G Chapter 8 "If you don't find it in the index, look very carefully through the entire catalogue." -- Sears, Roebuck, and Co., Consumer's Guide, 1897 Query Optimization

More information

Data Management for Data Science

Data Management for Data Science Data Management for Data Science Database Management Systems: Access file manager and query evaluation Maurizio Lenzerini, Riccardo Rosati Dipartimento di Ingegneria informatica automatica e gestionale

More information

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See  for conditions on re-use Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files Static

More information

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15 Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2014/15 Lecture II: Indexing Part I of this course Indexing 3 Database File Organization and Indexing Remember: Database tables

More information

Step 4: Choose file organizations and indexes

Step 4: Choose file organizations and indexes Step 4: Choose file organizations and indexes Asst. Prof. Dr. Kanda Saikaew (krunapon@kku.ac.th) Dept of Computer Engineering Khon Kaen University Overview How to analyze users transactions to determine

More information

ECS 165B: Database System Implementa6on Lecture 7

ECS 165B: Database System Implementa6on Lecture 7 ECS 165B: Database System Implementa6on Lecture 7 UC Davis April 12, 2010 Acknowledgements: por6ons based on slides by Raghu Ramakrishnan and Johannes Gehrke. Class Agenda Last 6me: Dynamic aspects of

More information

Chapter 12: Indexing and Hashing (Cnt(

Chapter 12: Indexing and Hashing (Cnt( Chapter 12: Indexing and Hashing (Cnt( Cnt.) Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

Extra: B+ Trees. Motivations. Differences between BST and B+ 10/27/2017. CS1: Java Programming Colorado State University

Extra: B+ Trees. Motivations. Differences between BST and B+ 10/27/2017. CS1: Java Programming Colorado State University Extra: B+ Trees CS1: Java Programming Colorado State University Slides by Wim Bohm and Russ Wakefield 1 Motivations Many times you want to minimize the disk accesses while doing a search. A binary search

More information

Overview of Storage and Indexing

Overview of Storage and Indexing Overview of Storage and Indexing UVic C SC 370 Dr. Daniel M. German Department of Computer Science July 2, 2003 Version: 1.1.1 7 1 Overview of Storage and Indexing (1.1.1) CSC 370 dmgerman@uvic.ca Overview

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

I think that I shall never see A billboard lovely as a tree. Perhaps unless the billboards fall I ll never see a tree at all.

I think that I shall never see A billboard lovely as a tree. Perhaps unless the billboards fall I ll never see a tree at all. 9 TREE-STRUCTURED INDEXING I think that I shall never see A billboard lovely as a tree. Perhaps unless the billboards fall I ll never see a tree at all. Ogden Nash, Song of the Open Road We now consider

More information

Lecture #16 (Physical DB Design)

Lecture #16 (Physical DB Design) Introduction to Data Management Lecture #16 (Physical DB Design) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Announcements v Homework info:

More information

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing. Basic Concepts Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition

More information

Data Organization B trees

Data Organization B trees Data Organization B trees Data organization and retrieval File organization can improve data retrieval time SELECT * FROM depositors WHERE bname= Downtown 100 blocks 200 recs/block Query returns 150 records

More information

THE B+ TREE INDEX. CS 564- Spring ACKs: Jignesh Patel, AnHai Doan

THE B+ TREE INDEX. CS 564- Spring ACKs: Jignesh Patel, AnHai Doan THE B+ TREE INDEX CS 564- Spring 2018 ACKs: Jignesh Patel, AnHai Doan WHAT IS THIS LECTURE ABOUT? The B+ tree index Basics Search/Insertion/Deletion Design & Cost 2 INDEX RECAP We have the following query:

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

Introduction to Data Management. Lecture #17 (Physical DB Design!)

Introduction to Data Management. Lecture #17 (Physical DB Design!) Introduction to Data Management Lecture #17 (Physical DB Design!) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Announcements v Homework info:

More information

CAS CS 460/660 Introduction to Database Systems. File Organization and Indexing

CAS CS 460/660 Introduction to Database Systems. File Organization and Indexing CAS CS 460/660 Introduction to Database Systems File Organization and Indexing Slides from UC Berkeley 1.1 Review: Files, Pages, Records Abstraction of stored data is files of records. Records live on

More information

Chapter 11: Indexing and Hashing" Chapter 11: Indexing and Hashing"

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing" Database System Concepts, 6 th Ed.! Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use " Chapter 11: Indexing and Hashing" Basic Concepts!

More information

CS330. Some Logistics. Three Topics. Indexing, Query Processing, and Transactions. Next two homework assignments out today Extra lab session:

CS330. Some Logistics. Three Topics. Indexing, Query Processing, and Transactions. Next two homework assignments out today Extra lab session: CS330 Indexing, Query Processing, and Transactions 1 Some Logistics Next two homework assignments out today Extra lab session: This Thursday, after class, in this room Bring your laptop fully charged Extra

More information

Introduction to Data Management. Lecture 14 (Storage and Indexing)

Introduction to Data Management. Lecture 14 (Storage and Indexing) Introduction to Data Management Lecture 14 (Storage and Indexing) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Announcements v HW s and quizzes:

More information

CSIT5300: Advanced Database Systems

CSIT5300: Advanced Database Systems CSIT5300: Advanced Database Systems L11: Physical Database Design Dr. Kenneth LEUNG Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong SAR, China

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

Physical Level of Databases: B+-Trees

Physical Level of Databases: B+-Trees Physical Level of Databases: B+-Trees Adnan YAZICI Computer Engineering Department METU (Fall 2005) 1 B + -Tree Index Files l Disadvantage of indexed-sequential files: performance degrades as file grows,

More information

Final Exam Review. Kathleen Durant CS 3200 Northeastern University Lecture 22

Final Exam Review. Kathleen Durant CS 3200 Northeastern University Lecture 22 Final Exam Review Kathleen Durant CS 3200 Northeastern University Lecture 22 Outline for today Identify topics for the final exam Discuss format of the final exam What will be provided for you and what

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Database System Concepts, 5th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

Introduction to Data Management. Lecture #13 (Indexing)

Introduction to Data Management. Lecture #13 (Indexing) Introduction to Data Management Lecture #13 (Indexing) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Announcements v Homework info: HW #5 (SQL):

More information

Topics to Learn. Important concepts. Tree-based index. Hash-based index

Topics to Learn. Important concepts. Tree-based index. Hash-based index CS143: Index 1 Topics to Learn Important concepts Dense index vs. sparse index Primary index vs. secondary index (= clustering index vs. non-clustering index) Tree-based vs. hash-based index Tree-based

More information

RAID in Practice, Overview of Indexing

RAID in Practice, Overview of Indexing RAID in Practice, Overview of Indexing CS634 Lecture 4, Feb 04 2014 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke 1 Disks and Files: RAID in practice For a big enterprise

More information

Friday Nights with Databases!

Friday Nights with Databases! Introduction to Data Management Lecture #22 (Physical DB Design) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 It s time again for... Friday

More information

Hash-Based Indexes. Chapter 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Hash-Based Indexes. Chapter 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Hash-Based Indexes Chapter Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Introduction As for any index, 3 alternatives for data entries k*: Data record with key value k

More information

CSE 444: Database Internals. Lectures 5-6 Indexing

CSE 444: Database Internals. Lectures 5-6 Indexing CSE 444: Database Internals Lectures 5-6 Indexing 1 Announcements HW1 due tonight by 11pm Turn in an electronic copy (word/pdf) by 11pm, or Turn in a hard copy in my office by 4pm Lab1 is due Friday, 11pm

More information

Database System Concepts, 5th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Database System Concepts, 5th Ed. Silberschatz, Korth and Sudarshan See   for conditions on re-use Chapter 12: Indexing and Hashing Database System Concepts, 5th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

CSC 261/461 Database Systems Lecture 17. Fall 2017

CSC 261/461 Database Systems Lecture 17. Fall 2017 CSC 261/461 Database Systems Lecture 17 Fall 2017 Announcement Quiz 6 Due: Tonight at 11:59 pm Project 1 Milepost 3 Due: Nov 10 Project 2 Part 2 (Optional) Due: Nov 15 The IO Model & External Sorting Today

More information

amiri advanced databases '05

amiri advanced databases '05 More on indexing: B+ trees 1 Outline Motivation: Search example Cost of searching with and without indices B+ trees Definition and structure B+ tree operations Inserting Deleting 2 Dense ordered index

More information

Midterm Review CS634. Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke

Midterm Review CS634. Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke Midterm Review CS634 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke Coverage Text, chapters 8 through 15 (hw1 hw4) PKs, FKs, E-R to Relational: Text, Sec. 3.2-3.5, to pg.

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 6 - Storage and Indexing

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 6 - Storage and Indexing CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2009 Lecture 6 - Storage and Indexing References Generalized Search Trees for Database Systems. J. M. Hellerstein, J. F. Naughton

More information

Introduction to Indexing R-trees. Hong Kong University of Science and Technology

Introduction to Indexing R-trees. Hong Kong University of Science and Technology Introduction to Indexing R-trees Dimitris Papadias Hong Kong University of Science and Technology 1 Introduction to Indexing 1. Assume that you work in a government office, and you maintain the records

More information

CS143: Index. Book Chapters: (4 th ) , (5 th ) , , 12.10

CS143: Index. Book Chapters: (4 th ) , (5 th ) , , 12.10 CS143: Index Book Chapters: (4 th ) 12.1-3, 12.5-8 (5 th ) 12.1-3, 12.6-8, 12.10 1 Topics to Learn Important concepts Dense index vs. sparse index Primary index vs. secondary index (= clustering index

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part IV Lecture 14, March 10, 015 Mohammad Hammoud Today Last Two Sessions: DBMS Internals- Part III Tree-based indexes: ISAM and B+ trees Data Warehousing/

More information

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 External Sorting Chapter 13 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Why Sort? A classic problem in computer science! Data requested in sorted order e.g., find students in increasing

More information

Intro to DB CHAPTER 12 INDEXING & HASHING

Intro to DB CHAPTER 12 INDEXING & HASHING Intro to DB CHAPTER 12 INDEXING & HASHING Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing

More information

Final Exam Review. Kathleen Durant PhD CS 3200 Northeastern University

Final Exam Review. Kathleen Durant PhD CS 3200 Northeastern University Final Exam Review Kathleen Durant PhD CS 3200 Northeastern University 1 Outline for today Identify topics for the final exam Discuss format of the final exam What will be provided for you and what you

More information

Material You Need to Know

Material You Need to Know Review Quiz 2 Material You Need to Know Normalization Storage and Disk File Layout Indexing B-trees and B+ Trees Extensible Hashing Linear Hashing Decomposition Goals: Lossless Joins, Dependency preservation

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

Advances in Data Management Principles of Database Systems - 2 A.Poulovassilis

Advances in Data Management Principles of Database Systems - 2 A.Poulovassilis 1 Advances in Data Management Principles of Database Systems - 2 A.Poulovassilis 1 Storing data on disk The traditional storage hierarchy for DBMSs is: 1. main memory (primary storage) for data currently

More information

Records in a file are grouped into buckets. Search key values are organized in a tree. The highest level is the root

Records in a file are grouped into buckets. Search key values are organized in a tree. The highest level is the root Hash- Based Indexing Records in a file are grouped into buckets Each bucket consists of a primary page and zero or more overflow pages A hash func7on h takes a search- key value k and returns the address

More information

File Structures and Indexing

File Structures and Indexing File Structures and Indexing CPS352: Database Systems Simon Miner Gordon College Last Revised: 10/11/12 Agenda Check-in Database File Structures Indexing Database Design Tips Check-in Database File Structures

More information

Evaluation of relational operations

Evaluation of relational operations Evaluation of relational operations Iztok Savnik, FAMNIT Slides & Textbook Textbook: Raghu Ramakrishnan, Johannes Gehrke, Database Management Systems, McGraw-Hill, 3 rd ed., 2007. Slides: From Cow Book

More information

External Sorting Implementing Relational Operators

External Sorting Implementing Relational Operators External Sorting Implementing Relational Operators 1 Readings [RG] Ch. 13 (sorting) 2 Where we are Working our way up from hardware Disks File abstraction that supports insert/delete/scan Indexing for

More information

Physical Database Design and Tuning. Review - Normal Forms. Review: Normal Forms. Introduction. Understanding the Workload. Creating an ISUD Chart

Physical Database Design and Tuning. Review - Normal Forms. Review: Normal Forms. Introduction. Understanding the Workload. Creating an ISUD Chart Physical Database Design and Tuning R&G - Chapter 20 Although the whole of this life were said to be nothing but a dream and the physical world nothing but a phantasm, I should call this dream or phantasm

More information