Chapter 13: Indexing (Slides by Hector Garcia-Molina, http://wwwdb.stanford.edu/~hector/cs245/notes.htm) Chapter 13 1 Chapter 13 Indexing & Hashing value record? value Chapter 13 2 Topics Conventional indexes B-trees Hashing schemes (self-study) Chapter 13 3 1
Sequential File 0 Chapter 13 4 Dense Index 0 1 1 Sequential File 0 Chapter 13 5 Sparse Index 1 1 1 1 1 2 2 Sequential File 0 Chapter 13 6 2
Sparse 2nd level 1 2 3 4 4 5 1 1 1 1 1 2 2 Sequential File 0 Chapter 13 7 Question: Can we build a dense, 2nd level index for a dense index? Chapter 13 8 Notes on pointers: (1) Block pointer (sparse index) can be smaller than record pointer BP RP Chapter 13 9 3
Notes on pointers: (2) If file is contiguous, then we can omit pointers (i.e., compute them) Chapter 13 R1 K1 K2 K3 K4 R2 R3 R4 say: 24 B per block if we want K3 block: get it at offset (3-1)24 = 48 bytes Chapter 13 11 Sparse vs. Dense Tradeoff Sparse: Less index space per record can keep more of index in memory Dense: Can tell if any record exists without accessing file (Also: sparse better for insertions dense needed for secondary indexes) Chapter 13 12 4
Terms Index sequential file Search key ( primary key) Primary index (on Sequencing field) Secondary index Dense index (all Search Key values in) Sparse index Multi-level index Chapter 13 13 Next: Duplicate keys Deletion/Insertion Secondary indexes Chapter 13 14 Duplicate keys 45 Chapter 13 15 5
Duplicate keys Dense index, one way to implement? 45 Chapter 13 16 Duplicate keys Dense index, better way? 45 Chapter 13 17 Duplicate keys Sparse index, one way? careful if looking for or! 45 Chapter 13 18 6
Duplicate keys Sparse index, another way? place first new key from block should this be? 45 Chapter 13 19 Summary Duplicate values, primary index Index may point to first instance of each value only File Index a a a.. b Chapter 13 Deletion from sparse index 1 1 1 Chapter 13 21 7
Deletion from sparse index delete record 1 1 1 Chapter 13 22 Deletion from sparse index delete record 1 1 1 Chapter 13 23 Deletion from sparse index delete records & 1 1 1 Chapter 13 24 8
Deletion from dense index Chapter 13 25 Deletion from dense index delete record Chapter 13 26 Insertion, sparse index case Chapter 13 27 9
Insertion, sparse index case insert record 34 our lucky day! we have free space where we need it! 34 Chapter 13 28 Insertion, sparse index case insert record 15 Illustrated: Immediate reorganization Variation: insert new block (chained file) update index Chapter 13 29 15 Insertion, sparse index case insert record 25 25 overflow blocks (reorganize later...) Chapter 13
Insertion, dense index case Similar Often more expensive... Chapter 13 31 Secondary indexes Sequence field 0 Chapter 13 32 Secondary indexes Sparse index 0... does not make sense! 0 Sequence field Chapter 13 33 11
Secondary indexes Dense index Sequence field... sparse high level... 0 Chapter 13 34 With secondary indexes: Lowest level is dense Other levels are sparse Also: Pointers are record pointers (not block pointers; not computed) Chapter 13 35 Summary so far Conventional index Basic Ideas: sparse, dense, multi-level Duplicate Keys Deletion/Insertion Secondary indexes Chapter 13 36 12
Conventional indexes Advantage: - Simple - Index is sequential file good for scans Disadvantage: - Inserts expensive, and/or - Lose sequentiality & balance Chapter 13 37 Outline: Conventional indexes B-Trees NEXT Hashing schemes (self-study) Chapter 13 38 NEXT: Another type of index Give up on sequentiality of index Try to get balance Chapter 13 39 13
B+Tree Example n=3 Root 3 5 11 35 0 1 1 1 1 1 156 179 1 0 1 1 1 0 Chapter 13 Sample non-leaf 57 81 95 to keys to keys to keys to keys < 57 57 k<81 81 k<95 95 Chapter 13 41 Sample leaf node: From non-leaf node 57 to next leaf in sequence To record with key 57 To record with key 81 To record with key 85 81 95 Chapter 13 42 14
Size of nodes: n+1 pointers (fixed) n keys Chapter 13 43 Don t want nodes to be too empty Use at least Non-leaf: Leaf: (n+1)/2 pointers (n+1)/2 pointers to data Chapter 13 44 n=3 Full node min. node Non-leaf 1 1 1 Leaf 3 5 11 35 counts even if null Chapter 13 45 15
B+tree rules: tree of order n (1) All leaves are at the same lowest level (balanced tree) (2) Pointers in leaves point to records, except for sequence pointer Chapter 13 46 (3) Number of pointers/keys for B+tree Max Max Min ptrs keys ptrs data Min keys Non-leaf (non-root) n+1 n (n+1)/2 (n+1)/2-1 Leaf (non-root) n+1 n (n+1)/2 (n+1)/2 Root n+1 n 1 1 Chapter 13 47 Insert into B+tree (a) simple case space available in leaf (b) leaf overflow (c) non-leaf overflow (d) new root Chapter 13 48 16
(a) Insert key = 32 n=3 3 5 11 31 32 0 Chapter 13 49 (a) Insert key = 7 n=3 3 5 3 5 11 7 31 7 0 Chapter 13 (c) Insert key = 1 n=3 1 156 179 1 179 1 0 1 1 1 1 0 1 Chapter 13 51 17
(d) New root, insert 45 n=3 new root 1 2 3 12 25 32 45 Chapter 13 52 Deletion from B+tree (a) Simple case - no example (b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf Chapter 13 53 (b) Coalesce with sibling Delete n=4 0 Chapter 13 54 18
(c) Redistribute keys Delete n=4 35 35 0 35 Chapter 13 55 (d) Non-leaf coalese Delete 37 n=4 new root 1 3 14 22 25 26 37 45 25 25 Chapter 13 56 B+tree deletions in practice Often, coalescing is not implemented Too hard and not worth it! Chapter 13 57 19
Variation on B+tree: B-tree (no +) Idea: Avoid duplicate keys Have record pointers in non-leaf nodes Chapter 13 58 K1 P1 K2 P2 K3 P3 to record to record to record with K1 with K2 with K3 to keys to keys to keys to keys < K1 K1<x<K2 K2<x<k3 >k3 Chapter 13 59 B-tree example n=2 sequence pointers not useful now! (but keep space for simplicity) 65 125 0 1 1 1 1 1 1 1 1 25 45 85 5 145 165 Chapter 13
Tradeoffs: B-trees have faster lookup than B+trees in B-tree, non-leaf & leaf different sizes in B-tree, deletion more complicated B+trees preferred! Chapter 13 61 But note: If blocks are fixed size (due to disk and buffering restrictions) Then lookup for B+tree is actually better!! Chapter 13 62 So... 8 records B+ B ooooooooooooo ooooooooo 156 records 8 records Total = 116 Conclusion: For fixed block size, B+ tree is better because it is bushier Chapter 13 63 21
Outline/summary Conventional Indexes Sparse vs. dense Primary vs. secondary B trees B+trees vs. B-trees B+trees vs. indexed sequential Hashing schemes (self-study) Chapter 13 64 22