Review Quiz 2
Material You Need to Know Normalization Storage and Disk File Layout Indexing B-trees and B+ Trees Extensible Hashing Linear Hashing
Decomposition Goals: Lossless Joins, Dependency preservation Redundancy Avoidance BCNF Guarantees no redundancy and lossless joins (Not DP!) Relation schema R, with FD set F, is in BCNF if: For all nontrivial X Y in F+: X R
Storage Disk Know the memory hierarchy (Do not worry about having to do the calculation in the warm up problems on the homework) What type of questions do you think might be asked for this section?
File Layout Differences between fixed record size and variable record size Understand the tradeoffs of all the different layouts talked about in the slides
Indexes Sparse Index - Pointer for every group Dense Index - pointer value for every value in the table Explain the difference between each of the following: 1. Primary versus secondary indexes. 2. Dense versus sparse indexes. 3. If you were about to create an index on a relation, what considerations would guide your choice with respect to each pair of properties listed above?
B-Trees Balanced n-way tree (every path from root to a leaf must be of the same length) Can be used for primary/secondary, clustering/non-clustering index O(log B N) for all operations (search, insert, delete), where B is the number of pointers (branching factor) and N is the number of records 50% utilization at minimum; each non-leaf must have between ceil(n/2) and n children, while leaves must have at least ceil((n-1)/2) values Some overhead on each update but far from the overhead of file reorganization
B-Trees: Insert Insert in leaf, if room If overflow: Split (create new internal node) Redistribute keys Recursively push middle key up (height increases when root overflows and splits)
B-Trees: Insertion with Overflow - Insert 3 4 7 1 2 5 6 8 9
B-Trees: Insertion with Overflow - Insert 3 2 4 7 1 3 5 6 8 9
B-Trees: Insertion with Overflow - Insert 3 2 4 7 1 3 5 6 8 9
B-Trees: Insertion with Overflow - Insert 3 4 2 7 1 3 5 6 8 9
B-Trees: Delete Delete key and promote if needed (promote largest from left subtree or smallest from right subtree) If underflow: Rich sibling: borrow key through parent Poor sibling: recursively merge, by pulling key from parent
B-Trees: Deletion with Underflow & Rich Sibling - Delete 9 4 2 7 1 3 5 6 9
B-Trees: Deletion with Underflow & Rich Sibling - Delete 9 4 2 1 3 5 6 7
B-Trees: Deletion with Underflow & Rich Sibling - Delete 9 4 2 6 1 3 5 7
B-Trees: Deletion with Underflow & Poor Sibling - Delete 7 4 2 6 1 3 5 7
B-Trees: Deletion with Underflow & Poor Sibling - Delete 7 4 2 6 1 3 5
B-Trees: Deletion with Underflow & Poor Sibling - Delete 7 4 2 1 3 5 6
B-Trees: Deletion with Underflow & Poor Sibling - Delete 7 2 4 1 3 5 6
B+Trees All keys appear at leaf level - replicate keys at non-leaf levels To lookup, must traverse down to leaf level Each leaf has a pointer to the next Facilitate sequential operations
B+Trees: Insert Insert key into leaf If overflow: Split leaf and copy smallest value of new node to parent (copy key up) If further overflow, do recursive B-tree split (push key up)
* adapted from last year s recitation B+Trees: Insertion with Overflow - Insert N O C F J S A B D E G H K L M P Q T U
* adapted from last year s recitation B+Trees: Insert N O C F J S A B D E G H K L P Q M N T U
* adapted from last year s recitation B+Trees: Insert N O C F J M S A B D E G H K L P Q M N T U
* adapted from last year s recitation B+Trees: Insert N O C F J M S A B D E G H K L P Q M N T U
* adapted from last year s recitation B+Trees: Insert N J O C F M S A B D E G H K L P Q M N T U
Hashing Best for equality selections, cannot be used for range searches Static hashing: number of bucket pages is fixed, overflow pages used when buckets are filled Dynamic hashing: extendible and linear
Extendible Hashing Allocate to buckets by least significant bits of hash value Directory grows by doubling when buckets overflow, shrinks when enough values are removed Global depth: maximum number of bits needed to tell which bucket a value hashes to Local depth: number of bits needed to tell if the value hashes to a particular bucket
Extendible Hashing: Insert If bucket overflows, increment local depth: If local depth > global depth, double directory and redistribute records If local depth <= global depth, allocate new page with local depth, redistribute records, and add page to directory
Extendible Hashing: Delete If deletion makes bucket empty, merge with its split image Can halve when each directory element points to the same bucket as its split image
Extendible Hashing Given bucket size = 4 and the following catalog, hash the following values in order: 41 38 45 40 catalog buckets 00 4 16 28 32 01 9 13 37 10 2 10 22 30 11 7 23 27
Global depth: 2 Local depth 00 2 4 16 28 32 01 9 13 37 41 2 10 2 10 22 30 2 11 7 23 27 2
Global depth: 3 Local depth 000 2 001 9 13 37 41 2 010 2 10 3 011 7 23 27 2 100 101 110 3 111 4 16 28 32 22 30 38
Global depth: 3 Local depth 000 2 001 9 41 3 010 2 10 3 011 7 23 27 2 100 101 3 110 3 111 4 16 28 32 13 37 45 22 30 38
Global depth: 3 Local depth 000 3 001 9 41 2 010 2 10 3 011 7 23 27 2 100 3 101 3 110 3 111 16 32 40 4 28 13 37 45 22 30 38
Linear Hashing Uses temporary overflow pages and splits bucket in a round-robin fashion Algorithm goes in rounds (level) next pointer determines the next bucket to be split Round is completed when all buckets that were there at the start of the round have been split Space utilization better than extendible hashing since splits are not concentrated on data-dense areas
Linear Hashing: Insertion If bucket is full: Add overflow page Split the bucket next points to If next was pointing at the last bucket originally there at the beginning of the round, reset it to zero and increment the round number; else increment next
Linear Hashing Given bucket size = 4 and the following catalog, hash the following values in order: 41 38 45 40 catalog buckets 00 4 16 28 32 01 9 13 37 10 2 10 22 30 11 7 23 27
h 0 00 next 4 16 28 32 01 10 11 9 13 37 41 2 10 22 30 7 23 27
h 1 h 0 000 00 16 32 001 01 9 13 37 41 next 010 10 011 11 100 00 2 10 22 30 7 23 27 4 28 38
h 1 h 0 000 00 001 01 16 32 9 41 010 10 2 10 22 30 38 next 011 11 100 00 101 01 7 23 27 4 28 13 37 45
h 1 h 0 000 00 001 01 16 32 40 9 41 010 10 2 10 22 30 38 next 011 11 100 00 101 01 7 23 27 4 28 13 37 45