Topics to Learn. Important concepts. Tree-based index. Hash-based index

Similar documents
CS143: Index. Book Chapters: (4 th ) , (5 th ) , , 12.10

Access Methods. Basic Concepts. Index Evaluation Metrics. search key pointer. record. value. Value

Chapter 13: Indexing. Chapter 13. ? value. Topics. Indexing & Hashing. value. Conventional indexes B-trees Hashing schemes (self-study) record

Indexing. Announcements. Basics. CPS 116 Introduction to Database Systems

Material You Need to Know

CARNEGIE MELLON UNIVERSITY DEPT. OF COMPUTER SCIENCE DATABASE APPLICATIONS

Introduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana

key h(key) Hash Indexing Friday, April 09, 2004 Disadvantages of Sequential File Organization Must use an index and/or binary search to locate data

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Announcements. Reading Material. Recap. Today 9/17/17. Storage (contd. from Lecture 6)

Indexing: Overview & Hashing. CS 377: Database Systems

CSE 562 Database Systems

CSIT5300: Advanced Database Systems

Data Organization B trees

CS 525: Advanced Database Organization 04: Indexing

Lecture 8 Index (B+-Tree and Hash)

CS232A: Database System Principles INDEXING. Indexing. Indexing. Given condition on attribute find qualified records Attr = value

Chapter 17. Disk Storage, Basic File Structures, and Hashing. Records. Blocking

CS 245: Database System Principles

Chapter 12: Indexing and Hashing. Basic Concepts

Kathleen Durant PhD Northeastern University CS Indexes

Chapter 12: Indexing and Hashing

B-Tree. CS127 TAs. ** the best data structure ever

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Physical Level of Databases: B+-Trees

Hash-Based Indexes. Chapter 11

Find the block in which the tuple should be! If there is free space, insert it! Otherwise, must create overflow pages!

Chapter 11: Indexing and Hashing" Chapter 11: Indexing and Hashing"

CSC 261/461 Database Systems Lecture 17. Fall 2017

Intro to DB CHAPTER 12 INDEXING & HASHING

Tree-Structured Indexes

Chapter 11: Indexing and Hashing

amiri advanced databases '05

Chapter 11: Indexing and Hashing

Chapter 12: Indexing and Hashing (Cnt(

Extra: B+ Trees. Motivations. Differences between BST and B+ 10/27/2017. CS1: Java Programming Colorado State University

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

Physical Disk Structure. Physical Data Organization and Indexing. Pages and Blocks. Access Path. I/O Time to Access a Page. Disks.

Chapter 11: Indexing and Hashing

Indexing. Chapter 8, 10, 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Indexing Methods. Lecture 9. Storage Requirements of Databases

Chapter 12: Indexing and Hashing

Goals for Today. CS 133: Databases. Example: Indexes. I/O Operation Cost. Reason about tradeoffs between clustered vs. unclustered tree indexes

Spring 2017 B-TREES (LOOSELY BASED ON THE COW BOOK: CH. 10) 1/29/17 CS 564: Database Management Systems, Jignesh M. Patel 1

Tree-Structured Indexes. Chapter 10

Storage hierarchy. Textbook: chapters 11, 12, and 13

Tree-Structured Indexes

Selection Queries. to answer a selection query (ssn=10) needs to traverse a full path.

Hashing file organization

Database Systems. File Organization-2. A.R. Hurson 323 CS Building

Physical Database Design: Outline

Tree-Structured Indexes

Hash-Based Indexing 1

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 6 - Storage and Indexing

Chapter 17 Indexing Structures for Files and Physical Database Design

Database System Concepts, 5th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Lecture 13. Lecture 13: B+ Tree

CSE 544 Principles of Database Management Systems

More B-trees, Hash Tables, etc. CS157B Chris Pollett Feb 21, 2005.

Hash-Based Indexes. Chapter 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

System Structure Revisited

Introduction. Choice orthogonal to indexing technique used to locate entries K.

Indexing. Jan Chomicki University at Buffalo. Jan Chomicki () Indexing 1 / 25

Background: disk access vs. main memory access (1/2)

Indexing and Hashing

Introduction to Data Management. Lecture 21 (Indexing, cont.)

Some Practice Problems on Hardware, File Organization and Indexing

ΗΥ360 Αρχεία και Βάσεις εδοµένων

Data Management for Data Science

Chapter 11: Indexing and Hashing

5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

THE B+ TREE INDEX. CS 564- Spring ACKs: Jignesh Patel, AnHai Doan

Introduction to Data Management. Lecture 15 (More About Indexing)

Tree-Structured Indexes

Problem. Indexing with B-trees. Indexing. Primary Key Indexing. B-trees: Example. B-trees. primary key indexing

File Organization and Storage Structures

Indexing: B + -Tree. CS 377: Database Systems

ACCESS METHODS: FILE ORGANIZATIONS, B+TREE

Hashed-Based Indexing

Database Management and Tuning

Outline. Database Management and Tuning. What is an Index? Key of an Index. Index Tuning. Johann Gamper. Unit 4

Tree-Structured Indexes

Hash-Based Indexes. Chapter 11 Ramakrishnan & Gehrke (Sections ) CPSC 404, Laks V.S. Lakshmanan 1

Tree-Structured Indexes

Principles of Data Management. Lecture #5 (Tree-Based Index Structures)

M-ary Search Tree. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. B-Trees (4.7 in Weiss)

Chapter 18 Indexing Structures for Files. Indexes as Access Paths

Tree-Structured Indexes ISAM. Range Searches. Comments on ISAM. Example ISAM Tree. Introduction. As for any index, 3 alternatives for data entries k*:

Database Technology. Topic 7: Data Structures for Databases. Olaf Hartig.

Symbol Table. Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management

Database index structures

Database Applications (15-415)

COMP 430 Intro. to Database Systems. Indexing

CSE 444: Database Internals. Lectures 5-6 Indexing

Remember. 376a. Database Design. Also. B + tree reminders. Algorithms for B + trees. Remember

Chapter 1 Disk Storage, Basic File Structures, and Hashing.

QUIZ: Buffer replacement policies

Administrivia. Tree-Structured Indexes. Review. Today: B-Tree Indexes. A Note of Caution. Introduction

Chapter 5: Physical Database Design. Designing Physical Files

Indexes. File Organizations and Indexing. First Question to Ask About Indexes. Index Breakdown. Alternatives for Data Entries (Contd.

Transcription:

CS143: Index 1

Topics to Learn Important concepts Dense index vs. sparse index Primary index vs. secondary index (= clustering index vs. non-clustering index) Tree-based vs. hash-based index Tree-based index Indexed sequential file B+-tree Hash-based index Static hashing Extendible hashing 2

Basic Problem SELECT * FROM Student WHERE sid = 40 sid name GPA 20 Elaine 3.2 70 Peter 2.6 40 Susan 3.7 How can we answer the query? 3

Random-Order File How do we find sid=40? sid name GPA 20 Susan 3.5 60 James 1.7 70 Peter 2.6 40 Elaine 3.9 30 Christy 2.9 4

Sequential File Table sequenced by sid. Find sid=40? sid name GPA 20 Susan 3.5 30 James 1.7 40 Peter 2.6 50 Elaine 3.9 60 Christy 2.9 5

100,000 records Binary Search Q: How many blocks to read? Any better way? In a library, how do we find a book? 6

Basic Idea Build an index on the table An auxiliary structure to help us locate a record given a key 40 20 60 10 40 80 7

Dense, Primary Index Dense Index 10 20 30 40 50 60 70 80 90 100 110 120 Sequential File 10 20 30 40 50 60 70 80 90 100 Primary index (clustering index) Index on the search key Dense index (key, pointer) pair for every record Find the key from index and follow pointer Maybe through binary search Q: Why dense index? Isn t binary search on the file the same? 8

Why Dense Index? Example 10,000,000 records (900-bytes/rec) 4-byte search key, 4-byte pointer 4096-byte block. Unspanned tuples Q: How many blocks for table (how big)? Q: How many blocks for index (how big)? 9

Sparse, Primary Index Sparse Index 10 30 50 70 90 110 130 150 Sequential File 10 20 30 40 50 60 70 80 90 100 Sparse index (key, pointer) pair per every block (key, pointer) pair points to the first record in the block Q: How can we find 60? 10

Multi-level index Sparse 2nd level 10 90 170 250 330 410 490 570 1st level 10 30 50 70 90 110 130 150 170 190 210 230 Q: Why multi-level index? Sequential File 10 20 30 40 50 60 70 80 90 100 Q: Does dense, 2nd level index make sense? 11

Secondary (non-clustering) Index Sequence field 30 50 20 70 80 40 100 10 90 60 Secondary (non-clustering) index When tuples in the table are not ordered by the index search key Index on a non-search-key for sequential file Unordered file Q: What index? Does sparse index make sense? 12

Sparse and secondary index? 30 20 80 100 90... 30 50 20 70 80 40 100 10 90 60 13

Secondary index 10 50 90... sparse High level 10 20 30 40 50 60 70... First level is always dense Sparse from the second level 30 50 20 70 80 40 100 10 90 60 14

Important terms Dense index vs. sparse index Primary index vs. secondary index Clustering index vs. non-clustering index Multi-level index Indexed sequential file Sometimes called ISAM (indexed sequential access method) Search key ( primary key) 15

Insertion Insert 35 10 30 40 60 10 20 30 35 40 50 60 Q: Do we need to update higher-level index? 16

Insertion Insert 15 10 30 40 60 10 15 20 30 40 50 60 Q: Do we need to update higher-level index? 17

Insertion Insert 15 20 10 30 40 60 10 20 15 30 40 50 60 Q: Do we need to update higher-level index? 18

Potential performance problem After many insertions Main index 10 20 30 33 40 50 60 70 80 90 39 31 35 36 32 38 34 overflow pages (not sequential) 19

Traditional Index (ISAM) Advantage Simple Sequential blocks Disadvantage Not suitable for updates Becomes ugly (loses sequentiality and balance) over time 20

B+Tree Most popular index structure in RDBMS Advantage Suitable for dynamic updates Balanced Minimum space usage guarantee Disadvantage Non-sequential index blocks 21

B+Tree Example (n=3) 70 root Non leaf 50 80 20 30 50 70 80 90 Leaf......... 20 Susan 2.7 30 James 3.6 50 Peter 1.8 Balanced: All leaf nodes are at the same level 22

Sample Leaf Node (n=3) From a non-leaf node 20 30 Last pointer: to the next leaf node points to tuple 20 Susan 2.7 30 James 3.6 50 Peter 1.8 n: max # of pointers in a node All pointers (except the last one) point to tuples At least half of the pointers are used. (more precisely, (n+1)/2 pointers) 23

Sample Non-leaf Node (n=3) 23 56 To keys k<23 To keys 23 k<56 To keys 56 k Points to the nodes one-level below - No direct pointers to tuples At least half of the ptrs used (precisely, n/2 ) - except root, where at least 2 ptrs used 24

Search on B+tree Find 30, 60, 70? 70 50 80 20 30 50 70 80 90 Find a greater key and follow the link on the left (Algorithm: Figure 12.10 on textbook) 25

Nodes are never too empty Use at least Non-leaf: n/2 pointers Leaf: (n+1)/2 pointers n=4 Non-leaf full node min. node 5 8 10 5 Leaf 5 8 10 5 8 26

Number of Ptrs/Keys for B+tree Max Max Ptrs keys Min ptrs Min keys Non-leaf (non-root) n n-1 n/2 n/2-1 Leaf (non-root) n n-1 (n+1)/2 (n-1)/2 Root n n-1 2 1 27

B+Tree Insertion (a) simple case (no overflow) (b) leaf overflow (c) non-leaf overflow (d) new root 28

(a) Simple case (no overflow) 29

Insertion (Simple Case) Insert 60 70 50 80 20 30 50 70 80 90 30

Insertion (Simple Case) Insert 60 70 50 80 20 30 50 60 70 80 90 31

(b) Leaf overflow 32

Insertion (Leaf Overflow) Insert 55 70 50 80 20 30 50 60 70 80 90 No space to store 55 33

Insertion (Leaf Overflow) Insert 55 70 50 80 Overflow! 20 30 50 55 60 70 80 90 Split the leaf into two. Put the keys half and half 34

Insertion (Leaf Overflow) Insert 55 70 50 80 20 30 50 55 60 70 80 90 35

Insertion (Leaf Overflow) Insert 55 70 50 80 60 20 30 50 55 60 70 80 90 Copy the first key of the new node to parent 36

Insertion (Leaf Overflow) Insert 55 70 50 60 No overflow. Stop 80 20 30 50 55 60 70 80 90 Q: After split, leaf nodes always half full? 37

(c) Non-leaf overflow 38

Insertion (Non-leaf Overflow) Insert 52 70 50 60 20 30 50 55 60 Leaf overflow. Split and copy the first key of the new node 39

Insertion (Non-leaf Overflow) Insert 52 70 50 60 20 30 50 52 55 60 40

Insertion (Non-leaf Overflow) Insert 52 70 50 60 55 20 30 50 52 55 60 41

Insertion (Non-leaf Overflow) Insert 52 70 50 55 60 Overflow! 20 30 50 52 55 60 42

Insertion (Non-leaf Overflow) Insert 52 70 50 55 60 20 30 50 52 55 60 Split the node into two. Move up the key in the middle. 43

Insertion (Non-leaf Overflow) Insert 52 70 55 Middle key 50 60 20 30 50 52 55 60 44

Insertion (Non-leaf Overflow) Insert 52 55 70 No overflow. Stop 50 60 20 30 50 52 55 60 Q: After split, non-leaf at least half full? 45

(d) New root 46

Insertion (New Root Node) Insert 25 50 60 20 30 50 55 60 47

Insertion (New Root Node) Insert 25 30 50 60 Overflow! 20 25 30 50 55 60 48

Insertion (New Root Node) Insert 25 30 50 60 Split and move up the mid-key. Create new root 20 25 30 50 55 60 49

Insertion (New Root Node) Insert 25 Q: At least 2 ptrs at root? 50 30 60 20 25 30 50 55 60 50

B+Tree Insertion Leaf node overflow The first key of the new node is copied to the parent Non-leaf node overflow The middle key is moved to the parent Detailed algorithm: Figure 12.13 51

B+Tree Deletion (a) Simple case (no underflow) (b) Leaf node, coalesce with neighbor (c) Leaf node, redistribute with neighbor (d) Non-leaf node, coalesce with neighbor (e) Non-leaf node, redistribute with neighbor In the examples, n = 4 Underflow for non-leaf when fewer than n/2 = 2 ptrs Underflow for leaf when fewer than (n+1)/2 = 3 ptrs Nodes are labeled as a, b, c, d, 52

(a) Simple case (no underflow) 53

(a) Simple case a 20 40 60 b c d e 20 25 30 40 50 Delete 25 54

(a) Simple case a 20 40 60 b c d e 20 30 Underflow? 40 50 Delete 25 Underflow? Min 3 ptrs. Currently 3 ptrs 55

(b) Leaf node, coalesce with neighbor 56

Delete 50 (b) Coalesce with sibling (leaf) 20 40 60 b c d a 20 30 40 50 e 60 57

Delete 50 (b) Coalesce with sibling (leaf) 20 40 60 b c d a 20 30 40 Underflow? Underflow? Min 3 ptrs, currently 2. e 60 58

Delete 50 (b) Coalesce with sibling (leaf) 20 40 60 b c d Try to merge with a sibling a 20 30 40 underflow! Can be merged? e 60 59

(b) Coalesce with sibling (leaf) 20 40 60 Merge b c d 20 30 a Delete 50 Merge c and d. Move everything on the right to the left. 40 e 60 60

(b) Coalesce with sibling (leaf) a 20 40 60 b c d 20 30 40 e 60 Delete 50 Once everything is moved, delete d 61

Delete 50 (b) Coalesce with sibling (leaf) After leaf node merge, a 20 30 40 20 40 60 b c d From its parent, delete the pointer and key to the deleted node e 60 62

(b) Coalesce with sibling (leaf) a 20 60 b c Underflow? e 20 30 40 60 Delete 50 Check underflow at a. Min 2 ptrs, currently 3 63

(c) Leaf node, redistribute with neighbor 64

(c) Redistribute (leaf) a 20 40 60 b c d e 20 25 30 40 50 60 Delete 50 65

(c) Redistribute (leaf) a 20 40 60 b c d e 20 25 30 40 Underflow? 60 Delete 50 Underflow? Min 3 ptrs, currently 2 Check if d can be merged with its sibling c or e If not, redistribute the keys in d with a sibling Say, with c Can be merged? 66

Delete 50 (c) Redistribute (leaf) a 20 40 60 Redistribute b c d e 20 25 30 Redistribute c and d, so that nodes c and d are roughly half full 40 60 Move the key 30 and its tuple pointer to the d 67

(c) Redistribute (leaf) 20 25 a 20 40 60 b c d e 60 30 40 Delete 50 Update the key in the parent 68

(c) Redistribute (leaf) a 30 Underflow? 20 40 60 b c d e 60 20 25 30 40 Delete 50 No underflow at a. Done. 69

(d) Non-leaf node, coalesce with neighbor 70

(d) Coalesce (non-leaf) a 50 90 b 30 c 70 d e f g 10 20 30 40 50 60 70 Delete 20 Underflow! Merge d with e. Move everything in the right to the left 71

(d) Coalesce (non-leaf) a 50 90 b 30 c 70 d e f g 10 30 40 50 60 70 Delete 20 From the parent node, delete pointer and key to the deleted node 72

(d) Coalesce (non-leaf) a 50 90 b underflow! c 70 Can be merged? d f g 10 30 40 50 60 70 Delete 20 Underflow at b? Min 2 ptrs, currently 1. Try to merge with its sibling. Nodes b and c: 3 ptrs in total. Max 4 ptrs. Merge b and c. 73

merge b (d) Coalesce (non-leaf) a 50 90 c 70 d f g 10 30 40 50 60 Delete 20 Merge b and c Pull down the mid-key 50 in the parent node Move everything in the right node to the left. Very important: when we merge non-leaf nodes, we always pull down the mid-key in the parent and place it in the merged node. 74 70

(d) Coalesce (non-leaf) a 50 90 b c 70 d f g 10 30 40 50 60 Delete 20 Merge b and c Pull down the mid-key 50 in the parent node Move everything in the right node to the left. Very important: when we merge non-leaf nodes, we always pull down the mid-key in the parent and place it in the merged node. 75 70

(d) Coalesce (non-leaf) a 90 b 50 70 c d f g 10 30 40 50 60 Delete 20 Delete pointer to the merged node. 70 76

(d) Coalesce (non-leaf) a 90 b 50 70 d f g 10 30 40 50 60 Delete 20 Underflow at a? Min 2 ptrs. Currently 2. Done. 70 77

(e) Non-leaf node, redistribute with neighbor 78

(e) Redistribute (non-leaf) a 50 99 b 30 c 70 90 97 d e f g 10 20 30 40 50 60 70 Delete 20 Underflow! Merge d with e. 79

(e) Redistribute (non-leaf) a 50 99 b 30 c 70 90 97 d e f g 10 30 40 50 60 70 Delete 20 After merge, remove the key and ptr to the deleted node from the parent 80

(e) Redistribute (non-leaf) underflow! b c 70 90 97 Can be merged? d f g 10 30 40 50 60 70 a 50 99 Delete 20 Underflow at b? Min 2 ptrs, currently 1. Merge b with c? Max 4 ptrs, 5 ptrs in total. If cannot be merged, redistribute the keys with a sibling. Redistribute b and c 81

(e) Redistribute (non-leaf) redistribute a 50 99 b c 70 90 97 d f g 10 30 40 Delete 20 50 60 Redistribution at a non-leaf node is done in two steps. Step 1: Temporarily, make the left node b overflow by pulling down the mid-key and moving everything to the left. 70 82

(e) Redistribute (non-leaf) redistribute a 99 b temporary overflow 97 50 70 90 c d f g 10 30 40 50 60 70 Delete 20 Step 2: Apply the overflow handling algorithm (the same algorithm used for B+tree insertion) to the overflowed node Detailed algorithm in the next slide 83

(e) Redistribute (non-leaf) redistribute a 99 b 50 70 90 97 c d f g 10 30 40 50 60 70 Delete 20 Step 2: overflow handling algorithm Pick the mid-key (say 90) in the node and move it to parent. Move everything to the right of 90 to the empty node c. 84

(e) Redistribute (non-leaf) a 90 99 b 50 70 c 97 d f g 10 30 40 50 60 70 Delete 20 Underflow at a? Min 2 ptrs, currently 3. Done 85

Important Points Remember: For leaf node merging, we delete the mid-key from the parent For non-leaf node merging/redistribution, we pull down the mid-key from their parent. Exact algorithm: Figure 12.17 In practice Coalescing is often not implemented Too hard and not worth it 86

Where does n come from? n determined by Size of a node Size of search key Size of an index pointer Q: 1024B node, 10B key, 8B ptr à n? 87

Question on B+tree SELECT * FROM Student WHERE sid > 60? 70 50 80 20 30 50 60 70 80 90 88

Summary on tree index Issues to consider Sparse vs. dense Primary (clustering) vs. secondary (non-clustering) Indexed sequential file (ISAM) Simple algorithm. Sequential blocks Not suitable for dynamic environment B+trees Balanced, minimum space guarantee Insertion, deletion algorithms 89

Index Creation in SQL CREATE INDEX <indexname> ON <table>(<attr>,<attr>, ) Example CREATE INDEX stidx ON Student(sid) Creates a B+tree on the attributes Speeds up lookup on sid 90

Primary (Clustering) Index MySQL: Primary key becomes the clustering index DB2: CREATE INDEX idx ON Student(sid) CLUSTER Tuples in the table are sequenced by sid Oracle: Index-Organized Table (IOT) CREATE TABLE T (... ) ORGANIZATION INDEX B+tree on primary key Tuples are stored at the leaf nodes of B+tree Periodic reorganization may still be necessary to improve range scan performance 91

Next topic Hash index Static hashing Extendible hashing 92

What is a Hash Table? Hash Table Hash function h(k): key à integer [0 n] e.g., h( Susan ) = 7 Array for keys: T[0 n] Given a key k, store it in T[h(k)] h(susan) = 4 h(james) = 3 h(neil) = 1 0 1 Neil 2 3 James 4 Susan 5 93

Hashing for DBMS (Static Hashing) search key h(key) 0 1 2 Disk blocks (buckets) (key, record) 3 4. 94

Overflow and Chaining Insert h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 h(e) = 1 0 1 2 d a c b e Delete h(b) = 2 h(c) = 1 3 95

Major Problem of Static Hashing How to cope with growth? Data tends to grow in size Overflow blocks unavoidable hash buckets 10 20 30 33 40 50 60 70 80 90 overflow blocks 39 31 35 36 32 38 34 96

Extendible Hashing (two ideas) (a) Use i of b bits output by hash function h(k) b 00110101 use i grows over time 97

Extendible Hashing (two ideas) (b) Use directory that maintains pointers to hash buckets (indirection) h(c) directory.. hash bucket c e 98

Example h(k) is 4 bits; 2 keys/bucket Insert 0111 i = 0 1 i = 1 0001 0111 1 i = 1 1001 1100 99

Example Insert 1010 i = 0 1 i = 1 0001 0111 1 i = 1 1001 1100 1010 overflow! Increase i of the bucket. Split it. 100

Example Insert 1010 i = 0 1 1 i = i = 1 0001 0111 2 1 1001 1100 1010 overflow! Redistribute keys based on first i bits i = 2 101

Example Insert 1010 i = 0 1 1 0001 0111 Update ptr in dir to new bkt 1? 2 1001 1010 If no space, double directory size (increase i) 2 1100 102

Example Insert 1010 i = 2 00 i = 0 1 1 0001 0111 01 10 11 Copy pointers 1 2 1001 1010 2 1100 103

Example Insert 1010 i = 2 00 i = 0 1 1 0001 0111 01 10 11 1 2 1001 1010 2 1100 104

Example Insert 0000 i = 2 00 01 10 11 Split bucket and increase i 1 0001 0111 2 1001 1010 2 1100 0000 Overflow! 105

Example 2 Insert 0000 i = 2 00 01 10 11 Redistribute keys 2 1 0001 0111 2 1001 1010 2 1100 0000 Overflow! 106

Insert 0000 i = 2 00 Example 2 0000 0001 2 1 0111 01 10 11 Update ptr in directory 2 1001 1010 2 1100 107

Insert 0000 i = 2 00 01 10 11 Example 2 0000 0001 2 1 0111 2 1001 1010 2 1100 108

Insert 0011 2 0000 0001 0011 Overflow! i = 00 2 2 0111 01 10 11 Split bucket, increase i, redistribute keys 2 1001 1010 2 1100 109

Insert 0011 3 0000 0001 3 2 0011 i = 00 2 2 0111 01 10 11 Update ptr in dir If no space, double directory 2 1001 1010 2 1100 110

Insert 0011 i = 3 3 0000 0001 3 2 0011 000 001 010 i = 00 2 2 0111 011 100 101 110 111 01 10 11 2 1001 1010 2 1100 111

Insert 0011 i = 3 3 0000 0001 3 2 0011 000 001 010 i = 00 2 2 0111 011 100 101 110 111 01 10 11 2 1001 1010 2 1100 112

Extendible Hashing: Deletion Two options a) No merging of buckets b) Merge buckets and shrink directory if possible 113

Delete 1010 i = 00 2 1 0001 a 01 10 2 1001 1010 b 11 2 1100 c 114

Delete 1010 i = 00 2 1 0001 a 01 10 2 1001 b 11 2 1100 c Can we merge a and b? b and c? 115

i = 00 01 10 11 Delete 1010 Decrease i and merge buckets 2 Update ptr in directory Q: Can we shrink directory? 1 0001 2 1 1001 1100 2 1100 a b c 116

Delete 1010 i = 0 1 i = 00 2 1 0001 a 1 01 10 2 1 1001 1100 b 11 117

Bucket Merge Condition Bucket merge condition Bucket i s are the same First (i-1) bits of the hash key are the same Directory shrink condition All bucket i s are smaller than the directory i 118

Questions on Extendible Hashing Can we provide minimum space guarantee? 119

Space Waste 00000 4 00001 i = 4 00010 4 3 2 1 120

Static hashing Hash index summary Overflow and chaining Extendible hashing Can handle growing files No periodic reorganizations Indirection Up to 2 disk accesses to access a key Directory doubles in size Not too bad if the data is not too large 121

Hashing vs. Tree Can an extendible-hash index support? SELECT * FROM R WHERE R.A > 5 Which one is better, B+tree or Extendible hashing? SELECT * FROM R WHERE R.A = 5 122