CS 245: Database System Principles

Similar documents
CS 525: Advanced Database Organization 04: Indexing

CSE 562 Database Systems

Chapter 13: Indexing. Chapter 13. ? value. Topics. Indexing & Hashing. value. Conventional indexes B-trees Hashing schemes (self-study) record

CS232A: Database System Principles INDEXING. Indexing. Indexing. Given condition on attribute find qualified records Attr = value

Access Methods. Basic Concepts. Index Evaluation Metrics. search key pointer. record. value. Value

Topics to Learn. Important concepts. Tree-based index. Hash-based index

CS143: Index. Book Chapters: (4 th ) , (5 th ) , , 12.10

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

7RSLFV ,QGH[HV 7HUPVÃDQGÃ'LVWLQFWLRQV. Conventional Indexes B-Tree Indexes Hashing Indexes

Physical Level of Databases: B+-Trees

Some Practice Problems on Hardware, File Organization and Indexing

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing

CSIT5300: Advanced Database Systems

Indexing and Hashing

Chapter 11: Indexing and Hashing

Datenbanksysteme II: Caching and File Structures. Ulf Leser

Indexing. Announcements. Basics. CPS 116 Introduction to Database Systems

Chapter 11: Indexing and Hashing

Material You Need to Know

Extra: B+ Trees. Motivations. Differences between BST and B+ 10/27/2017. CS1: Java Programming Colorado State University

amiri advanced databases '05

Chapter 11: Indexing and Hashing

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Kathleen Durant PhD Northeastern University CS Indexes

Intro to DB CHAPTER 12 INDEXING & HASHING

Chapter 12: Indexing and Hashing (Cnt(

Hashed-Based Indexing

COMP 430 Intro. to Database Systems. Indexing

Introduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana

Chapter 11: Indexing and Hashing

Selection Queries. to answer a selection query (ssn=10) needs to traverse a full path.

Storage hierarchy. Textbook: chapters 11, 12, and 13

Chapter 12: Indexing and Hashing

Indexing: Overview & Hashing. CS 377: Database Systems

Database Technology. Topic 7: Data Structures for Databases. Olaf Hartig.

Find the block in which the tuple should be! If there is free space, insert it! Otherwise, must create overflow pages!

Indexing. Jan Chomicki University at Buffalo. Jan Chomicki () Indexing 1 / 25

Data Organization B trees

Database System Concepts, 5th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Chapter 11: Indexing and Hashing" Chapter 11: Indexing and Hashing"

More B-trees, Hash Tables, etc. CS157B Chris Pollett Feb 21, 2005.

Background: disk access vs. main memory access (1/2)

Indexing. Chapter 8, 10, 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Hash-Based Indexes. Chapter 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Physical Disk Structure. Physical Data Organization and Indexing. Pages and Blocks. Access Path. I/O Time to Access a Page. Disks.

Announcements. Reading Material. Recap. Today 9/17/17. Storage (contd. from Lecture 6)

Advances in Data Management Principles of Database Systems - 2 A.Poulovassilis

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11

Data on External Storage

Advanced Database Systems

Database index structures

Database System Concepts

Chapter 17 Indexing Structures for Files and Physical Database Design

ACCESS METHODS: FILE ORGANIZATIONS, B+TREE

Tree-Structured Indexes

Chapter 13: Query Processing

Database Applications (15-415)

CMSC 424 Database design Lecture 13 Storage: Files. Mihai Pop

CARNEGIE MELLON UNIVERSITY DEPT. OF COMPUTER SCIENCE DATABASE APPLICATIONS

Indexing: B + -Tree. CS 377: Database Systems

CS-245 Database System Principles

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Physical Database Design: Outline

Remember. 376a. Database Design. Also. B + tree reminders. Algorithms for B + trees. Remember

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

Chapter 13: Query Processing Basic Steps in Query Processing

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 6 - Storage and Indexing

Chapter 5: Physical Database Design. Designing Physical Files

Review: Memory, Disks, & Files. File Organizations and Indexing. Today: File Storage. Alternative File Organizations. Cost Model for Analysis

Tree-Structured Indexes

Chapter 12: Query Processing. Chapter 12: Query Processing

System Structure Revisited

PS2 out today. Lab 2 out today. Lab 1 due today - how was it?

Introduction to Indexing R-trees. Hong Kong University of Science and Technology

Context. File Organizations and Indexing. Cost Model for Analysis. Alternative File Organizations. Some Assumptions in the Analysis.

Spring 2017 B-TREES (LOOSELY BASED ON THE COW BOOK: CH. 10) 1/29/17 CS 564: Database Management Systems, Jignesh M. Patel 1

Chapter 12: Query Processing

File Structures and Indexing

Homework 2 (by Ao Zeng) Solutions Due: Friday Sept 28, 11:59pm

IMPORTANT: Circle the last two letters of your class account:

Indexing Methods. Lecture 9. Storage Requirements of Databases

CSC 261/461 Database Systems Lecture 17. Fall 2017

Why Is This Important? Overview of Storage and Indexing. Components of a Disk. Data on External Storage. Accessing a Disk Page. Records on a Disk Page

THE B+ TREE INDEX. CS 564- Spring ACKs: Jignesh Patel, AnHai Doan

Overview of Storage and Indexing

QUIZ: Buffer replacement policies

CS 245 Midterm Exam Winter 2014

Indexing and Hashing

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Advanced Database Systems

Outline. Database Management and Tuning. Index Data Structures. Outline. Index Tuning. Johann Gamper. Unit 5

CS 525: Advanced Database Organization 03: Disk Organization

Lecture 13. Lecture 13: B+ Tree

Index Tuning. Index. An index is a data structure that supports efficient access to data. Matching records. Condition on attribute value

Overview of Storage and Indexing

Text Analytics. Index-Structures for Information Retrieval. Ulf Leser

Hash-Based Indexes. Chapter 11

Hash-Based Indexing 1

L9: Storage Manager Physical Data Organization

Transcription:

CS 2: Database System Principles Notes 4: Indexing Chapter 4 Indexing & Hashing value record value Hector Garcia-Molina CS 2 Notes 4 1 CS 2 Notes 4 2 Topics Conventional indexes B-trees Hashing schemes Sequential File 0 CS 2 Notes 4 CS 2 Notes 4 4 Dense Index Sequential File Sparse Index Sequential File 0 1 1 0 1 1 1 1 1 2 2 0 CS 2 Notes 4 5 CS 2 Notes 4 6 1

Sparse 2nd level 1 2 4 4 5 1 1 1 1 1 2 2 Sequential File 0 Comment: {FILE,INDEX} may be contiguous or not (blocks chained) CS 2 Notes 4 7 CS 2 Notes 4 8 Question: Can we build a dense, 2nd level index for a dense index? Notes on pointers: (1) Block pointer (sparse index) can be smaller than record pointer BP RP CS 2 Notes 4 9 CS 2 Notes 4 Notes on pointers: (2) If file is contiguous, then we can omit pointers (i.e., compute them) K1 K2 K K4 R1 R2 R R4 CS 2 Notes 4 11 CS 2 Notes 4 12 2

R1 Sparse vs. Dense Tradeoff K1 K2 K K4 R2 R R4 say: 24 B per block Sparse: Less index space per record can keep more of index in memory Dense: Can tell if any record exists without accessing file if we want K block: get it at offset (-1)24 = 48 bytes (Later: sparse better for insertions dense needed for secondary indexes) CS 2 Notes 4 1 CS 2 Notes 4 14 Terms Index sequential file Search key ( primary key) Primary index (on Sequencing field) Secondary index Dense index (all Search Key values in) Sparse index Multi-level index Next: Duplicate keys Deletion/Insertion Secondary indexes CS 2 Notes 4 15 CS 2 Notes 4 16 Duplicate keys Duplicate keys Dense index, one way to implement? CS 2 Notes 4 17 CS 2 Notes 4 18

Duplicate keys Duplicate keys Dense index, better way? Sparse index, one way? CS 2 Notes 4 19 CS 2 Notes 4 Duplicate keys Sparse index, one way? careful if looking for or! Duplicate keys Sparse index, another way? place first new key from block CS 2 Notes 4 21 CS 2 Notes 4 22 Duplicate keys Sparse index, another way? place first new key from block should this be? CS 2 Notes 4 2 Summary Duplicate values, primary index Index may point to first instance of each value only File Index a a a CS 2 Notes 4 24 b 4

Deletion from sparse index Deletion from sparse index delete record 1 1 1 1 1 1 CS 2 Notes 4 CS 2 Notes 4 26 Deletion from sparse index delete record Deletion from sparse index delete record 1 1 1 1 1 1 CS 2 Notes 4 27 CS 2 Notes 4 28 Deletion from sparse index delete record Deletion from sparse index delete records & 1 1 1 1 1 1 CS 2 Notes 4 29 CS 2 Notes 4 5

Deletion from sparse index delete records & Deletion from sparse index delete records & 1 1 1 1 1 1 CS 2 Notes 4 1 CS 2 Notes 4 2 Deletion from dense index Deletion from dense index delete record CS 2 Notes 4 CS 2 Notes 4 4 Deletion from dense index delete record Deletion from dense index delete record CS 2 Notes 4 5 CS 2 Notes 4 6 6

Insertion, sparse index case Insertion, sparse index case insert record 4 CS 2 Notes 4 7 CS 2 Notes 4 8 Insertion, sparse index case insert record 4 Insertion, sparse index case insert record 15 4 our lucky day! we have free space where we need it! CS 2 Notes 4 9 CS 2 Notes 4 Insertion, sparse index case insert record 15 Insertion, sparse index case insert record 15 15 Illustrated: Immediate reorganization Variation: insert new block (chained file) update index 15 CS 2 Notes 4 41 CS 2 Notes 4 42 7

Insertion, sparse index case insert record Insertion, sparse index case insert record overflow blocks (reorganize later...) CS 2 Notes 4 4 CS 2 Notes 4 44 Insertion, dense index case Similar Often more expensive... Secondary indexes 0 Sequence field CS 2 Notes 4 CS 2 Notes 4 46 Secondary indexes Sparse index Sequence field Secondary indexes Sparse index Sequence field 0 0...... 0 does not make sense! 0 CS 2 Notes 4 47 CS 2 Notes 4 48 8

Secondary indexes Dense index Sequence field Secondary indexes Dense index Sequence field 0... 0 CS 2 Notes 4 49 CS 2 Notes 4 Secondary indexes Dense index Sequence field With secondary indexes:... sparse high level... 0 Lowest level is dense Other levels are sparse Also: Pointers are record pointers (not block pointers; not computed) CS 2 Notes 4 51 CS 2 Notes 4 52 Duplicate values & secondary indexes Duplicate values & secondary indexes one option...... CS 2 Notes 4 5 CS 2 Notes 4 54 9

Duplicate values & secondary indexes one option... Problem: excess overhead! disk space search time... Duplicate values & secondary indexes another option... CS 2 Notes 4 55 CS 2 Notes 4 56 Duplicate values & secondary indexes another option... Problem: variable size records in index! Duplicate values & secondary indexes... Another idea (suggested in class): Chain records with same key? CS 2 Notes 4 57 CS 2 Notes 4 58 Duplicate values & secondary indexes Duplicate values & secondary indexes... Another idea (suggested in class): Chain records with same key? Problems: Need to add fields to records Need to follow chain to know records... buckets CS 2 Notes 4 59 CS 2 Notes 4

Why bucket idea is useful Query: Get employees in (Toy Dept) ^ (2nd floor) Indexes Name: primary Dept: secondary Floor: secondary Records EMP (name,dept,floor,...) Dept. index EMP Floor index Toy 2nd CS 2 Notes 4 61 CS 2 Notes 4 62 Query: Get employees in (Toy Dept) ^ (2nd floor) Dept. index EMP Floor index This idea used in text information retrieval Documents Toy Intersect toy bucket and 2nd Floor bucket to get set of matching EMP s 2nd...the cat is fat......was raining cats and dogs......fido the dog... CS 2 Notes 4 6 CS 2 Notes 4 64 This idea used in text information retrieval cat dog Documents...the cat is fat......was raining cats and dogs... IR QUERIES Find articles with cat and dog Find articles with cat or dog Find articles with cat and not dog Inverted lists...fido the dog... CS 2 Notes 4 65 CS 2 Notes 4 66 11

IR QUERIES Find articles with cat and dog Find articles with cat or dog Find articles with cat and not dog Find articles with cat in title Find articles with cat and dog within 5 words Common technique: more info in inverted list cat Title 5 Author Abstract 57 dog Title 0 Title 12 d d 2 d 1 CS 2 Notes 4 67 CS 2 Notes 4 68 Posting: an entry in inverted list. Represents occurrence of term in article Size of a list: 1 Rare words or (in postings) miss-spellings 6 Common words Size of a posting: -15 bits (compressed) IR DISCUSSION Stop words Truncation Thesaurus Full text vs. Abstracts Vector model CS 2 Notes 4 69 CS 2 Notes 4 Vector space model w1 w2 w w4 w5 w6 w7 DOC = <1 0 0 1 1 0 0 > Vector space model w1 w2 w w4 w5 w6 w7 DOC = <1 0 0 1 1 0 0 > Query= <0 0 1 1 0 0 0 > Query= <0 0 1 1 0 0 0 > PRODUCT = 1 +. = score CS 2 Notes 4 71 CS 2 Notes 4 72 12

Tricks to weigh scores + normalize e.g.: Match on common word not as useful as match on rare words... How to process V.S. Queries? w1 w2 w w4 w5 w6 Q = < 0 0 0 1 1 0 > CS 2 Notes 4 7 CS 2 Notes 4 74 Try Stanford Libraries Try Google, Yahoo,... Summary so far Conventional index Basic Ideas: sparse, dense, multi-level Duplicate Keys Deletion/Insertion Secondary indexes Buckets of Postings List CS 2 Notes 4 75 CS 2 Notes 4 76 Conventional indexes Advantage: -Simple - Index is sequential file good for scans Disadvantage: - Inserts expensive, and/or - Lose sequentiality & balance Example continuous free space Index (sequential) CS 2 Notes 4 77 CS 2 Notes 4 78 1

Example continuous free space Index (sequential) 9 1 5 6 2 8 4 overflow area (not sequential) Outline: Conventional indexes B-Trees NEXT Hashing schemes CS 2 Notes 4 79 CS 2 Notes 4 NEXT: Another type of index Give up on sequentiality of index Try to get balance B+Tree Example n= Root 0 1 1 1 Note: This index is called B+ tree, but Gradiance homeworks just call it B-tree. 5 11 5 0 1 1 1 1 1 156 179 1 0 CS 2 Notes 4 81 CS 2 Notes 4 82 Sample non-leaf Sample leaf node: From non-leaf node 57 81 95 57 81 95 to next leaf in sequence to keys to keys to keys to keys < 57 57 k<81 81 k<95 95 To record with key 57 To record with key 81 To record with key 85 CS 2 Notes 4 8 CS 2 Notes 4 84 14

In textbook s notation n= Leaf: 5 5 Size of nodes: n+1 pointers (fixed) n keys Non-leaf: CS 2 Notes 4 85 CS 2 Notes 4 86 Don t want nodes to be too empty Use at least n= Full node min. node Non-leaf: (n+1)/2 pointers Non-leaf 1 1 1 Leaf: (n+1)/2 pointers to data Leaf 5 11 5 counts even if null CS 2 Notes 4 87 CS 2 Notes 4 88 B+tree rules tree of order n () Number of pointers/keys for B+tree (1) All leaves at same lowest level (balanced tree) (2) Pointers in leaves point to records except for sequence pointer Max Max Min ptrs keys ptrs data Min keys Non-leaf (non-root) n+1 n (n+1)/2 (n+1)/2-1 Leaf (non-root) n+1 n (n+1)/2 (n+1)/2 Root n+1 n 1 1 CS 2 Notes 4 89 CS 2 Notes 4 15

Insert into B+tree (a) Insert key = 2 n= (a) simple case space available in leaf (b) leaf overflow (c) non-leaf overflow (d) new root 5 11 1 0 CS 2 Notes 4 91 CS 2 Notes 4 92 (a) Insert key = 2 n= (a) Insert key = 7 n= 5 11 1 2 5 11 1 0 0 CS 2 Notes 4 9 CS 2 Notes 4 94 (a) Insert key = 7 n= (a) Insert key = 7 n= 5 7 5 11 1 5 7 5 11 1 7 0 0 CS 2 Notes 4 95 CS 2 Notes 4 96 16

(c) Insert key = 1 n= (c) Insert key = 1 n= 1 156 179 1 0 1 156 179 1 179 1 0 1 1 1 1 1 1 0 0 CS 2 Notes 4 97 CS 2 Notes 4 98 (c) Insert key = 1 n= (c) Insert key = 1 n= 1 156 179 1 179 1 0 1 156 179 1 179 1 0 1 1 1 1 1 1 1 1 0 0 1 CS 2 Notes 4 99 CS 2 Notes 4 0 (d) New root, insert n= (d) New root, insert n= 1 2 12 2 1 2 12 2 CS 2 Notes 4 1 CS 2 Notes 4 2 17

(d) New root, insert n= (d) New root, insert n= new root 1 2 12 2 1 2 12 2 CS 2 Notes 4 CS 2 Notes 4 4 Deletion from B+tree (b) Coalesce with sibling Delete n=4 (a) Simple case - no example (b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf 0 CS 2 Notes 4 5 CS 2 Notes 4 6 (b) Coalesce with sibling Delete n=4 (c) Redistribute keys Delete n=4 5 0 0 CS 2 Notes 4 7 CS 2 Notes 4 8 18

(c) Redistribute keys Delete n=4 (d) Non-leaf coalesce Delete 7 n=4 1 14 22 26 7 5 5 0 5 CS 2 Notes 4 9 CS 2 Notes 4 1 (d) Non-leaf coalesce Delete 7 n=4 (d) Non-leaf coalesce Delete 7 n=4 1 14 22 26 7 1 14 22 26 7 CS 2 Notes 4 111 CS 2 Notes 4 112 (d) Non-leaf coalesce Delete 7 n=4 B+tree deletions in practice new root 1 14 22 26 7 Often, coalescing is not implemented Too hard and not worth it! CS 2 Notes 4 11 CS 2 Notes 4 114 19

Comparison: B-trees vs. static indexed sequential file Ref #1: Held & Stonebraker B-Trees Re-examined CACM, Feb. 1978 Ref # 1 claims: - Concurrency control harder in B-Trees - B-tree consumes more space For their comparison: block = 512 bytes key = pointer = 4 bytes 4 data records per block CS 2 Notes 4 115 CS 2 Notes 4 116 Example: 1 block static index 127 keys k1 k2 k (127+1)4 = 512 Bytes -> pointers in index implicit! up to 127 blocks CS 2 Notes 4 117 k1 k2 k 1 data block Example: 1 block B-tree 6 keys k1 k2... k6 - next 6x(4+4)+8 = 512 Bytes -> pointers needed in B-tree up to 6 blocks because index is blocks not contiguous CS 2 Notes 4 118 k1 k2 k 1 data block Size comparison Ref. #1 Static Index B-tree # data # data blocks height blocks height 2 -> 127 2 2 -> 6 2 128 -> 16,129 64 -> 968 16,1 -> 2,048,8 4 969 -> 2,047 4 2,048 -> 15,752,961 5 Ref. #1 analysis claims For an 8,000 block file, after 2,000 inserts after 16,000 lookups Static index saves enough accesses to allow for reorganization CS 2 Notes 4 119 CS 2 Notes 4 1

Ref. #1 analysis claims For an 8,000 block file, after 2,000 inserts after 16,000 lookups Static index saves enough accesses to allow for reorganization Ref #2: M. Stonebraker, Retrospection on a database system, TODS, June 19 Ref. #2 conclusion B-trees better!! Ref. #1 conclusion Static index better!! CS 2 Notes 4 121 CS 2 Notes 4 122 Ref. #2 conclusion B-trees better!! Ref. #2 conclusion B-trees better!! DBA does not know when to reorganize DBA does not know how full to load pages of new index Buffering B-tree: has fixed buffer requirements Static index: must read several overflow blocks to be efficient (large & variable size buffers needed for this) CS 2 Notes 4 12 CS 2 Notes 4 124 Speaking of buffering Is LRU a good policy for B+tree buffers? Speaking of buffering Is LRU a good policy for B+tree buffers? Of course not! Should try to keep root in memory at all times (and perhaps some nodes from second level) CS 2 Notes 4 1 CS 2 Notes 4 126 21

Interesting problem: For B+tree, how large should n be? Sample assumptions: (1) Time to read node from disk is (S+Tn) msec. n is number of keys / node CS 2 Notes 4 127 CS 2 Notes 4 128 Sample assumptions: (1) Time to read node from disk is (S+Tn) msec. (2) Once block in memory, use binary search to locate key: (a + b LOG 2 n) msec. For some constants a,b; Assume a << S CS 2 Notes 4 129 Sample assumptions: (1) Time to read node from disk is (S+Tn) msec. (2) Once block in memory, use binary search to locate key: (a + b LOG 2 n) msec. For some constants a,b; Assume a << S () Assume B+tree is full, i.e., # nodes to examine is LOG n N where N = # records CS 2 Notes 4 1 Can get: f(n) = time to find a record f(n) FIND n opt by f (n) = 0 Answer is n opt = few hundred (see homework for details) n opt n CS 2 Notes 4 11 CS 2 Notes 4 12 22

FIND n opt by f (n) = 0 Exercise Answer is n opt = few hundred (see homework for details) S= T= 0.2 b= 0.002 a= 0 What happens to n opt Disk gets faster? CPU get faster? as N= 000000 f(n)= log n N*[S+T*n+a+b*log 2 (n)] CS 2 Notes 4 1 CS 2 Notes 4 14 N= million records S= T= 0.2 b= 0.002 a= 0 N= 000000 N=0 million records S= T= 0.2 b= 0.002 a= 0 N= 000000 times in microseconds CS 2 Notes 4 15 n times in microseconds CS 2 Notes 4 16 n N=0 million records S= varies T= 0.2 b= 0.002 a= 0 N= 000000 S=000 S= Variation on B+tree: B-tree (no +) Idea: Avoid duplicate keys Have record pointers in non-leaf nodes S=000 n times in microseconds CS 2 Notes 4 17 CS 2 Notes 4 18 2

B-tree example n=2 K1 P1 K2 P2 K P to record to record to record with K1 with K2 with K to keys to keys to keys to keys < K1 K1<x<K2 K2<x<k >k 0 1 1 1 1 85 5 1 165 1 1 1 1 65 1 CS 2 Notes 4 19 CS 2 Notes 4 1 B-tree example n=2 sequence pointers not useful now! (but keep space for simplicity) 65 1 Note on inserts Say we insert record with key = leaf n= 0 1 1 1 1 1 1 1 1 85 5 1 165 CS 2 Notes 4 141 CS 2 Notes 4 142 Note on inserts Say we insert record with key = Afterwards: leaf n= So, for B-trees: MAX MIN Tree Rec Keys Tree Rec Keys Ptrs Ptrs Ptrs Ptrs Non-leaf non-root n+1 n n (n+1)/2 (n+1)/2-1 (n+1)/2-1 Leaf non-root 1 n n 1 n/2 n/2 Root non-leaf n+1 n n 2 1 1 Root Leaf 1 n n 1 1 1 CS 2 Notes 4 14 CS 2 Notes 4 144 24

Tradeoffs: B-trees have faster lookup than B+trees Tradeoffs: B-trees have faster lookup than B+trees in B-tree, non-leaf & leaf different sizes in B-tree, deletion more complicated in B-tree, non-leaf & leaf different sizes in B-tree, deletion more complicated B+trees preferred! CS 2 Notes 4 1 CS 2 Notes 4 146 But note: If blocks are fixed size (due to disk and buffering restrictions) Then lookup for B+tree is actually better!! Example: -Pointers 4 bytes -Keys 4 bytes - Blocks 0 bytes (just example) - Look at full 2 level tree CS 2 Notes 4 147 CS 2 Notes 4 148 B-tree: Root has 8 keys + 8 record pointers + 9 son pointers = 8x4 + 8x4 + 9x4 = 0 bytes B-tree: Root has 8 keys + 8 record pointers + 9 son pointers = 8x4 + 8x4 + 9x4 = 0 bytes Each of 9 sons: 12 rec. pointers (+12 keys) = 12x(4+4) + 4 = 0 bytes CS 2 Notes 4 149 CS 2 Notes 4 1

B-tree: Root has 8 keys + 8 record pointers + 9 son pointers = 8x4 + 8x4 + 9x4 = 0 bytes B+tree: Root has 12 keys + 1 son pointers = 12x4 + 1x4 = 0 bytes Each of 9 sons: 12 rec. pointers (+12 keys) = 12x(4+4) + 4 = 0 bytes 2-level B-tree, Max # records = 12x9 + 8 = 116 CS 2 Notes 4 151 CS 2 Notes 4 152 B+tree: Root has 12 keys + 1 son pointers = 12x4 + 1x4 = 0 bytes Each of 1 sons: 12 rec. ptrs (+12 keys) = 12x(4 +4) + 4 = 0 bytes B+tree: Root has 12 keys + 1 son pointers = 12x4 + 1x4 = 0 bytes Each of 1 sons: 12 rec. ptrs (+12 keys) = 12x(4 +4) + 4 = 0 bytes 2-level B+tree, Max # records = 1x12 = 156 CS 2 Notes 4 15 CS 2 Notes 4 154 So... 8 records So... 8 records B+ B B+ B ooooooooooooo ooooooooo 156 records 8 records Total = 116 ooooooooooooo ooooooooo 156 records 8 records Total = 116 Conclusion: For fixed block size, B+ tree is better because it is bushier CS 2 Notes 4 155 CS 2 Notes 4 156 26

An Interesting Problem... What is a good index structure when: records tend to be inserted with keys that are larger than existing values? (e.g., banking records with growing data/time) we want to remove older data One Solution: Multiple Indexes Example: I1, I2 day days indexed days indexed I1 I2 1,2,,4,5 6,7,8,9, 11 11,2,,4,5 6,7,8,9, 12 11,12,,4,5 6,7,8,9, 1 11,12,1,4,5 6,7,8,9, advantage: deletions/insertions from smaller index disadvantage: query multiple indexes CS 2 Notes 4 157 CS 2 Notes 4 158 Another Solution (Wave Indexes) day I1 I2 I I4 1,2, 4,5,6 7,8,9 11 1,2, 4,5,6 7,8,9,11 12 1,2, 4,5,6 7,8,9,11, 12 1 1 4,5,6 7,8,9,11, 12 14 1,14 4,5,6 7,8,9,11, 12 15 1,14,15 4,5,6 7,8,9,11, 12 16 1,14,15 16 7,8,9,11, 12 advantage: no deletions disadvantage: approximate windows Outline/summary Conventional Indexes Sparse vs. dense Primary vs. secondary B trees B+trees vs. B-trees B+trees vs. indexed sequential Hashing schemes --> Next CS 2 Notes 4 159 CS 2 Notes 4 1 27