Problem. Indexing with B-trees. Indexing. Primary Key Indexing. B-trees: Example. B-trees. primary key indexing

Similar documents
Data Organization B trees

Material You Need to Know

CARNEGIE MELLON UNIVERSITY DEPT. OF COMPUTER SCIENCE DATABASE APPLICATIONS

B-Tree. CS127 TAs. ** the best data structure ever

Multi-way Search Trees. (Multi-way Search Trees) Data Structures and Programming Spring / 25

B-Trees. Based on materials by D. Frey and T. Anastasio

Tree-Structured Indexes. Chapter 10

Data Structures and Algorithms

Tree-Structured Indexes

Spring 2017 B-TREES (LOOSELY BASED ON THE COW BOOK: CH. 10) 1/29/17 CS 564: Database Management Systems, Jignesh M. Patel 1

Lecture 4. ISAM and B + -trees. Database Systems. Tree-Structured Indexing. Binary Search ISAM. B + -trees

M-ary Search Tree. B-Trees. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. Maximum branching factor of M Complete tree has height =

Physical Level of Databases: B+-Trees

Chapter 4. ISAM and B + -trees. Architecture and Implementation of Database Systems Summer 2016

Chapter 12: Indexing and Hashing. Basic Concepts

Introduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana

M-ary Search Tree. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. B-Trees (4.7 in Weiss)

Chapter 12: Indexing and Hashing

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

CSIT5300: Advanced Database Systems

Chapter 11: Indexing and Hashing

Tree-Structured Indexes

CSE 530A. B+ Trees. Washington University Fall 2013

Tree-Structured Indexes

Extra: B+ Trees. Motivations. Differences between BST and B+ 10/27/2017. CS1: Java Programming Colorado State University

CS 525: Advanced Database Organization 04: Indexing

Principles of Data Management. Lecture #5 (Tree-Based Index Structures)

THE B+ TREE INDEX. CS 564- Spring ACKs: Jignesh Patel, AnHai Doan

Chapter 4. ISAM and B + -trees. Architecture and Implementation of Database Systems Winter 2010/11

Chapter 11: Indexing and Hashing

Chapter 4. ISAM and B + -trees. Architecture and Implementation of Database Systems Summer 2013

Tree-Structured Indexes

Tree-Structured Indexes

(2,4) Trees. 2/22/2006 (2,4) Trees 1

Tree-Structured Indexes ISAM. Range Searches. Comments on ISAM. Example ISAM Tree. Introduction. As for any index, 3 alternatives for data entries k*:

Introduction to Indexing R-trees. Hong Kong University of Science and Technology

Chapter 12: Indexing and Hashing

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Chapter 12: Indexing and Hashing (Cnt(

CS 310 B-trees, Page 1. Motives. Large-scale databases are stored in disks/hard drives.

Intro to DB CHAPTER 12 INDEXING & HASHING

Introduction. for large input, even access time may be prohibitive we need data structures that exhibit times closer to O(log N) binary search tree

B-Trees. Disk Storage. What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree. Deletion in a B-tree

(2,4) Trees Goodrich, Tamassia. (2,4) Trees 1

Topics to Learn. Important concepts. Tree-based index. Hash-based index

Introduction. Choice orthogonal to indexing technique used to locate entries K.

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology

Lecture 13. Lecture 13: B+ Tree

CS F-11 B-Trees 1

Remember. 376a. Database Design. Also. B + tree reminders. Algorithms for B + trees. Remember

Introduction to Data Management. Lecture 15 (More About Indexing)

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 6 - Storage and Indexing

CS143: Index. Book Chapters: (4 th ) , (5 th ) , , 12.10

Announcements. Reading Material. Recap. Today 9/17/17. Storage (contd. from Lecture 6)

Chapter 11: Indexing and Hashing

Find the block in which the tuple should be! If there is free space, insert it! Otherwise, must create overflow pages!

Multiway searching. In the worst case of searching a complete binary search tree, we can make log(n) page faults Everyone knows what a page fault is?

Tree-Structured Indexes

Database Applications (15-415)

CSC 261/461 Database Systems Lecture 17. Fall 2017

Balanced Search Trees

Trees. (Trees) Data Structures and Programming Spring / 28

Information Systems (Informationssysteme)

Indexing. Chapter 8, 10, 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

(2,4) Trees Goodrich, Tamassia (2,4) Trees 1

Chapter 11: Indexing and Hashing" Chapter 11: Indexing and Hashing"

Tree-Structured Indexes (Brass Tacks)

I think that I shall never see A billboard lovely as a tree. Perhaps unless the billboards fall I ll never see a tree at all.

Algorithms. Deleting from Red-Black Trees B-Trees

Tree-Structured Indexes. A Note of Caution. Range Searches ISAM. Example ISAM Tree. Introduction

Tree-Structured Indexes

2-3 Tree. Outline B-TREE. catch(...){ printf( "Assignment::SolveProblem() AAAA!"); } ADD SLIDES ON DISJOINT SETS

Module 4: Tree-Structured Indexing

Background: disk access vs. main memory access (1/2)

CS127: B-Trees. B-Trees

Database System Concepts, 5th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Introduction to Data Management. Lecture 21 (Indexing, cont.)

Trees can be used to store entire records from a database, serving as an in-memory representation of the collection of records in a file.

Multiway Search Trees

Multi-way Search Trees! M-Way Search! M-Way Search Trees Representation!

An AVL tree with N nodes is an excellent data. The Big-Oh analysis shows that most operations finish within O(log N) time

B-Trees. Version of October 2, B-Trees Version of October 2, / 22

Administrivia. Tree-Structured Indexes. Review. Today: B-Tree Indexes. A Note of Caution. Introduction

Multidimensional Indexes [14]

Indexing. Announcements. Basics. CPS 116 Introduction to Database Systems

Goals for Today. CS 133: Databases. Example: Indexes. I/O Operation Cost. Reason about tradeoffs between clustered vs. unclustered tree indexes

Lecture 8 Index (B+-Tree and Hash)

Physical Disk Structure. Physical Data Organization and Indexing. Pages and Blocks. Access Path. I/O Time to Access a Page. Disks.

Chapter 17 Indexing Structures for Files and Physical Database Design

Kathleen Durant PhD Northeastern University CS Indexes

CSE 326: Data Structures B-Trees and B+ Trees

The B-Tree. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland

Chapter 11: Indexing and Hashing

Multiway Search Trees. Multiway-Search Trees (cont d)

Advances in Data Management Principles of Database Systems - 2 A.Poulovassilis

Indexes. File Organizations and Indexing. First Question to Ask About Indexes. Index Breakdown. Alternatives for Data Entries (Contd.

Design and Analysis of Algorithms Lecture- 9: B- Trees

Indexing: B + -Tree. CS 377: Database Systems

Binary Trees, Binary Search Trees

CSCI Trees. Mark Redekopp David Kempe

Transcription:

15-82 Advanced Topics in Database Systems Performance Problem Given a large collection of records, Indexing with B-trees find similar/interesting things, i.e., allow fast, approximate queries 2 Indexing Primary Key Indexing primary key indexing B-trees and variants (static) hashing extendible hashing secondary key indexing spatial access methods text... find employee with ssn=12 attributes in a record R n SSN a2 a a4 a5 table R R1 R2 R R4 R5 R sequential scan R1 R2 B-tree R on SSN R4 R5 R usinganindex SSN=12 4 B-trees B-trees: Example Most successful family of index schemes B-trees B +- trees B * -trees Canbeusedfor primary/secondary, or clustering/non-clustering index. Balanced n-way search trees Here is a B-tree of order : < > < > 5 1

B-tree Properties In a B-tree of order n: key order preserved at most n pointers at least n/2 pointers (except root) all leaves at the same level B-tree Properties (cont.) block aware nodes: each node -> disk page O(log (N)) for everything! (ins/del/search) typically, if m = 50-100, then 2 - levels utilization >= 50%, guaranteed; on average % if number of pointers is k, node has exactly k-1 keys (leaves are empty) p1 v1 v2 v n-1 pn 8 Exact-Match Queries Range Queries E.g., ssn=8 E.g., 5<salary<8 < > < > H steps (= disk accesses) < > < > 10 Proximity Queries B-trees: Insertion E.g., nearest neighbor searches: salary ~ 8 Insert in leaf; on overflow, push middle up (recursively) split: preserves B - tree properties < > < > 11 12 2

B-trees: Insertion (cont.) Easy case: Tree T0; insert 8 B-trees: Insertion (cont.) Tree T0; insert 8 < > < > < > < > 8 14 B-trees: Insertion (cont.) B-trees: Insertion (cont.) Hardest case: Tree T0; insert 2 Hardest case: Tree T0; insert 2 < > < > 1 2 2 push middle up 15 1 B-trees: Insertion (cont.) B-trees: Insertion (cont.) Hardest case: Tree T0; insert 2 Hardest case: Tree T0; insert 2 Overflow; push middle 22 Final state 2 1 18

B-trees - insertion B-trees: Insertion Sketch Q: What if there are two middles? (eg, order 4) A: either one is fine Insertinleaf;onoverflow,pushmiddleup (recursively propagate split ) split: preserves all B - tree properties (!!) notice how it grows: height increases when root overflows & splits Automatic, incremental re-organization 1 20 Algorithm: Insertion of Key K B-trees: Deletion find the correct leaf node L ; if ( L overflows ) { split L by pushing middle key up to parent P ; Rough outline of algorithm: Delete key; on underflow, may need to merge if ( P overflows) { In practice, some implementors just allow underflows to happen repeat the split recursively; }else{ add key K in node L ; // maintain key order in L } 21 22 B-trees: Deletion Cases B-trees: Deletion Case 1 Case 1 delete a key at a leaf no underflow Case 2 delete non-leaf key no underflow Case delete leaf-key; underflow, and rich sibling Case 4 delete leaf-key; underflow, and poor sibling Case 1: delete a key at a leaf Easiest case: no underflow (delete from T0) < > < > 2 24 4

B-trees: Deletion Case 1 (cont.) Easiest case: Tree T0; delete B-trees: Deletion Case 2 Case2: delete a key at a non-leaf no underflow (eg., delete from T0) 1 < > < > < > < > Delete & promote, ie: 25 2 B-trees: Deletion Case 2 (cont.) B-trees: Deletion Case 2 (cont.) Case2: delete a key at a non-leaf no underflow (eg., delete from T0) Case2: delete a key at a non-leaf no underflow (eg., delete from T0) < > < > Delete & promote, ie: < > < > Delete & promote, ie: 1 2 28 B-trees: Deletion Case 2 (cont.) B-trees: Deletion Case 2 (cont.) Case2: delete a key at a non-leaf no underflow (eg., delete from T0) FINAL TREE < > < 1 > Case2: delete a key at a non-leaf no underflow (eg., delete from T0) Q: How to promote? A: pick the largest key from the left sub-tree (or the smallest from the right sub-tree) Observation: every deletion eventually becomes a deletion of a leaf key 2 0 5

B-trees: Deletion Case 2 (cont.) B-trees: Deletion Cases (cont.) Case2: delete a key at a non-leaf no underflow (eg., delete from T0) < > < 1 > Delete & promote, ie: Case1: delete a key at a leaf no underflow Case2: delete non-leaf key no underflow Case: delete leaf-key; underflow, and rich sibling Case4: delete leaf-key; underflow, and poor sibling 1 2 B-trees: Deletion Case B-trees: Deletion Case (cont.) Case: underflow & rich sibling (eg., delete from T0) Case: underflow & rich sibling e.g., delete from T0 Delete & borrow < > < > Rich sibling < > < > 4 B-trees: Deletion Case (cont.) B-trees: Deletion Case (cont.) Case: underflow & rich sibling Case: underflow & rich sibling e.g., delete from T0 rich = can give a key, without underflowing borrowing a key: THROUGH the PARENT! Delete & borrow < Rich sibling > < > NO!! 5

B-trees: Deletion Case (cont.) B-trees: Deletion Case (cont.) Case: underflow & rich sibling e.g., delete from T0 Delete & borrow Case: underflow & rich sibling e.g., delete from T0 Delete & borrow < > < > < > < > 8 B-trees: Deletion Case (cont.) B-trees: Deletion Cases (cont.) Case: underflow & rich sibling e.g., delete from T0 Delete & borrow THROUGH the parent < > < > Case1: delete a key at a leaf no underflow Case2: delete non-leaf key no underflow Case: delete leaf-key; underflow, and rich sibling Case4: delete leaf-key; underflow, and poor sibling 40 B-trees: Deletion Case 4 B-trees: Deletion Case 4 (cont.) Case4: underflow & poor sibling eg., delete from T0) Case4: underflow & poor sibling eg., delete from T0) < > < > < > < > 41 42

B-trees: Deletion Case 4 (cont.) Case4: underflow & poor sibling eg., delete from T0) B-trees: Deletion Case 4 (cont.) Case4: underflow & poor sibling eg., delete from T0) < > < > A: merge w/ poor sibling Merge, by pulling a key from the parent exact reversal from insertion: split and push up, vs. merge and pull down Ie.: 4 44 B-trees: Deletion Case 4 (cont.) B-trees: Deletion Case 4 (cont.) Case4: underflow & poor sibling eg., delete from T0) Case4: underflow & poor sibling eg., delete from T0) < A: merge w/ poor sibling FINAL TREE < > > 45 4 B-trees: Deletion Case 4 (cont.) Algorithm: Deletion of Key K Case 4: underflow & poor sibling -> pull key from parent, and merge Q: What if the parent underflows? A: repeat recursively locate key K, in node N if( N is a non-leaf node) { delete K from N ; find the immediately largest key K1 ; /* which is guaranteed to be on a leaf node L */ copy K1 in the old position of K ; invoke DELETION on K1 from the leaf node L ; else { /* N is a leaf node */ next slide 4 48 8

Deletion of Key K (cont.) B-trees in Practice if( N underflows ){ let N1 be the sibling of N ; if( N1 is "rich"){ /* ie., N1 can lend us a key */ borrow a key from N1 THROUGH parent node; } else { /* N1 is 1 key away from underflowing */ MERGE: pull key from parent P, merge it with keys of N and N1 into new node; if( P underflows) { repeat recursively } } } In practice: no empty leaves; pointers to records theory < > < > 4 50 B-trees in Practice (cont.) B-trees in Practice (cont.) In practice: no empty leaves; pointers to records practice < > < > In practice: < > < > SSN 1 51 52 B-trees in Practice (cont.) Overview In practice, the formats are: B trees - leaf nodes: (v1, rp1, v2, rp2, vn, rpn) - Non-leaf nodes: (p1, v1, rp1, p2, v2, rp2, ) B+ - trees, B*-trees < > < > 5 54

B+ trees: Motivation B-tree print keys in sorted order: B+ trees: Motivation (cont.) B-tree needs back-tracking how to avoid it? < > < > < > < > 55 5 Solution: B+ - trees B+ trees Facilitate sequential ops They string all leaf nodes together AND replicate keys from non-leaf nodes, to make sure every key appears at the leaf level < >= < >= 5 58 B+ trees: Insertion B*-trees: Motivation E.g., insert 2 < >= < >= Splits drop utilization to 50% May increase height How to avoid them? 5 0 10

B*-trees: Deferred Split! B*-trees: deferred split! Instead of splitting, LEND keys to sibling! (through PARENT, of course!) < > < > Instead of splitting, LEND keys to sibling! (through PARENT, of course!) FINAL TREE < > < > 1 2 2 2 1 2 B*-trees: Advantages B*-trees: deferred split! Tree becomes Shorter, BUT: What if sibling has no room for lending? More packed, Faster Rare case: improve together space utilization speed BUT: What if sibling has no room for lending? 2-to- split 1. get the keys from the sibling 2. pool them with ours (and a key from the parent). split in Details: too messy (and even worse for deletion) 4 Conclusions Performance Aspects of B-trees Two parameters matter: Main ideas: recursive; block-aware; on overflow -> split; defer splits All B-tree variants have excellent, O(logN) worst-case performance for ins/del/search Height H (maximum search path) H =1+ log F* ( N/C* ) N is the number of tuples C* is the average number of entries in a leaf node, and F* is the average number of entries in an index node. It s the prevailing indexing method Size S (number of pages tree occupies) S = i (F*) i-1,1 i < H 5 11

Reducing the Number of Leafs Increasing the Fanout Increase page size (hard) Shorten data length (values, tuples, pointers) Is it worthwhile to change the tuples to TIDs? Compression Prefix store differences (suffixes) Suffix store prefixes No extra page accesses! From Gray&Reuter: 1.1 log F* x must hold i.e., average fan-out really small or tuples > 1K Prefix compression: sequential scan anchor keys 8 Lehman and Yao CC on B-trees Example safe node: node with <2k entries unsafe node: node with =2k entries Simple CC won t do.why? Transaction 1: read x; look for 15; get ptr to y; x y 15 8 10 12 15 x 10 15 y 8 10 y 12 15 Transaction 2: read x; read y; insert split y into y+y 0 Previous B-tree CC algorithms B link -tree Samadi 1 lock the whole subtree of affected node Bayer & Schkolnick 1 parameters on degree/type of consistency required writer-exclusion locks (readers may proceed) upper exclusive locks on modified nodes Miller & Snyder 18 pioneer and follower locks locked region moves up the tree no modifications Node + P2k+1 pointertonextnodeatthe same level of tree Rightmost node s B-link is NULL IDEA: Splitting is implemented as legal to have left twin and no parent 1 2 12

Advantages Algorithms Allows for temporary fix until all pointers are added correctly Link pointers should be used infrequently because splitting a node is a special case Search No locks needed for reads Just move right as well as down Insertions Level traversal comes for free as a side effect Well-ordered locks Use stack to remember ancestors Split while preserving links Deletions No underflows, no merging 4