CSE 530A. B+ Trees. Washington University Fall 2013

Similar documents
Multi-way Search Trees. (Multi-way Search Trees) Data Structures and Programming Spring / 25

THE B+ TREE INDEX. CS 564- Spring ACKs: Jignesh Patel, AnHai Doan

Main Memory and the CPU Cache

Balanced Search Trees

Intro to DB CHAPTER 12 INDEXING & HASHING

CSIT5300: Advanced Database Systems

Physical Level of Databases: B+-Trees

Module 4: Index Structures Lecture 13: Index structure. The Lecture Contains: Index structure. Binary search tree (BST) B-tree. B+-tree.

CS350: Data Structures B-Trees

Laboratory Module X B TREES

Design and Analysis of Algorithms Lecture- 9: B- Trees

CS 350 : Data Structures B-Trees

Extra: B+ Trees. Motivations. Differences between BST and B+ 10/27/2017. CS1: Java Programming Colorado State University

M-ary Search Tree. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. B-Trees (4.7 in Weiss)

An AVL tree with N nodes is an excellent data. The Big-Oh analysis shows that most operations finish within O(log N) time

Background: disk access vs. main memory access (1/2)

The B-Tree. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland

Trees. Reading: Weiss, Chapter 4. Cpt S 223, Fall 2007 Copyright: Washington State University

(2,4) Trees Goodrich, Tamassia. (2,4) Trees 1

Tree-Structured Indexes

Chapter 11: Indexing and Hashing

M-ary Search Tree. B-Trees. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. Maximum branching factor of M Complete tree has height =

Motivation for B-Trees

Data Structures and Algorithms

CS127: B-Trees. B-Trees

amiri advanced databases '05

Multi-way Search Trees! M-Way Search! M-Way Search Trees Representation!

CSE 373 OCTOBER 25 TH B-TREES

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

B-Trees. Introduction. Definitions

Tree-Structured Indexes

(2,4) Trees. 2/22/2006 (2,4) Trees 1

Tree-Structured Indexes

Chapter 18 Indexing Structures for Files. Indexes as Access Paths

CSC 261/461 Database Systems Lecture 17. Fall 2017

Introduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana

B-Trees. Version of October 2, B-Trees Version of October 2, / 22

Spring 2017 B-TREES (LOOSELY BASED ON THE COW BOOK: CH. 10) 1/29/17 CS 564: Database Management Systems, Jignesh M. Patel 1

Tree-Structured Indexes. Chapter 10

CS Fall 2010 B-trees Carola Wenk

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing

Database Systems. File Organization-2. A.R. Hurson 323 CS Building

CMPS 2200 Fall 2017 B-trees Carola Wenk

B-Trees & its Variants

CS 310 B-trees, Page 1. Motives. Large-scale databases are stored in disks/hard drives.

Trees. Courtesy to Goodrich, Tamassia and Olga Veksler

Multi-way Search Trees

Chapter 11: Indexing and Hashing" Chapter 11: Indexing and Hashing"

Tree-Structured Indexes

Introduction. Choice orthogonal to indexing technique used to locate entries K.

Tree-Structured Indexes ISAM. Range Searches. Comments on ISAM. Example ISAM Tree. Introduction. As for any index, 3 alternatives for data entries k*:

Oregon State University Practice problems 2 Winster Figure 1: The original B+ tree. Figure 2: The B+ tree after the changes in (1).

Tree-Structured Indexes

Chapter 12: Indexing and Hashing (Cnt(

Lecture 13. Lecture 13: B+ Tree

Tree-Structured Indexes

Announcements. Reading Material. Recap. Today 9/17/17. Storage (contd. from Lecture 6)

CSE332: Data Abstractions Lecture 7: B Trees. James Fogarty Winter 2012

CSE 326: Data Structures B-Trees and B+ Trees

Material You Need to Know

8. Binary Search Tree

C SCI 335 Software Analysis & Design III Lecture Notes Prof. Stewart Weiss Chapter 4: B Trees

Trees can be used to store entire records from a database, serving as an in-memory representation of the collection of records in a file.

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology

(2,4) Trees Goodrich, Tamassia (2,4) Trees 1

2-3 Tree. Outline B-TREE. catch(...){ printf( "Assignment::SolveProblem() AAAA!"); } ADD SLIDES ON DISJOINT SETS

Binary Trees

I think that I shall never see A billboard lovely as a tree. Perhaps unless the billboards fall I ll never see a tree at all.

Lecture 8 Index (B+-Tree and Hash)

Trees can be used to store entire records from a database, serving as an in-memory representation of the collection of records in a file.

Chapter 12: Indexing and Hashing

Principles of Data Management. Lecture #5 (Tree-Based Index Structures)

Indexing: B + -Tree. CS 377: Database Systems

Tree-Structured Indexes

Trees. (Trees) Data Structures and Programming Spring / 28

Database System Concepts, 5th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Computational Optimization ISE 407. Lecture 16. Dr. Ted Ralphs

Chapter 17 Indexing Structures for Files and Physical Database Design

Algorithms in Systems Engineering ISE 172. Lecture 16. Dr. Ted Ralphs

CSCI Trees. Mark Redekopp David Kempe

EE 368. Weeks 5 (Notes)

Data Structures. Motivation

Tree-Structured Indexes. A Note of Caution. Range Searches ISAM. Example ISAM Tree. Introduction

Red-black trees (19.5), B-trees (19.8), trees

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 6 - Storage and Indexing

Balanced Trees Part One

UNIT III BALANCED SEARCH TREES AND INDEXING

B-Trees. Based on materials by D. Frey and T. Anastasio

Administrivia. Tree-Structured Indexes. Review. Today: B-Tree Indexes. A Note of Caution. Introduction

Indexing and Hashing

CS301 - Data Structures Glossary By

CSE 530A. Query Planning. Washington University Fall 2013

Database System Concepts

CS F-11 B-Trees 1

MySQL B+ tree. A typical B+tree. Why use B+tree?

Level-Balanced B-Trees

Introduction to Data Management. Lecture 15 (More About Indexing)

Transcription:

CSE 530A B+ Trees Washington University Fall 2013

B Trees A B tree is an ordered (non-binary) tree where the internal nodes can have a varying number of child nodes (within some range)

B Trees When a key is inserted into or removed from a node, the number of child nodes changes To maintain the proper range of number of keys, internal nodes are split or joined The lower and upper bounds on number of keys per node are usually fixed The upper bound is most often twice the lower bound At most half the space is wasted in the nodes Balance is kept by requiring that all leaf nodes have the same depth

B Trees B tree of order n (as defined by Knuth): 1. Every node has at most n children 2. Every node (except the root) has at least n/2 children 3. The root has at least two children (if it is not a leaf) 4. A non-leaf node with k children contains k 1 keys 5. All leaves are at the same level and have data

B Trees Each internal node's elements (a.k.a, the keys) divide up the subtrees E.g., if an internal node has 3 children then it must have 2 keys, k 1 and k 2, where k 1 < k 2 All elements in the left subtree must be < k 1 All elements in the middle subtree must be between k 1 and k 2 All elements in the right subtree must be > k 2

B Trees Internal nodes Non-leaf nodes and not the root node Each has between the maximum and minimum number of children If the minimum number of children is defined as half of the maximum number of children then Each internal node is at least half full Two half-full nodes can be joined to make a legal (full) node A full node can be split into two legal nodes, if there is room to push the splitting key up into the parent Root node Same upper limit but no lower limit Leaf nodes Same restriction (lower and upper) on number of elements, but no child nodes

B+ Trees PostgreSQL (and other databases) actually use B+ Trees B+ Trees are a variant of B Trees All of the data are stored at the leaves All of the keys appear at the leaves (in sorted order) Some keys are duplicated in internal nodes Leaf nodes are linked together (in order) to create a linked list

B+ Trees x < 3 3 <= x < 5 3 5 x >= 5 1 2 3 4 5 6 7

Searching To find key x in the tree Start at the root node Find the keys in the node where m <= x < n Follow pointer to child node Repeat until leaf is reached Find key x in leaf Note that search always goes all the way to the leaves

Inserting Find leaf where new key x belongs If leaf is not full then add x to leaf If leaf is full then split leaf and push splitting key up to parent If parent is full then split parent and push splitting key up If we need to add a key to the root and the root is full then split the root and create a new parent as root

Inserting 25 50 75 5 10 15 20 25 30 50 55 60 65 75 80 85 90 Insert 28 Leaf is not full so we can just add it

Inserting 25 50 75 5 10 15 20 25 28 30 50 55 60 65 75 80 85 90 Insert 28 Leaf is not full so we can just add it

Inserting 25 50 75 5 10 15 20 25 28 30 50 55 60 65 75 80 85 90 Insert 70 Should go in leaf with (50, 55, 60, 65), but leaf is full Need to split the leaf and then insert

Inserting 25 50 60 75 5 10 15 20 25 28 30 50 55 60 65 70 75 80 85 90 Insert 70 Should go in leaf with (50, 55, 60, 65), but leaf is full Need to split the leaf and then insert 60 is pushed up to the parent

Inserting 25 50 60 75 5 10 15 20 25 28 30 50 55 60 65 70 75 80 85 90 Insert 95 Should go in leaf with (75, 80, 85, 90), but leaf is full Need to split the leaf and then insert 85 is pushed up to the parent, but parent is full Need to split parent node then insert 85

Inserting 25 50 60 75 5 10 15 20 50 55 75 80 25 28 30 60 65 70 85 90 95 Insert 95 Should go in leaf with (75, 80, 85, 90), but leaf is full Need to split the leaf and then insert 85 is pushed up to the parent, but parent is full Need to split parent node then insert 85

Inserting 60 25 50 75 85 5 10 15 20 50 55 75 80 25 28 30 60 65 70 85 90 95 Insert 95 Should go in leaf with (75, 80, 85, 90), but leaf is full Need to split the leaf and then insert 85 is pushed up to the parent, but parent is full Need to split parent node then insert 85 60 is then pushed up to the new root

Inserting Alternative 25 50 60 75 5 10 15 20 25 28 30 50 55 60 65 70 75 80 85 90 If inserting into a full node with a non-full sibling, we could shift keys from the full node to the non-full sibling Requires modifying parent keys

Inserting 25 50 60 80 5 10 15 20 25 28 30 50 55 60 65 70 75 80 85 90 95 Insert 95 Should go in leaf with (75, 80, 85, 90), but leaf is full Shift 75 to left sibling Modify key in parent node Insert 95 in newly opened node

Deleting Find leaf containing key x Delete x from leaf If x was leftmost key then replace x in inner nodes with new leftmost key If leaf is not below lower limit If x was a key in the parent then fix parent Propagate change up to root if necessary If leaf is below lower limit then If sibling is above lower limit then shift key from sibling Adjust keys in ancestors If sibling is at lower limit then merge nodes This removes a key from parent Merge parent with sibling if below limit and propagate up Potentially will remove the current root

Deleting 25 50 60 80 5 10 15 20 25 28 30 50 55 60 65 70 75 80 85 90 95 Delete 70 70 is removed from leaf (60, 65, 60, 65) No other change needed

Deleting 25 50 60 80 5 10 15 20 25 28 30 50 55 60 65 75 80 85 90 95 Delete 70 70 is removed from leaf (60, 65, 60, 65) No other change needed

Deleting 25 50 60 80 5 10 15 20 25 28 30 50 55 60 65 75 80 85 90 95 Delete 25 25 is removed from leaf (25, 28, 30) 25 is used in inner nodes so the inner nodes must be fixed

Deleting 28 50 60 80 5 10 15 20 28 30 50 55 60 65 75 80 85 90 95 Delete 25 25 is removed from leaf (25, 28, 30) 25 is used in inner nodes so the inner nodes must be fixed

Deleting 60 25 50 75 85 5 10 15 20 50 55 75 80 25 28 30 60 65 85 90 95 Delete 60 Remove from leaf (60, 65) And replace 60 in inner nodes with new leftmost key

Deleting 65 25 50 75 85 5 10 15 20 50 55 75 80 25 28 30 65 85 90 95 Delete 60 Leaf is now below lower limit Can't shift key from sibling as sibling is at lower limit Combine with leaf (75, 80)

Deleting 65 25 50 75 85 5 10 15 20 50 55 25 28 30 65 75 80 85 90 95 Delete 60 Extra key now needs to be removed from parent

Deleting 65 25 50 85 5 10 15 20 50 55 25 28 30 65 75 80 85 90 95 Delete 60 Inner node is now below lower limit Combine with sibling This will eliminate root

Deleting 25 50 65 85 5 10 15 20 50 55 25 28 30 65 75 80 85 90 95 Delete 60 Inner node is now below lower limit Combine with sibling This will eliminate root

B+ Trees For a b order B+ tree (max of b children per node) Find, insert, and delete are all O(log b n) Space is O n Range queries can be done in O log b n + k for a range of k Range queries are queries that ask for all elements between two values Elements in a range are already in order

Database Index An inner node with n keys needs n + 1 pointers A leaf nodes with n keys also needs n + 1 pointers (n pointers to the actual data and 1 pointer to its sibling For a key size k and a pointer size p A node holding n keys needs kn + p n + 1 = k + p n + p bytes

Database Index Key size varies depending on type Assume 8 bytes per key (long int) Pointer size varies depending on architecture Assume 8 bytes (64 bits) A node holding n keys needs a minimum of 16n + 8 bytes

Database Index In practice, disk fetches are much, much more expensive than RAM reads Want to minimize disk fetches No point in reading less than a page at a time from disk If we make our B+ tree nodes the size of a page then Page size is typically 4 or 8 KB For 4 KB and 8-byte keys and pointers, a node can hold a maximum of 255 keys

Database Index A tree just 3 levels deep can hold more than 16 million keys at the leaves A tree just 4 levels deep can hold more than 4 billion keys at the leaves Searching for a key in a 4 billion record tables takes just 4 page fetches using an index A sequential scan of the table would take at least 16 million page fetches