B-Trees. Version of October 2, B-Trees Version of October 2, / 22

Similar documents
An AVL tree with N nodes is an excellent data. The Big-Oh analysis shows that most operations finish within O(log N) time

Module 4: Index Structures Lecture 13: Index structure. The Lecture Contains: Index structure. Binary search tree (BST) B-tree. B+-tree.

C SCI 335 Software Analysis & Design III Lecture Notes Prof. Stewart Weiss Chapter 4: B Trees

V Advanced Data Structures

Multi-way Search Trees! M-Way Search! M-Way Search Trees Representation!

Lecture 3: B-Trees. October Lecture 3: B-Trees

Data Structures and Algorithms

What is a Multi-way tree?

V Advanced Data Structures

Multiway searching. In the worst case of searching a complete binary search tree, we can make log(n) page faults Everyone knows what a page fault is?

CS350: Data Structures B-Trees

Introduction. for large input, even access time may be prohibitive we need data structures that exhibit times closer to O(log N) binary search tree

M-ary Search Tree. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. B-Trees (4.7 in Weiss)

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology

CS 350 : Data Structures B-Trees

Self-Balancing Search Trees. Chapter 11

Balanced Search Trees

(2,4) Trees. 2/22/2006 (2,4) Trees 1

M-ary Search Tree. B-Trees. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. Maximum branching factor of M Complete tree has height =

Trees. Courtesy to Goodrich, Tamassia and Olga Veksler

Algorithms. Deleting from Red-Black Trees B-Trees

TREES. Trees - Introduction

CS F-11 B-Trees 1

(2,4) Trees Goodrich, Tamassia (2,4) Trees 1

Data Structures Week #6. Special Trees

Comp 335 File Structures. B - Trees

Multi-way Search Trees

Uses for Trees About Trees Binary Trees. Trees. Seth Long. January 31, 2010

B-Trees. Disk Storage. What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree. Deletion in a B-tree

Data Structures Week #6. Special Trees

CS350: Data Structures Red-Black Trees

CSE 530A. B+ Trees. Washington University Fall 2013

Data Structures. Motivation

Lecture 11: Multiway and (2,4) Trees. Courtesy to Goodrich, Tamassia and Olga Veksler

B-trees. It also makes sense to have data structures that use the minimum addressable unit as their base node size.

CS 310 B-trees, Page 1. Motives. Large-scale databases are stored in disks/hard drives.

Augmenting Data Structures

CSE 326: Data Structures B-Trees and B+ Trees

Data Structures Week #6. Special Trees

Search Trees - 1 Venkatanatha Sarma Y

B-Trees and External Memory

Multi-way Search Trees. (Multi-way Search Trees) Data Structures and Programming Spring / 25

Algorithms in Systems Engineering ISE 172. Lecture 16. Dr. Ted Ralphs

B-Trees and External Memory

Laboratory Module X B TREES

Section 4 SOLUTION: AVL Trees & B-Trees

B-Trees Data structures on secondary storage primary memory main memory

2-3 Tree. Outline B-TREE. catch(...){ printf( "Assignment::SolveProblem() AAAA!"); } ADD SLIDES ON DISJOINT SETS

CMPS 2200 Fall 2017 Red-black trees Carola Wenk

Computational Optimization ISE 407. Lecture 16. Dr. Ted Ralphs

B-Trees & its Variants

CS Fall 2010 B-trees Carola Wenk

Trees. (Trees) Data Structures and Programming Spring / 28

13.4 Deletion in red-black trees

Multiway Search Trees. Multiway-Search Trees (cont d)

CSE 373 OCTOBER 25 TH B-TREES

Advanced Set Representation Methods

CS102 Binary Search Trees

CISC 235: Topic 4. Balanced Binary Search Trees

CMPS 2200 Fall 2017 B-trees Carola Wenk

Motivation for B-Trees

Trees. Eric McCreath

CSCI Trees. Mark Redekopp David Kempe

Extra: B+ Trees. Motivations. Differences between BST and B+ 10/27/2017. CS1: Java Programming Colorado State University

Algorithms. AVL Tree

Binary Trees, Binary Search Trees

Binary Search Trees. Analysis of Algorithms

CMPS 2200 Fall 2015 Red-black trees Carola Wenk

18.3 Deleting a key from a B-tree

Lecture 6: Analysis of Algorithms (CS )

Module 4: Dictionaries and Balanced Search Trees

Trees. Reading: Weiss, Chapter 4. Cpt S 223, Fall 2007 Copyright: Washington State University

COMP171. AVL-Trees (Part 1)

Trees can be used to store entire records from a database, serving as an in-memory representation of the collection of records in a file.

Search Trees. COMPSCI 355 Fall 2016

Recall: Properties of B-Trees

(2,4) Trees Goodrich, Tamassia. (2,4) Trees 1

B-Trees. CS321 Spring 2014 Steve Cutchin

Red-Black, Splay and Huffman Trees

Binary Tree. Preview. Binary Tree. Binary Tree. Binary Search Tree 10/2/2017. Binary Tree

CS127: B-Trees. B-Trees

Background: disk access vs. main memory access (1/2)

Splay Trees. (Splay Trees) Data Structures and Programming Spring / 27

Data Structure - Advanced Topics in Tree -

Sorted Arrays. Operation Access Search Selection Predecessor Successor Output (print) Insert Delete Extract-Min

13.4 Deletion in red-black trees

CS 3343 Fall 2007 Red-black trees Carola Wenk

CSE332: Data Abstractions Lecture 7: B Trees. James Fogarty Winter 2012

UNIT III BALANCED SEARCH TREES AND INDEXING

Binary Trees

Dynamic Access Binary Search Trees

DATA STRUCTURES AND ALGORITHMS. Hierarchical data structures: AVL tree, Bayer tree, Heap

Introduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana

2-3 and Trees. COL 106 Shweta Agrawal, Amit Kumar, Dr. Ilyas Cicekli

Search Trees. The term refers to a family of implementations, that may have different properties. We will discuss:

Physical Level of Databases: B+-Trees

CIS265/ Trees Red-Black Trees. Some of the following material is from:

Design and Analysis of Algorithms Lecture- 9: B- Trees

B-Trees. Introduction. Definitions

CSCI2100B Data Structures Trees

Transcription:

B-Trees Version of October 2, 2014 B-Trees Version of October 2, 2014 1 / 22

Motivation An AVL tree can be an excellent data structure for implementing dictionary search, insertion and deletion Each operation on an n-node AVL tree takes O(log n) time This only works, though, long as the entire data structure fits into main memory When the data size is too large and data must reside on disk, AVL performance may deteriorate rapidly. B-Trees Version of October 2, 2014 2 / 22

A ractical Example For a typical machine Main memory: 100 nanoseconds per access (a nanosecond is 10 9 second) Hard disk: 0.01 seconds per access (seek time + rotational latency) HD is 5 orders of magnitude slower than main memory HD access reads a large block of data at one time Reading one byte and full block of data take the same time. Consider a database with 10 9 items (stored on disk) Tree would have height log 2 10 9 = 30 Operations on these BSTS would need 30 disk accesses a very slow 0.3 second! Want a way to substantially reduce number of disk accesses. B-Trees Version of October 2, 2014 3 / 22

From Binary to M-ary Idea: allow the nodes to have many children More branching Shallower Tree Fewer disk accesses As branching increases, tree height decreases (shallower) An m-ary tree allows m-way branching Each internal node has at most m 1 keys Complete m-ary tree has height log m n instead of log 2 n Example: if m = 100, then log 100 10 9 < 5 This reduces disk accesses and speeds up search significantly B-Trees Version of October 2, 2014 4 / 22

B-Trees A B-tree of (minimum) degree t 2 has following properties: 1 Every node x (except root) has between t and 2t children Node with n[x] keys has n[x] + 1 children. between t 1 and 2t 1 keys Root has at most 2t children at most 2t 1 keys 2 All leaves appear on the same level 3 Every node x has the following fields: a. n[x], the number of keys currently stored in node x b. the n[x] keys themselves, stored in nondecreasing order c. n[x] + 1 pointers c 1 [x], c 2 [x],..., c n[x]+1 [x] to its children (Leaf nodes have no children, so their c i fields are undefined) 4 Keys key i [x] separate ranges of keys in subtrees: if k i is a key stored in the subtree with root c i [x], then k 1 key 1 [x] k 2 key 2 [x] key n[x] [x] k n[x]+1 B-Trees Version of October 2, 2014 5 / 22

B-Tree Example C G M A B D E F J K L N O Q R S U W Y Z t = 2: the simplest B-tree Every node has at least one key. Every internal node has at least 2 children Every node has at most 3 keys. Every internal node has at most 4 children A node is full if it contains exactly 2t 1 keys (e.g., nodes colored in the above example) We choose t such that an internal node fits in one disk block. B-Trees Version of October 2, 2014 6 / 22

Height of B-Tree Consider the worst case the root contains one key all other nodes contain t 1 keys which implies, 1 node at depth 0; 2 nodes at depth 1; 2t nodes at depth 2; 2t 2 nodes at depth 3;...; 2t h 1 nodes at depth h Thus, for any n-key B-tree of minimum degree t 2 and height h h ( t n 1 + (t 1) 2t i 1 h ) 1 = 1 + 2(t 1) t 1 i=1 = 2t h 1. Therefore, h log t n+1 2. Compared with AVL trees, a factor of about log 2 t is saved in the number of nodes examined for most tree operations. B-Trees Version of October 2, 2014 7 / 22

Insertion Basically follows insertion strategy of binary search tree Insert the new key into an existing leaf node C G M A B D E F J K L N O Q R S U W Y Z Insert V C G M A B D E F J K L N O Q R S U V W Y Z B-Trees Version of October 2, 2014 8 / 22

Insertion: How to insert into a full node? Don t. Split full nodes BEFORE inserting into them! Given a nonfull internal node x, an index i, and a node y such that y = c i [x] is a full child of x. x key i 1 [x] G M key i [x] x key i 1 [x] key i [x] G K M key i+1 [x] y = c i [x] J K L y = c i [x] J L z = c i+1 [x] T 1 T 2 T 3 T 4 T 1 T 2 T 3 T 4 1 split the full node y (with 2t 1 keys) around its median key key t [y] into two nodes with t 1 keys each 2 move key t [y] up into y s parent x to separate the two nodes B-Trees Version of October 2, 2014 9 / 22

Insertion Strategy Question How can we insure that the parent of a full node is not full? Answer While moving down the tree, split every full node along the path from the root to the leaf where the new key will be inserted A key can be inserted into a B-tree in a single pass down the tree from the root to a leaf Splitting the root is the only way to increase the height of a B-tree B-Trees Version of October 2, 2014 10 / 22

Insertion: Example (a) initial tree C G M A B D E F J K L N O Q R S U W Y Z G C M A B D E F J K L N O Q R S U W Y Z (b) insert H: split the encountered full node B-Trees Version of October 2, 2014 11 / 22

Insertion: Example (c) insert H: split the encountered full node G C K M A B D E F J L N O Q R S U W Y Z G C K M A B D E F H J L N O Q R S U W Y Z (d) insert H: insert into an existing nonfull leaf node B-Trees Version of October 2, 2014 12 / 22

Deletion Trivial case: the leaf that contains the deleted key is not small (i.e., before deletion, it contains at least t keys) C G M A B D E F J K L N O Q R S U W Y Z Delete E C G M A B D F J K L N O Q R S U W Y Z B-Trees Version of October 2, 2014 13 / 22

Deletion Strategy Question How to delete key k moving down from root without backing up? Answer Remove(x, k) will remove k from subtree rooted at x.. Algorithm walks down tree towards k. Will (for non-root x) first ensure that x contains at least t keys. Then will either remove k or recursively call Remove(x, k ), where k is some key (possibly not k) and x is the root of subtree of x containing k. If k is in leaf x, condition ensures deletion is trivial Two other, more complicated cases, to consider Case 1 k is in the internal node x Case 2 k is not in the internal node x B-Trees Version of October 2, 2014 14 / 22

Deletion: Comments on root deletion When viewing the following, note that height of tree remains unchanged except in special cases of 1(c) and 2(c) when root x contains exactly one key root x s two children c 1 [x] and c 2 [x] each contain exactly t 1 keys. Both 1(c) and 2(c) will then merge c 1 [x] key[x] and c 2 [x] into one new root node and then proceed to delete k from the new tree rooted at this new node. We will not explicitly illustrate these cases in the following slides. B-Trees Version of October 2, 2014 15 / 22

Deletion: Case 1a Case 1: key k is in the internal node x a. If the child y that precedes k in node x has at least t keys 1 find the predecessor k of k in the subtree rooted at y 2 replace k by k in x 3 recursively delete k C G M A B D F J K L N O Q R S U W Y Z Delete G C F M A B D J K L N O Q R S U W Y Z the predecessor F of G is moved up to take G s position B-Trees Version of October 2, 2014 16 / 22

Deletion: Case 1b Case 1: key k is in the internal node x b. If the child z that follows k in node x has at least t keys 1 find the successor k of k in the subtree rooted at z 2 replace k by k in x 3 recursively delete k C F M A B D J K L N O Q R S U W Y Z Delete F C J M A B D K L N O Q R S U W Y Z the successor J of F is moved up to take F s position B-Trees Version of October 2, 2014 17 / 22

Deletion: Case 1c Case 1: key k is in the internal node x c. If both y and z have only t 1 keys 1 merge k and z into y (y now contains 2t 1 keys) 2 recursively delete k from y L deleted (trivial case) C J M A B D K N O Q R S U W Y Z Delete J C M A B D K N O Q R S U W Y Z J is pushed down to make node DJK, from where J is deleted B-Trees Version of October 2, 2014 18 / 22

Deletion: Case 2a Case 2: the key k is not in the internal node x, then determine the root c i [x] whose subtree contains k. If c i [x] has > t 1 keys then Recursively delete k from c i [x] B-Trees Version of October 2, 2014 19 / 22

Deletion: Case 2b Case 2: the key k is not in the internal node x, then determine the root c i [x] whose subtree contains k. If c i [x] has only t 1 keys b. If c i [x] has an immediate sibling with at least t keys 1 give c i [x] an extra key by moving a key from x down into c i [x] 2 move a key from c i [x] s immediate left or right sibling up into x 3 move the appropriate child pointer from the sibling into c i [x] 4 recursively delete k from the appropriate child of x A deleted (trivial case) C M B D K N O Q R S U W Y Z Delete B D M C K N O Q R S U W Y Z C is moved to fill B s position, and D is moved to fill C s B-Trees Version of October 2, 2014 20 / 22

Deletion: Case 2c Case 2: the key k is not in the internal node x, then determine the root c i [x] whose subtree contains k. If c i [x] has only t 1 keys c. and both of its immediate siblings have t 1 keys 1 merge c i [x] with one sibling 2 recursively delete k from the appropriate child of x D M C K N O Q R S U W Y Z Delete C M D K N O Q R S U W Y Z D is pushed down to get node CDK, from where C is deleted B-Trees Version of October 2, 2014 21 / 22

B-Trees: More Saw how to maintain a B-tree using log t n operations each operation requires constant number of disk reads. could also require many internal memory operations For large t; useful for storing large databases on disk with each node a disk page B-Trees created by Bayer and McCreight at Boeing in 1972 B + tree variant keeps data keys in leaves Simplest B-Tree is (2 3 4)-tree Balanced tree good for internal memory storage Another variation is (a, b)-trees: all non-root nodes have between a and b children Is a B-tree if b = 2a. Smallest (non B-Tree) version is (2, 3)-trees B-Trees Version of October 2, 2014 22 / 22