B-Trees. Based on materials by D. Frey and T. Anastasio

Similar documents
CMPE 160: Introduction to Object Oriented Programming

Data Structures and Algorithms

Multiway searching. In the worst case of searching a complete binary search tree, we can make log(n) page faults Everyone knows what a page fault is?

Problem. Indexing with B-trees. Indexing. Primary Key Indexing. B-trees: Example. B-trees. primary key indexing

M-ary Search Tree. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. B-Trees (4.7 in Weiss)

Balanced Search Trees

Introduction. for large input, even access time may be prohibitive we need data structures that exhibit times closer to O(log N) binary search tree

Extra: B+ Trees. Motivations. Differences between BST and B+ 10/27/2017. CS1: Java Programming Colorado State University

CSE 530A. B+ Trees. Washington University Fall 2013

CS F-11 B-Trees 1

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology

Algorithms. Deleting from Red-Black Trees B-Trees

Spring 2018 Mentoring 8: March 14, Binary Trees

Module 4: Index Structures Lecture 13: Index structure. The Lecture Contains: Index structure. Binary search tree (BST) B-tree. B+-tree.

M-ary Search Tree. B-Trees. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. Maximum branching factor of M Complete tree has height =

TREES. Trees - Introduction

Physical Level of Databases: B+-Trees

Uses for Trees About Trees Binary Trees. Trees. Seth Long. January 31, 2010

Trees can be used to store entire records from a database, serving as an in-memory representation of the collection of records in a file.

Search Trees. Computer Science S-111 Harvard University David G. Sullivan, Ph.D. Binary Search Trees

C SCI 335 Software Analysis & Design III Lecture Notes Prof. Stewart Weiss Chapter 4: B Trees

CS 310 B-trees, Page 1. Motives. Large-scale databases are stored in disks/hard drives.

Trees. (Trees) Data Structures and Programming Spring / 28

Garbage Collection: recycling unused memory

Trees. Chapter 6. strings. 3 Both position and Enumerator are similar in concept to C++ iterators, although the details are quite different.

Intro to DB CHAPTER 12 INDEXING & HASHING

CSE332: Data Abstractions Lecture 7: B Trees. James Fogarty Winter 2012

Red-Black Trees. Based on materials by Dennis Frey, Yun Peng, Jian Chen, and Daniel Hood

B-Trees. Introduction. Definitions

Introduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana

THE B+ TREE INDEX. CS 564- Spring ACKs: Jignesh Patel, AnHai Doan

Hash Tables. CS 311 Data Structures and Algorithms Lecture Slides. Wednesday, April 22, Glenn G. Chappell

A set of nodes (or vertices) with a single starting point

Balanced Search Trees

Binary Search Trees. Analysis of Algorithms

Trees. Reading: Weiss, Chapter 4. Cpt S 223, Fall 2007 Copyright: Washington State University

Binary Trees, Binary Search Trees

CSIT5300: Advanced Database Systems

Section 4 SOLUTION: AVL Trees & B-Trees

8. Binary Search Tree

Data Structures and Algorithms

CS 171: Introduction to Computer Science II. Binary Search Trees

Sorted Arrays. Operation Access Search Selection Predecessor Successor Output (print) Insert Delete Extract-Min

Outline. An Application: A Binary Search Tree. 1 Chapter 7: Trees. favicon. CSI33 Data Structures

Tree-Structured Indexes

Multi-way Search Trees! M-Way Search! M-Way Search Trees Representation!

Lec 17 April 8. Topics: binary Trees expression trees. (Chapter 5 of text)

CMSC 341 Lecture 14: Priority Queues, Heaps

a graph is a data structure made up of nodes in graph theory the links are normally called edges

CSE 326: Data Structures B-Trees and B+ Trees

Tree-Structured Indexes

Search Trees. The term refers to a family of implementations, that may have different properties. We will discuss:

Lecture 13. Lecture 13: B+ Tree

Recall: Properties of B-Trees

Find the block in which the tuple should be! If there is free space, insert it! Otherwise, must create overflow pages!

Lecture 6: Analysis of Algorithms (CS )

Trees 2: Linked Representation, Tree Traversal, and Binary Search Trees

Chapter 12: Indexing and Hashing. Basic Concepts

Trees, Part 1: Unbalanced Trees

Principles of Data Management. Lecture #5 (Tree-Based Index Structures)

CS 206 Introduction to Computer Science II

Chapter 12: Indexing and Hashing

Tree-Structured Indexes

Algorithms. AVL Tree

CS 112 Introduction to Computing II. Wayne Snyder Computer Science Department Boston University

Tree-Structured Indexes

Data Structures. Outline. Introduction Linked Lists Stacks Queues Trees Deitel & Associates, Inc. All rights reserved.

1 Interlude: Is keeping the data sorted worth it? 2 Tree Heap and Priority queue

Background: disk access vs. main memory access (1/2)

I2206 Data Structures TS 6: Binary Trees - Binary Search Trees

Tree-Structured Indexes

B-Trees. Disk Storage. What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree. Deletion in a B-tree

Computer Science 210 Data Structures Siena College Fall Topic Notes: Trees

Tree-Structured Indexes ISAM. Range Searches. Comments on ISAM. Example ISAM Tree. Introduction. As for any index, 3 alternatives for data entries k*:

CmpSci 187: Programming with Data Structures Spring 2015

Trees. A tree is a directed graph with the property

BINARY SEARCH TREES cs2420 Introduction to Algorithms and Data Structures Spring 2015

Programming II (CS300)

V Advanced Data Structures

CMSC 341. Binary Search Trees CMSC 341 BST

Binary Trees: Practice Problems

B-Trees. Version of October 2, B-Trees Version of October 2, / 22

Binary Trees

ECE 242 Data Structures and Algorithms. Trees IV. Lecture 21. Prof.

Computational Optimization ISE 407. Lecture 16. Dr. Ted Ralphs

Tree-Structured Indexes. Chapter 10

CS350: Data Structures B-Trees

ITEC2620 Introduction to Data Structures

Binary Tree Applications

Chapter 12: Indexing and Hashing

INF2220: algorithms and data structures Series 1

CMSC 341 Priority Queues & Heaps. Based on slides from previous iterations of this course

CSE 100: B+ TREE, 2-3 TREE, NP- COMPLETNESS

Data Structures in Java

Tree-Structured Indexes (Brass Tacks)

CSCI Trees. Mark Redekopp David Kempe

Trees 2: Linked Representation, Tree Traversal, and Binary Search Trees

Another solution: B-trees

ECE 242. Data Structures

What is a Multi-way tree?

Transcription:

B-Trees Based on materials by D. Frey and T. Anastasio 1

Large Trees n Tailored toward applications where tree doesn t fit in memory q operations much faster than disk accesses q want to limit levels of tree (because each new level requires a disk access) q keep root and top level in memory 2

An Alternative to BSTs n Up until now we assumed that each node in a BST stored the data. n What about having the data stored only in the leaves? n The internal nodes just guide our search to the leaf which contains the data we want. n We ll restrict this discussion of such trees to those in which all leaves are at the same level. 3

20 12 40 8 17 33 45 1 2 5 7 9 10 12 15 18 19 27 29 20 33 37 40 41 45 Figure 1 - A BST with data stored in the leaves 4

Observations n Store data only at leaves; all leaves at same level q interior and exterior nodes have different structure q interior nodes store one key and two subtree pointers q all search paths have same length: lg n (assuming one element per leaf) q can store multiple data elements in a leaf 5

M-Way Trees n A generalization of the previous BST model q each interior node has M subtrees pointers and M-1 keys n the previous BST would be called a 2-way tree or Mway tree of order 2 q as M increases, height decreases: lg M n (assuming one element per leaf) q A perfect M-way tree of height h has M h leaves 6

An M-Way Tree of Order 3 Figure 2 (next page) shows the same data as figure 1, stored in an M-way tree of order 3. In this example M = 3 and h = 2, so the tree can support 9 leaves, although it contains only 8. One way to look at the reduced path length with increasing M is that the number of nodes to be visited in searching for a leaf is smaller for large M. We ll see that when data is stored on the disk, each node visited requires a disk access, so reducing the nodes visited is essential. 7

12 20 5 9 15 18 26 1 2 4 5 7 10 11 12 15 16 18 19 21 24 27 30 34 42 Figure 2 -- An M-Way tree of order 3 8

Searching in an M-Way Tree n Different from standard BST search q q q search always terminates at a leaf node might need to scan more than one element at a leaf might need to scan more than one key at an interior node n Trade-offs q q tree height decreases as M increases computation at each node during search increases as M increases 9

Searching an M-Way Tree Search (MWayNode v, DataType element, boolean foundit) if (v == NULL) return failure; if (v is a leaf) search the list of values looking for element if found, return success otherwise return failure else (if v is an interior node) search the keys to find which subtree element is in recursively search the subtree 10

Search Algorithm: Traversing the M-way Tree Everything in this subtree is smaller than this key 18 32 10 13 22 28 39 1 2 9 10 11 13 14 16 18 23 24 25 28 30 32 35 38 39 44 In any interior node, find the first key > search item, and traverse the link to the left of that key. Search for any item >= the last key in the subtree pointed to by the rightmost link. Continue until search reaches a leaf. 11

22 36 48 6 12 18 26 32 42 54 2 4 6 8 10 14 16 18 19 20 22 24 26 28 30 32 34 38 40 42 44 46 48 50 52 54 56 Figure 3 searching in an M-way tree of order 4 12

Is it worth it? n Is it worthwhile to reduce the height of the search tree by letting M increase? n Although the number of nodes visited decreases, the amount of computation at each node increases. n Where s the payoff? 13

An example n Consider storing 10 7 items in a balanced BST and in an M-way tree of order 10. n The height of the BST will be lg(10 7 ) ~ 24. n The height of the M-Way tree will be log(10 7 ) = 7 (assuming that we store just 1 record per leaf) n However, in the BST, just one comparison will be done at each interior node, but in the M-Way tree, 9 will be done (worst case) 14

How can this be worth the price? n Only if it somehow takes longer to descend the tree than it does to do the extra computation n This is exactly the situation when the nodes are stored externally (e.g. on disk) n Compared to disk access time, the time for extra computation is insignificant n We can reduce the number of accesses by sizing the M-way tree to match the disk block and record size. 15

A Generic M-Way Tree Node public class MwayNode<Ktype, Dtype> { // code for public interface here // constructors, accessors, mutators } private boolean isleaf; // true if node is a leaf private int m; // the order of the node private int nkeys; // nr of actual keys used private ArrayList<Ktype> keys; // array of keys(size = m - 1) private MWayNode subtrees[ ]; // array of pts (size = m) private int nelems; // nr poss. elements in leaf private List<Dtype> data; // data storage if leaf 16

B-Tree Definition A B-Tree of order M is an M-Way tree with the following constraints 1. The root is either a leaf or has between 2 and M subtrees 2. All interior node (except maybe the root) have between 3. M / 2 and M subtrees (i.e. each interior node is at least half full ) 4. All leaves are at the same level. A leaf must store between L / 2 and L data elements, where L is a fixed constant >= 1 (i.e. each leaf is at least half full, except when the tree has fewer than L/2 elements) 17

A B-Tree example n The following figure (also figure 3) shows a B-Tree with M = 4 and L = 3 n The root node can have between 2 and M = 4 subtrees n Each other interior node can have between M / 2 = 4 / 2 = 2 and M = 4 subtrees and up to M 1 = 3 keys. n Each exterior node (leaf) can hold between L / 2 = 3 / 2 = 2 and L = 3 data elements 18

22 36 48 6 12 18 26 32 42 54 2 4 6 8 10 14 16 18 19 20 22 24 26 28 30 32 34 38 40 42 44 46 48 50 52 54 56 Figure 4 A B-Tree with M = 4 and L = 3 19

Designing a B-Tree n Recall that M-way trees (and therefore B-trees) are often used when there is too much data to fit in memory. Therefore each node and leaf access costs one disk access. n When designing a B-Tree (choosing the values of M and L), we need to consider the size of the data stored in the leaves, the size of the keys and pointers stored in the interior nodes, and the size of a disk block. 20

Student Record Example Suppose our B-Tree stores student records which contain name, address, etc. and other data totaling 1024 bytes. Further assume that the key to each student record (ssn??) is 8 bytes long. Assume also that a pointer (really a disk block number, not a memory address) requires 4 bytes And finally, assume that our disk block is 4096 bytes 21

Calculating L L is the number of data records that can be stored in each leaf. Since we want to do just one disk access per leaf, this is the same as the number of data records per disk block. Since a disk block is 4096 and a data record is 1024, we choose L = 4096 / 1024 = 4 data records per leaf. 22

Calculating M Each interior node contains M pointers and M-1 keys. To maximize M (and therefore keep the tree flat and wide) and yet do just one disk access per interior node, we have the following relationship 4M + 8 ( M 1) <= 4096 12M <= 4104 M <= 342 So choose the largest possible M (making tree as shallow as possible) of 342. 23

Performance of our B-Tree With M = 342 the height of our tree for N students will be log 342 N/L. For example, with N = 100,000 (about 10 times the size of UMBC student population) the height of the tree with M = 342 would be no more than 2, because log 342 (25000) = 2. So any student record can be found in 3 disk accesses. If the root of the B-Tree is stored in memory, then only 2 disk accesses are needed. 24

Insertion of X in a B-Tree n Search to find the leaf into which X should be inserted n If the leaf has room (fewer than L elements), insert X and write the leaf back to the disk. n If the is leaf full, split it into two leaves, each with half of elements. Insert X into the appropriate new leaf and write new leaves back to the disk. q q q Update the keys in the parent If the parent node is already full, split it in the same manner Splits may propagate all the way to the root, in which case, the root is split (this is how the tree grows in height) 25

Insert 33 into this B-Tree 22 36 48 6 12 18 26 32 42 54 2 4 6 8 10 12 14 16 18 19 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 Figure 5 before inserting 33 26

Inserting 33 n Traversing the tree from the root, we find that 33 is less than 36 and is greater than 22, leading us to the 2 nd subtree. Since 33 is greater than 32 we are led to the 3 rd leaf (the one containing 32 and 34). n Since there is room for an additional data item in the leaf it is inserted (in sorted order which means reorganizing the leaf) 27

After inserting 33 22 36 48 6 12 18 26 32 42 54 2 4 6 8 10 12 14 16 18 19 20 22 24 26 28 30 32 33 34 36 38 40 42 44 46 48 50 52 54 56 Figure 6 after inserting 33 28

Now insert 35 n This item also belongs in the 3 rd leaf of the 2 nd subtree. However, that leaf is full. n Split the leaf in two and update the parent to get the tree in figure 7. 29

After inserting 35 22 36 48 6 12 18 26 32 34 42 54 2 4 6 8 10 12 14 16 18 19 20 22 24 26 28 30 32 33 34 35 36 38 40 42 44 46 48 50 52 54 56 Figure 7 after inserting 35 30

Inserting 21 n This item belongs in the 4 th leaf of the 1 st subtree (the leaf containing 18, 19, 20). n Since the leaf is full, we split it and update the keys in the parent. n However, the parent is also full, so it must be split and its parent (the root) updated. n But this would give the root 5 subtrees which is not allowed, so the root must also be split. n This is the only way the tree grows in height 31

After inserting 21 36 18 22 48 6 12 20 26 32 34 42 54 2 4 6 8 10 12 14 16 18 19 20 21 22 24 26 28 30 32 33 34 35 36 38 40 42 44 46 48 50 52 54 56 Figure 8 after inserting 21 32

B-tree Deletion Find leaf containing element to be deleted. If that leaf is still full enough (still has L / 2 elements after remove) write it back to disk without that element. Then change the key in the ancestor if necessary. If leaf is now too empty (has less than L / 2 elements), borrow an element from a neighbor. If neighbor would be too empty, combine two leaves into one. This combining requires updating the parent which may now have too few subtrees. If necessary, continue the combining up the tree Does it matter which neighbor we borrow from? 33