B + -trees. Kerttu Pollari-Malmi

Similar documents
B Tree. Also, every non leaf node must have at least two successors and all leaf nodes must be at the same level.

Outline. Definition. 2 Height-Balance. 3 Searches. 4 Rotations. 5 Insertion. 6 Deletions. 7 Reference. 1 Every node is either red or black.

Search Trees. Chapter 11

AVL Trees. Reading: 9.2

Purely Functional Data structures (3) Purely Functional Data structures (4) Numerical Representations (1)

Lecture 3: B-Trees. October Lecture 3: B-Trees

Lecture Notes on AVL Trees

Laboratory Module X B TREES

Lecture 16 Notes AVL Trees

CS Fall 2010 B-trees Carola Wenk

Module 4: Index Structures Lecture 13: Index structure. The Lecture Contains: Index structure. Binary search tree (BST) B-tree. B+-tree.

Balanced Search Trees

Splay Trees Goodrich, Tamassia, Dickerson. Splay Trees 1

CMPS 2200 Fall 2017 B-trees Carola Wenk

CSE 530A. B+ Trees. Washington University Fall 2013

Splay Trees 3/20/14. Splay Trees. Splay Trees are Binary Search Trees. note that two keys of equal value may be wellseparated (7,T) (1,Q) (1,C) (5,H)

Multiway searching. In the worst case of searching a complete binary search tree, we can make log(n) page faults Everyone knows what a page fault is?

An AVL tree with N nodes is an excellent data. The Big-Oh analysis shows that most operations finish within O(log N) time

Splay Trees. Splay Trees 1

B-Trees. Version of October 2, B-Trees Version of October 2, / 22

Algorithms. Deleting from Red-Black Trees B-Trees

Data Structures. Motivation

Binary Trees. Binary Search Trees

13.4 Deletion in red-black trees

Background: disk access vs. main memory access (1/2)

Augmenting Data Structures

13.4 Deletion in red-black trees

AVL Trees. (AVL Trees) Data Structures and Programming Spring / 17

CS350: Data Structures B-Trees

(2,4) Trees. 2/22/2006 (2,4) Trees 1

(2,4) Trees Goodrich, Tamassia. (2,4) Trees 1

Lines and Their Slopes

CSC 263 Lecture 4. September 13, 2006

Algorithms for Memory Hierarchies Lecture 2

CS 350 : Data Structures B-Trees

CS 314H Honors Data Structures Fall 2017 Programming Assignment #6 Treaps Due November 12/15/17, 2017

Multi-way Search Trees. (Multi-way Search Trees) Data Structures and Programming Spring / 25

(2,4) Trees Goodrich, Tamassia (2,4) Trees 1

Properties of red-black trees

CS 310 B-trees, Page 1. Motives. Large-scale databases are stored in disks/hard drives.

Search Trees - 1 Venkatanatha Sarma Y

CS 314H Algorithms and Data Structures Fall 2012 Programming Assignment #6 Treaps Due November 11/14/16, 2012

Material You Need to Know

V Advanced Data Structures

Red-black trees (19.5), B-trees (19.8), trees

M-ary Search Tree. B-Trees. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. Maximum branching factor of M Complete tree has height =

Trees. Courtesy to Goodrich, Tamassia and Olga Veksler

INF2220: algorithms and data structures Series 1

Trees. Eric McCreath

V Advanced Data Structures

C SCI 335 Software Analysis & Design III Lecture Notes Prof. Stewart Weiss Chapter 4: B Trees

CHECK Your Understanding

Functions: The domain and range

M-ary Search Tree. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. B-Trees (4.7 in Weiss)

OPPA European Social Fund Prague & EU: We invest in your future.

Trees. Reading: Weiss, Chapter 4. Cpt S 223, Fall 2007 Copyright: Washington State University

Chapter 12: Indexing and Hashing (Cnt(

Recall: Properties of B-Trees

Trees. (Trees) Data Structures and Programming Spring / 28

Define the red- black tree properties Describe and implement rotations Implement red- black tree insertion

Section 1: True / False (1 point each, 15 pts total)

B-Trees. Large degree B-trees used to represent very large dictionaries that reside on disk.

Intro to DB CHAPTER 12 INDEXING & HASHING

Data Structures and Algorithms

CSE 326: Data Structures B-Trees and B+ Trees

18.3 Deleting a key from a B-tree

Comp 335 File Structures. B - Trees

Self-Balancing Search Trees. Chapter 11

ADVANCED DATA STRUCTURES

B-Trees. Introduction. Definitions

Search Trees - 2. Venkatanatha Sarma Y. Lecture delivered by: Assistant Professor MSRSAS-Bangalore. M.S Ramaiah School of Advanced Studies - Bangalore

Implicit differentiation

Lesson 2.1 Exercises, pages 90 96

Graphing square root functions. What would be the base graph for the square root function? What is the table of values?

Physical Level of Databases: B+-Trees

Rank-Balanced Trees 1. INTRODUCTION

Advanced Set Representation Methods

CIS265/ Trees Red-Black Trees. Some of the following material is from:

Binary Trees, Binary Search Trees

Binary Heaps in Dynamic Arrays

B-Trees. Disk Storage. What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree. Deletion in a B-tree

CPS 231 Exam 2 SOLUTIONS

Programming II (CS300)

Multi-Way Search Trees

CS102 Binary Search Trees

Algorithms. AVL Tree

What is a Multi-way tree?

Multi-Way Search Trees

ITEC2620 Introduction to Data Structures

CSCI2100B Data Structures Trees

Search Trees. Computer Science S-111 Harvard University David G. Sullivan, Ph.D. Binary Search Trees

Uses for Trees About Trees Binary Trees. Trees. Seth Long. January 31, 2010

Optional: Building a processor from scratch

Practical session No. 6. AVL Tree

Chapter 11: Indexing and Hashing

DELETION WITHOUT REBALANCING IN BINARY SEARCH TREES

Balanced search trees

CS 157: Assignment 6

Week 8. BinaryTrees. 1 Binary trees. 2 The notion of binary search tree. 3 Tree traversal. 4 Queries in binary search trees. 5 Insertion.

CS 380 ALGORITHM DESIGN AND ANALYSIS

Transcription:

B + -trees Kerttu Pollari-Malmi This tet is based partl on the course tet book b Cormen and partl on the old lecture slides written b Matti Luukkainen and Matti Nkänen. 1 Introduction At first, read the beginning of the chapter 18 from the tet book. It eplains wh B-trees are used when the search structure is on disk. In this course, we consider B + -trees instead of classical B-trees introduced in the tet book. The difference is that in B + -trees onl leaf nodes contain the actual ke values. The nonleaf nodes of the B + -trees contain router values. Routers are entities of the same tpe as the ke values, but the are not the kes stored in the search structure. The are onl used to guide the search in the tree. In classical B-trees, the ke values are stored in both leaf and non-leaf nodes of the tree. 2 The structure of a B + -tree A B + -tree T consists of nodes. One of the nodes is a special node T.root. If a node is a non-leaf node, it has the following fields:.n the number of router values currentl stored in node The router values stored in node in increasing order.router 1 <.router 2 <... <.router.n.leaf, a boolean field whose value is FALSE meaning that is a non-leaf node.n+1 pointers.c 1,.c 2,.c 3,...,.c.n+1 to the children of. If the node is a leaf node, it has the following fields.n number of ke values currentl stored in. 1

The ke values stored in node in increasing order.ke 1 <.ke 2 <... <.ke.n.leaf, a boolean field whose value is TRUE meaning that is a leaf node. If we consider a non-leaf node and pointers in it.c 1,.c 2,.c 3,...,.c.n+1 the following condition is true: k 1.router 1 < k 2.router 2 < k 3... < k.n.router.n < k.n+1 where k i is an ke or router value in the subtree pointed b the pointer.c i. An eample of the non-leaf node containing 5 router values: leaf n false 5 10 18 45 66 98 Usuall, we do not draw the values of the fields.n and.leaf : 10 18 45 66 98 The leaf nodes contain onl the fields n and leaf and ke values (and mabe some data connected to the ke values, but for the case of simplicit we do not draw that data) leaf n true 8 19 21 23 24 27 29 44 45 Usuall, we draw the leaf node without the fields n and leaf : 19 21 23 24 27 29 44 45 Usuall, each non-leaf node has doens or hundreds of children. An eample of a B + -tree: 13 29 79 5 19 67 85 2 5 6 7 11 12 14 19 20 23 31 41 50 53 61 63 68 73 75 76 80 83 84 87 88

3 Properties of the B + -tree nodes The B + -tree has to satisf the following balance conditions: Ever path from the root node to a leaf node has an equal length, i.e. ever leaf node has the same depth which is the height of the tree. Ever node of the B + -tree ecept the root node is at least half-filled. The latter condition can be formulated more eactl: denote b s() the number of children of node if is a non-leaf node and the number of kes stored in if is a leaf node. Let t 2 be a fied integer constant. For each node of the B + -tree, the following is true: 1 s() 2t, if is the onl node in the tree. 2 s() 2t, if is the root node and the tree contains also other nodes in addition to the root node. t s() 2t, otherwise. Because the B + -tree satisfies the given balance conditions, we can prove that the hight h of the B + -tree h log t n where n is the number of the kes stored in the tree. 4 Search in B + -tree The search operation is started in the root node and it proceeds in ever node as follows: If is a non-leaf node, we seek for the first router value.router i which is greater than or equal to the ke k searched for. After that the search continues in the node pointed b.c i If all router values in node are smaller than the ke k searched for, we continue in the node pointed b the last pointer in. If is a leaf node, we inspect whether is stored in this node. An eample of a search operation: Search(23): 13 29 79 5 19 67 85 2 5 6 7 11 12 14 19 20 23 31 41 50 53 61 63 68 73 75 76 80 83 84 87 88

Search(23): 13 29 79 5 19 67 85 2 5 6 7 11 12 14 19 20 23 31 41 50 53 61 63 68 73 75 76 80 83 84 87 88 Search(23): 13 29 79 5 19 67 85 2 5 6 7 11 12 14 19 20 23 31 41 50 53 61 63 68 73 75 76 80 83 84 87 88 4.1 Code of the search operation BTreeSearch(T,k) 1 = T.root 2 while not.leaf 3 i = 1 4 while i.n and k >.router i 5 i = i+1 6 =.c i 7 DiskRead() 8 i = 1 // ollaan lehtisolmussa 9 while i.n and k >.ke i 10 i = i+1 11 if i.n and k =.ke i 12 return (, i ) 13 else return NIL 4.2 Time compleit of the search operation During the search operation, h nodes are read from the disk to the main memor where h is the height of the B + -tree. As previousl stated, the height of the B + -tree is h = O(logn) where n is the number of the kes stored in the tree. In addition of the disk reads, the algorithm performs a linear search in ever node read from the disk. The time compleit of each linear search is O(t). Thus, the total time compleit of the B + -tree search operation is O(t log t n).

If the linear search inside each node is changed to a binar search, the total time compleit of the B + -tree search operation becomes O(log 2 t log t n). However, in practise the disk read operations dominate the time demand of the operation. 5 B + -tree insertion The insertion of the ke k to a B + -tree is started b searching for the leaf node which should contain k. This is performed in the same wa as when performing the B + -tree-search operation. If there is enough space in to insert the ke k, k is inserted and no other actions are needed. If the node is full before the insertion, it must be split as follows: We allocate memor for a new node. The first t kes of node are left in The last t kes of node are transferred to the new node. The ke k is inserted to either node or according to the value of k. A pointer to the new node and a new router value are inserted to node which is the parent of both and. A good choice for the new router value is the last ke value in. If there is not enough space for a new pointer and router value in, must be split. In the worst case, all nodes in the path from the leaf to the root must be split. An eample of splitting a leaf node: Insert 59: 31 41 50 53 61 63 31 41 50 53 61 63 59 31 41 50 53 59 61 63 31 41 50 53 59 61 63 In non-leaf nodes, the number of router values is one smaller than the number of pointers. Thus, if we split a non-leaf node, we have an etra router value. This router value is

added to the parent of the split node between the pointers to the original node and to its new sibling. An eample of the situation where a new pointer and a new router value 59 are inserted into node is presented below. 29 79 29 67 79 67 29 67 79 59 If the parent of the split non-leaf node is full, we must split the parent to be able to add a new pointer and router value to the parent. In the worst case, all nodes in the path from the split node to the root node have to be split. An eample of the situation where inserting the value 59 into the leaf node leads to splitting of three nodes. (Onl part of the B + -tree is presented in the picture below): Insert 59 29 13 29 79 13 67 79 67 59 50 53 61 63 50 53 59 61 63 The time compleit of the insertion algorithm is O(t log t n), where the cofficient t is due to the operations performed for each node in the main memor. In practice, the time demand is dominated b the number of disk reads and writes needed, which is O(log t n). We have presented the algorithm which performs the insertion to a leaf node at first and then goes back to the root node and does node splits when necessar. It is also possible to do the insertion in a single pass down the tree: if we encounter a full node, we split it on the

wa from the root down to the leaf. This means that we mabe do some unnecessar split operations, but on the other hand, we guarantee that it is alwas possible to insert a ke into a leaf node without going back to the root after insertion. 6 B + -tree deletion We start the deletion b searching for the leaf node containing the ke to be deleted. Notice that we do not delete the same value from the non-leaf nodes although a non-leaf node would contain this value as a router value. Because ever leaf node contains more than one ke values, we do not delete the node itself. We just remove the ke value from the node. If a leaf node contains at least t kes after the deletion, we are read and no other actions are needed. If the number of kes in the leaf node is t 1 after deletion, rebalancing operation are needed. When rebalancing node which contains t 1 kes, we use the sibling of. The action performed depends on the number of nodes in. If the node contains at most t + 1 kes, the kes in are transferred to node. After that, the pointer to node and one router value are removed from the parent of and. This action is called fusing. Delete 53: 67 67 50 53 68 69 50 68 69 50 68 69 If the parent has less than t children after fusing, we must continue the rebalancing process in, which is a non-leaf node. If fusing is applied to two non-leaf nodes, the router value between the pointers to these nodes in the parent is removed from the parent and added to the fused node. In the worst case, all nodes in the path from the fused leaf to the root must be fused. The figure below shows the situation where fusing is started after one ke has been removed from the leaf in the left. (The figure contains onl part of the B + -tree.)

T.root 90 T.root 74 99 90 99 67 80 74 80 50 68 69 50 68 69 If the node contains t 1 kes after the deletion and its sibling contains at least t+2 kes, fusing cannot be applied. In this case, we use sharing: Some of the kes in are transferred to the node so that the number of kes in both nodes is about the same: Delete 53: 67 67 69 50 53 68 69 70 50 68 69 70 50 68 69 70 Because the parent of and does not loose an children in this process, we do not need an rebalancing in the parent. However, we must update the router value between the pointers to the nodes and. It is also possible that we must appl sharing to non-leaf nodes, if fusing nodes lower in the tree has led to the situation where a non-leaf node contains t 1 children and its sibling contains t + 2 children. Notice how the router values are changed between the nodes to which sharing is applied and their parent in the eample below: 13 85 13 75 85 60 75 80 60 80 The time compleit of the delete operation is the same as the time compleit of the insertion:

The total time compleit is O(t log t n) The number of disk operations needed is O(log t n). This is the dominating factor of the time demand in practice. It is also possible to perform the delete operation and rebalancing in a single pass down the tree: Each time we encounter a node having eactl t children (non-leaf node) or kes (leaf node) on the wa from the root to the leaf in the search phase, we appl sharing or fusing to this node and its sibling. When we reach the leaf, we can alwas remove at least one ke from it without an further rebalancing operations.