Another solution: B-trees

Similar documents
Another solution: B-trees

CSC212 Data Structure - Section FG

Trees 2: Linked Representation, Tree Traversal, and Binary Search Trees

Material You Need to Know

CSE 530A. B+ Trees. Washington University Fall 2013

Data Structures and Algorithms

2-3 Tree. Outline B-TREE. catch(...){ printf( "Assignment::SolveProblem() AAAA!"); } ADD SLIDES ON DISJOINT SETS

Balanced Search Trees

Extra: B+ Trees. Motivations. Differences between BST and B+ 10/27/2017. CS1: Java Programming Colorado State University

CS 310 B-trees, Page 1. Motives. Large-scale databases are stored in disks/hard drives.

Multi-way Search Trees. (Multi-way Search Trees) Data Structures and Programming Spring / 25

Trees can be used to store entire records from a database, serving as an in-memory representation of the collection of records in a file.

B-Trees. Based on materials by D. Frey and T. Anastasio

Some Search Structures. Balanced Search Trees. Binary Search Trees. A Binary Search Tree. Review Binary Search Trees

Balanced Search Trees. CS 3110 Fall 2010

Physical Level of Databases: B+-Trees

Data Structures and Algorithms

Multiway searching. In the worst case of searching a complete binary search tree, we can make log(n) page faults Everyone knows what a page fault is?

CS F-11 B-Trees 1

CSCI Trees. Mark Redekopp David Kempe

An AVL tree with N nodes is an excellent data. The Big-Oh analysis shows that most operations finish within O(log N) time

Data Structures in Java

B-Trees. Version of October 2, B-Trees Version of October 2, / 22

8. Binary Search Tree

3. Priority Queues. ADT Stack : LIFO. ADT Queue : FIFO. ADT Priority Queue : pick the element with the lowest (or highest) priority.

Search Trees. The term refers to a family of implementations, that may have different properties. We will discuss:

DATA STRUCTURES AND ALGORITHMS. Hierarchical data structures: AVL tree, Bayer tree, Heap

Laboratory Module X B TREES

Advanced Set Representation Methods

Intro to DB CHAPTER 12 INDEXING & HASHING

Trees 1: introduction to Binary Trees & Heaps. trees 1

Fall, 2015 Prof. Jungkeun Park

A set of nodes (or vertices) with a single starting point

Lecture 3: B-Trees. October Lecture 3: B-Trees

(2,4) Trees. 2/22/2006 (2,4) Trees 1

Multi-way Search Trees! M-Way Search! M-Way Search Trees Representation!

Binary Trees, Binary Search Trees

Introduction. for large input, even access time may be prohibitive we need data structures that exhibit times closer to O(log N) binary search tree

Chapter 12: Indexing and Hashing (Cnt(

Heaps. 2/13/2006 Heaps 1

B-Trees. nodes with many children a type node a class for B-trees. an elaborate example the insertion algorithm removing elements

CS 234. Module 8. November 15, CS 234 Module 8 ADT Priority Queue 1 / 22

Binary Trees

Introduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana

Analysis of Algorithms

CS 315 Data Structures mid-term 2

COMP : Trees. COMP20012 Trees 219

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology

CMSC 341 Lecture 15 Leftist Heaps

Programming II (CS300)

CMSC 341 Leftist Heaps

Trees. Reading: Weiss, Chapter 4. Cpt S 223, Fall 2007 Copyright: Washington State University

Recall: Properties of B-Trees

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 11: Indexing and Hashing

Binary Search Trees. Contents. Steven J. Zeil. July 11, Definition: Binary Search Trees The Binary Search Tree ADT...

Basic Data Structures (Version 7) Name:

CMSC 341 Lecture 15 Leftist Heaps

C SCI 335 Software Analysis & Design III Lecture Notes Prof. Stewart Weiss Chapter 4: B Trees

An abstract tree stores data that is hierarchically ordered. Operations that may be performed on an abstract tree include:

Chapter 12: Indexing and Hashing

Module 4: Index Structures Lecture 13: Index structure. The Lecture Contains: Index structure. Binary search tree (BST) B-tree. B+-tree.

2-3 and Trees. COL 106 Shweta Agrawal, Amit Kumar, Dr. Ilyas Cicekli

Lecture Notes on Priority Queues

Algorithms. Deleting from Red-Black Trees B-Trees

INF2220: algorithms and data structures Series 1

Lec 17 April 8. Topics: binary Trees expression trees. (Chapter 5 of text)

Priority Queues. 1 Introduction. 2 Naïve Implementations. CSci 335 Software Design and Analysis III Chapter 6 Priority Queues. Prof.

Announcements. Midterm exam 2, Thursday, May 18. Today s topic: Binary trees (Ch. 8) Next topic: Priority queues and heaps. Break around 11:45am

void insert( Type const & ) void push_front( Type const & )

Spring 2017 B-TREES (LOOSELY BASED ON THE COW BOOK: CH. 10) 1/29/17 CS 564: Database Management Systems, Jignesh M. Patel 1

Self-Balancing Search Trees. Chapter 11

Search Trees. Computer Science S-111 Harvard University David G. Sullivan, Ph.D. Binary Search Trees

UNIT III BALANCED SEARCH TREES AND INDEXING

Spring 2018 Mentoring 8: March 14, Binary Trees

The smallest element is the first one removed. (You could also define a largest-first-out priority queue)

Trees. (Trees) Data Structures and Programming Spring / 28

Trees 2: Linked Representation, Tree Traversal, and Binary Search Trees

CISC 235: Topic 4. Balanced Binary Search Trees

Summer Final Exam Review Session August 5, 2009

Trees. A tree is a directed graph with the property

CSE 332 Autumn 2013: Midterm Exam (closed book, closed notes, no calculators)

Hierarchical data structures. Announcements. Motivation for trees. Tree overview

Binary Trees. Height 1

Algorithms. AVL Tree

COSC 2007 Data Structures II Final Exam. Part 1: multiple choice (1 mark each, total 30 marks, circle the correct answer)

Trees. Eric McCreath

CMSC 341 Lecture 14: Priority Queues, Heaps

(2,4) Trees Goodrich, Tamassia (2,4) Trees 1

Trees. Chapter 6. strings. 3 Both position and Enumerator are similar in concept to C++ iterators, although the details are quite different.

Chapter 12: Indexing and Hashing

Instructions PLEASE READ (notice bold and underlined phrases)

Binary Search Trees. Analysis of Algorithms

CSIT5300: Advanced Database Systems

! Tree: set of nodes and directed edges. ! Parent: source node of directed edge. ! Child: terminal node of directed edge

Questions from the material presented in this lecture

CMPSCI 187: Programming With Data Structures. Lecture #28: Binary Search Trees 21 November 2011

Lecture 11: Multiway and (2,4) Trees. Courtesy to Goodrich, Tamassia and Olga Veksler

Red-Black trees are usually described as obeying the following rules :

Binary heaps (chapters ) Leftist heaps

Transcription:

Trees III: B-trees

The balance problem Binary search trees provide efficient search mechanism only if they re balanced Balance depends on the order in which nodes are added to a tree We have seen red-black trees, one solution to this problem

Another solution: B-trees B-tree nodes hold data; typically several data items per node Each node has either no children or several children Insertion of data is governed by a set of rules outlined on the next several slides

B-tree rules Rule 0: Never talk about B-trees! Rule 1: Root may have as few as one entry; every other node has at least MINIMUM entries Rule 2: Maximum number of entries in a node is MAXIMUM (2 * MINIMUM) Rule 3: Each node of a B-tree contains a partially-filled array of entries, sorted from smallest to largest

B-tree rules Rule 4: The number of subtrees below a nonleaf node is one more than the number of entries in the node Example: if a node has 10 entries, it has 11 children Subtrees of a node are organized according to rule 5 (next slide)

B-tree rules Rule 5: For any non-leaf node: an entry at index n is greater than all entries in subtree n of the node; an entry at index n is less than all entries in subtree n+1 of the node Rule 6: Every leaf has the same depth

Example B-tree (MINIMUM=2)

Searching for data in a B-tree Check for target in root; if found there, return true If target isn t found in root, and root has no children, return false If root has children but doesn t contain target, make recursive call to search the subtree that could contain the target

Inserting an item in a B-tree Easiest option: relax the rules Make the array for each node size MAXIMUM+1 When the number of data values in a node exceeds MAXIMUM, split the node into 3 parts: Middle value: goes up one level to root node of this subtree Other values become data arrays of two nodes, which will be the left & right subtrees of the index where the middle value went

Examples: inserting an item Before insertion of 30: After insertion: In this example, the new value fits into a leaf node with no further adjustments needed

Examples: inserting an item Inserting a new value into the same leaf node requires the middle value be pushed up a level: After inserting 25:

Examples: inserting an item If enough values are added, a middle value can be pushed up to the root node

Examples: inserting an item Once insertions have filled root to capacity, an additional split causes the tree to grow upward (example shown has MINIMUM=1 so that tree fits on screen)

Methods needed for insertion Public add method: Performs loose insertion first; may result in an excess entry If there is excess, grow the tree upward Private methods called by public method: looseadd fixexcess

Loose insert method Does most of the actual work of inserting the value: Find slot where value should go and save this index; if correct slot isn't in root node, set index to root's count value If index is within root's data array and root has no children, shift entries to the right to accommodate new entry & increment count If root has children make recursive call to looseinsert on subset at index

Fixing nodes with excess entries Because each data array is sized at MAXIMUM+1, a node can contain one too many entries A node with such an excess will always have an odd number of entries to fix: Push middle data entry up to the parent node Remaining entries & associated subsets are split between the existing child and a new child

The fixexcess method Called by the private looseadd method when a child node is involved Called by the public add method when the action of a call to looseadd causes the root node to have an excess entry

Removing an item from a B-tree Once again, simplest method involves relaxing the rules Public remove method calls a private loose erase method that may invalidate the B-tree: Root might be left with 0 entries Root of a subtree might have less than MINIMUM entries If a loose erase causes either of the above conditions, tree must be restored

Removing items from Btree Example 1: remove value from leaf node with more than MINIMUM entries:

Removing items from Btree Example 2: Removing value from an inner node with data available to borrow

Removing items from Btree Example 3: removal from an interior node with no place to borrow from Step 1: find data:

Step 2: As before, borrow data from another node; this time, action leaves child node deficient Step 3: combine data from parent, child, and child s neighbor to create merged node:

Step 4: After merge, parent node is deficient Step 5: perform another merge, this time with parent, its parent and its sibling

Step 6: result of merge leaves root node temporarily empty: Step 7: collapse tree down one level:

Set implemented as B-tree We will use the Set ADT to illustrate the use of a B-tree The class we re defining (BalancedSet) describes a single object, the root node of a B-tree Keep in mind that, as with most of the trees we have studied, the concept of a B-tree is inherently recursive; every node can be considered the root node of a subtree btrees 25

Invariant for BalancedSet class Items in the set are stored in a B-tree; each child node is the root of a smaller B-tree A tally of the number of items in the root node is kept in member variable datacount The items in the root node are stored in the data array in data[0] data[count-1] If the root has subtrees, they are stored in sets pointed to by pointers in the subset array in subset[0] subset[children-1] btrees 26

Set class definition public class BalancedSet implements Cloneable { private final int MINIMUM = 1; // usually much larger in practice private final int MAXIMUM = 2*MINIMUM; int datacount; // # of items stored at this node int[ ] data = new int[maximum + 1]; int childcount; // # of children of this node BalancedSet[ ] subset = new BalancedSet[MAXIMUM + 2]; // each element of subset is a reference to a set represented // here as a partially filled array of sets btrees 27

Set class definition - constructor public BalancedSet( ) { datacount = 0; childcount = 0; } btrees 28

Searching for item in a B-tree Check for target in root; if found there, return true If target isn t found in root, and root has no children, return false If root has children but doesn t contain target, make recursive call to search the subtree that could contain the target btrees 29

Implementation of Set member method contains() public boolean contains(int target) { int i; for (i=0; i<datacount && data[i] < target; i++); if (i < data.length && data[i] == target) // found it return true; if (childcount == 0) // this is a leaf not found return false; return subset[i].contains(target); } btrees 30

Inserting an item into a B-tree Easiest method: relax the rules! Perform loose insertion: allow the root node to end up with one entry too many After loose insertion, can split root node if necessary, creating new root node and increasing height of the tree btrees 31

Methods needed for insertion Public add method: performs loose insertion; if loose insertion results in excess entries in a child node, grows the tree upward Private methods looseadd and fixexcess are called by the public method btrees 32

Loose insertion Loose insertion does most of the work of inserting a value: finds slot where value should go, saving index; if correct slot not found in root, index set to root s count value if index is within root s data array, and root has no children, shift entries to the right and add new entry, incrementing count if root has children make recursive call on subset at index btrees 33

Pseudocode for looseadd private void looseadd(int entry) { int i; for (i = 0; i<datacount && data[i] < entry; i++); if (i < data.length && data[i] == entry) return; if (childcount == 0) { // add entry at this node for(int x = data.length-1; x > i; x--) data[x] = data[x-1]; // shift elements to make room data[i] = entry; datacount++; } else { // add entry to a subset, housekeep subset[i].looseadd(entry); if(subset[i].datacount > MAXIMUM) fixexcess(i); } } btrees 34

Fixing nodes with excess entries Loose insertion can result in a node containing one too many entries A node with an excess will always have an odd number of entries to fix: middle entry is pushed up to the parent node remaining entries, along with any subsets, are split between the existing child and a new child btrees 35

fixexcess method Called by looseadd when a child node is involved Called by add when action of looseadd causes there to be an excess entry in the root node (of the entire tree) btrees 36

Pseudocode for fixexcess private void fixexcess(int i) { // make room in root s data array for new data, then copy // middle entry of subset[i] to root & increment root s datacount // split subset[i] into 2 subsets & copy data from original // subset into the splits // if subset[i] was not a leaf, copy its subsets into splits // created above & increment their childcount values // make room in root's subset array for new children & // add new subsets to root's subset array btrees 37

Public add method public void add(int element) { looseadd(element); // add data, then check to see if node still OK; if not: if (datacount > MAXIMUM) { // get ready to split root node BalancedSet child = new BalancedSet(); // transfer data to new child: for (int x=0; x<datacount; x++) child.data[x] = data[x]; for (int y=0; y<childcount; y++) child.subset[y] = subset[y]; // continued on next slide btrees 38

Public add method // finish setting up child set: child.childcount = childcount; child.datacount = datacount; // reset current node as empty, with 1 child datacount = 0; childcount = 1; // make new child subset of current node subset[0] = child; // fix problem of empty root node fixexcess(0); } } btrees 39

Removing an item from a B-tree Again, simplest method involves relaxing the rules Perform loose erase -- may end up with an invalid B-tree: might leave root of entire tree with 0 entries might leave root of subtree with less than MINIMUM entries After loose erase, restore B-tree btrees 40

Removing a B-tree entry Several methods involved; three are analogous to insertion methods: remove: public method -- performs loose remove, then calls other methods as necessary to restore B-tree looseremove: performs actual removal of data entry; may leave B-tree invalid, with root node having 0 or subtree root having MINIMUM-1 entries btrees2 41

Removing a B-tree entry Additional removal methods: fixshortage: deals with the problem of a subtree s root having MINIMUM-1 entries Other methods serve as helpers to fixshortage btrees2 42

Pseudocode for public remove method public boolean remove(int target) { if (!(looseremove(target)) return false; // target not found if (datacount == 0 && childcount ==1) // root was emptied by looseremove: shrink the // tree by : // - setting temporary reference to subset // - copying all member variables from // temp to root // - deleting original child node btrees2 43

Pseudocode for looseremove public boolean looseremove(int target) { find first index such that data[index]>=target; if no such index found, index=count if (target not found and isleaf()) return false; if (target found and isleaf()) remove target from data array; shift contents to the left and decrement count return true; btrees2 44

Pseudocode for looseremove if (target not found and root has children) { subset[index].loose_remove(target); if(subset[index].datacount < MINIMUM) fixshortage(index); return true; } btrees2 45

Pseudocode for looseremove if (target found and root has children) { data[index] = subset[index].removelargest(); if(subset[index].datacount < MINIMUM) fixshortage(index); return true; } btrees2 46

Action of fixshortage method In order to remedy a shortage of entries in subset[n], do one of the following: borrow an entry from the node s left neighbor (subset[n-1]) or right neighbor (subset[n+1]) if either of these two has more than MINIMUM entries combine subset[n] with either of its neighbors if they don t have excess entries to give btrees2 47

Pseudocode for fixshortage public void fixshortage(int x) { if (subset[x-1].datacount > MINIMUM) shift existing entries in subset[x] over one, copy data[x-1] to subset[x].data[0] and increment subset[x].datacount data[x-1] = last item in subset[x-1].data and decrement subset[x-1].datacount if(!(subset[x-1].isleaf())) transfer last child of subset[x-1] to front of subset[x], incrementing subset[x].childcount and decrementing subset[x-1].childcount btrees2 48

Pseudocode for fixshortage else if (subset[x+1].datacount > MINIMUM) increment subset[x].datacount and copy data[x] to subset[x].data[subset[x].datacount-1] data[x] = subset[x+1].data[0] and shift entries in subset[x+1].data to the left and decrement subset[x+1].datacount if (!(subset[x+1].isleaf())) transfer first child of subset[x+1] to subset[x], incrementing subset[x].childcount and decrementing subset[x+1].childcount btrees2 49

Pseudocode for fixshortage else if (subset[x-1].datacount == MINIMUM) add data[x-1] to the end of subset[x-1].data shift data array leftward, decrementing datacount and incrementing subset[x-1].datacount transfer all data items and children from subset[x] to end of subset[x-1]; update values of subset[x-1].datacount and subset[x-1].childcount, and set subset[x].datacount and subset[x].childcount to 0 delete subset[x] and shift subset array to the left and decrement children btrees2 50

Pseudocode for fixshortage else combine subset[x] with subset[x+1] -- work is similar to previous combination operation: borrow an entry from root and add to subset[x] transfer all private members from subset[x+1] to subset[x], and zero out subset[x+1] s childcount and datacount variables delete subset[x-1] and update root s subset information btrees2 51

Worst-case times for tree operations For a tree of depth d, all of the following are O(d) applications in the worst case: adding an entry to a binary search tree, heap or B-tree deleting an entry from a binary search tree, heap or B-tree search for an entry in a binary search tree or B- tree treebigo 52

Depth is not the whole story for binary search trees Time analysis on the basis of depth is not always the most useful measure -- these two binary search trees have the same depth: treebigo 53

Analysis based on number of entries in a BST The maximum depth of a binary search tree is n-1 (because there must be at least one node at each level) So the worst-case time for a binary search tree (O(d)) converts to O(n-1), or just O(n) treebigo 54

Heap analysis By definition, a heap is a complete binary tree Maximum nodes at each level: root node (level 0) : 1 (2 0 ) nodes root s children (level 1): 2 (2 1 ) nodes root s grandchildren (level 2): 4 (2 2 ) nodes At level d, there are 2 d nodes treebigo 55

Heap analysis So, for a heap to reach depth d, it must have (1 + 2 + 4 + + 2 (d-1) ) + 1 nodes Simplifying the formula: 1 + 1 + 2 + 4 + + 2 (d-1) 2 + 2 + 4 + + 2 (d-1) 4 + 4 + + 2 (d-1) 2 (d-1) + 2 (d-1) = 2 d treebigo 56

Since Worst-case times for heap d (depth) = log 2 2 d and operations n (number of nodes) >= 2 d log 2 n >= log 2 2 d so log 2 n >= d Adding or deleting an entry is O(d); since d<=log 2 n, worst-case scenario for heap operations is O(log 2 n) or just O(log n) treebigo 57

B-tree analysis For all three functions, the number of total steps is a constant (MAXIMUM in the worst case) times the height of the B-tree Height is no more than log M n (where M is MINIMUM and n is the number of entries in the tree) Thus, all three functions require no more than O(log n) operations treebigo 58

Significance of logarithms Logarithmic algorithms are those (such as heap and B-tree operations) with worstcase time of O(log n) For a logarithmic algorithm, doubling the input size (N) will make the time required increase by a (small) fixed number of operations treebigo 59

Significance of logarithms Example: adding a new entry to a heap with n entries In worst case, the algorithm may examine as many as log 2 n nodes Doubling the number of nodes to 2n would require the algorithm to examine as many as log 2 2n nodes -- but that is just 1 more than log 2 n (example: log 2 1024 = 10, log 2 2048 = 11) treebigo 60