CS 310 B-trees, Page 1. Motives. Large-scale databases are stored in disks/hard drives.

Similar documents
Data Structures and Algorithms

CS F-11 B-Trees 1

B-Trees. Disk Storage. What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree. Deletion in a B-tree

Motivation for B-Trees

Multi-way Search Trees. (Multi-way Search Trees) Data Structures and Programming Spring / 25

Multiway searching. In the worst case of searching a complete binary search tree, we can make log(n) page faults Everyone knows what a page fault is?

B-Trees. Version of October 2, B-Trees Version of October 2, / 22

CSE 530A. B+ Trees. Washington University Fall 2013

Tree-Structured Indexes

Trees. Eric McCreath

Tree-Structured Indexes

Extra: B+ Trees. Motivations. Differences between BST and B+ 10/27/2017. CS1: Java Programming Colorado State University

Chapter 12 Advanced Data Structures

2-3 and Trees. COL 106 Shweta Agrawal, Amit Kumar, Dr. Ilyas Cicekli

Design and Analysis of Algorithms Lecture- 9: B- Trees

What is a Multi-way tree?

2-3 Tree. Outline B-TREE. catch(...){ printf( "Assignment::SolveProblem() AAAA!"); } ADD SLIDES ON DISJOINT SETS

Material You Need to Know

Trees can be used to store entire records from a database, serving as an in-memory representation of the collection of records in a file.

Comp 335 File Structures. B - Trees

(2,4) Trees Goodrich, Tamassia (2,4) Trees 1

Trees. Courtesy to Goodrich, Tamassia and Olga Veksler

Module 4: Index Structures Lecture 13: Index structure. The Lecture Contains: Index structure. Binary search tree (BST) B-tree. B+-tree.

Chapter 12: Indexing and Hashing (Cnt(

CS 206 Introduction to Computer Science II

Tree-Structured Indexes

Tree-Structured Indexes

Advanced Set Representation Methods

M-ary Search Tree. B-Trees. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. Maximum branching factor of M Complete tree has height =

Module 8: Binary trees

Augmenting Data Structures

Tree-Structured Indexes. Chapter 10

C SCI 335 Software Analysis & Design III Lecture Notes Prof. Stewart Weiss Chapter 4: B Trees

V Advanced Data Structures

M-ary Search Tree. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. B-Trees (4.7 in Weiss)

Tree-Structured Indexes

(2,4) Trees. 2/22/2006 (2,4) Trees 1

Module 9: Binary trees

Tree-Structured Indexes ISAM. Range Searches. Comments on ISAM. Example ISAM Tree. Introduction. As for any index, 3 alternatives for data entries k*:

B-Trees. Based on materials by D. Frey and T. Anastasio

CS127: B-Trees. B-Trees

B-Trees and External Memory

Problem. Indexing with B-trees. Indexing. Primary Key Indexing. B-trees: Example. B-trees. primary key indexing

Tree-Structured Indexes

Multi-way Search Trees

V Advanced Data Structures

Spring 2017 B-TREES (LOOSELY BASED ON THE COW BOOK: CH. 10) 1/29/17 CS 564: Database Management Systems, Jignesh M. Patel 1

CSCI Trees. Mark Redekopp David Kempe

Balanced Trees Part One

Introduction. Choice orthogonal to indexing technique used to locate entries K.

Introduction. for large input, even access time may be prohibitive we need data structures that exhibit times closer to O(log N) binary search tree

An AVL tree with N nodes is an excellent data. The Big-Oh analysis shows that most operations finish within O(log N) time

Intro to DB CHAPTER 12 INDEXING & HASHING

B-Trees and External Memory

Multiway Search Trees. Multiway-Search Trees (cont d)

Programming II (CS300)

Search Trees. The term refers to a family of implementations, that may have different properties. We will discuss:

Main Memory and the CPU Cache

EE 368. Weeks 5 (Notes)

Splay Trees. (Splay Trees) Data Structures and Programming Spring / 27

Binary Trees

Trees. Reading: Weiss, Chapter 4. Cpt S 223, Fall 2007 Copyright: Washington State University

Indexing and Hashing

Sorted Arrays. Operation Access Search Selection Predecessor Successor Output (print) Insert Delete Extract-Min

Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Programming II (CS300)

The B-Tree. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland

Physical Level of Databases: B+-Trees

B-Trees. Introduction. Definitions

Background: disk access vs. main memory access (1/2)

B-Trees & its Variants

B-Trees. CS321 Spring 2014 Steve Cutchin

Balanced Search Trees

CSE 326: Data Structures B-Trees and B+ Trees

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology

Multi-Way Search Trees

Module 4: Dictionaries and Balanced Search Trees

Principles of Data Management. Lecture #5 (Tree-Based Index Structures)

Self-Balancing Search Trees. Chapter 11

Multi-way Search Trees! M-Way Search! M-Way Search Trees Representation!

Multi-Way Search Trees

CS-301 Data Structure. Tariq Hanif

THE B+ TREE INDEX. CS 564- Spring ACKs: Jignesh Patel, AnHai Doan

Data Structures. Motivation

Tree-Structured Indexes

CS102 Binary Search Trees

Data Structures Week #6. Special Trees

Lecture 3: B-Trees. October Lecture 3: B-Trees

CSIT5300: Advanced Database Systems

MID TERM MEGA FILE SOLVED BY VU HELPER Which one of the following statement is NOT correct.

Algorithms. AVL Tree

Balanced Binary Search Trees. Victor Gao

Binary Search Trees. Analysis of Algorithms

CS 350 : Data Structures B-Trees

Algorithms. Deleting from Red-Black Trees B-Trees

Why Trees? Alternatives. Want: Ordered arrays. Linked lists. A data structure that has quick insertion/deletion, as well as fast search

Tree-Structured Indexes. A Note of Caution. Range Searches ISAM. Example ISAM Tree. Introduction

Lecture 11: Multiway and (2,4) Trees. Courtesy to Goodrich, Tamassia and Olga Veksler

Red-black trees (19.5), B-trees (19.8), trees

Trees. (Trees) Data Structures and Programming Spring / 28

Transcription:

CS 310 B-trees, Page 1 Motives Large-scale databases are stored in disks/hard drives. Disks are quite different from main memory. Data in a disk are accessed through a read-write head. To read a piece of data, the read-write head must first be positioned right before the data and then scans through the data. Moving the read-write head is terribly slow, compared to the speed of the processor. Implication: What would happen if we implement AVL trees on disks? CS 310 B-trees, Page 2 A Wish List To store a large amount of data in disks, we need is a tree structure that has the following properties. 1. Arrange data in a way similar to BSTs to enable efficient searching. 2. Balanced, not necessarily in the sense of the AVL trees. 3. Work well with large tree nodes.

CS 310 B-trees, Page 3 Definition Given a positive constant integer, called MINIMUM, a B tree satisfies the following rules. 1. The root may have as few as zero/one entry; every other node has at least MINIMUM entries. 2. The maximum number of entries in a node, denoted as MAXIMUM, is twice the value of MINIMUM. 3. The entries of each B tree node are stored in an array in ascending order, with the smallest at index 0. 4. The number of subtrees below a non-leaf node is always one more than the number of entries in the node. CS 310 B-trees, Page 4 5. For any non-leaf node: (a) an entry at index i is greater than all the entries in subtree number i of the node. (b) an entry at index i is less than all the entries in subtree number i + 1 of the node. 6. Every leaf in a B tree has the same depth. Note: C++ implementations of B-trees will NOT be covered in this course.

CS 310 B-trees, Page 5 An Example, MINIMUM = 2 17 7, 11 25, 30, 35, 43 2,4,6 8,10 12,14,16 18,20,22,24 27,29 32,33 37,39,41 45,46 CS 310 B-trees, Page 6 Contents of a B tree Node data count number of items stored in this node subtree count number of items stored in this node at leaves, subtree count = 0 at other nodes, subtree count = data count + 1 data[minimum*2] holds the data entries subtree[minimum*2 + 1] points to the data count + 1 subtrees

CS 310 B-trees, Page 7 Printing a B Tree if (subtree count == 0) print out data[0] to data[data count-1] else { for i from 0 to data count-1 { print subtree[i] (recursively) print data[i] } print subtree[data count] (recursively) } CS 310 B-trees, Page 8 Searching a B tree if (target = some data[k], 0 k < data count) return TRUE else if (subtree count == 0) return FALSE else { k = (smallest i, if target < data[i]) OR (subtree count-1, if target > the last entries) search in subtree[k] }

CS 310 B-trees, Page 9 Insertion to B-Trees 25 15 15 25 32 15 25 32 20 15 20 25 32 40 15 25 20 32 40 50 15 20 25 32 40 50 45 15 20 25 32 40 45 50 Split Root 32 15 20 25 40 45 50 MINIMUM = 3 Only the root can sometimes hold less than MINIMUM entries; other nodes are created with at least MINIMUM entries. The two leaves are of the same depth right at their creation. CS 310 B-trees, Page 10 Splitting A Node MAXIMUM entries Insert MAXIMUM + 1 entries MINIMUM MINIMUM Split and promote the middle entry

CS 310 B-trees, Page 11 Observations It is the middle entry that is promoted, not the newly inserted entry (they could be the same entry though). The promoted entry becomes the new root if the split node is the root of the entire B-tree. Otherwise, the promoted entry joins the parent node of the split node. CS 310 B-trees, Page 12 When A Promoted Entry Joins Its Parent MINIMUM MINIMUM MINIMUM MINIMUM After the promotion, the parent may have more than MAXIMUM entries and need to be split itself. This split-and-promote process could propagate all the way up to the root.

CS 310 B-trees, Page 13 Examples, MINIMUM=1 Let us insert 55 to 30 20 40 10, 15 25 35, 38 45, 50 CS 310 B-trees, Page 14 After the Insertion of 55 30 20 40, 50 10, 15 25 35, 38 45 55 Next, let us insert 37.

CS 310 B-trees, Page 15 Promote 37 30 20 37, 40, 50 10, 15 25 35 38 45 55 CS 310 B-trees, Page 16 Promote 40 30, 40 20 37 50 10, 15 25 35 38 45 55 Now, let us insert 5.

CS 310 B-trees, Page 17 After the Insertion of 5 30, 40 10, 20 37 50 5 15 25 35 38 45 55 Next, insert 18. CS 310 B-trees, Page 18 After the Insertion of 18 30, 40 10, 20 37 50 5 15,18 25 35 38 45 55 Let us insert 12.

CS 310 B-trees, Page 19 Promote 15 30, 40 10,15,20 37 50 5 12 18 25 35 38 45 55 CS 310 B-trees, Page 20 Promote 15 Again 15, 30, 40 10 20 37 50 5 12 18 25 35 38 45 55

CS 310 B-trees, Page 21 Promote 30 30 15 40 10 20 37 50 5 12 18 25 35 38 45 55 Note that depths of leaves are identical all the time. The only time a B-tree grows is when its root is split. CS 310 B-trees, Page 22 Deletion from B-Trees Try not to change the height of the tree for as long as possible. When unavoidable, the heights of all nodes are reduced simultaneously. The easy case: deleting 58 from the B-tree below (MINIMUM = 1) 50 30 60, 70 10 40 55,58 65 75,80

CS 310 B-trees, Page 23 After the Deletion of 58 50 30 60, 70 10 40 55 65 75,80 A more difficult case: deleting 65. CS 310 B-trees, Page 24 Borrow from the Nearest Right Sibling Condition: the nearest right sibling must have at least MINIMUM+1 entries. A B # of entries not changed B A MINIMUM 1 at least MINIMUM+1 MINIMUM at least MINIMUM Can you figure out how to borrow from the nearest left sibling?

CS 310 B-trees, Page 25 After the Deletion of 65 50 30 60,75 10 40 55 70 80 Next, delete 55. CS 310 B-trees, Page 26 Merge with the Nearest Right Sibling Condition: The nearest right sibling has exactly MINIMUM entries. A B at least MINIMUM B at least MINIMUM 1 A MINIMUM 1 MINIMUM 2*MINIMUM = MAXIMUM Can you figure out how to merge with the nearest left sibling?

CS 310 B-trees, Page 27 Observations If the merge leaves the parent with only MINIMUM-1 entries, than we need to fix the shortage problem of the parent too (by borrowing or merging). This merge process could propagate all the way up to the root. CS 310 B-trees, Page 28 After the Deletion of 55 50 30 75 10 40 60,70 80 Lastly, let us delete 40.

CS 310 B-trees, Page 29 50 Merge with left 50 30 75 75 10 60,70 80 10,30 60,70 80 Merge with right Discard empty root 50, 75 50, 75 10,30 60,70 80 10,30 60,70 80 CS 310 B-trees, Page 30 Deletion from a Non-Leaf Node Swap the deletion target with the maximum entry in the left subtree of the target. Example: to delete 50 20,30 60, 70 5,10 11,12 48,49 55,58 65 75,80

CS 310 B-trees, Page 31 Performance Analysis Consider a B tree with MINIMUM=10. What is the minimum number of entries in the tree when the height of the tree is 2? Repeat the above problem for heights 3, 4, 5, and h. What is the maximum height for B trees that contain 1 billion entries? CS 310 B-trees, Page 32 Counting Disk Accesses Let h be the height of a given B tree. In the worst case, an insertion can require 2h disk accesses. exactly h accesses in the downward, searching process at most h accesses in the upward, splitting process In the worst case, a deletion can also require 2h disk accesses. exactly h accesses in the downward, searching process at most h accesses in the upward, merging process It is possible to reduce the worst cases to h for both insertion and deletion, by allowing MAXIMUM=2*MINIMUM+1. such B tree variations are outside the scope of this course