The B-Tree. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland

Similar documents
B-Trees. Disk Storage. What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree. Deletion in a B-tree

Priority Queues and Binary Heaps

Multi-way Search Trees. (Multi-way Search Trees) Data Structures and Programming Spring / 25

CSE 530A. B+ Trees. Washington University Fall 2013

Physical Level of Databases: B+-Trees

(2,4) Trees. 2/22/2006 (2,4) Trees 1

M-ary Search Tree. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. B-Trees (4.7 in Weiss)

Background: disk access vs. main memory access (1/2)

(2,4) Trees Goodrich, Tamassia (2,4) Trees 1

Module 4: Index Structures Lecture 13: Index structure. The Lecture Contains: Index structure. Binary search tree (BST) B-tree. B+-tree.

M-ary Search Tree. B-Trees. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. Maximum branching factor of M Complete tree has height =

Binary Heaps in Dynamic Arrays

Multi-way Search Trees

Multiway Search Trees. Multiway-Search Trees (cont d)

Remember. 376a. Database Design. Also. B + tree reminders. Algorithms for B + trees. Remember

2-3 Tree. Outline B-TREE. catch(...){ printf( "Assignment::SolveProblem() AAAA!"); } ADD SLIDES ON DISJOINT SETS

Lecture Notes: External Interval Tree. 1 External Interval Tree The Static Version

Lecture 11: Multiway and (2,4) Trees. Courtesy to Goodrich, Tamassia and Olga Veksler

2-3 and Trees. COL 106 Shweta Agrawal, Amit Kumar, Dr. Ilyas Cicekli

Search Trees - 1 Venkatanatha Sarma Y

B-Trees and External Memory

(2,4) Trees Goodrich, Tamassia. (2,4) Trees 1

High Dimensional Indexing by Clustering

Data Structures and Algorithms

B-Trees and External Memory

Oregon State University Practice problems 2 Winster Figure 1: The original B+ tree. Figure 2: The B+ tree after the changes in (1).

Chapter 18 Indexing Structures for Files. Indexes as Access Paths

We can use a max-heap to sort data.

B-Trees & its Variants

Design and Analysis of Algorithms Lecture- 9: B- Trees

B+ TREES Examples. B Trees are dynamic. That is, the height of the tree grows and contracts as records are added and deleted.

Trees. Courtesy to Goodrich, Tamassia and Olga Veksler

CS 310 B-trees, Page 1. Motives. Large-scale databases are stored in disks/hard drives.

Multi-way Search Trees! M-Way Search! M-Way Search Trees Representation!

THE B+ TREE INDEX. CS 564- Spring ACKs: Jignesh Patel, AnHai Doan

CSIT5300: Advanced Database Systems

CS350: Data Structures Red-Black Trees

Intro to DB CHAPTER 12 INDEXING & HASHING

EE 368. Weeks 5 (Notes)

CSE 326: Data Structures B-Trees and B+ Trees

Chapter 12: Indexing and Hashing (Cnt(

Multiway Search Trees

Lecture 6: External Interval Tree (Part II) 3 Making the external interval tree dynamic. 3.1 Dynamizing an underflow structure

Module 4: Tree-Structured Indexing

Algorithms. Deleting from Red-Black Trees B-Trees

CS F-11 B-Trees 1

Material You Need to Know

Multi-Way Search Trees

Multi-Way Search Trees

CS127: B-Trees. B-Trees

I/O-Algorithms Lars Arge

3. Fundamental Data Structures

Multidimensional Divide and Conquer 2 Spatial Joins

CS Fall 2010 B-trees Carola Wenk

Sorting and Searching

An AVL tree with N nodes is an excellent data. The Big-Oh analysis shows that most operations finish within O(log N) time

Introduction to Indexing R-trees. Hong Kong University of Science and Technology

Extra: B+ Trees. Motivations. Differences between BST and B+ 10/27/2017. CS1: Java Programming Colorado State University

TREES. Trees - Introduction

CSE 373 OCTOBER 25 TH B-TREES

CS 350 : Data Structures B-Trees

Problem. Indexing with B-trees. Indexing. Primary Key Indexing. B-trees: Example. B-trees. primary key indexing

Sorting and Searching

B-Trees. Version of October 2, B-Trees Version of October 2, / 22

Multiway searching. In the worst case of searching a complete binary search tree, we can make log(n) page faults Everyone knows what a page fault is?

Fundamentals of Data Structure

CMPS 2200 Fall 2017 B-trees Carola Wenk

Trees. (Trees) Data Structures and Programming Spring / 28

Recursion: The Beginning

CSCI Trees. Mark Redekopp David Kempe

In-Memory Searching. Linear Search. Binary Search. Binary Search Tree. k-d Tree. Hashing. Hash Collisions. Collision Strategies.

Multidimensional Indexing The R Tree

B-Trees. Introduction. Definitions

Chapter 11: Indexing and Hashing

Data Structure: Search Trees 2. Instructor: Prof. Young-guk Ha Dept. of Computer Science & Engineering

Lecture 6: Analysis of Algorithms (CS )

Augmenting Data Structures

B Tree. Also, every non leaf node must have at least two successors and all leaf nodes must be at the same level.

B-Tree. CS127 TAs. ** the best data structure ever

Chapter 12: Indexing and Hashing. Basic Concepts

Associate Professor Dr. Raed Ibraheem Hamed

13.4 Deletion in red-black trees

Chapter 12: Indexing and Hashing

COMP Analysis of Algorithms & Data Structures

Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

CS350: Data Structures B-Trees

13.4 Deletion in red-black trees

CS301 - Data Structures Glossary By

Chapter 17 Indexing Structures for Files and Physical Database Design

Algorithms for Memory Hierarchies Lecture 2

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

More B-trees, Hash Tables, etc. CS157B Chris Pollett Feb 21, 2005.

Indexing and Hashing

Comp 335 File Structures. B - Trees

Search Trees. COMPSCI 355 Fall 2016

Red-Black Trees. Based on materials by Dennis Frey, Yun Peng, Jian Chen, and Daniel Hood

Introduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana

Advanced Set Representation Methods

Computational Optimization ISE 407. Lecture 16. Dr. Ted Ralphs

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology

Transcription:

Yufei Tao ITEE University of Queensland

Before ascending into d-dimensional space R d with d > 1, this lecture will focus on one-dimensional space, i.e., d = 1. We will review the B-tree, which is a fundamental structure that can be used to process many types of queries on one-dimensional points.

Range Reporting Let S be a set of points in R. Given an interval q = [x, y], a range query returns S q, namely, all the points in S that are covered by q. Example Assume S = {1, 13, 17, 25, 36, 49, 52, 67}. For q = [5, 20], the result is {13, 17}. For q = [27, 30], the result is.

Computation model We assume that the input set S does not fit in memory, and thus needs to be stored in the disk. The disk has been formatted into blocks (also called pages), such that each block has the same size, e.g., 4096 bytes. An I/O either reads a block from the disk into memory, or writes a block of memory to the disk. We measure the cost of an operation in the number of I/Os that need to be performed.

Naively, we can solve any range query by scanning the whole S, but the cost is obviously prohibitive. The B-tree allows us to do much better.

B-tree Each leaf node has between B/2 and B data elements, where B is a parameter that is at least 3. The only exception takes place when the leaf is the root, in which case it can have any number of elements. All the leaf nodes are at the same level. Each internal node has between B/2 and B child nodes, except that the root can have as few as 2 child nodes.

B-tree (cont.) For any node u, denote by S u the set of elements in the subtree of u. Now let u be an internal node with child nodes v 1,..., v f (f B). All the data elements in S vi must be strictly smaller than any data element in S vj for any 1 i < j f. For each v i (i f ), u stores the smallest element r(v i ) in S vi, which is referred to as a routing element. Note that r(u) = r(v 1 ).

Example B = 3 u 1 1 71 u 2 u 3 1 17 49 71 84 1 13 17 25 36 49 52 67 71 77 84 95 u 4 u 5 u 6 u 7 u 8 In practice, the leaf nodes are connected in a doubly-linked list, sorted by their natural ordering.

Think: How to construct a B-tree if the dataset S has been sorted?

Successor and Predecessor Search Let S be a set of real values. The successor (or predecessor) of a value x in S is the smallest (or largest, resp.) value in S that is at least (most, resp.) x. For example, given S = {1, 13, 17, 25}, the successors of 10 and 13 are both 13, while the predecessor of 12 is 1. Given a B-tree T on S, we can find the successor of any x by visiting a single root-to-leaf path in T : 1 Set u to the root of T. 2 If u is a leaf, return the successor of x among the elements in u. 3 Otherwise, find the predecessor of x among the (routing) elements in u. Let it be the r(v) of some child node v of u. Set u to v and go to Step 2.

Example u 1, u 2, u 5 are accessed to retrieve the successor of 20. u 1 1 71 u 2 u 3 1 17 49 71 84 Cost = 3 I/Os. 1 13 17 25 36 49 52 67 71 77 84 95 u 4 u 5 u 6 u 7 u 8

Answering A Range Query q = [x, y] 1 Locate the leaf u containing the successor of x. 2 Report all elements in u that fall in q. 3 If no element in u is greater than y, set u to the succeeding leaf node, and go to Step 2.

Example The algorithm accesses u 1, u 2, u 5, u 6, and u 7 to answer a range query with q = [20, 75]. u 1 1 71 u 2 u 3 1 17 49 71 84 1 13 17 25 36 49 52 67 71 77 84 95 u 4 u 5 u 6 u 7 u 8 Think: How should the algorithm be adapted if leaf nodes are not linked together?

Insertion To insert a value x into a B-tree: 1 Find the leaf node u that should accommodate x without violating the B-tree definition. Add x to u. 2 If u has no more than B elements, the insertion is complete. Otherwise, u verflows.

Overflow Handling Let u be the node that overflows. 1 Create a new node u. 2 Split the elements of u in two halves, respecting their ordering. Put the first (second) half in u (u ). 3 Insert r(u ) into the parent p of u. 4 If p has no more than B elements, done. Otherwise, handle the overflow at p in the same way. The changes may propagate all the way up. If the root splits, create a new root.

Example The insertion of 20 makes u 5 overflow. u 1 1 71 u 2 u 3 1 17 49 71 84 1 13 17 20 25 36 49 52 67 71 77 84 95 u 4 u 5 u 6 u 7 u 8

Example (cont.) u 5 splits, and generates u 5. u 1 1 71 u 2 u 3 1 17 49 71 84 1 13 17 20 25 36 49 52 67 71 77 84 95 u 4 u 5 u 6 u 7 u 8 u 5

Example (cont.) r(u 5 ) = 25 is added to u 2, which overflows. u 1 1 71 u 2 u 3 1 17 25 49 71 84 1 13 17 20 25 36 49 52 67 71 77 84 95 u 4 u 5 u 6 u 7 u 8 u 5

Example (cont.) u 2 splits, and generates u 2. u 1 1 71 u 2 u 2 u 3 1 17 25 49 71 84 1 13 17 20 25 36 49 52 67 71 77 84 95 u 4 u 5 u 6 u 7 u 8 u 5

Example (cont.) p(u 2 ) is added the root. Done! u 1 1 25 71 u 2 u 3 1 17 25 49 u 2 71 84 1 13 17 20 25 36 49 52 67 71 77 84 95 u 4 u 5 u 6 u 7 u 8 u 5

Deletion To delete a value x from a B-tree: 1 Find the leaf node u that contains x. Remove x from u. 2 If x is the smallest element in u, adjust the routing elements in the ancestors of u appropriately. 3 If u is the root or has at least B/2 elements, the deletion is complete. Otherwise, u underflows.

Underflow Handling Let u be the node that underflows, and u be the right sibling of u (if u does not have a right sibling, set u to its left sibling, and swap u and u in the below): 1 If u and u contain no more than B elements in total, perform a merge: Put all the elements in u. Remove r(u ) from the parent p of u. If p underflows, handle it in the same way. 2 Otherwise, perform a share: Distribute the elements in u and u equally between them. Modify r(u ) in p. If the root ends up having only one child, remove the root.

Example Deletion of 77 makes u 7 underflow. u 1 1 71 u 2 u 3 1 17 49 71 84 1 13 17 25 36 49 52 67 71 77 84 95 u 4 u 5 u 6 u 7 u 8

Example (cont.) Merge u 7 and u 7, and remove r(u 7 ) = 84 from p 3, causing the underflow of u 3. u 1 1 71 u 2 u 3 1 17 49 71 84 1 13 17 25 36 49 52 67 71 84 95 u 4 u 5 u 6 u 7

Example (cont.) Perform a share between u 2 and u 3. Update r(u 3 ) = 49 in u 1. u 1 1 49 u 2 u 3 1 17 49 71 1 13 17 25 36 49 52 67 71 84 95 u 4 u 5 u 6 u 7