THE B+ TREE INDEX. CS 564- Spring ACKs: Jignesh Patel, AnHai Doan

Similar documents
Lecture 13. Lecture 13: B+ Tree

Spring 2017 B-TREES (LOOSELY BASED ON THE COW BOOK: CH. 10) 1/29/17 CS 564: Database Management Systems, Jignesh M. Patel 1

Tree-Structured Indexes

Tree-Structured Indexes

Tree-Structured Indexes. Chapter 10

Introduction. Choice orthogonal to indexing technique used to locate entries K.

CSE 530A. B+ Trees. Washington University Fall 2013

Tree-Structured Indexes

Announcements. Reading Material. Recap. Today 9/17/17. Storage (contd. from Lecture 6)

Tree-Structured Indexes

CSIT5300: Advanced Database Systems

Tree-Structured Indexes ISAM. Range Searches. Comments on ISAM. Example ISAM Tree. Introduction. As for any index, 3 alternatives for data entries k*:

Introduction to Data Management. Lecture 15 (More About Indexing)

Introduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana

Principles of Data Management. Lecture #5 (Tree-Based Index Structures)

Tree-Structured Indexes. A Note of Caution. Range Searches ISAM. Example ISAM Tree. Introduction

Tree-Structured Indexes

Tree-Structured Indexes

Tree-Structured Indexes

Lecture 8 Index (B+-Tree and Hash)

Administrivia. Tree-Structured Indexes. Review. Today: B-Tree Indexes. A Note of Caution. Introduction

Physical Level of Databases: B+-Trees

Tree-Structured Indexes (Brass Tacks)

Chapter 12: Indexing and Hashing (Cnt(

Module 4: Index Structures Lecture 13: Index structure. The Lecture Contains: Index structure. Binary search tree (BST) B-tree. B+-tree.

Introduction to Data Management. Lecture 21 (Indexing, cont.)

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 6 - Storage and Indexing

Department of Computer Science University of Cyprus EPL446 Advanced Database Systems. Lecture 6. B+ Trees: Structure and Functions

CSC 261/461 Database Systems Lecture 17. Fall 2017

I think that I shall never see A billboard lovely as a tree. Perhaps unless the billboards fall I ll never see a tree at all.

Extra: B+ Trees. Motivations. Differences between BST and B+ 10/27/2017. CS1: Java Programming Colorado State University

EXTERNAL SORTING. CS 564- Spring ACKs: Dan Suciu, Jignesh Patel, AnHai Doan

Goals for Today. CS 133: Databases. Example: Indexes. I/O Operation Cost. Reason about tradeoffs between clustered vs. unclustered tree indexes

CSE 444: Database Internals. Lectures 5-6 Indexing

Indexing. Chapter 8, 10, 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Background: disk access vs. main memory access (1/2)

B-Tree. CS127 TAs. ** the best data structure ever

Multi-way Search Trees. (Multi-way Search Trees) Data Structures and Programming Spring / 25

amiri advanced databases '05

(2,4) Trees. 2/22/2006 (2,4) Trees 1

Overview of Storage and Indexing

Indexes. File Organizations and Indexing. First Question to Ask About Indexes. Index Breakdown. Alternatives for Data Entries (Contd.

Intro to DB CHAPTER 12 INDEXING & HASHING

Indexing: B + -Tree. CS 377: Database Systems

Lecture 12. Lecture 12: The IO Model & External Sorting

Database Applications (15-415)

Multiway searching. In the worst case of searching a complete binary search tree, we can make log(n) page faults Everyone knows what a page fault is?

Remember. 376a. Database Design. Also. B + tree reminders. Algorithms for B + trees. Remember

Kathleen Durant PhD Northeastern University CS Indexes

Introduction to Indexing R-trees. Hong Kong University of Science and Technology

Main Memory and the CPU Cache

Multi-way Search Trees! M-Way Search! M-Way Search Trees Representation!

CSE 544 Principles of Database Management Systems

Data Structures and Algorithms

Material You Need to Know

M-ary Search Tree. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. B-Trees (4.7 in Weiss)

CS 350 : Data Structures B-Trees

2-3 Tree. Outline B-TREE. catch(...){ printf( "Assignment::SolveProblem() AAAA!"); } ADD SLIDES ON DISJOINT SETS

Lecture 12. Lecture 12: Access Methods

CS350: Data Structures B-Trees

INDEXES MICHAEL LIUT DEPARTMENT OF COMPUTING AND SOFTWARE MCMASTER UNIVERSITY

Lecture 4. ISAM and B + -trees. Database Systems. Tree-Structured Indexing. Binary Search ISAM. B + -trees

(2,4) Trees Goodrich, Tamassia. (2,4) Trees 1

M-ary Search Tree. B-Trees. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. Maximum branching factor of M Complete tree has height =

2-3 and Trees. COL 106 Shweta Agrawal, Amit Kumar, Dr. Ilyas Cicekli

Chapter 11: Indexing and Hashing

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

(2,4) Trees Goodrich, Tamassia (2,4) Trees 1

Trees. Reading: Weiss, Chapter 4. Cpt S 223, Fall 2007 Copyright: Washington State University

Data Organization B trees

Chapter 11: Indexing and Hashing

Database Systems. File Organization-2. A.R. Hurson 323 CS Building

Indexing. Jan Chomicki University at Buffalo. Jan Chomicki () Indexing 1 / 25

Chapter 12: Indexing and Hashing. Basic Concepts

RELATIONAL OPERATORS #1

Problem. Indexing with B-trees. Indexing. Primary Key Indexing. B-trees: Example. B-trees. primary key indexing

Chapter 12: Indexing and Hashing

Information Systems (Informationssysteme)

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Design and Analysis of Algorithms Lecture- 9: B- Trees

Find the block in which the tuple should be! If there is free space, insert it! Otherwise, must create overflow pages!

CS143: Index. Book Chapters: (4 th ) , (5 th ) , , 12.10

Module 4: Tree-Structured Indexing

Chapter 11: Indexing and Hashing

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department

Topics to Learn. Important concepts. Tree-based index. Hash-based index

Chapter 4. ISAM and B + -trees. Architecture and Implementation of Database Systems Summer 2016

CS127: B-Trees. B-Trees

STORING DATA: DISK AND FILES

Selection Queries. to answer a selection query (ssn=10) needs to traverse a full path.

Chapter 12: Indexing and Hashing

CSE 373 OCTOBER 25 TH B-TREES

Chapter 4. ISAM and B + -trees. Architecture and Implementation of Database Systems Winter 2010/11

Chapter 4. ISAM and B + -trees. Architecture and Implementation of Database Systems Summer 2013

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Spring 2017 EXTERNAL SORTING (CH. 13 IN THE COW BOOK) 2/7/17 CS 564: Database Management Systems; (c) Jignesh M. Patel,

CS301 - Data Structures Glossary By

Chapter 18 Indexing Structures for Files. Indexes as Access Paths

B-Trees. Disk Storage. What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree. Deletion in a B-tree

CSE 326: Data Structures B-Trees and B+ Trees

Transcription:

THE B+ TREE INDEX CS 564- Spring 2018 ACKs: Jignesh Patel, AnHai Doan

WHAT IS THIS LECTURE ABOUT? The B+ tree index Basics Search/Insertion/Deletion Design & Cost 2

INDEX RECAP We have the following query: SELECT * FROM Sales WHERE price > 100 ; How do we organize the file to answer this query efficiently? 3

INDEXES Hash index: good for equality search in expectation constant I/O cost for search and insert B+ tree index: good for range and equality search 4

B+ TREE BASICS 5

THE B+ TREE INDEX a dynamic tree-structured index adjusted to be always height-balanced 1 node = 1 physical page supports efficient equality and range search widely used in many DBMSs SQLite uses it as the default index SQL Server, DB2, 6

B+ TREE INDEX: BASIC STRUCTURE root node a node corresponds to a disk page non-leaf nodes leaf nodes data entries exist only in the leaf nodes are sorted according to the search key 7

B+ TREE: NODE Parameter d is the order of the tree Each non-leaf node contains d m 2d entries minimum 50% occupancy at all times The root node can have 1 m 2d entries k 1 k 2 k m 8

NON-LEAF NODES An non-leaf (or internal) node with m entries has m+1 pointers to lower-level nodes k 1 k 2 k m pointer to a page with values < k 1 pointer to a page with k 1 values < k 2 pointer to a page with values k m 9

LEAF NODES A leaf node with m entries has m pointers to the data records (rids) pointers to the next and previous leaves r 1 k 1 r 2 k 2 r m k m pointer to the previous page pointer to the next page record record record 10

B+ TREE OPERATIONS 11

B+ TREE OPERATIONS A B+ tree supports the following operations: equality search range search insert delete bulk loading 12

SEARCH start from the root node examine the index entries in non-leaf nodes to find the correct child traverse down the tree until a leaf node is reached for equality search, we are done for range search, traverse the leaves sequentially using the previous/next pointers 13

EQUALITY SEARCH: EXAMPLE 34<60 60 key = 34 20 40 70 80 90 5 12 25 34 35 55 62 63 64 14

EQUALITY SEARCH: EXAMPLE 34<60 60 key = 34 20 34<40 20 40 70 80 90 5 12 25 34 35 55 62 63 64 To locate the correct data entry in the leaf node, we can do either linear or binary search 15

RANGE SEARCH: EXAMPLE 34<60 60 34 key 63 20 34<40 20 40 70 80 90 5 12 25 34 35 55 62 63 64 After we find the leftmost point of the range, we traverse sequentially! 16

INSERT find correct leaf node L insert data entry in L If L has enough space, DONE! Else, we must split L (into L and a new node L ) redistribute entries evenly, copy up the middle key insert index entry pointing to L into parent of L This can propagate recursively to other nodes! to split a non-leaf node, redistribute entries evenly, but push up the middle key 17

INSERT: EXAMPLE order d = 2 insert 8 13 17 24 30 2 3 5 7 14 16 19 20 22 24 27 29 33 34 38 39 18

INSERT: EXAMPLE order d = 2 insert 8 the leaf node is full so we must split it! 13 17 24 30 2 3 5 7 14 16 19 20 22 24 27 29 33 34 38 39 2 3 5 7 8 d entries d+1 entries 19

INSERT: EXAMPLE order d = 2 insert 8 the middle key (5) must be copied up, but the root node is full as well! 5 13 17 24 30 2 3 5 7 8 14 16 19 20 22 24 27 29 33 34 38 39 20

INSERT: EXAMPLE order d = 2 insert 8 17 the new middle key (17) is now pushed up only (and not copied) 5 13 24 30 2 3 5 7 8 14 16 19 20 22 24 27 29 33 34 38 39 21

INSERT PROPERTIES The B+ Tree insertion algorithm has several attractive qualities: ~ same cost as exact search it is self-balancing: the tree remains balanced (with respect to height) even after multiple insertsions 22

B+ TREE: DELETE find leaf node L where entry belongs remove the entry If L is at least half-full, DONE! If L has only d-1 entries, Try to re-distribute, borrowing from sibling If re-distribution fails, merge L and sibling If a merge occurred, we must delete an entry from the parent of L 23

DELETE : EXAMPLE 1 order d = 2 delete 22 17 5 13 24 30 2 3 5 7 8 14 16 19 20 22 24 27 29 33 34 38 39 since by deleting 22 the node remains half-full, we simply remove it 24

DELETE : EXAMPLE 1 order d = 2 delete 22 17 5 13 24 30 2 3 5 7 8 14 16 19 20 24 27 29 33 34 38 39 25

DELETE : EXAMPLE 2 order d = 2 delete 20 17 5 13 24 30 2 3 5 7 8 14 16 19 20 24 27 29 33 34 38 39 by removing 20 the node is not half-full anymore, so we attempt to redistribute! 26

DELETE : EXAMPLE 2 order d = 2 delete 20 17 the middle key is again copied up! 5 13 27 30 2 3 5 7 8 14 16 19 24 27 29 33 34 38 39 by removing 20 the node is not half-full anymore, so we attempt to redistribute! 27

DELETE : EXAMPLE 3 order d = 2 delete 24 17 the entry with key=27 must be removed as well 5 13 27 30 2 3 5 7 8 14 16 19 24 27 29 33 34 38 39 in this case, we have to merge nodes! 28

DELETE : EXAMPLE 3 order d = 2 delete 24 17 we are not done, since the resulting non-leaf node is not half-full! 5 13 30 2 3 5 7 8 14 16 19 27 29 33 34 38 39 29

DELETE : EXAMPLE 3 order d = 2 delete 24 we are not done, since the resulting non-leaf node is not half-full! 5 13 17 30 2 3 5 7 8 14 16 19 27 29 33 34 38 39 30

B+ TREE: DELETE Redistribution of entries can also be possible for the non-leaf nodes We can also try to redistribute using all siblings, and not only the neighboring one 31

DUPLICATES duplicate keys: many data entries with the same key value Solution 1: All entries with a given key value reside on a single page Use overflow pages Solution 2: Allow duplicate key values in data entries Modify search operation 32

B+ TREE DESIGN & COST 33

B+ TREE DESIGN How large is d? Example key size = 4 bytes pointer size = 8 bytes block size = 4096 bytes We want each node to fit on a single block/page 2d & 4 + 2d + 1 & 8 4096 d 170 34

B+ TREE: FANOUT Fanout: the number of pointers to child nodes coming out of a node compared to binary trees (fanout =2), B+ trees have a high fanout (between d+1 and 2d+1) high fanout -> smaller depth -> less I/O per search The fanout of B+ trees is dynamic, but we will often assume it is constant to come up with approximate equations 35

B+ TREES IN PRACTICE typical order: d = 100 typical fill factor = 67% average node fanout = 133 typical B+ tree capacities: height 4: 133 4 = 312,900,700 records height 3: 133 3 = 2,352,637 records It can often store the top levels in buffer pool: level 1 = 1 page = 8 KB level 2 = 133 pages = 1 MB level 3 = 17,689 pages = 133 MB The Fill-factor is the percent of available slots in the B+ Tree that are filled; it is usually < 1 to leave slack for (quicker) insertions! 36

COST MODEL FOR SEARCH Parameters: f = fanout, which is in [d+1, 2d+1] (assume it is constant for our cost model) N = total number of pages we need to index F = fill-factor (usually ~ 2/3) We need to index N/F pages. For different heights: h = 1 -> f pages h = 2 -> f 2 pages h = k -> f k pages height must be h = log 5 6 7 37

COST MODEL FOR SEARCH To do equality search: we read one page per level of the tree levels that we can fit in buffer are free! finally we read in the actual record I/O cost = log 5 6 7 L G + 1 If we have B available buffer pages, we can store L B levels of the B+ Tree in memory: L B is the number of levels such that the sum of all the levels nodes fit in the buffer: B 1 + f + + f = >?@ 38

COST MODEL FOR SEARCH To do range search: we read one page per level of the tree levels that we can fit in buffer are free! we read sequentially the pages in the range I/O cost = log 5 6 7 L G + OUT Here, OUT is the I/O cost of loading the additional leaf nodes we need to access + the IO cost of loading each page of the results. 39