Background: disk access vs. main memory access (1/2)

Similar documents
M-ary Search Tree. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. B-Trees (4.7 in Weiss)

M-ary Search Tree. B-Trees. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. Maximum branching factor of M Complete tree has height =

Module 4: Index Structures Lecture 13: Index structure. The Lecture Contains: Index structure. Binary search tree (BST) B-tree. B+-tree.

Physical Level of Databases: B+-Trees

Multiway searching. In the worst case of searching a complete binary search tree, we can make log(n) page faults Everyone knows what a page fault is?

CSE 326: Data Structures B-Trees and B+ Trees

CSE 530A. B+ Trees. Washington University Fall 2013

Main Memory and the CPU Cache

C SCI 335 Software Analysis & Design III Lecture Notes Prof. Stewart Weiss Chapter 4: B Trees

CS350: Data Structures B-Trees

B-Trees. Disk Storage. What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree. Deletion in a B-tree

Comp 335 File Structures. B - Trees

Balanced Search Trees

B-Trees & its Variants

CS 350 : Data Structures B-Trees

Trees. Courtesy to Goodrich, Tamassia and Olga Veksler

amiri advanced databases '05

CS F-11 B-Trees 1

THE B+ TREE INDEX. CS 564- Spring ACKs: Jignesh Patel, AnHai Doan

The B-Tree. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland

Intro to DB CHAPTER 12 INDEXING & HASHING

Extra: B+ Trees. Motivations. Differences between BST and B+ 10/27/2017. CS1: Java Programming Colorado State University

Multi-way Search Trees. (Multi-way Search Trees) Data Structures and Programming Spring / 25

An AVL tree with N nodes is an excellent data. The Big-Oh analysis shows that most operations finish within O(log N) time

Introduction to Indexing R-trees. Hong Kong University of Science and Technology

Indexing: B + -Tree. CS 377: Database Systems

(2,4) Trees. 2/22/2006 (2,4) Trees 1

Trees. Reading: Weiss, Chapter 4. Cpt S 223, Fall 2007 Copyright: Washington State University

Introduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana

(2,4) Trees Goodrich, Tamassia (2,4) Trees 1

Data Structures and Algorithms

Material You Need to Know

B-Trees. Version of October 2, B-Trees Version of October 2, / 22

Indexing and Hashing

Remember. 376a. Database Design. Also. B + tree reminders. Algorithms for B + trees. Remember

CSIT5300: Advanced Database Systems

Chapter 12: Indexing and Hashing (Cnt(

CSE332: Data Abstractions Lecture 7: B Trees. James Fogarty Winter 2012

B-Tree. CS127 TAs. ** the best data structure ever

CSE 373 OCTOBER 25 TH B-TREES

Chapter 12: Indexing and Hashing. Basic Concepts

Find the block in which the tuple should be! If there is free space, insert it! Otherwise, must create overflow pages!

CS143: Index. Book Chapters: (4 th ) , (5 th ) , , 12.10

Chapter 12: Indexing and Hashing

Topics to Learn. Important concepts. Tree-based index. Hash-based index

Indexing Methods. Lecture 9. Storage Requirements of Databases

(2,4) Trees Goodrich, Tamassia. (2,4) Trees 1

Multiway Search Trees

Chapter 11: Indexing and Hashing

Spring 2017 B-TREES (LOOSELY BASED ON THE COW BOOK: CH. 10) 1/29/17 CS 564: Database Management Systems, Jignesh M. Patel 1

B-Trees. Introduction. Definitions

Tree-Structured Indexes

Tree-Structured Indexes (Brass Tacks)

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 6 - Storage and Indexing

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology

Tree-Structured Indexes

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Data Organization B trees

Laboratory Module X B TREES

Motivation for B-Trees

B-Trees and External Memory

Tree-Structured Indexes

CS 310 B-trees, Page 1. Motives. Large-scale databases are stored in disks/hard drives.

B-Trees and External Memory

Announcements. Reading Material. Recap. Today 9/17/17. Storage (contd. from Lecture 6)

Multi-way Search Trees

CS Fall 2010 B-trees Carola Wenk

Algorithms. Deleting from Red-Black Trees B-Trees

Trees can be used to store entire records from a database, serving as an in-memory representation of the collection of records in a file.

Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

8. Binary Search Tree

Lecture 13. Lecture 13: B+ Tree

CS 525: Advanced Database Organization 04: Indexing

CMPS 2200 Fall 2017 B-trees Carola Wenk

Principles of Data Management. Lecture #5 (Tree-Based Index Structures)

Section 4 SOLUTION: AVL Trees & B-Trees

Tree-Structured Indexes

Search Trees - 1 Venkatanatha Sarma Y

Tree-Structured Indexes

Tree-Structured Indexes

Chapter 11: Indexing and Hashing" Chapter 11: Indexing and Hashing"

CARNEGIE MELLON UNIVERSITY DEPT. OF COMPUTER SCIENCE DATABASE APPLICATIONS

Tree-Structured Indexes ISAM. Range Searches. Comments on ISAM. Example ISAM Tree. Introduction. As for any index, 3 alternatives for data entries k*:

Data Structure - Advanced Topics in Tree -

More B-trees, Hash Tables, etc. CS157B Chris Pollett Feb 21, 2005.

Introduction. Choice orthogonal to indexing technique used to locate entries K.

Suppose you are accessing elements of an array: ... or suppose you are dereferencing pointers: temp->next->next = elem->prev->prev;

Tree-Structured Indexes. Chapter 10

Trees. (Trees) Data Structures and Programming Spring / 28

Indexing. Announcements. Basics. CPS 116 Introduction to Database Systems

Problem. Indexing with B-trees. Indexing. Primary Key Indexing. B-trees: Example. B-trees. primary key indexing

What is a Multi-way tree?

Database Systems. File Organization-2. A.R. Hurson 323 CS Building

Indexing. Chapter 8, 10, 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Lecture 3: B-Trees. October Lecture 3: B-Trees

Overview of Storage and Indexing

Lecture 11: Multiway and (2,4) Trees. Courtesy to Goodrich, Tamassia and Olga Veksler

CS127: B-Trees. B-Trees

I/O-Algorithms Lars Arge

Multiway Search Trees. Multiway-Search Trees (cont d)

Transcription:

4.4 B-trees Disk access vs. main memory access: background B-tree concept Node structure Structural properties Insertion operation Deletion operation Running time 66 Background: disk access vs. main memory access (1/2) One disk page access takes the same time as executing several hundred thousands of machine instructions. Accessing one random disk page takes about 10 milliseconds. Accessing the next consecutive disk page takes much less. Overall, disk access takes about 5 milliseconds per page. So, performance measure for a disk-resident data structure is the disk I/O cost. 67 1

Background: disk access vs. main memory access (2/2) A simple measure of the disk I/O cost is the number of disk pages accessed. Access = read or write Performance metric for a main-memory data structure is the CPU cost. A simple measure of the CPU cost is the number of machine instructions, the number of primitive operations (e.g., comparisons between two elements), etc. 68 B-tree concept B-tree is: a disk-resident tree. a multi-way balanced tree for external (i.e., on disk) searching. balanced to minimize the disk I/O cost. A B-tree node represents a disk page. The page size is typically an order of hundred to thousand bytes (e.g., 512, 1024,, 8192). 69 2

B-tree structure: example Figure 4.60 B-tree of order 5 In practice, the order is much higher than 5. 70 B-tree node structure A B-tree internal node is an index node made of entries with two types of fields: p i (0 i M-1) is a pointer (i.e., link). k i (1 i M-1) is a key. (Internal) Node p 0 k 1 p 1 k 2 p 2 k M-1 p M-1 class Entry { Key k; Node p; } class Node { Node p; Entry a[1..m-1];} 71 3

B-tree node structure A B-tree leaf node is a data node made of data records. Note. What s called a leaf node in the textbook is actually not part of a B-tree. It represents a data record on which a B-tree is built as an index structure. (Well, let s follow the textbook anyway ) (Leaf) Node r 1 r 2 r L class Entry { } // data record class Node { Entry a[1..l]; } 72 B-tree structural properties Every node is between 50% to 100% full. A leaf node contains between L/2 and L data items. (L: load factor of a leaf node) An internal node (except the root) contains M/2 to M pointers (i.e., (M-1)/2 to (M-1) keys). (M: load factor of an internal node) The i th key value is the smallest key value in the (i+1) th subtree. The root node is either a leaf node or contains 2 to M pointers (i.e., 1 to M-1 keys). 73 4

B-tree parameters Consider disk page size B= 8Kbytes = 8192 bytes pointer size P = 4 bytes key size K = 8 bytes data record size R = 520 bytes Then, for a leaf node, L = B/R = 15. for an internal node, M = (B-P)/(P+K) +1 = 683. 74 B-tree node merge and split The key to keeping a B-tree balanced is the node split (when insertion) and node merge (when deletion) operations. If a node is 100% full, then split it before inserting a new key. If a node is 50% full, then merge it with a neighboring node before deleting another key. 75 5

Side note: bounds for split or merge In some environment (e.g., DBMS), users are allowed to adjust the lower bound and the upper bound to something different from 50% and 100%, respectively. In that case, the upper bound and the lower bound can be set to control the overhead of node split or merge. If the upper bound is too low, node split occurs too frequently! If the lower bound is too high, node merge occurs too frequently! 76 B-tree insertion algorithm Assume duplicate keys are ignored. 1. Search the tree for a leaf node into which to insert the key, and insert it if the key is new. 2. If the node overflows as a result, then (1) split the node (i.e., acquire a new node) (2) move half to the new node (3) if the node is a leaf, copy the smallest key of the new node to the parent; if an internal node, move the smallest key of the new node to the parent. If the parent node overflows as a result, repeat the step 2. 77 6

B-tree insertion example (1/4) Figure 4.60 Insert 57. 78 B-tree insertion example (2/4) Figure 4.61 Now, insert 55. 79 7

B-tree insertion example (3/4) Figure 4.62 Inserting 55 caused a split into two leaf nodes. Now, insert 40. 80 B-tree insertion example (4/4) Figure 4.63 Inserting 40 caused a split into two leaf nodes and then a split of the parent node. 81 8

B-tree deletion algorithm 1. Search for a leaf node containing the key to be deleted and, if found, remove the key from the node. 2. If the node is less than half full as a result, (1) Look to the siblings for one that has enough to share (i.e., more than half left after giving) (2) If neither can afford it, merge with one of them and remove the smallest key of the larger sibling from the parent node. (Here, the larger sibling refers to the one with the larger keys.) 3. If the parent node has less than half left as a result, then repeat the step 2. 82 B-tree deletion example Figure 4.64 B-tree after deleting 99 from the B-tree in Figure 4.63 The last node at the leaf level underflows, and so merges with its neighbor. This merge then causes its parent to underflow, so the parent gets one entry from its neighbor. 83 9

Running time of B-tree operations Given a B-tree of order M with N data nodes, the number of disk pages accessed for search equals the height of the B- tree. The height is log M/2 N in the worst case (every node is half full). log M N in the best case (every node is completely full). log (2/3)M N in the average case confirmed empirically by repeating a large number of random insertions and deletions. Q: What is the average height if N = 1000000 and M = 500? A: log (2/3)500 1000000 = 2.78 = 3. (Only three for a million data nodes!) 84 The End 85 10