CSE 326: Data Structures B-Trees and B+ Trees

Similar documents
M-ary Search Tree. B-Trees. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. Maximum branching factor of M Complete tree has height =

M-ary Search Tree. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. B-Trees (4.7 in Weiss)

Module 4: Index Structures Lecture 13: Index structure. The Lecture Contains: Index structure. Binary search tree (BST) B-tree. B+-tree.

CSE332: Data Abstractions Lecture 7: B Trees. James Fogarty Winter 2012

Background: disk access vs. main memory access (1/2)

CSE 373 OCTOBER 25 TH B-TREES

B-Trees. Disk Storage. What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree. Deletion in a B-tree

Multiway searching. In the worst case of searching a complete binary search tree, we can make log(n) page faults Everyone knows what a page fault is?

(2,4) Trees Goodrich, Tamassia (2,4) Trees 1

CSE 326: Data Structures Lecture #11 B-Trees

What is a Multi-way tree?

CS350: Data Structures B-Trees

An AVL tree with N nodes is an excellent data. The Big-Oh analysis shows that most operations finish within O(log N) time

(2,4) Trees. 2/22/2006 (2,4) Trees 1

Comp 335 File Structures. B - Trees

B-Trees. Version of October 2, B-Trees Version of October 2, / 22

Multi-way Search Trees. (Multi-way Search Trees) Data Structures and Programming Spring / 25

Recall: Properties of B-Trees

CS 350 : Data Structures B-Trees

Sorted Arrays. Operation Access Search Selection Predecessor Successor Output (print) Insert Delete Extract-Min

CSE 530A. B+ Trees. Washington University Fall 2013

Binary Search Trees. Analysis of Algorithms

Trees. Courtesy to Goodrich, Tamassia and Olga Veksler

Trees. Reading: Weiss, Chapter 4. Cpt S 223, Fall 2007 Copyright: Washington State University

Data Structures and Algorithms

Lecture 11: Multiway and (2,4) Trees. Courtesy to Goodrich, Tamassia and Olga Veksler

Augmenting Data Structures

CS Fall 2010 B-trees Carola Wenk

Introduction. for large input, even access time may be prohibitive we need data structures that exhibit times closer to O(log N) binary search tree

Search Trees - 1 Venkatanatha Sarma Y

(2,4) Trees Goodrich, Tamassia. (2,4) Trees 1

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology

Algorithms. Deleting from Red-Black Trees B-Trees

B-Trees and External Memory

Balanced Search Trees

Section 4 SOLUTION: AVL Trees & B-Trees

B-Trees and External Memory

Multi-Way Search Trees

B+ Tree Review. CSE332: Data Abstractions Lecture 10: More B Trees; Hashing. Can do a little better with insert. Adoption for insert

Balanced Trees Part One

Multi-Way Search Trees

Splay Trees. (Splay Trees) Data Structures and Programming Spring / 27

Search Trees. COMPSCI 355 Fall 2016

C SCI 335 Software Analysis & Design III Lecture Notes Prof. Stewart Weiss Chapter 4: B Trees

Trees. (Trees) Data Structures and Programming Spring / 28

CMPS 2200 Fall 2017 B-trees Carola Wenk

Binary Trees

CSE 326: Data Structures Splay Trees. James Fogarty Autumn 2007 Lecture 10

Multiway Search Trees. Multiway-Search Trees (cont d)

Uses for Trees About Trees Binary Trees. Trees. Seth Long. January 31, 2010

Main Memory and the CPU Cache

CS F-11 B-Trees 1

CS127: B-Trees. B-Trees

Course Review for. Cpt S 223 Fall Cpt S 223. School of EECS, WSU

Multi-way Search Trees! M-Way Search! M-Way Search Trees Representation!

Physical Level of Databases: B+-Trees

Extra: B+ Trees. Motivations. Differences between BST and B+ 10/27/2017. CS1: Java Programming Colorado State University

Sample Exam 1 Questions

B-Trees. CS321 Spring 2014 Steve Cutchin

Material You Need to Know

THE B+ TREE INDEX. CS 564- Spring ACKs: Jignesh Patel, AnHai Doan

CS 310 B-trees, Page 1. Motives. Large-scale databases are stored in disks/hard drives.

EE 368. Weeks 5 (Notes)

Motivation for B-Trees

CIS265/ Trees Red-Black Trees. Some of the following material is from:

Lecture 3: B-Trees. October Lecture 3: B-Trees

2-3 and Trees. COL 106 Shweta Agrawal, Amit Kumar, Dr. Ilyas Cicekli

B-Trees & its Variants

The B-Tree. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland

Chapter 12: Indexing and Hashing (Cnt(

Introduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana

AVL Trees (10.2) AVL Trees

Database Systems. File Organization-2. A.R. Hurson 323 CS Building

Disk Accesses. CS 361, Lecture 25. B-Tree Properties. Outline

Programming II (CS300)

B-Trees. Based on materials by D. Frey and T. Anastasio

Trees : Part 1. Section 4.1. Theory and Terminology. A Tree? A Tree? Theory and Terminology. Theory and Terminology

CS 206 Introduction to Computer Science II

Binary Trees, Binary Search Trees

Ch04 Balanced Search Trees

Balanced Binary Search Trees. Victor Gao

I/O-Algorithms Lars Arge

2-3 Tree. Outline B-TREE. catch(...){ printf( "Assignment::SolveProblem() AAAA!"); } ADD SLIDES ON DISJOINT SETS

Algorithms. AVL Tree

Trees. Eric McCreath

CS350: Data Structures Red-Black Trees

Tree-Structured Indexes

CSE 373 OCTOBER 23 RD MEMORY AND HARDWARE

Search Trees. The term refers to a family of implementations, that may have different properties. We will discuss:

Notes slides from before lecture. CSE 21, Winter 2017, Section A00. Lecture 10 Notes. Class URL:

CSC148 Week 7. Larry Zhang

CISC 235: Topic 4. Balanced Binary Search Trees

Chapter 11: Indexing and Hashing

Tree-Structured Indexes

TREES. Trees - Introduction

Suppose you are accessing elements of an array: ... or suppose you are dereferencing pointers: temp->next->next = elem->prev->prev;

Indexing: B + -Tree. CS 377: Database Systems

CSE 332 Autumn 2013: Midterm Exam (closed book, closed notes, no calculators)

Announcements. Midterm exam 2, Thursday, May 18. Today s topic: Binary trees (Ch. 8) Next topic: Priority queues and heaps. Break around 11:45am

B-Tree. CS127 TAs. ** the best data structure ever

Transcription:

Announcements (2/4/09) CSE 26: Data Structures B-Trees and B+ Trees Midterm on Friday Special office hour: 4:00-5:00 Thursday in Jaech Gallery (6 th floor of CSE building) This is in addition to my usual 11am office hour. Steve Seitz Winter 2009 2 Traversing very large datasets Suppose we had very many pieces of data (as in a database), e.g., n = 2 10 9. How many (worst case) hops through the tree to find a node? BST AVL Memory considerations What is in a tree node? In an object? Node: Object obj; Node left; Node right; Node parent; Object: Key key; data Splay Suppose the data is 1KB. How much space does the tree take? How much of the data can live in 1GB of RAM? 4

CPU Cycles to access: Registers 1 L1 Cache 2 Minimizing random disk access In our example, almost all of our data structure is on disk. L2 Cache Thus, hopping through a tree amounts to random accesses to disk. Ouch! Main memory 250 How can we address this problem? Disk Random:,000,000 Streamed: 5000 5 6 Complete tree has height: M-ary Search Tree Suppose we devised a search tree with branching factor M: B+ Trees (book calls these B-trees) Each internal node still has (up to) M-1 keys: Order property: subtree between two keys x and y contain leaves with values v such that x v < y Note the Leaf nodes have up to L sorted keys. 7 21 # hops for find: Runtime of find: 7 x< x<7 7 x< x<21 21 x 8

B+ Tree Structure Properties Root (special case) has between 2 and M children (or root could be a leaf) Internal nodes store up to M-1 keys have between M/2 and M children Leaf nodes where data is stored all at the same depth contain between L/2 and L data items B+ Tree: Example B+ Tree with M = 4 (# pointers in internal node) and L = 5 (# data items in leaf) Data objects 44 which I ll ignore in slides 6 20 27 4 50 1, AB.. 2, GH.. 4, XY.. 6 8 9 10 17 19 20 22 24 27 28 2 4 8 9 41 44 47 49 50 60 70 All leaves at the same depth 9 Definition for later: neighbor is the next sibling to the left or right. 10 Disk Friendliness What makes B+ trees disk-friendly? 1. Many keys stored in a node All brought to memory/cache in one disk access. 2. Internal nodes contain only keys; Only leaf nodes contain keys and actual data Much of tree structure can be loaded into memory irrespective of data object size Data actually resides in disk B+ trees vs. AVL trees Suppose again we have n = 2 10 9 items: Depth of AVL Tree Depth of B+ Tree with M = 256, L = 256 Great, but how to we actually make a B+ tree and keep it balanced? 11

Building a B+ Tree with Insertions Insert() Insert() Insert() Insert() The empty B-Tree M = L = 1 M = L = 2 2 Insert(2) Insert(6) 2 6 Insert() 2 6 Insert() 2 2 6 M = L = M = L =

Insertion Algorithm M = L = 2 2 6 Insert(,,,8) 2 2 6 8 17 1. Insert the key in its leaf in sorted order 2. If the leaf ends up with L+1 items, overflow! Split the leaf into two nodes: original with (L+1)/2 smaller keys new one with (L+1)/2 larger keys Add the new child to the parent If the parent ends up with M+1 children, overflow! This makes the tree deeper!. If an internal node ends up with M+1 children, overflow! Split the node into two nodes: original with (M+1)/2 children with smaller keys new one with (M+1)/2 children with larger keys Add the new child to the parent If the parent ends up with M+1 items, overflow! 4. Split an overflowed root in two and hang the new nodes under a new root 5. Propagate keys up tree. And Now for Deletion Delete(2) Delete() 2 6 6 2 6 8 6 8 6 8 M = L = 19 M = L = 20

Delete() Delete() 6 6 6 6 6 6 8 8 8 M = L = 21 M = L = 22 6 Delete() 6 6 Delete() 6 6 6 6 6 8 8 8 8 M = L = 2 M = L = 24

Deletion Algorithm 6 1. Remove the key from its leaf 6 8 2. If the leaf ends up with fewer than L/2 items, underflow! Adopt data from a neighbor; update the parent If adopting won t work, delete node and merge with neighbor If the parent ends up with fewer than M/2 children, underflow! M = L = 25 26 Deletion Slide Two. If an internal node ends up with fewer than M/2 children, underflow! Adopt from a neighbor; update the parent If adoption won t work, merge with neighbor If the parent ends up with fewer than M/2 children, underflow! 4. If the root ends up with only one child, make the child the new root of the tree This reduces the height of the tree! Thinking about B+ Trees B+ Tree insertion can cause (expensive) splitting and propagation up the tree B+ Tree deletion can cause (cheap) adoption or (expensive) merging and propagation up the tree Split/merge/propagation is rare if M and L are large (Why?) Pick branching factor M and data items/leaf L such that each node takes one full page/block of memory/disk. 5. Propagate keys up through tree. 27 28

Complexity Tree Names You Might Encounter Find: Insert: find: Insert in leaf: split/propagate up: B-Trees More general form of B+ trees, allows data at internal nodes too Range of children is (key1,key2) rather than [key1, key2) B-Trees with M =, L = x are called 2- trees Internal nodes can have 2 or children B-Trees with M = 4, L = x are called 2--4 trees Internal nodes can have 2,, or 4 children Claim: O(M) costs are negligible 29