Trees. Courtesy to Goodrich, Tamassia and Olga Veksler

Similar documents
Lecture 11: Multiway and (2,4) Trees. Courtesy to Goodrich, Tamassia and Olga Veksler

(2,4) Trees Goodrich, Tamassia (2,4) Trees 1

(2,4) Trees. 2/22/2006 (2,4) Trees 1

Multiway Search Trees. Multiway-Search Trees (cont d)

B-Trees. Disk Storage. What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree. Deletion in a B-tree

B-Trees and External Memory

B-Trees and External Memory

Multi-Way Search Trees

Search Trees - 1 Venkatanatha Sarma Y

Multi-way Search Trees! M-Way Search! M-Way Search Trees Representation!

Multiway searching. In the worst case of searching a complete binary search tree, we can make log(n) page faults Everyone knows what a page fault is?

Motivation for B-Trees

Multi-Way Search Trees

B-Trees. Version of October 2, B-Trees Version of October 2, / 22

Multi-way Search Trees. (Multi-way Search Trees) Data Structures and Programming Spring / 25

Module 4: Index Structures Lecture 13: Index structure. The Lecture Contains: Index structure. Binary search tree (BST) B-tree. B+-tree.

(2,4) Trees Goodrich, Tamassia. (2,4) Trees 1

M-ary Search Tree. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. B-Trees (4.7 in Weiss)

What is a Multi-way tree?

M-ary Search Tree. B-Trees. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. Maximum branching factor of M Complete tree has height =

Augmenting Data Structures

An AVL tree with N nodes is an excellent data. The Big-Oh analysis shows that most operations finish within O(log N) time

Background: disk access vs. main memory access (1/2)

Data Structure: Search Trees 2. Instructor: Prof. Young-guk Ha Dept. of Computer Science & Engineering

Balanced Search Trees

Balanced Trees Part One

CSE 530A. B+ Trees. Washington University Fall 2013

Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

CSE 373 OCTOBER 25 TH B-TREES

Algorithms. AVL Tree

2-3 Tree. Outline B-TREE. catch(...){ printf( "Assignment::SolveProblem() AAAA!"); } ADD SLIDES ON DISJOINT SETS

Multi-Way Search Tree

C SCI 335 Software Analysis & Design III Lecture Notes Prof. Stewart Weiss Chapter 4: B Trees

Data Structures and Algorithms

CS 350 : Data Structures B-Trees

CSE 326: Data Structures B-Trees and B+ Trees

CS350: Data Structures B-Trees

Comp 335 File Structures. B - Trees

Lecture 3: B-Trees. October Lecture 3: B-Trees

Uses for Trees About Trees Binary Trees. Trees. Seth Long. January 31, 2010

Search Trees. COMPSCI 355 Fall 2016

CSE332: Data Abstractions Lecture 7: B Trees. James Fogarty Winter 2012

B-Trees & its Variants

Red-black trees (19.5), B-trees (19.8), trees

Self-Balancing Search Trees. Chapter 11

Search Trees - 2. Venkatanatha Sarma Y. Lecture delivered by: Assistant Professor MSRSAS-Bangalore. M.S Ramaiah School of Advanced Studies - Bangalore

CS 310 B-trees, Page 1. Motives. Large-scale databases are stored in disks/hard drives.

Recall: Properties of B-Trees

TREES. Trees - Introduction

CHAPTER 10 AVL TREES. 3 8 z 4

Advanced Set Representation Methods

Trees. (Trees) Data Structures and Programming Spring / 28

Binary Search Trees. Analysis of Algorithms

Multiway Search Trees

DATA STRUCTURES AND ALGORITHMS. Hierarchical data structures: AVL tree, Bayer tree, Heap

Programming II (CS300)

Design and Analysis of Algorithms Lecture- 9: B- Trees

Module 4: Dictionaries and Balanced Search Trees

CS127: B-Trees. B-Trees

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology

AVL Trees Goodrich, Tamassia, Goldwasser AVL Trees 1

Main Memory and the CPU Cache

CS F-11 B-Trees 1

AVL Tree Definition. An example of an AVL tree where the heights are shown next to the nodes. Adelson-Velsky and Landis

CSC Design and Analysis of Algorithms

B-Trees. Introduction. Definitions

Introduction. for large input, even access time may be prohibitive we need data structures that exhibit times closer to O(log N) binary search tree

Balanced Binary Search Trees. Victor Gao

B+ Tree Review. CSE332: Data Abstractions Lecture 10: More B Trees; Hashing. Can do a little better with insert. Adoption for insert

CSC Design and Analysis of Algorithms. Lecture 7. Transform and Conquer I Algorithm Design Technique. Transform and Conquer

Multi-Way Search Tree

CSIT5300: Advanced Database Systems

Trees. Truong Tuan Anh CSE-HCMUT

Binary Trees. BSTs. For example: Jargon: Data Structures & Algorithms. root node. level: internal node. edge.

Analysis of Algorithms

Physical Level of Databases: B+-Trees

Extra: B+ Trees. Motivations. Differences between BST and B+ 10/27/2017. CS1: Java Programming Colorado State University

Splay Trees 3/20/14. Splay Trees. Splay Trees are Binary Search Trees. note that two keys of equal value may be wellseparated (7,T) (1,Q) (1,C) (5,H)

CSCI 136 Data Structures & Advanced Programming. Lecture 25 Fall 2018 Instructor: B 2

CS Fall 2010 B-trees Carola Wenk

amiri advanced databases '05

The B-Tree. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland

Hash Tables. CS 311 Data Structures and Algorithms Lecture Slides. Wednesday, April 22, Glenn G. Chappell

CMPS 2200 Fall 2017 B-trees Carola Wenk

Chapter 12: Indexing and Hashing. Basic Concepts

CISC 235: Topic 4. Balanced Binary Search Trees

2-3 and Trees. COL 106 Shweta Agrawal, Amit Kumar, Dr. Ilyas Cicekli

CMPS 2200 Fall 2017 Red-black trees Carola Wenk

COSC 2007 Data Structures II Final Exam. Part 1: multiple choice (1 mark each, total 30 marks, circle the correct answer)

CS 234. Module 6. October 25, CS 234 Module 6 ADT Dictionary 1 / 22

Section 4 SOLUTION: AVL Trees & B-Trees

Chapter 12: Indexing and Hashing

Chapter 11: Indexing and Hashing

Computational Optimization ISE 407. Lecture 16. Dr. Ted Ralphs

Trees can be used to store entire records from a database, serving as an in-memory representation of the collection of records in a file.

Database Systems. File Organization-2. A.R. Hurson 323 CS Building

Binary Search Trees Treesort

Lecture 5. Treaps Find, insert, delete, split, and join in treaps Randomized search trees Randomized search tree time costs

Introduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana

Trees. Eric McCreath

Transcription:

Lecture 12: BT Trees Courtesy to Goodrich, Tamassia and Olga Veksler Instructor: Yuzhen Xie

Outline B-tree Special case of multiway search trees used when data must be stored on the disk, i.e. too large to fit in the memory 2

Reasons for using B-Trees Accessing disk is MUCH slower than memory access One disk access takes around 8ms This is about the same as executing 200,000 CPU instructions It is worth executing lots of instructions to avoid a single disk access! Because accessing disk is so slow, data on a disk is stored in blocks of size B Typical block size is between 1024 bytes and 8192 bytes A whole block is read per one disk access When searching a tree, accessing each node most likely will require reading that node from the disk The larger the tree height, the more disk accesses we need Use a tree (B-tree) with high branching factor so that the height of the tree is smaller, thus requiring smaller number of disc accesses assuming a tree node fits into 1 disc block 3

Definition of a B-tree A B-tree of order d is a multiway search tree st: s.t.: 1. Each internal node (except the root) has between d/2 and d children (and therefore each node stores between d/2-1 to d - 1 entries). 2. All external nodes have the same depth B-tree of order 5: between 3 and 5 children 2 6 8 10 12 13 15 27 32 33 B-tree is particularly suited for huge dictionaries that do not fit into main memory Height of a B-tree is ( n) log n O = O log d / 2 log d / 2 log a Here we used the property of logarithms: = logb a log b

Side Note on B-trees The only restriction on the root is that it has at least 2 children There is no restriction on the root to have between d/2 and d children to allow small B-trees For example, below is the only way to have a B-tree of order 5 storing 5 items, notice the root has only 2 children, whereas all other internal nodes must have between 3 and 5 children 11 2 6 27 32

Operations in B-Tree Since B-tree is a multi-way search tree, the search operation is exactly like in multi-way search trees Insert and delete operations require only minor changes from that of the (2,4)-tree Perform splits as necessary when inserting Perform transfer and fusion as necessary when deleting 6

Insertion Example Let B-tree be of order d = 6 Internal nodes must have between 3 and 6 children Insert new entry in the node returned by search, like in (2,4)- trees Overflow occurs when a node becomes a 7-node 1 3 5 12 14 15 20 22 27 32 33 v insert 13 overflow! v 1 3 5 12 13 14 15 20 22 27 32 33

Insertion Example To handle overflow at node v, perform a split on v (d=6): Middle entry m = (d+1)/2 in v goes to parent of v, in correct place Entries 1 through (d+1)/2-1 together with children 1 though (d+1)/2 go to new node v ; v becomes child to the left of m Entries (d+1)/2 +1 through d together with children (d+1)/2 +1 through d+1 go to new node v ; v becomes child to the right of m overflow! v 1 3 5 12 13 14 15 20 22 27 32 33 1 3 5 11 14 24 v v 12 13 15 20 22 27 32 33 after split

Delete Basically identical to (2,4)-trees Ensure that entry to be deleted is at node with leaf children If not, swap the entry with its inorder successor Delete the entry at node v and the appropriate leaf child Underflow occurs if number of children left is less than d/2. Handle underflow with either transfer and fusion (transfer is preferable) Transfer, if an adjacent sibling w has at least d/2 +1 entries v gets an entry which is between v and w from its parent u, sibling w gives the entry to parent u as a replacement, this is the entry of w which is closest to the entry moved from u to v After transfer, no new underflows Fusion if both adjacent siblings have less than d/2 +1 entries merge node u with an adjacent sibling w into new entry v v gets an entry from the parent. This is the entry between the old nodes v and w After fusion, underflow at a parent node may occur.

Deletion Example: Degree d=6 1 3 5 12 14 27 32 33 w u 12 24 1 3 5 27 32 33 14 v transfer 5 24 1 3 12 14 27 32 33 underflow!

Deletion Example: Degree d=6 2 4 12 14 22 24 fusion 24 2 12 14 22 24 underflow! 2 11 12 14 22 24

B-Tree Complexity Analysis for Dictionary in Main Memory Assume time spent at each node is O(d) We can implement entries at each node as an ordered dictionary using search table implementation height of B-tree with n entries is O((log n)/(log d/2 ) Thus search, insert, and delete take O((d log n)/(log d/2 ) time If the tree does fit into the memory, we are better off using a (2,4)-tree,, where search, insert, and delete have complexity O(log n) O((d log n)/(log d/2 ) = O((log n) [d /log d/2 ]) is worse than O(log n) ) 12

B-Tree Complexity Analysis for Dictionary Stored on a Disk If the data is stored on a disk, we have to do complexity analysis differently Let f be the time for one CPU operation, and s be the time for 1 disc access f is MUCH smaller than s, approximately s = 200,000 f When performing search, insert, and delete we spent d time at each node, and perform (log n)/(log d/2 ) disc accesses We are ignoring constants for this analysis Thus the time to perform search, insert, and delete is Time(d)= ( ) f d (log n)/(log )( g d/2 )+ s(log ( g n)/(log )( g d/2 ) fast main memory operations slow disk operations Suppose f =1, s = 200,000. 000 What s the optimal value of d to minimize the Time(d) as a function of d? This is your favorite Calculus 1 problem. Taking derivative of Time(d) and setting it to 0 we get d = 4000 and setting it to 0, we get d 4000 Should make d = 4000 (if a node storing 4000 entries fits into disc block size)

Reasons for using B-Trees Choose d so that the d children references and the d - 1 entries stored at a node can all fit into a single disk block Since each internal node has at least d/2 children, each disk block used to support B-tree is at least half full not a lot of disk space is wasted by being empty Thus the disk block size B is O(d), andanyany search/update operation requires only O(log d/2 n)=o(log B n) time 14

Number of Entries in a Multiway Tree The number of entries in a multiway tree is the number of external nodes 1 Proof: Put one stone on each leaf and pass the stones from the leaves up the tree. Each internal node keeps 1 stone per entry, and passes the rest to its parent. Since #entries = #children 1, each internal node will pass only 1 stone to its parent. This process stops at the root, and the root will pass 1 stone outside the tree. At the end, each entry has 1 stone, and 1 stone is outside the tree. 2 6 8 10 12 13 15 27 32 33

Number of Entries in a Multiway Tree 2 6 8 10 12 13 15 27 32 33 2 6 8 10 12 13 15 27 32 33

Number of Entries in a Multiway Tree 2 6 8 10 12 13 15 27 32 33 2 6 8 10 12 13 15 27 32 33

Reasons for using B-Trees For example, if we use a B-tree of order 200, this size is usually small enough so that we can transfer each node in one disc read operation A B-tree of order 200 and height 3 can hold 200 3 1 entries (approximately 8 million) and any item can be accessed with 2 disc reads Assuming we hold the root in memory, which makes sense since root is node which is always gets accessed for any operation Level 0: 200 0 =1 internal nodes Level 1: 200 1 internal nodes Level 2: 200 2 internal nodes Level 3: 200 3 external nodes

Comparing Trees (Finally Done with Search Trees) Binary Trees Binary search tree Not necessarily balanced if keys are inserted/deleted randomly, performance can be close to optimal, O(log n). However in the worst case, performance is O(n) AVL-tree Guaranteed to be balanced with O(log n) performance, but more complicated to maintain i than BST Multi-way trees (2,4) tree is balanced. Guaranteed O(log n) time performance, but is complicated to implement. B-Trees is balanced, useful for dictionary located in external (disk) memory. 19