Disk Accesses. CS 361, Lecture 25. B-Tree Properties. Outline

Similar documents
CS 561, Lecture 2 : Randomization in Data Structures. Jared Saia University of New Mexico

Module 4: Index Structures Lecture 13: Index structure. The Lecture Contains: Index structure. Binary search tree (BST) B-tree. B+-tree.

CS 561, Lecture 2 : Hash Tables, Skip Lists, Bloom Filters, Count-Min sketch. Jared Saia University of New Mexico

CS 361, Lecture 21. Outline. Things you can do. Things I will do. Evaluation Results

Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19

COMP Analysis of Algorithms & Data Structures

COMP Analysis of Algorithms & Data Structures

V Advanced Data Structures

Binary Search Trees. Analysis of Algorithms

Balanced Binary Search Trees. Victor Gao

Recall: Properties of B-Trees

Augmenting Data Structures

Module 4: Dictionaries and Balanced Search Trees

Lecture 23: Binary Search Trees

V Advanced Data Structures

CS Fall 2010 B-trees Carola Wenk

Today s Outline. CS 561, Lecture 8. Direct Addressing Problem. Hash Tables. Hash Tables Trees. Jared Saia University of New Mexico

CSE 326: Data Structures B-Trees and B+ Trees

Programming II (CS300)

Data Structures in Java

Search Trees - 2. Venkatanatha Sarma Y. Lecture delivered by: Assistant Professor MSRSAS-Bangalore. M.S Ramaiah School of Advanced Studies - Bangalore

(2,4) Trees. 2/22/2006 (2,4) Trees 1

Binary Tree. Preview. Binary Tree. Binary Tree. Binary Search Tree 10/2/2017. Binary Tree

CS102 Binary Search Trees

Uses for Trees About Trees Binary Trees. Trees. Seth Long. January 31, 2010

M-ary Search Tree. B-Trees. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. Maximum branching factor of M Complete tree has height =

Friday Four Square! 4:15PM, Outside Gates

Lecture 6: Analysis of Algorithms (CS )

CSE 326: Data Structures Splay Trees. James Fogarty Autumn 2007 Lecture 10

B-Trees. Version of October 2, B-Trees Version of October 2, / 22

Splay Trees. (Splay Trees) Data Structures and Programming Spring / 27

l Heaps very popular abstract data structure, where each object has a key value (the priority), and the operations are:

Learning Goals. CS221: Algorithms and Data Structures Lecture #3 Mind Your Priority Queues. Today s Outline. Back to Queues. Priority Queue ADT

DATA STRUCTURES AND ALGORITHMS

Trees. Reading: Weiss, Chapter 4. Cpt S 223, Fall 2007 Copyright: Washington State University

M-ary Search Tree. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. B-Trees (4.7 in Weiss)

CS 240 Fall Mike Lam, Professor. Priority Queues and Heaps

DATA STRUCTURES AND ALGORITHMS. Hierarchical data structures: AVL tree, Bayer tree, Heap

Balanced Search Trees. CS 3110 Fall 2010

CSCI 136 Data Structures & Advanced Programming. Lecture 25 Fall 2018 Instructor: B 2

1 Binary Search Trees (BSTs)

CMPS 2200 Fall 2017 B-trees Carola Wenk

18.3 Deleting a key from a B-tree

8. Binary Search Tree

CSE 100 Advanced Data Structures

Dynamic Access Binary Search Trees

Algorithms in Systems Engineering ISE 172. Lecture 16. Dr. Ted Ralphs

CISC 235: Topic 4. Balanced Binary Search Trees

Lecture 3: B-Trees. October Lecture 3: B-Trees

W4231: Analysis of Algorithms

Search Trees - 1 Venkatanatha Sarma Y

INF2220: algorithms and data structures Series 1

Trees. (Trees) Data Structures and Programming Spring / 28

Trees, Part 1: Unbalanced Trees

Chapter 12 Advanced Data Structures

Data Structures and Algorithms

CS Transform-and-Conquer

Dynamic Access Binary Search Trees

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17

Algorithms. AVL Tree

CSE 530A. B+ Trees. Washington University Fall 2013

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:

Binary Trees, Binary Search Trees

CS60020: Foundations of Algorithm Design and Machine Learning. Sourangshu Bhattacharya

l So unlike the search trees, there are neither arbitrary find operations nor arbitrary delete operations possible.

CS60020: Foundations of Algorithm Design and Machine Learning. Sourangshu Bhattacharya

Trees. Eric McCreath

Lecture 11: Multiway and (2,4) Trees. Courtesy to Goodrich, Tamassia and Olga Veksler

Binary Search Tree (3A) Young Won Lim 6/6/18

Lecture 5. Treaps Find, insert, delete, split, and join in treaps Randomized search trees Randomized search tree time costs

Data Structures and Algorithms

COMP Analysis of Algorithms & Data Structures

Binary Search Tree (3A) Young Won Lim 6/4/18

Some Search Structures. Balanced Search Trees. Binary Search Trees. A Binary Search Tree. Review Binary Search Trees

Multi-Way Search Trees

Module 4: Priority Queues

Analysis of Algorithms

Readings. Priority Queue ADT. FindMin Problem. Priority Queues & Binary Heaps. List implementation of a Priority Queue

CS 380 ALGORITHM DESIGN AND ANALYSIS

Multi-Way Search Trees

Section 5.3: Event List Management

CSC148 Week 7. Larry Zhang

Course Review. Cpt S 223 Fall 2009

Binary Search Tree (3A) Young Won Lim 6/2/18

CmpSci 187: Programming with Data Structures Spring 2015

CSE100. Advanced Data Structures. Lecture 8. (Based on Paul Kube course materials)

Extra: B+ Trees. Motivations. Differences between BST and B+ 10/27/2017. CS1: Java Programming Colorado State University

CSE 373 OCTOBER 25 TH B-TREES

Binary Heaps in Dynamic Arrays

Multiway Search Trees. Multiway-Search Trees (cont d)

Lecture: Analysis of Algorithms (CS )

Sorted Arrays. Operation Access Search Selection Predecessor Successor Output (print) Insert Delete Extract-Min

(2,4) Trees Goodrich, Tamassia (2,4) Trees 1

Computational Optimization ISE 407. Lecture 16. Dr. Ted Ralphs

Physical Level of Databases: B+-Trees

Trees. Courtesy to Goodrich, Tamassia and Olga Veksler

2-3 Tree. Outline B-TREE. catch(...){ printf( "Assignment::SolveProblem() AAAA!"); } ADD SLIDES ON DISJOINT SETS

CSE 5311 Notes 4a: Priority Queues

Introduction. for large input, even access time may be prohibitive we need data structures that exhibit times closer to O(log N) binary search tree

CE 221 Data Structures and Algorithms

Transcription:

Disk Accesses CS 361, Lecture 25 Jared Saia University of New Mexico Consider any search tree The number of disk accesses per search will dominate the run time Unless the entire tree is in memory, there will usually be a disk access every time an arbitrary node is examined The number of disk accesses for most operations on a B-tree is proportional to the height of the B-tree I.e. The info on each node of a B-tree can be stored in main memory 3 Outline B-Tree Properties B-Trees s Graph Theory Intro The following is true for every node x x stores keys, key 1,... key l (x) in sorted order (nondecreasing) x contains pointers, c 1 (x),..., c l+1 (x) to its children Let k i be any key stored in the subtree rooted at the i-th child of x, then k 1 key 1 (x) k 2 key 2 (x) key l (x) k l+1 1 B-Trees B-Tree Properties B-Trees are balanced search trees designed to work well on disks B-Trees are not binary trees: each node can have many children Each node of a B-Tree potentially contains several keys, not just one When doing searches, we decide which child link to follow by finding the correct interval of our search key in the key set of the current node. All leaves have the same depth Lower and upper bounds on the number of keys a node can contain. Given as a function of a fixed integer t If the tree is non-empty, the root must have at least one key, and 2 children Every node other than the root must have at least (t 1) keys, and all internal nodes other than the root must have at least t children. Every node can contain at most 2t 1 keys, and so any internal node can have at most 2t children 2 5

Note s The above properties imply that the height of a B-tree is no n+1 more than log t 2, for t 2, where n is the number of keys. If we make t, larger, we can save a larger (constant) fraction over RB-trees in the number of nodes examined A (2-3-)-tree is just a B-tree with t = 2 Technically, not a BST, but they implement all of the same operations Very elegant randomized data structure, simple to code but analysis is subtle They guarantee that, with high probability, all the major operations take O(log n) time We ll discuss them more next class 6 9 High Level Analysis We will now show that for any B-Tree with height h and n keys, n+1 h log t 2, where t 2. Consider a B-Tree of height h > 1 Q1: What is the minimum number of nodes at depth 1, 2, and 3 Q2: What is the minimum number of nodes at depth i? Q3: Now give a lowerbound for the total number of keys (e.g. n???) Q: Show how to solve for h in this inequality to get an upperbound on h Comparison of various BSTs RB-Trees: + guarantee O(log n) time for each operation, easy to augment, high constants AVL-Trees: + guarantee O(log n) time for each operation, high constants B-Trees: + guarantee O(log n) time for each operation, works well for trees that won t fit in memory, inserts and deletes are more complicated Splay Tress: + small constants, amortized guarantees only s: + easy to implement, runtime guarantees are probabilistic only 7 10 Splay Trees Which Data Structure to use? A Splay Tree is a kind of BST where the standard operations run in O(log n) amortized time This means that over l operations (e.g. Insert, Lookup, Delete, etc), where l is sufficiently large, the total cost is O(l log n) In other words, the average cost per operation is O(log n) However a single operation could still take O(n) time In practice, they are very fast Splay trees work very well in practice, the hidden constants are small Unfortunately, they can not guarantee that every operation takes O(log n) When this guarantee is required, B-Trees are best when the entire tree will not be stored in memory If the entire tree will be stored in memory, RB-Trees, AVL- Trees, and s are good 8 11

Example Technically, not a BST, but they implement all of the same operations Very elegant randomized data structure, simple to code but analysis is subtle They guarantee that, with high probability, all the major operations take O(log n) time 2 1 2 3 5 5 6 12 15 Search A skip list is basically a collection of doubly-linked lists, L 1, L 2,..., L x, for some integer x Each list has a special and node, the keys of these nodes are assumed to be MAXINT and +MAXINT respectively The keys in each list are in sorted order (non-decreasing) Search(k){ pleft = L_x.; for (i=x;i>=0;i--){ Search from pleft in L_i to get the rightmost elem, r, with value <= k; pleft = pointer to r in L_(i-1); if (pleft==k) return pleft else return nil 13 16 Insert p is a constant between 0 and 1, typically p = 1/2 Every key is in the list L 1. For all i > 2, if a key x is in the list L i, it is also in L i 1. Further there are up and down pointers between the x in L i and the x in L i 1. All the () nodes from neighboring lists are interconnected Insert(k){ First call Search(k), let pleft be the leftmost elem <= k in L_1 Insert k in L_1, to the right of pleft i = 2; while (rand()<p){ insert k in the appropriate place in L_i; 1 17

Deletion Deletion is very simple First do a search for the key to be deleted Then delete that key from all the lists it appears in from the bottom up, making sure to zip up the lists after the deletion Q: How much memory do we expect a skip list to use up? Let X i be the number of lists elem i is inserted in Q: What is P (X i 1), P (X i 2), P (X i 3)? Q: What is P (X i k) for general k? Q: What is E(X i )? Q: Let X = n i=1 X i. What is E(X)? 18 21 Height of A trick for computing expectations of discrete positive random variables: Let X be a discrete r.v., that takes on values from 1 to n n E(X) = P (X i) i=1 Assume there are n nodes in the list Q: What is the probability that a particular key i achieves height k log n for some constant k? A: If p = 1/2, P (X i k log n) = 1 n k 19 22 Why? Height of Q: What is the probability that any of the nodes achieve height higher than k log n? A: We want n P (X i) = 1 P (X = 1) + 2 P (X = 2) +... (1) i=1 = E(X) (2) P (X 1 k log n or X 2 k log n or... or X n k log n) By a Union Bound, this probability is no more than P (X 1 k log n) + P (X 2 k log n) + + P (X n k log n) Which equals n n k = n1 k 20 23

Height of If we choose k to be, say 10, this probability gets very small as n gets large In particular, the probability of having a skip list of size exceeding k log n is o(1) So we say that the height of the skip list is O(log n) with high probability 2 Search Time Note that the expected number of siblings of a node, x, at any level i is 2 Why? Because for a node to be a sibling of x at level i, it must have failed to advance to the next level The first node that advances to the next level ends the possibility of further siblings. This is the same as asking expected number of times we need to flip a coin to get a s - the answer is 2 25 Search Time The expected number of siblings of a node, x, at any level i is 2 The number of levels is O(log n) with high probability From these two facts, we can argue that the expected search time is O(log n) (Warning: The argument is not as simple as multiplying these two values. We can t do this since the two random variables are not independent.) 26