CSE 100: B+ TREE, 2-3 TREE, NP- COMPLETNESS

Similar documents
Lecture 13. Reading: Weiss, Ch. 9, Ch 8 CSE 100, UCSD: LEC 13. Page 1 of 29

Suppose you are accessing elements of an array: ... or suppose you are dereferencing pointers: temp->next->next = elem->prev->prev;

Department of Computer Applications. MCA 312: Design and Analysis of Algorithms. [Part I : Medium Answer Type Questions] UNIT I

Final Examination CSE 100 UCSD (Practice)

CSE 100 Minimum Spanning Trees Prim s and Kruskal

Chapter 9 Graph Algorithms

B-Trees. Disk Storage. What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree. Deletion in a B-tree

Chapter 9 Graph Algorithms

CSE100 Practice Final Exam Section C Fall 2015: Dec 10 th, Problem Topic Points Possible Points Earned Grader

11/22/2016. Chapter 9 Graph Algorithms. Introduction. Definitions. Definitions. Definitions. Definitions

Chapter 9 Graph Algorithms

Best known solution time is Ω(V!) Check every permutation of vertices to see if there is a graph edge between adjacent vertices

Unit 8: Coping with NP-Completeness. Complexity classes Reducibility and NP-completeness proofs Coping with NP-complete problems. Y.-W.

What is a Multi-way tree?

Algorithm Design Techniques. Hwansoo Han

Computer Science E-22 Practice Final Exam

Data Structures and Algorithms

CSE 100: GRAPH ALGORITHMS

Introduction to Computer Science

An AVL tree with N nodes is an excellent data. The Big-Oh analysis shows that most operations finish within O(log N) time

Chapter 8. NP-complete problems

Balanced Search Trees

M-ary Search Tree. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. B-Trees (4.7 in Weiss)

CAD Algorithms. Categorizing Algorithms

CSE 373: Data Structures and Algorithms

1. AVL Trees (10 Points)

9. Which situation is true regarding a cascading cut that produces c trees for a Fibonacci heap?

CSE 530A. B+ Trees. Washington University Fall 2013

1 5,9,2,7,6,10,4,3,8,1 The first number (5) is automatically the first number of the sorted list

C SCI 335 Software Analysis & Design III Lecture Notes Prof. Stewart Weiss Chapter 4: B Trees

Module 4: Index Structures Lecture 13: Index structure. The Lecture Contains: Index structure. Binary search tree (BST) B-tree. B+-tree.

B-Trees and External Memory

Reference Sheet for CO142.2 Discrete Mathematics II

n 2 C. Θ n ( ) Ο f ( n) B. n 2 Ω( n logn)

M-ary Search Tree. B-Trees. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. Maximum branching factor of M Complete tree has height =

B-Trees. Introduction. Definitions

B-Trees and External Memory

Exam 3 Practice Problems

CSE 100 Tutor Review Section. Anish, Daniel, Galen, Raymond

Multiple-choice (35 pt.)

Coping with the Limitations of Algorithm Power Exact Solution Strategies Backtracking Backtracking : A Scenario

CS 6783 (Applied Algorithms) Lecture 5

CS521 \ Notes for the Final Exam

1 Format. 2 Topics Covered. 2.1 Minimal Spanning Trees. 2.2 Union Find. 2.3 Greedy. CS 124 Quiz 2 Review 3/25/18

CSE 373: Data Structures and Algorithms

CS350: Data Structures B-Trees

Trees, Trees and More Trees

( ) n 3. n 2 ( ) D. Ο

looking ahead to see the optimum

Trees. (Trees) Data Structures and Programming Spring / 28

Recall: Properties of B-Trees

P and NP CISC4080, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang

) $ f ( n) " %( g( n)

logn D. Θ C. Θ n 2 ( ) ( ) f n B. nlogn Ο n2 n 2 D. Ο & % ( C. Θ # ( D. Θ n ( ) Ω f ( n)

Multiway searching. In the worst case of searching a complete binary search tree, we can make log(n) page faults Everyone knows what a page fault is?

CSE 332 Spring 2013: Midterm Exam (closed book, closed notes, no calculators)

& ( D. " mnp ' ( ) n 3. n 2. ( ) C. " n

V Advanced Data Structures

Total Score /1 /20 /41 /15 /23 Grader

( ) D. Θ ( ) ( ) Ο f ( n) ( ) Ω. C. T n C. Θ. B. n logn Ο

CLASS: II YEAR / IV SEMESTER CSE CS 6402-DESIGN AND ANALYSIS OF ALGORITHM UNIT I INTRODUCTION

P and NP CISC5835, Algorithms for Big Data CIS, Fordham Univ. Instructor: X. Zhang

Algorithms. Deleting from Red-Black Trees B-Trees

Analysis of Algorithms

Lecture 8: The Traveling Salesman Problem

V Advanced Data Structures

Section 05: Solutions

B-Trees. Based on materials by D. Frey and T. Anastasio

Lecture 5. Treaps Find, insert, delete, split, and join in treaps Randomized search trees Randomized search tree time costs

1. O(log n), because at worst you will only need to downheap the height of the heap.

CS 251, LE 2 Fall MIDTERM 2 Tuesday, November 1, 2016 Version 00 - KEY

CS 350 : Data Structures B-Trees

We will show that the height of a RB tree on n vertices is approximately 2*log n. In class I presented a simple structural proof of this claim:

n 2 ( ) ( ) + n is in Θ n logn

4.1.2 Merge Sort Sorting Lower Bound Counting Sort Sorting in Practice Solving Problems by Sorting...

Multi-way Search Trees. (Multi-way Search Trees) Data Structures and Programming Spring / 25

CSE 332 Autumn 2013: Midterm Exam (closed book, closed notes, no calculators)

Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest. Introduction to Algorithms

Note that this is a rep invariant! The type system doesn t enforce this but you need it to be true. Should use repok to check in debug version.

6.856 Randomized Algorithms

CS161 - Final Exam Computer Science Department, Stanford University August 16, 2008

Introduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

Data Structures in Java. Session 23 Instructor: Bert Huang

Search Trees. Computer Science S-111 Harvard University David G. Sullivan, Ph.D. Binary Search Trees

CSE 417 Branch & Bound (pt 4) Branch & Bound

CS 112 Final May 8, 2008 (Lightly edited for 2012 Practice) Name: BU ID: Instructions

Weighted Graphs and Greedy Algorithms

PSD1A. DESIGN AND ANALYSIS OF ALGORITHMS Unit : I-V

Questions? You are given the complete graph of Facebook. What questions would you ask? (What questions could we hope to answer?)

9. The expected time for insertion sort for n keys is in which set? (All n! input permutations are equally likely.)

COSC 2007 Data Structures II Final Exam. Part 1: multiple choice (1 mark each, total 30 marks, circle the correct answer)

5105 BHARATHIDASAN ENGINEERING COLLEGE

Examples of P vs NP: More Problems

Exam Sample Questions CS3212: Algorithms and Data Structures

Red-Black trees are usually described as obeying the following rules :

managing an evolving set of connected components implementing a Union-Find data structure implementing Kruskal s algorithm

Foundations of Discrete Mathematics

CIS265/ Trees Red-Black Trees. Some of the following material is from:

Recitation 9. Prelim Review

Transcription:

CSE 100: B+ TREE, 2-3 TREE, NP- COMPLETNESS

Analyzing find in B-trees Since the B-tree is perfectly height-balanced, the worst case time cost for find is O(logN) Best case: If every internal node is completely filled N <= m H -1 H >= log m (N+1) Worst case (d = ceil(m/2)) N >= 2 d (H-1) 1 H <= log d ((N+1)/2) H = 1 for root node

B+-tree with parameter M, L There are several variants of the basic B tree idea B+ trees is one of them In a B+ tree, data records are stored only in the leaves A leaf always holds between ceil(l/2) and L data records (inclusive) All the leaves are at the same level Internal nodes store key values which guide searching in the tree An internal node always has between ceil(m/2) and M children (inclusive) (except the root, which, if it is not a leaf, can have between 2 and M children) An internal node holds one fewer keys than it has children: the leftmost child has no key stored for it every other child has a key stored which is equal to the smallest key in the subtree rooted at that child

A B+-tree with M=4, L=3 Every internal node must have 2, 3, or 4 children Every leaf is at the same level (level 2 in the tree shown), and every leaf must have 2 or 3 data records (keys only shown)

A B+-tree with M=4, L=3 Insert following sequence of numbers: 12, 13, 4, 8, 1, 15, 18, 19

Designing a B+ tree Suppose each key takes K bytes, and each pointer to a child node takes P bytes then each internal node must be able to hold M*P + (M-1)*K bytes Suppose each data record takes R bytes then each leaf node must be able to hold R * L bytes If a disk block contains B bytes, then you should have M and L as large as possible and still satisfy these inequalities ( assume R<=B; if not, you can use multiple blocks for each leaf ): B >= M*P + (M-1)*K, and so: M <= (B+K)/(P+K) B >= R*L, and so: L <= B/R

Designing B+ tree Suppose each data record takes 6 bytes and each disk block is 36 bytes. What should be the value of L in the B+ tree? A. 1 B. 4 C. 5 D. 6

Designing B+ tree Suppose both the key and the child pointer use 4 bytes and each disk block is 36 bytes. What should be the value of M in the B+ tree? A. 1 B. 4 C. 5 D. 6

Designing a B+-tree, cont d The branching factor in a B+-tree is at least M/2 (except possibly at the root), and there are at least L/2 records in each leaf node So, if you are storing N data records, there will be at most 2N/L leaf nodes There are at most (2N/L) / (M/2) nodes at the level above the leaves; etc. Therefore: the height of the B-tree with N data records is at most log M/2 (2N/L) = log M/2 N - log M/2 L + log M/2 2 In a typical application, with 32-bit int keys, 32-bit disk block pointers, 1024-byte disk blocks, and 256-byte data records, we would have: M = 128, L = 4 And the height of the B+-tree storing N data records would be at most log 64 N How does all this relate to the performance of operations on the B+tree?

Analyzing find in B+-trees: more detail Assume a two-level memory structure: reading a disk block takes T d time units; accessing a variable in memory takes T m time units The height of the tree is at most log M/2 N; this is the maximum number of disk blocks that need to be accessed when doing a find so the time taken with disk accesses during a find is at most T d * log M/2 N The number of items in an internal node is at most M; assuming they are sorted, it takes at most log 2 M steps to search within a node, using binary search so the time taken doing search within internal nodes during a find is no more than T m * log 2 M * log M/2 N Which about the same as T m * log 2 N since log M/2 N= log 2 N / (log 2 M/2) The number of items in a leaf node is at most L; assuming they are sorted, it takes at most log 2 L steps to search within a node, using binary search so the time taken searching within a leaf node during a find is at most T m * log 2 L

Analyzing find in B+-trees: still more detail Putting all that together, the time to find a key in a B-tree with parameters M,L, and memory and disk access times Tm and Td is at most T d * log M/2 N + T m * log 2 M * log M/2 N + T m * log 2 L This formula shows the breakdown of find time cost, distinguishing between time due to disk access (the first term) and time due to memory access (the last 2 terms) This more complete analysis can help understand issues of designing large database systems using B+-tree design For example, from the formula we can see that disk accesses (the first term) will dominate the total runtime cost for large N if log This will almost certainly be true, for typical inexpensive disks and memory and reasonable choices of M And so the biggest performance improvements will come from faster disk systems, or by storing as many top levels of the tree as possible in memory instead of on disk

2-3 Tree Every internal node has either two or three children We say that T is a 2-3 tree if and only if one of the following statements hold T is empty T is a 2-node r with data element a. If r has left child L and right child R, then L and R are non-empty 2-3 trees of the same height, a is greater than each element in L, and a is less than or equal to each data element in R. T is a 3-node r with data elements a and b, where a < b. If r has left child L, middle child M, and right child R, then L, M, and R are non-empty 2-3 trees of equal height, a is greater than each data element in L and less than or equal to each data element in M, and b is greater than each data element in M and less than or equal to each data element in R.

Insert in a 2-3 tree We will indicate leaf nodes with rectangles, and interior nodes with ellipses; and for simplicity we will just consider integer keys, with no associated data records Starting with an empty tree, insert 22; then 41 Insert 22: 22, Insert 41: 22, 41 Now insert 17...

Insert in a 2-3 tree, cont d Inserting the third key would overflow the root/leaf! This is not a legal 2-3 tree: Insert 17: 17, 22, 41 This node must be split into two. Since this is the root that is being split, a new root is formed: the tree grows in height 22, 17, 41,

Insert in a 2-3 tree, still cont d Suppose insertions have continued until this tree is produced: 50, 20, 90, 97 10, 15 40, 60,70 95, 98, What happens if 5 is inserted?

Insert in a 2-3 tree, still cont d 50, 10, 20 90, 97 5, 15 40, 60,70 95, 98, Now insert 65.

Insert in a 2-3 tree, still cont d 50, 90 10, 20 65, 97 5, 15 40, 60, 70, 95, 98,

2-3 tree converted to red-black tree 50 90 10 20 65, 97 5 15 40 60 70 95 98

(2-3-4) B-trees 10 3 5 8 13 1 2 4 6 7 9 11 12 14 15

(2-3-4) B-trees == RBTs! 10 3 5 8 13 1 2 4 6 7 9 11 12 14 15

(2-3-4) B-trees == RBTs! 10 3 5 8 13 1 2 4 6 7 9 11 12 14 15 2 10 5 13 3 8 12 4 7 9 11 14 15 So which are asymptotically faster? RB-trees or 2-3-4 (B) trees? A. RB trees B. B-trees C. They are the same 1 6

A note about graph algorithm time costs The graph algorithms we will study have fast algorithms: Find shortest path in unweighted graphs Solved by basic breadth-first search: O( V + E ) worst case Find shortest path in weighted graphs Solved by Dijkstra s algorithm: O( E log V ) worst case Find minimum-cost spanning tree in weighted graphs Solved by Prim s or Kruskal s algorithm: O( E log V ) worst case The greedy algorithms used for solving these problems have polynomial time cost functions in the worst case since E <= V 2, Dijkstra s, Prim s and Kruskal s algorithms are O( V 3 ) As a result, these problems can be solved in a reasonable amount of time, even for large graphs; they are considered to be tractable problems However, many graph problems do not have any known polynomial time solutions...!

Intractable graph problems For many interesting graph problems, the best known algorithms to solve them have exponential time costs O(2 V ) In the worst case, these intractable problems simply cannot be solved exactly, except for quite small graphs (e.g. fewer than 100 vertices, even on the world s fastest computers); the best known algorithms for these problems take too long to run For these problems, simple greedy best-first algorithms do not work... Essentially the best approach known for solving them exactly is basically to try all the possibilities, and there can be exponentially many possibilities to try These intractable graph problems are often members of the class called NP-complete problems, which includes many non-graph problems as well...

NP-complete problems A problem that can be exactly solved in time that is a polynomial function of the size of the problem is in the class P (for Polynomial time) A problem whose solution can be checked for correctness in time that is a polynomial function of the size of the problem is in the class NP (for Nondeterministic Polynomial time) A nondeterministic computer could guess the solution to the problem, and then check if it is a solution in polynomial time, and never give a wrong answer Note that the class P is contained in NP A problem that is in NP, and is as hard as any problem in NP (an algorithm for it is also essentially an algorithm for any NP problem) is NP-complete For all the NP-complete problems, the best known algorithms take exponential time in the worst case......however, nobody has yet been able to prove that there are no polynomial time algorithms for them! If you find one, you will be instantly very very famous What are some of the NP-complete graph problems?...

Examples of intractable graph problems Here are a few examples of the many graph problems that are NP-complete, and so seem to require O(2 V ) time worst-case: Hamiltonian circuit : Given a graph, say whether the graph has a cycle that includes all the vertices of the graph exactly once. Travelling salesman : Given a weighted graph, find the Hamiltonian circuit that has the smallest total cost. Longest path : Given a graph and two vertices s and d, find the longest path from s to d that doesn t contain any cycles. (But note that shortest path is solvable in polynomial time!) Shortest total path length spanning tree : Given a graph, find the spanning tree that has the smallest total path lengths between all pairs of vertices Steiner tree : Given a graph (V,E) and a subset S of V, find the minimum-cost spanning tree that spans every vertex in S (and may also span some other vertices) (but note that if S=V, the problem is solvable in polynomial time!)

P vs NP Is P a subset of NP? A. True B. False

The problem with intractable problems If a problem is NP-complete, the best known algorithms to solve it requires exponentially many steps in the worst case Simple greedy algorithms do not work for these problems backtracking, or some other way of looking at, and checking, possible alternatives is usually required...... and there are exponentially many alternatives to check! For example: The problem has N boolean variables, and you need to check the 2 N possible different assignments of truth values to them The problem has N items, and you need to check each of the 2 N different subsets of those items Because of the exponential time costs of the best known solutions to these problems, you have to either... restrict yourself to small instances of the problems, or try to find approximate algorithms that are fast, but not always exactly correct

P vs NP Is P a subset of NP-complete? A. True B. False

Final Exam 4 Parts Part 1: Basic knowledge of data structures and C++ 20% to 40% of final score Multiple choice Part 2: Application, Comparison and Implementation of the data structures we have covered 20% to 40% of final score Short answers Part 3: Simulating algorithms and run time analysis 20% to 40% of final score Short answers Part 4: C++ and programming assignments

What would you like to review next class? A. Multiway tries and ternary tries B. Skip lists C. Red-Black trees D. B trees E. Hashing