CSCI 104 Log Structured Merge Trees. Mark Redekopp

Similar documents
CSCI 104 Runtime Complexity. Mark Redekopp David Kempe Sandra Batista

CSCI 104 Runtime Complexity. Mark Redekopp David Kempe

CSCI 104. Rafael Ferreira da Silva. Slides adapted from: Mark Redekopp and David Kempe

Chapter 3:- Divide and Conquer. Compiled By:- Sanjay Patel Assistant Professor, SVBIT.

Chapter 3: The Efficiency of Algorithms Invitation to Computer Science,

Introduction to Algorithms

Chapter 3: The Efficiency of Algorithms. Invitation to Computer Science, C++ Version, Third Edition

Chapter 3: The Efficiency of Algorithms

Lower Bound on Comparison-based Sorting

Data Structures and Algorithms Key to Homework 1

Recall: Properties of B-Trees

CSci 231 Final Review

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

Solutions. (a) Claim: A d-ary tree of height h has at most 1 + d +...

CPS 231 Exam 2 SOLUTIONS

Basic Data Structures (Version 7) Name:

CS302 Topic: Algorithm Analysis #2. Thursday, Sept. 21, 2006

Quiz 1 Solutions. (a) f(n) = n g(n) = log n Circle all that apply: f = O(g) f = Θ(g) f = Ω(g)

Assignment 1 (concept): Solutions

a) For the binary tree shown below, what are the preorder, inorder and post order traversals.

COMP171 Data Structures and Algorithms Fall 2006 Midterm Examination

Computer Science Foundation Exam

Hash Table. A hash function h maps keys of a given type into integers in a fixed interval [0,m-1]

08 A: Sorting III. CS1102S: Data Structures and Algorithms. Martin Henz. March 10, Generated on Tuesday 9 th March, 2010, 09:58

Analysis of Algorithms. Unit 4 - Analysis of well known Algorithms

CS:3330 (22c:31) Algorithms

DATA STRUCTURES AND ALGORITHMS

Priority Queues Heaps Heapsort

Solutions. Suppose we insert all elements of U into the table, and let n(b) be the number of elements of U that hash to bucket b. Then.

Problem. Input: An array A = (A[1],..., A[n]) with length n. Output: a permutation A of A, that is sorted: A [i] A [j] for all. 1 i j n.

Test 1 Review Questions with Solutions

ECE 2574: Data Structures and Algorithms - Basic Sorting Algorithms. C. L. Wyatt

Data Structures Question Bank Multiple Choice

Computer Science 302 Spring 2007 Practice Final Examination: Part I

CS 6402 DESIGN AND ANALYSIS OF ALGORITHMS QUESTION BANK

Divide and Conquer Algorithms

Greedy algorithms is another useful way for solving optimization problems.

CS61B Fall 2014 Guerrilla Section 2 This worksheet is quite long. Longer than we ll cover in guerrilla section.

INF2220: algorithms and data structures Series 1

Data Structures and Algorithms. Part 2

Practical Session #11 - Sort properties, QuickSort algorithm, Selection

Question And Answer.

l Heaps very popular abstract data structure, where each object has a key value (the priority), and the operations are:

Plotting run-time graphically. Plotting run-time graphically. CS241 Algorithmics - week 1 review. Prefix Averages - Algorithm #1

Quicksort. Repeat the process recursively for the left- and rightsub-blocks.

Computer Science Foundation Exam

CSE548, AMS542: Analysis of Algorithms, Fall 2012 Date: October 16. In-Class Midterm. ( 11:35 AM 12:50 PM : 75 Minutes )

Principles of Algorithm Design

Sorting Pearson Education, Inc. All rights reserved.

Quiz 1 Practice Problems

Reading for this lecture (Goodrich and Tamassia):

The Limits of Sorting Divide-and-Conquer Comparison Sorts II

PROGRAM EFFICIENCY & COMPLEXITY ANALYSIS

Randomized Algorithms: Element Distinctness

Overview of Sorting Algorithms

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

MUHAMMAD FAISAL MIT 4 th Semester Al-Barq Campus (VGJW01) Gujranwala

CSE 332, Spring 2010, Midterm Examination 30 April 2010

CS141: Intermediate Data Structures and Algorithms Analysis of Algorithms

COMP 250 Fall recurrences 2 Oct. 13, 2017

x-fast and y-fast Tries

Priority queues. Priority queues. Priority queue operations

Binary Search Trees, etc.

CSCI-1200 Data Structures Fall 2018 Lecture 23 Priority Queues II

1 Lower bound on runtime of comparison sorts

CS141: Intermediate Data Structures and Algorithms Analysis of Algorithms

Data Structure and Algorithm Midterm Reference Solution TA

CSE 332 Winter 2015: Midterm Exam (closed book, closed notes, no calculators)

7. Sorting I. 7.1 Simple Sorting. Problem. Algorithm: IsSorted(A) 1 i j n. Simple Sorting

Big-O-ology. Jim Royer January 16, 2019 CIS 675. CIS 675 Big-O-ology 1/ 19

Adam Blank Lecture 2 Winter 2017 CSE 332. Data Structures and Parallelism

Trees. Chapter 6. strings. 3 Both position and Enumerator are similar in concept to C++ iterators, although the details are quite different.

Amortization Analysis

Data Structures and Algorithms Chapter 2

Priority Queues Heaps Heapsort

Solutions to relevant spring 2000 exam problems

Lecture 15 : Review DRAFT

Computer Science Foundation Exam

Binary heaps (chapters ) Leftist heaps

Programming, Data Structures and Algorithms Prof. Hema Murthy Department of Computer Science and Engineering Indian Institute Technology, Madras

Q1 Q2 Q3 Q4 Q5 Q6 Total

Algorithms and Data Structures

Midterm solutions. n f 3 (n) = 3

l So unlike the search trees, there are neither arbitrary find operations nor arbitrary delete operations possible.

Final Exam. 1. (8 pts) Consider the following directed graph. If you iterate through all of the vertices as

CS-301 Data Structure. Tariq Hanif

17/05/2018. Outline. Outline. Divide and Conquer. Control Abstraction for Divide &Conquer. Outline. Module 2: Divide and Conquer

Encoding/Decoding, Counting graphs

Data Structures. Alice E. Fischer. Lecture 4, Fall Alice E. Fischer Data Structures L4... 1/19 Lecture 4, Fall / 19

CS 137 Part 8. Merge Sort, Quick Sort, Binary Search. November 20th, 2017

ANALYSIS OF ALGORITHMS. ANALYSIS OF ALGORITHMS IB DP Computer science Standard Level ICS3U

: Fundamental Data Structures and Algorithms. June 06, 2011 SOLUTIONS

8. Binary Search Tree

Data Structures and Algorithms CSE 465

Chapter 13: Query Processing

Cpt S 223 Course Overview. Cpt S 223, Fall 2007 Copyright: Washington State University

Analysis of Algorithms

Sorting and Selection

Introduction to Algorithms March 11, 2009 Massachusetts Institute of Technology Spring 2009 Professors Sivan Toledo and Alan Edelman Quiz 1

Asymptotic Analysis Spring 2018 Discussion 7: February 27, 2018

Transcription:

1 CSCI 10 Log Structured Merge Trees Mark Redekopp

Series Summation Review Let n = 1 + + + + k = σk i=0 n = k+1-1 i. What is n? What is log (1) + log () + log () + log (8)++ log ( k ) = 0 + 1 + + 3+ + k = σk Arithmetic series: i=0 i n O(k ) So then what if k = log(n) as in: log (1) + log () + log () + log (8)++ log ( log(n) ) O(log n) σ i=1 i = n(n+1) Geometric series n i=1 c i = cn+1 1 c 1 = θ n = θ cn

Size = 8 Size = 1 if non-empty 3 Merge Trees Overview Consider a list of (pointers to) arrays with the following constraints Each array is sorted though no ordering constraints exist between arrays The array at list index k is of exactly size k or empty 0 1 3 An array at list location k can be of size k or empty NULL 0 3 1 Note: These are the keys for a set (or key,value pairs for a map) 1 18 0 1

Size = 8 Size = 1 if non-empty Merge Trees Size Define n as the # of keys in the entire structure k as the size of the list (i.e. positions in the list) Given list of size k, how many total values, n, may be stored? Let n = 1 + + + + k-1 = σ k 1 i=0 i. What is n? n= k -1 0 1 3 An array at list location k can be of size k or empty NULL 0 3 1 Note: These are the keys for a set (or key,value pairs for a map) 1 18 0 1

Size = 8 Size = 1 if non-empty Merge Trees Find Operation To find an element (or check if it exists) Iterate through the arrays in order (i.e. start with array at list position 0, then the array at list position 1, etc.) In each array perform a binary search If you reach the end of the list of arrays without finding the value it does not exist in the set/map 0 1 3 An array at list location k can be of size k or empty NULL 0 3 1 Note: These are the keys for a set (or key,value pairs for a map) 1 18 0 1

Size = 8 Size = 1 if non-empty Find Runtime What is the worst case runtime of find? When the item is not present which requires, a binary search is performed on each list T(n) = log (1) + log () + + log ( k-1 ) = 0 + 1 + + + k-1 = σ k 1 i=0 i = O(k ) But let's put that in terms of the number of elements in the structure (i.e. n) Recall k = log (n+1) So find is O(log (n) ) 0 1 3 An array at list location k can be of size k or empty NULL 0 3 1 Note: These are the keys for a set (or key,value pairs for a map) 1 18 0 1

7 Improving Find's Runtime While we might be okay with [log(n)], how might we improve the find runtime in the general case? Hint: I would be willing to pay O(1) to know if a key is not in a particular array without having to perform find A Bloom filter could be maintained alongside each array and allow us to skip performing a binary search in an array

Size = 8 Size = 8 8 Insertion Algorithm Let j be the smallest integer such that array j is empty (first empty slot in the list of arrays) An insertion will cause Location j's array to become filled Locations 0 through j-1 to become empty An array at list location k can be of size k or empty insert(1) j= 0 1 3 NULL 0 1 1 Before insertion 18 0 0 1 3 After insertion 1 0 1 1 18 0

Size = 8 Insertion Algorithm Starting at array 0, iteratively merge the previously merged array with the next, stopping when an empty location is encountered insert(1) 1 0 1 NULL 0 1 NULL 0 1 3 NULL Merge 1 Merge 1 0 1 1 List 0 is full so merge two arrays of size 1 List 1 is full so merge two arrays of size 18 0

10 Insert Examples insert() 0 1 NULL insert(8) 0 1 NULL Cost = 1 / Stop @ 0 insert() 0 1 NULL Cost = 1 / Stop @ 0 8 1 Cost = / Stop @ 1 insert() Cost = 1 / Stop @ 0 0 1 NULL insert(7) Cost = / Stop @ 1 0 1 NULL 7 8 1 insert(1) Cost = / Stop @ 0 1 NULL 1 insert(1) Cost = 1 / Stop @ 0 0 1 NULL 1 7 8 1

11 Insertion Runtime: First Look 0 1 Best case? First list is empty and allows direct insertion in O(1) Worst case? All list entries (arrays) are full so we have to merge at each location In this case we will end with an array of size n= k in position k Also recall merging two sorted arrays of size m/ is Θ(m) So the total cost of all the merges is 1 + + + 8 + + k = Θ( k+1 ) = Θ(n) But if the worst case occurs how soon can it occur again? It seems the costs vary from one insert to the next This is a good place to use amortized analysis insert() insert() insert() insert(1) NULL 0 1 NULL 0 1 NULL 0 1 NULL 1

1 Total Cost for N insertions Reminder: Insert stopping at location k requires 1++++ k-1 + k = k+1-1 = O( k+1 ) merge steps Total cost of n=1 insertions: Stop at: 0,1,0,,0,1,0,3,0,1,0,,0,1,0, Cost: 1 + + 1 + 3 + 1 + + 1 + + 1 + + 1 + 3 + 1 + + 1 + = 1 *n/ + *n/ + 3 *n/8 + *n/1 + *1 = n + n + n + n + *n =n*log (n) + n Amortized cost = Total cost / n operations log (n) + = O(log (n))

13 Amortized Analysis of Insert 0 1 We have said when you end (place an array) in position k you have to do O( k+1 ) work for all the merges How often do we end in position k The 0 th position will be free with probability ½ (p=0.) We will stop at the 1 st position with probability ¼ (p=0.) We will stop at the nd position with probability 1/8 (p=0.1) We will stop at the k th position with probability 1/ k+1 = -(k+1) So we pay k+1 with probability -(k+1) Suppose we have n items in the structure (i.e. max k is log n) what is the expected cost of inserting a new element log(n) σ k=0 k+1 (k+1) log(n) = σ k=0 1 = log(n) insert() Cost = 1 / Stop @ 0 insert() Cost = / Stop @ 1 insert() Cost = 1 / Stop @ 0 insert(1) Cost = / Stop @ NULL 0 1 NULL 0 1 NULL 0 1 NULL 1

Summary Variants of log structured merge trees have found popular usage in industry Pros: Starting array size might be fairly large (size of memory of a single server) Large arrays (from merging) are stored on disk Ease of implementation Sequential access of arrays helps lower its constant factors Operations: Find = log (n) Insert = Amortized log(n) Remove = often not considered/supported More reading: http://www.benstopford.com/01/0//log-structured-merge-trees/