Dynamic Indexability and the Optimality of B-trees and Hash Tables

Size: px
Start display at page:

Download "Dynamic Indexability and the Optimality of B-trees and Hash Tables"

Transcription

1 Dynamic Indexability and the Optimality of B-trees and Hash Tables Ke Yi Hong Kong University of Science and Technology Dynamic Indexability and Lower Bounds for Dynamic One-Dimensional Range Query Indexes, PODS 09 Dynamic External Hashing: The Limit of Buffering, with Zhewei Wei and Qin Zhang, SPAA 09 + some latest development 1-1

2 An index is... An index is a single number calculated from a set of prices Dow Jones, S & P, Hang Seng 2-1

3 An index is... An index is a single number calculated from a set of prices Dow Jones, S & P, Hang Seng An index is a list of keywords and their page numbers in a book An index is an exponent An index is a finger An index is a list of academic publications and their citations 2-2

4 An index is... An index is a single number calculated from a set of prices Dow Jones, S & P, Hang Seng An index is a list of keywords and their page numbers in a book An index is an exponent An index is a finger An index is a list of academic publications and their citations An index (search engine) is an inverted list from keywords to web pages An index (database) is a (disk-based) data structure that improves the speed of data retrieval operations (queries) on a database table. 2-3

5 Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually used in all database systems 3-1

6 Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually used in all database systems B-tree: lookups and range queries; Hash table: lookups 3-2

7 Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually used in all database systems B-tree: lookups and range queries; Hash table: lookups External memory model (I/O model): Memory Memory of size M Each I/O reads/writes a block Disk Disk partitioned into blocks of size B 3-3

8 4-1 The B-tree

9 The B-tree A range query in O(log B N + K/B) I/Os K: output size 4-2

10 The B-tree memory A range query in O(log B N + K/B) I/Os K: output size log B N log B M = log B N M 4-3

11 The B-tree memory A range query in O(log B N + K/B) I/Os K: output size log B N log B M = log B N M The height of B-tree never goes beyond 5 (e.g., if B = 100, then a B-tree with 5 levels stores n = 10 billion records). We will N assume log B M = O(1). 4-4

12 External Hashing h(x) = last digit of x null null null null null null null 55 null null 5-1

13 External Hashing h(x) = last digit of x null null null null null null null 55 null null Ideal hash function assumption: h maps each object to a hash value uniformly independently at random 5-2

14 External Hashing h(x) = last digit of x null null null null null null null 55 null null Ideal hash function assumption: h maps each object to a hash value uniformly independently at random Expected average cost of a successful (or unsuccessful) lookup is 1 + 1/2 Ω(B) disk accesses, provided the load factor is less than a constant smaller than 1 [Knuth, 1973] 5-3

15 Exact Numbers Calculated by Knuth The Art of Computer Programming, volume 3, 1998, page

16 Exact Numbers Calculated by Knuth Extremely close to ideal The Art of Computer Programming, volume 3, 1998, page

17 Now Let s Go Dynamic Focus on insertions first: Both the B-tree and hash table do a search first, then insert into the appropriate block B-tree: Split blocks when necessary Hashing: Rebuild the hash table when too full; extensible hashing [Fagin, Nievergelt, Pippenger, Strong, 79]; linear hashing [Litwin, 80] 7-1

18 Now Let s Go Dynamic Focus on insertions first: Both the B-tree and hash table do a search first, then insert into the appropriate block B-tree: Split blocks when necessary Hashing: Rebuild the hash table when too full; extensible hashing [Fagin, Nievergelt, Pippenger, Strong, 79]; linear hashing [Litwin, 80] These resizing operations only add O(1/B) I/Os amortized per insertion; bottleneck is the first search + insert 7-2

19 Now Let s Go Dynamic Focus on insertions first: Both the B-tree and hash table do a search first, then insert into the appropriate block B-tree: Split blocks when necessary Hashing: Rebuild the hash table when too full; extensible hashing [Fagin, Nievergelt, Pippenger, Strong, 79]; linear hashing [Litwin, 80] These resizing operations only add O(1/B) I/Os amortized per insertion; bottleneck is the first search + insert Cannot hope for lower than 1 I/O per insertion only if the changes must be committed to disk right away (necessary?) 7-3

20 Now Let s Go Dynamic Focus on insertions first: Both the B-tree and hash table do a search first, then insert into the appropriate block B-tree: Split blocks when necessary Hashing: Rebuild the hash table when too full; extensible hashing [Fagin, Nievergelt, Pippenger, Strong, 79]; linear hashing [Litwin, 80] These resizing operations only add O(1/B) I/Os amortized per insertion; bottleneck is the first search + insert Cannot hope for lower than 1 I/O per insertion only if the changes must be committed to disk right away (necessary?) Otherwise we probably can lower the amortized insertion cost by buffering, like numerous problems in external memory, e.g. stack, priority queue,... All of them support an insertion in O(1/B) I/Os the best possible 7-4

21 Dynamic B-trees Dynamic Hash Tables 8-1

22 Dynamic B-trees for Fast Insertions LSM-tree [O Neil, Cheng, Gawlick, O Neil, Acta Informatica 96]: Logarithmic method + B-tree M lm memory l 2 M 9-1

23 Dynamic B-trees for Fast Insertions LSM-tree [O Neil, Cheng, Gawlick, O Neil, Acta Informatica 96]: Logarithmic method + B-tree M lm memory Insertion: O( l B log l N M ) l 2 M Query: O(log l N M ) (omit the K B output term) 9-2

24 Dynamic B-trees for Fast Insertions LSM-tree [O Neil, Cheng, Gawlick, O Neil, Acta Informatica 96]: Logarithmic method + B-tree M lm memory Insertion: O( l B log l N M ) l 2 M Query: O(log l N M ) (omit the K B output term) Stepped merge tree [Jagadish, Narayan, Seshadri, Sudarshan, Kannegantil, VLDB 97]: variant of LSM-tree Insertion: O( 1 B log l N M ) Query: O(l log l N M ) 9-3

25 Dynamic B-trees for Fast Insertions LSM-tree [O Neil, Cheng, Gawlick, O Neil, Acta Informatica 96]: Logarithmic method + B-tree M lm memory Insertion: O( l B log l N M ) l 2 M Query: O(log l N M ) (omit the K B output term) Stepped merge tree [Jagadish, Narayan, Seshadri, Sudarshan, Kannegantil, VLDB 97]: variant of LSM-tree Insertion: O( 1 B log l N M ) Query: O(l log l N M ) Usually l is set to be a constant, then they both have O( 1 B log N M ) insertion and O(log N M ) query 9-4

26 More Dynamic B-trees Buffer-tree (buffered-repository tree) [Arge, WADS 95; Buchsbaum, Goldwasser, Venkatasubramanian, Westbrook, SODA 00] Streaming B-tree [Bender, Farach-Colton, Fineman, Fogel, Kuszmaul, Nelson, SPAA 07] Y-tree [Jermaine, Datta, Omiecinski, VLDB 99] 10-1

27 More Dynamic B-trees Buffer-tree (buffered-repository tree) [Arge, WADS 95; Buchsbaum, Goldwasser, Venkatasubramanian, Westbrook, SODA 00] Streaming B-tree [Bender, Farach-Colton, Fineman, Fogel, Kuszmaul, Nelson, SPAA 07] Y-tree [Jermaine, Datta, Omiecinski, VLDB 99] q log B u 1 B log B 1 1 B Bɛ B ɛ 1 B 10-2

28 More Dynamic B-trees Buffer-tree (buffered-repository tree) [Arge, WADS 95; Buchsbaum, Goldwasser, Venkatasubramanian, Westbrook, SODA 00] Streaming B-tree [Bender, Farach-Colton, Fineman, Fogel, Kuszmaul, Nelson, SPAA 07] Y-tree [Jermaine, Datta, Omiecinski, VLDB 99] q log B u 1 B log B 1 1 B Bɛ B ɛ 1 B Deletions? Standard trick: inserting delete signals 10-3

29 More Dynamic B-trees Buffer-tree (buffered-repository tree) [Arge, WADS 95; Buchsbaum, Goldwasser, Venkatasubramanian, Westbrook, SODA 00] Streaming B-tree [Bender, Farach-Colton, Fineman, Fogel, Kuszmaul, Nelson, SPAA 07] Y-tree [Jermaine, Datta, Omiecinski, VLDB 99] Cache-oblivious model [Demaine, Fineman, Iacono, Langerman, Munro, SODA 10] q log B Deletions? Standard trick: inserting delete signals No better solutions known... u 1 B log B 1 1 B Bɛ B ɛ 1 B 10-4

30 Compare with the rich results in RAM! Range reporting O( log N/ log log N) insertion and query [Andersson, Thorup, JACM 07] O(log N/ log log N) insertion and O(log log N) query [Mortensen, Pagh, Pǎtraşcu, STOC 05] Other results that depend on the word size w Predecessor Θ( log N/ log log N) insertion and query [Andersson, Thorup, JACM 07] Partial-sum Θ(log N) insertion query [Pǎtraşcu, Demaine, SODA 04] 11-1

31 12-1 Are the EM and DB people just dumb?

32 Our Main Result For any dynamic range query index with a query cost of q and an amortized insertion cost of u, the following tradeoff holds { q log(ub/q) = Ω(log B), for q < α log B, α is any constant; ub log q = Ω(log B), for all q. 13-1

33 Our Main Result For any dynamic range query index with a query cost of q and an amortized insertion cost of u, the following tradeoff holds { q log(ub/q) = Ω(log B), for q < α log B, α is any constant; ub log q = Ω(log B), for all q. Assuming log B N M Current upper bounds: = O(1), all the bounds are tight! q log B u 1 B log B 1 1 B Bɛ B ɛ 1 B 13-2

34 Our Main Result For any dynamic range query index with a query cost of q and an amortized insertion cost of u, the following tradeoff holds { q log(ub/q) = Ω(log B), for q < α log B, α is any constant; ub log q = Ω(log B), for all q. Assuming log B N M Current upper bounds: log N M = O(1), all the bounds are tight! q log B u 1 B log B 1 1 B Bɛ B ɛ 1 B 1 B log N M Can t be true for B = o( log n log log n), since the exponential tree achieves u = q = O( log n/ log log n) [Andersson, Thorup, JACM 07]. (n = N/M) 13-3

35 The real question How large does B need to be for buffer-tree to be optimal for range reporting? Known: somewhere between Ω( log n log log n) and O(n ɛ ) 14-1

36 Lower Bound Model: Dynamic Indexability Indexability: [Hellerstein, Koutsoupias, Papadimitriou, PODS 97, JACM 02] 15-1

37 Lower Bound Model: Dynamic Indexability Indexability: [Hellerstein, Koutsoupias, Papadimitriou, PODS 97, JACM 02] Objects are stored in disk blocks of size up to B, possibly with redundancy. 15-2

38 Lower Bound Model: Dynamic Indexability Indexability: [Hellerstein, Koutsoupias, Papadimitriou, PODS 97, JACM 02] a query reports {2,3,4,5} Objects are stored in disk blocks of size up to B, possibly with redundancy. 15-3

39 Lower Bound Model: Dynamic Indexability Indexability: [Hellerstein, Koutsoupias, Papadimitriou, PODS 97, JACM 02] a query reports {2,3,4,5} cost = Objects are stored in disk blocks of size up to B, possibly with redundancy. The query cost is the minimum number of blocks that can cover all the required results (search time ignored!). 15-4

40 Lower Bound Model: Dynamic Indexability Indexability: [Hellerstein, Koutsoupias, Papadimitriou, PODS 97, JACM 02] a query reports {2,3,4,5} cost = Objects are stored in disk blocks of size up to B, possibly with redundancy. The query cost is the minimum number of blocks that can cover all the required results (search time ignored!). Similar in spirit to popular lower bound models: model, semigroup model cell probe 15-5

41 Previous Results on Static Indexability Nearly all external indexing lower bounds are under this model Tradeoff between space (s) and query time (q) 16-1

42 Previous Results on Static Indexability Nearly all external indexing lower bounds are under this model Tradeoff between space (s) and query time (q) 2D range queries: s/n log q = Ω(log(N/B)) [Hellerstein, Koutsoupias, Papadimitriou, PODS 97], [Koutsoupias, Taylor, PODS 98], [Arge, Samoladas, Vitter, PODS 99] 16-2

43 Previous Results on Static Indexability Nearly all external indexing lower bounds are under this model Tradeoff between space (s) and query time (q) 2D range queries: s/n log q = Ω(log(N/B)) [Hellerstein, Koutsoupias, Papadimitriou, PODS 97], [Koutsoupias, Taylor, PODS 98], [Arge, Samoladas, Vitter, PODS 99] 2D stabbing queries: q log(s/n) = Ω(log(N/B)) [Arge, Samoladas, Yi, ESA 04, Algorithmica 99] 16-3

44 Previous Results on Static Indexability Nearly all external indexing lower bounds are under this model Tradeoff between space (s) and query time (q) 2D range queries: s/n log q = Ω(log(N/B)) [Hellerstein, Koutsoupias, Papadimitriou, PODS 97], [Koutsoupias, Taylor, PODS 98], [Arge, Samoladas, Vitter, PODS 99] 2D stabbing queries: q log(s/n) = Ω(log(N/B)) [Arge, Samoladas, Yi, ESA 04, Algorithmica 99] 1D range queries: s = N, q = 1 trivially 16-4

45 Previous Results on Static Indexability Nearly all external indexing lower bounds are under this model Tradeoff between space (s) and query time (q) 2D range queries: s/n log q = Ω(log(N/B)) [Hellerstein, Koutsoupias, Papadimitriou, PODS 97], [Koutsoupias, Taylor, PODS 98], [Arge, Samoladas, Vitter, PODS 99] 2D stabbing queries: q log(s/n) = Ω(log(N/B)) [Arge, Samoladas, Yi, ESA 04, Algorithmica 99] 1D range queries: s = N, q = 1 trivially Adding dynamization makes it much more interesting! 16-5

46 Dynamic Indexability Still consider only insertions 17-1

47 Dynamic Indexability Still consider only insertions time t: memory of size M blocks of size B = snapshot 17-2

48 Dynamic Indexability Still consider only insertions time t: memory of size M blocks of size B = snapshot time t + 1: inserted 17-3

49 Dynamic Indexability Still consider only insertions time t: memory of size M blocks of size B = snapshot time t + 1: inserted time t + 2: inserted 17-4

50 Dynamic Indexability Still consider only insertions time t: memory of size M blocks of size B = snapshot time t + 1: inserted time t + 2: inserted transition cost =

51 Dynamic Indexability Still consider only insertions time t: memory of size M blocks of size B = snapshot time t + 1: inserted time t + 2: inserted transition cost = 2 Update cost: u = amortized transition cost per insertion 17-6

52 The Ball-Shuffling Problem B balls q bins 18-1

53 The Ball-Shuffling Problem B balls q bins cost =

54 The Ball-Shuffling Problem B balls q bins cost = 1 cost = 2 cost of putting the ball directly into a bin = # balls in the bin

55 The Ball-Shuffling Problem B balls q bins 19-1

56 The Ball-Shuffling Problem B balls q bins Shuffle: cost =

57 The Ball-Shuffling Problem B balls q bins Shuffle: cost = 5 Cost of shuffling = # balls in the involved bins 19-3

58 The Ball-Shuffling Problem B balls q bins Shuffle: cost = 5 Cost of shuffling = # balls in the involved bins Putting a ball directly into a bin is a special shuffle 19-4

59 The Ball-Shuffling Problem B balls q bins Shuffle: cost = 5 Cost of shuffling = # balls in the involved bins Putting a ball directly into a bin is a special shuffle Goal: Accommodating all B balls using q bins with minimum cost 19-5

60 The Workload Construction round 1: keys time 20-1

61 The Workload Construction round 1: keys round 2: time 20-2

62 The Workload Construction round 1: keys round 2: round 3: round B: time 20-3

63 The Workload Construction round 1: keys round 2: round 3: round B: time Queries that we require the index to cover with q blocks # queries 2MB 20-4

64 The Workload Construction round 1: round 2: round 3: snapshot snapshot snapshot keys time round B: snapshot Queries that we require the index to cover with q blocks # queries 2MB Snapshots of the dynamic index considered 20-5

65 The Workload Construction round 1: round 2: round 3: round B: keys There exists a query such that The B objects of the query reside in q blocks in all snapshots time All of its objects are on disk in all B snapshots (we have MB queries) The index moves its objects ub 2 times in total 21-1

66 The Reduction An index with update cost u and query A gives us a solution to the ball-shuffling game with cost ub 2 for B balls and q bins 22-1

67 The Reduction An index with update cost u and query A gives us a solution to the ball-shuffling game with cost ub 2 for B balls and q bins Lower bound on the ball-shuffling problem: Theorem: The cost of any solution for the ball-shuffling problem is at least { Ω(q B 1+Ω(1/q) ), for q < α log B where α is any constant; Ω(B log q B), for any q. 22-2

68 The Reduction An index with update cost u and query A gives us a solution to the ball-shuffling game with cost ub 2 for B balls and q bins Lower bound on the ball-shuffling problem: Theorem: The cost of any solution for the ball-shuffling problem is at least { Ω(q B 1+Ω(1/q) ), for q < α log B where α is any constant; Ω(B log q B), for any q. { q log(ub/q) = Ω(log B), for q < α log B, α is any constant; ub log q = Ω(log B), for all q. 22-3

69 Ball-Shuffling Lower Bounds Theorem: The cost of any solution for the ball-shuffling problem is at least { Ω(q B 1+Ω(1/q) ), for q < α log B where α is any constant; Ω(B log q B), for any q. cost lower bound q 23-1

70 Ball-Shuffling Lower Bounds Theorem: The cost of any solution for the ball-shuffling problem is at least { Ω(q B 1+Ω(1/q) ), for q < α log B where α is any constant; Ω(B log q B), for any q. cost lower bound B 2 1 q 23-2

71 Ball-Shuffling Lower Bounds Theorem: The cost of any solution for the ball-shuffling problem is at least { Ω(q B 1+Ω(1/q) ), for q < α log B where α is any constant; Ω(B log q B), for any q. cost lower bound B 2 B 4/3 1 2 q 23-3

72 Ball-Shuffling Lower Bounds Theorem: The cost of any solution for the ball-shuffling problem is at least { Ω(q B 1+Ω(1/q) ), for q < α log B where α is any constant; Ω(B log q B), for any q. cost lower bound B 2 B 4/3 B log B 1 2 log B q 23-4

73 Ball-Shuffling Lower Bounds Theorem: The cost of any solution for the ball-shuffling problem is at least { Ω(q B 1+Ω(1/q) ), for q < α log B where α is any constant; Ω(B log q B), for any q. cost lower bound B 2 B 4/3 B log B B 1 2 log B B ɛ q 23-5

74 Dynamic B-trees Dynamic Hash Tables 24-1

75 Dynamic Hash Tables B-tree query I/O: O(log B N M ) Hash table query I/O: 1 + 1/2 Ω(B) ; insertion the same 25-1

76 Dynamic Hash Tables B-tree query I/O: O(log B N M ) Hash table query I/O: 1 + 1/2 Ω(B) ; insertion the same A long-time conjecture in the external memory community: The insertion cost must be Ω(1) I/Os if the query cost is required to be O(1) I/Os. 25-2

77 Dynamic Hash Tables B-tree query I/O: O(log B N M ) Hash table query I/O: 1 + 1/2 Ω(B) ; insertion the same A long-time conjecture in the external memory community: The insertion cost must be Ω(1) I/Os if the query cost is required to be O(1) I/Os. Buffering is useless? 25-3

78 Dynamic Hash Tables (for successful queries) Logarithmic method (folklore?) m memory 2m 4m 8m 26-1

79 Dynamic Hash Tables (for successful queries) Logarithmic method (folklore?) Insertion: O( 1 B log N M ) Expected average query: O(1) m 2m memory 4m 8m 26-2

80 Dynamic Hash Tables (for successful queries) Logarithmic method (folklore?) Insertion: O( 1 B log N M ) Expected average query: O(1) Improving query time Idea: Keep one table large enough m 2m memory 4m 8m 26-3

81 Dynamic Hash Tables (for successful queries) Logarithmic method (folklore?) Insertion: O( 1 B log N M ) Expected average query: O(1) m 2m memory 4m 8m Improving query time For some parameter β = B c, c 1 Idea: Keep one table large enough x x/β 26-4

82 Dynamic Hash Tables (for successful queries) Logarithmic method (folklore?) Insertion: O( 1 B log N M ) Expected average query: O(1) m 2m memory 4m 8m Improving query time For some parameter β = B c, c 1 Idea: Keep one table large enough x x/β 26-5

83 Dynamic Hash Tables (for successful queries) Logarithmic method (folklore?) Insertion: O( 1 B log N M ) Expected average query: O(1) m 2m memory 4m 8m Improving query time Idea: Keep one table large enough For some parameter β = B c, c 1 x x/β 2x 26-6

84 Dynamic Hash Tables (for successful queries) Logarithmic method (folklore?) Insertion: O( 1 B log N M ) Expected average query: O(1) m 2m memory 4m 8m Improving query time For some parameter β = B c, c 1 Idea: Keep one table large enough x x/β Insertion: O(B c 1 ) 2x 26-7

85 Dynamic Hash Tables (for successful queries) Logarithmic method (folklore?) Insertion: O( 1 B log N M ) Expected average query: O(1) m 2m memory 4m 8m Improving query time For some parameter β = B c, c 1 Idea: Keep one table large enough x x/β Insertion: O(B c 1 ) Query: 1+O(1/B c ) 2x 26-8

86 Dynamic Hash Tables (for successful queries) Logarithmic method (folklore?) Insertion: O( 1 B log N M ) Expected average query: O(1) m 2m memory 4m 8m Improving query time For some parameter β = B c, c 1 Idea: Keep one table large enough x x/β Insertion: O(B c 1 ) Query: 1+O(1/B c ) 2x Still far from the target 1+1/Ω(2 B ) 26-9

87 Query-Insertion Tradeoff for Successful queries [Wei, Yi, Zhang, SPAA 09] Insertion standard hashing 1 + 1/2 Ω(B) upper bounds lower bounds 1 O(1/B (c 1)/4 ) O(1) Ω(1) O(B c 1 ) 1 Ω(B c 1 ) 1+Θ(1/B c ) 1 + Θ(1/B c ), c < 1 c > Θ(1/B) Query 27-1

88 Indexability Too Strong! Naïve solution: For every B items, write to a block. Query cost is 1, insertion is 1/B 28-1

89 Indexability Too Strong! Naïve solution: For every B items, write to a block. Query cost is 1, insertion is 1/B Too many possible mappings! 28-2

90 Indexability Too Strong! Naïve solution: For every B items, write to a block. Query cost is 1, insertion is 1/B Too many possible mappings! Indexabilty + information-theoretical argument If with only the information in memory, the hash table cannot locate the item, then querying it takes at least 2 I/Os. 28-3

91 The Abstraction Consider the layout of a hash table at any snapshot. Denote all the blocks on disk by B 1, B 2,..., B d. Let f : U {1,..., d} be any function computable within memory. We divide items inserted into 3 zones with respect to f. Memory zone M: set of items stored in memory. t q = 0. Fast zone F : set of items x such that x B f(x). t q = 1. Slow zone S: The rest of items. t q =

92 The Key The hash table can employ a family F of at most 2 M distinct f s. Note that the current f adopted by the hash table is dependent upon the already inserted items, but the family F has to be fixed beforehand. 30-1

93 How about All Queries? (Latest results) We are essentially talking about the membership problem Can t use indexability model Have to use cell probe model 31-1

94 All queries (the membership problem) (The cell probe model) Query truncated buffer tree n ɛ upper bounds lower bounds buffer tree l B log l n, log l n hashing 1 + 1/2 Ω(B) 0 1 B log M B n 1 B ɛ B log n log n 1 B Insert 32-1

95 All queries (the membership problem) (The cell probe model) Query truncated buffer tree n ɛ upper bounds lower bounds buffer tree l B log l n, log l n [Yi, Zhang, SODA 10] hashing B log M B n 1 B log n B ɛ log n B 1 + 1/2 Ω(B) Insert 32-2

96 All queries (the membership problem) (The cell probe model) Query truncated buffer tree n ɛ upper bounds lower bounds buffer tree log B log n n l B log l n, log l n [Yi, Zhang, SODA 10] hashing B log M B [Verbin, Zhang, STOC 10] n 1 B log n B ɛ log n B 1 + 1/2 Ω(B) Insert 32-3

97 THE BIG BOLD CONJECTURE All these fundamental data structure problems have the same query-update tradeoff in external memory when u = o(1), for sufficiently large B. Partial-sum: all B; Range reporting: B > n ɛ ; Predecessor: unknown. 33-1

98 THE BIG BOLD CONJECTURE All these fundamental data structure problems have the same query-update tradeoff in external memory when u = o(1), for sufficiently large B. Partial-sum: all B; Range reporting: B > n ɛ ; Predecessor: unknown. Strong implication: The buffer tree (and many of the log method based structures) is simple, practical, versatile, and optimal! 33-2

99 The End T HAN K YOU Q and A 34-1

Dynamic Indexability and Lower Bounds for Dynamic One-Dimensional Range Query Indexes

Dynamic Indexability and Lower Bounds for Dynamic One-Dimensional Range Query Indexes Dynamic Indexability and Lower Bounds for Dynamic One-Dimensional Range Query Indexes Ke Yi HKUST 1-1 First Annual SIGMOD Programming Contest (to be held at SIGMOD 2009) Student teams from degree granting

More information

On the Cell Probe Complexity of Dynamic Membership or

On the Cell Probe Complexity of Dynamic Membership or On the Cell Probe Complexity of Dynamic Membership or Can We Batch Up Updates in External Memory? Ke Yi and Qin Zhang Hong Kong University of Science & Technology SODA 2010 Jan. 17, 2010 1-1 The power

More information

Hashing Based Dictionaries in Different Memory Models. Zhewei Wei

Hashing Based Dictionaries in Different Memory Models. Zhewei Wei Hashing Based Dictionaries in Different Memory Models Zhewei Wei Outline Introduction RAM model I/O model Cache-oblivious model Open questions Outline Introduction RAM model I/O model Cache-oblivious model

More information

Write-Optimization in B-Trees. Bradley C. Kuszmaul MIT & Tokutek

Write-Optimization in B-Trees. Bradley C. Kuszmaul MIT & Tokutek Write-Optimization in B-Trees Bradley C. Kuszmaul MIT & Tokutek The Setup What and Why B-trees? B-trees are a family of data structures. B-trees implement an ordered map. Implement GET, PUT, NEXT, PREV.

More information

The Batched Predecessor Problem in External Memory

The Batched Predecessor Problem in External Memory The Batched Predecessor Problem in External Memory Michael A. Bender 12, Martín Farach-Colton 23, Mayank Goswami 4, Dzejla Medjedovic 5, Pablo Montes 1, and Meng-Tsung Tsai 3 1 Stony Brook University,

More information

Lecture 22 November 19, 2015

Lecture 22 November 19, 2015 CS 229r: Algorithms for ig Data Fall 2015 Prof. Jelani Nelson Lecture 22 November 19, 2015 Scribe: Johnny Ho 1 Overview Today we re starting a completely new topic, which is the external memory model,

More information

Lecture 8 13 March, 2012

Lecture 8 13 March, 2012 6.851: Advanced Data Structures Spring 2012 Prof. Erik Demaine Lecture 8 13 March, 2012 1 From Last Lectures... In the previous lecture, we discussed the External Memory and Cache Oblivious memory models.

More information

Indexing Big Data. Big data problem. Michael A. Bender. data ingestion. answers. oy vey ??? ????????? query processor. data indexing.

Indexing Big Data. Big data problem. Michael A. Bender. data ingestion. answers. oy vey ??? ????????? query processor. data indexing. Indexing Big Data Michael A. Bender Big data problem data ingestion answers oy vey 365 data indexing query processor 42???????????? queries The problem with big data is microdata Microdata is small data.

More information

Lecture 19 Apr 25, 2007

Lecture 19 Apr 25, 2007 6.851: Advanced Data Structures Spring 2007 Prof. Erik Demaine Lecture 19 Apr 25, 2007 Scribe: Aditya Rathnam 1 Overview Previously we worked in the RA or cell probe models, in which the cost of an algorithm

More information

Databases & External Memory Indexes, Write Optimization, and Crypto-searches

Databases & External Memory Indexes, Write Optimization, and Crypto-searches Databases & External Memory Indexes, Write Optimization, and Crypto-searches Michael A. Bender Tokutek & Stony Brook Martin Farach-Colton Tokutek & Rutgers What s a Database? DBs are systems for: Storing

More information

Algorithms and Data Structures: Efficient and Cache-Oblivious

Algorithms and Data Structures: Efficient and Cache-Oblivious 7 Ritika Angrish and Dr. Deepak Garg Algorithms and Data Structures: Efficient and Cache-Oblivious Ritika Angrish* and Dr. Deepak Garg Department of Computer Science and Engineering, Thapar University,

More information

Lecture April, 2010

Lecture April, 2010 6.851: Advanced Data Structures Spring 2010 Prof. Eri Demaine Lecture 20 22 April, 2010 1 Memory Hierarchies and Models of Them So far in class, we have wored with models of computation lie the word RAM

More information

Report on Cache-Oblivious Priority Queue and Graph Algorithm Applications[1]

Report on Cache-Oblivious Priority Queue and Graph Algorithm Applications[1] Report on Cache-Oblivious Priority Queue and Graph Algorithm Applications[1] Marc André Tanner May 30, 2014 Abstract This report contains two main sections: In section 1 the cache-oblivious computational

More information

A tale of two trees Exploring write optimized index structures. Ryan Johnson

A tale of two trees Exploring write optimized index structures. Ryan Johnson 1 A tale of two trees Exploring write optimized index structures Ryan Johnson The I/O model of computation 2 M bytes of RAM Blocks of size B N bytes on disk Goal: minimize block transfers to/from disk

More information

BetrFS: A Right-Optimized Write-Optimized File System

BetrFS: A Right-Optimized Write-Optimized File System BetrFS: A Right-Optimized Write-Optimized File System Amogh Akshintala, Michael Bender, Kanchan Chandnani, Pooja Deo, Martin Farach-Colton, William Jannen, Rob Johnson, Zardosht Kasheff, Bradley C. Kuszmaul,

More information

The TokuFS Streaming File System

The TokuFS Streaming File System The TokuFS Streaming File System John Esmet Rutgers Michael A. Bender Stony Brook Martin Farach-Colton Rutgers Bradley C. Kuszmaul MIT Abstract The TokuFS file system outperforms write-optimized file systems

More information

Lecture 7 8 March, 2012

Lecture 7 8 March, 2012 6.851: Advanced Data Structures Spring 2012 Lecture 7 8 arch, 2012 Prof. Erik Demaine Scribe: Claudio A Andreoni 2012, Sebastien Dabdoub 2012, Usman asood 2012, Eric Liu 2010, Aditya Rathnam 2007 1 emory

More information

38 Cache-Oblivious Data Structures

38 Cache-Oblivious Data Structures 38 Cache-Oblivious Data Structures Lars Arge Duke University Gerth Stølting Brodal University of Aarhus Rolf Fagerberg University of Southern Denmark 38.1 The Cache-Oblivious Model... 38-1 38.2 Fundamental

More information

The History of I/O Models Erik Demaine

The History of I/O Models Erik Demaine The History of I/O Models Erik Demaine MASSACHUSETTS INSTITUTE OF TECHNOLOGY Memory Hierarchies in Practice CPU 750 ps Registers 100B Level 1 Cache 100KB Level 2 Cache 1MB 10GB 14 ns Main Memory 1EB-1ZB

More information

Dynamic Range Majority Data Structures

Dynamic Range Majority Data Structures Dynamic Range Majority Data Structures Amr Elmasry 1, Meng He 2, J. Ian Munro 3, and Patrick K. Nicholson 3 1 Department of Computer Science, University of Copenhagen, Denmark 2 Faculty of Computer Science,

More information

Lecture Notes: External Interval Tree. 1 External Interval Tree The Static Version

Lecture Notes: External Interval Tree. 1 External Interval Tree The Static Version Lecture Notes: External Interval Tree Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk This lecture discusses the stabbing problem. Let I be

More information

Optimal Tracking of Distributed Heavy Hitters and Quantiles

Optimal Tracking of Distributed Heavy Hitters and Quantiles Optimal Tracking of Distributed Heavy Hitters and Quantiles Qin Zhang Hong Kong University of Science & Technology Joint work with Ke Yi PODS 2009 June 30, 2009 - 2- Models and Problems Distributed streaming

More information

ICS 691: Advanced Data Structures Spring Lecture 8

ICS 691: Advanced Data Structures Spring Lecture 8 ICS 691: Advanced Data Structures Spring 2016 Prof. odari Sitchinava Lecture 8 Scribe: Ben Karsin 1 Overview In the last lecture we continued looking at arborally satisfied sets and their equivalence to

More information

Computational Geometry in the Parallel External Memory Model

Computational Geometry in the Parallel External Memory Model Computational Geometry in the Parallel External Memory Model Nodari Sitchinava Institute for Theoretical Informatics Karlsruhe Institute of Technology nodari@ira.uka.de 1 Introduction Continued advances

More information

Optimality in External Memory Hashing

Optimality in External Memory Hashing Optimality in External Memory Hashing Morten Skaarup Jensen and Rasmus Pagh April 2, 2007 Abstract Hash tables on external memory are commonly used for indexing in database management systems. In this

More information

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far Chapter 5 Hashing 2 Introduction hashing performs basic operations, such as insertion, deletion, and finds in average time better than other ADTs we ve seen so far 3 Hashing a hash table is merely an hashing

More information

A Distribution-Sensitive Dictionary with Low Space Overhead

A Distribution-Sensitive Dictionary with Low Space Overhead A Distribution-Sensitive Dictionary with Low Space Overhead Prosenjit Bose, John Howat, and Pat Morin School of Computer Science, Carleton University 1125 Colonel By Dr., Ottawa, Ontario, CANADA, K1S 5B6

More information

Massive Data Algorithmics. Lecture 12: Cache-Oblivious Model

Massive Data Algorithmics. Lecture 12: Cache-Oblivious Model Typical Computer Hierarchical Memory Basics Data moved between adjacent memory level in blocks A Trivial Program A Trivial Program: d = 1 A Trivial Program: d = 1 A Trivial Program: n = 2 24 A Trivial

More information

Cache-Oblivious Persistence

Cache-Oblivious Persistence Cache-Oblivious Persistence Pooya Davoodi 1,, Jeremy T. Fineman 2,, John Iacono 1,,andÖzgür Özkan 1 1 New York University 2 Georgetown University Abstract. Partial persistence is a general transformation

More information

Theory and Implementation of Dynamic Data Structures for the GPU

Theory and Implementation of Dynamic Data Structures for the GPU Theory and Implementation of Dynamic Data Structures for the GPU John Owens UC Davis Martín Farach-Colton Rutgers NVIDIA OptiX & the BVH Tero Karras. Maximizing parallelism in the construction of BVHs,

More information

The Algorithmics of Write Optimization

The Algorithmics of Write Optimization The Algorithmics of Write Optimization Michael A. Bender Stony Brook University Birds-eye view of data storage incoming data queries stored data answers Birds-eye view of data storage? incoming data queries

More information

Integer Algorithms and Data Structures

Integer Algorithms and Data Structures Integer Algorithms and Data Structures and why we should care about them Vladimír Čunát Department of Theoretical Computer Science and Mathematical Logic Doctoral Seminar 2010/11 Outline Introduction Motivation

More information

Planar Point Location in Sublogarithmic Time

Planar Point Location in Sublogarithmic Time Planar Point Location in Sublogarithmic Time Mihai Pătraşcu (MIT) Point Location in o(log n) Time, Voronoi Diagrams in o(n log n) Time, and Other Transdichotomous Results in Computational Geometry Timothy

More information

systems & research project

systems & research project class 4 systems & research project prof. HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS265/ index index knows order about the data data filtering data: point/range queries index data A B C sorted A B C initial

More information

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is: CS 124 Section #8 Hashing, Skip Lists 3/20/17 1 Probability Review Expectation (weighted average): the expectation of a random quantity X is: x= x P (X = x) For each value x that X can take on, we look

More information

Module 5: Hashing. CS Data Structures and Data Management. Reza Dorrigiv, Daniel Roche. School of Computer Science, University of Waterloo

Module 5: Hashing. CS Data Structures and Data Management. Reza Dorrigiv, Daniel Roche. School of Computer Science, University of Waterloo Module 5: Hashing CS 240 - Data Structures and Data Management Reza Dorrigiv, Daniel Roche School of Computer Science, University of Waterloo Winter 2010 Reza Dorrigiv, Daniel Roche (CS, UW) CS240 - Module

More information

Multi-core Computing Lecture 2

Multi-core Computing Lecture 2 Multi-core Computing Lecture 2 MADALGO Summer School 2012 Algorithms for Modern Parallel and Distributed Models Phillip B. Gibbons Intel Labs Pittsburgh August 21, 2012 Multi-core Computing Lectures: Progress-to-date

More information

Predecessor Data Structures. Philip Bille

Predecessor Data Structures. Philip Bille Predecessor Data Structures Philip Bille Outline Predecessor problem First tradeoffs Simple tries x-fast tries y-fast tries Predecessor Problem Predecessor Problem The predecessor problem: Maintain a set

More information

6 Distributed data management I Hashing

6 Distributed data management I Hashing 6 Distributed data management I Hashing There are two major approaches for the management of data in distributed systems: hashing and caching. The hashing approach tries to minimize the use of communication

More information

Path Queries in Weighted Trees

Path Queries in Weighted Trees Path Queries in Weighted Trees Meng He 1, J. Ian Munro 2, and Gelin Zhou 2 1 Faculty of Computer Science, Dalhousie University, Canada. mhe@cs.dal.ca 2 David R. Cheriton School of Computer Science, University

More information

Hashing. Yufei Tao. Department of Computer Science and Engineering Chinese University of Hong Kong

Hashing. Yufei Tao. Department of Computer Science and Engineering Chinese University of Hong Kong Department of Computer Science and Engineering Chinese University of Hong Kong In this lecture, we will revisit the dictionary search problem, where we want to locate an integer v in a set of size n or

More information

Nearest Neighbor with KD Trees

Nearest Neighbor with KD Trees Case Study 2: Document Retrieval Finding Similar Documents Using Nearest Neighbors Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox January 22 nd, 2013 1 Nearest

More information

Cache-Oblivious Algorithms A Unified Approach to Hierarchical Memory Algorithms

Cache-Oblivious Algorithms A Unified Approach to Hierarchical Memory Algorithms Cache-Oblivious Algorithms A Unified Approach to Hierarchical Memory Algorithms Aarhus University Cache-Oblivious Current Trends Algorithms in Algorithms, - A Unified Complexity Approach to Theory, Hierarchical

More information

arxiv: v1 [cs.cr] 19 Sep 2017

arxiv: v1 [cs.cr] 19 Sep 2017 BIOS ORAM: Improved Privacy-Preserving Data Access for Parameterized Outsourced Storage arxiv:1709.06534v1 [cs.cr] 19 Sep 2017 Michael T. Goodrich University of California, Irvine Dept. of Computer Science

More information

INSERTION SORT is O(n log n)

INSERTION SORT is O(n log n) Michael A. Bender Department of Computer Science, SUNY Stony Brook bender@cs.sunysb.edu Martín Farach-Colton Department of Computer Science, Rutgers University farach@cs.rutgers.edu Miguel Mosteiro Department

More information

39 B-trees and Cache-Oblivious B-trees with Different-Sized Atomic Keys

39 B-trees and Cache-Oblivious B-trees with Different-Sized Atomic Keys 39 B-trees and Cache-Oblivious B-trees with Different-Sized Atomic Keys Michael A. Bender, Department of Computer Science, Stony Brook University Roozbeh Ebrahimi, Google Inc. Haodong Hu, School of Information

More information

A Multi Join Algorithm Utilizing Double Indices

A Multi Join Algorithm Utilizing Double Indices Journal of Convergence Information Technology Volume 4, Number 4, December 2009 Hanan Ahmed Hossni Mahmoud Abd Alla Information Technology Department College of Computer and Information Sciences King Saud

More information

How TokuDB Fractal TreeTM. Indexes Work. Bradley C. Kuszmaul. Guest Lecture in MIT Performance Engineering, 18 November 2010.

How TokuDB Fractal TreeTM. Indexes Work. Bradley C. Kuszmaul. Guest Lecture in MIT Performance Engineering, 18 November 2010. 6.172 How Fractal Trees Work 1 How TokuDB Fractal TreeTM Indexes Work Bradley C. Kuszmaul Guest Lecture in MIT 6.172 Performance Engineering, 18 November 2010. 6.172 How Fractal Trees Work 2 I m an MIT

More information

CACHE-OBLIVIOUS MAPS. Edward Kmett McGraw Hill Financial. Saturday, October 26, 13

CACHE-OBLIVIOUS MAPS. Edward Kmett McGraw Hill Financial. Saturday, October 26, 13 CACHE-OBLIVIOUS MAPS Edward Kmett McGraw Hill Financial CACHE-OBLIVIOUS MAPS Edward Kmett McGraw Hill Financial CACHE-OBLIVIOUS MAPS Indexing and Machine Models Cache-Oblivious Lookahead Arrays Amortization

More information

Funnel Heap - A Cache Oblivious Priority Queue

Funnel Heap - A Cache Oblivious Priority Queue Alcom-FT Technical Report Series ALCOMFT-TR-02-136 Funnel Heap - A Cache Oblivious Priority Queue Gerth Stølting Brodal, Rolf Fagerberg Abstract The cache oblivious model of computation is a two-level

More information

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion, Introduction Chapter 5 Hashing hashing performs basic operations, such as insertion, deletion, and finds in average time 2 Hashing a hash table is merely an of some fixed size hashing converts into locations

More information

Data Structures for Range Minimum Queries in Multidimensional Arrays

Data Structures for Range Minimum Queries in Multidimensional Arrays Data Structures for Range Minimum Queries in Multidimensional Arrays Hao Yuan Mikhail J. Atallah Purdue University January 17, 2010 Yuan & Atallah (PURDUE) Multidimensional RMQ January 17, 2010 1 / 28

More information

Outline for Today. How can we speed up operations that work on integer data? A simple data structure for ordered dictionaries.

Outline for Today. How can we speed up operations that work on integer data? A simple data structure for ordered dictionaries. van Emde Boas Trees Outline for Today Data Structures on Integers How can we speed up operations that work on integer data? Tiered Bitvectors A simple data structure for ordered dictionaries. van Emde

More information

Flash memory: Models and Algorithms. Guest Lecturer: Deepak Ajwani

Flash memory: Models and Algorithms. Guest Lecturer: Deepak Ajwani Flash memory: Models and Algorithms Guest Lecturer: Deepak Ajwani Flash Memory Flash Memory Flash Memory Flash Memory Flash Memory Flash Memory Flash Memory Solid State Disks A data-storage device that

More information

Nearest Neighbor with KD Trees

Nearest Neighbor with KD Trees Case Study 2: Document Retrieval Finding Similar Documents Using Nearest Neighbors Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox January 22 nd, 2013 1 Nearest

More information

Searchable Symmetric Encryption: Optimal Locality in Linear Space via Two-Dimensional Balanced Allocations

Searchable Symmetric Encryption: Optimal Locality in Linear Space via Two-Dimensional Balanced Allocations Searchable Symmetric Encryption: Optimal Locality in Linear Space via Two-Dimensional Balanced Allocations Gilad Asharov Cornell-Tech Moni Naor Gil Segev Ido Shahaf (Hebrew University) Weizmann Hebrew

More information

Fast Prefix Search in Little Space, with Applications

Fast Prefix Search in Little Space, with Applications Fast Prefix Search in Little Space, with Applications Djamal Belazzougui 1, Paolo Boldi 2, Rasmus Pagh 3, and Sebastiano Vigna 1 1 Université Paris Diderot Paris 7, France 2 Dipartimento di Scienze dell

More information

Locality- Sensitive Hashing Random Projections for NN Search

Locality- Sensitive Hashing Random Projections for NN Search Case Study 2: Document Retrieval Locality- Sensitive Hashing Random Projections for NN Search Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 18, 2017 Sham Kakade

More information

A NEW SORTING ALGORITHM FOR ACCELERATING JOIN-BASED QUERIES

A NEW SORTING ALGORITHM FOR ACCELERATING JOIN-BASED QUERIES Mathematical and Computational Applications, Vol. 15, No. 2, pp. 208-217, 2010. Association for Scientific Research A NEW SORTING ALGORITHM FOR ACCELERATING JOIN-BASED QUERIES Hassan I. Mathkour Department

More information

Comparing the strength of query types in property testing: The case of testing k-colorability

Comparing the strength of query types in property testing: The case of testing k-colorability Comparing the strength of query types in property testing: The case of testing k-colorability Ido Ben-Eliezer Tali Kaufman Michael Krivelevich Dana Ron Abstract We study the power of four query models

More information

arxiv: v1 [cs.ds] 21 Jun 2010

arxiv: v1 [cs.ds] 21 Jun 2010 Dynamic Range Reporting in External Memory Yakov Nekrich Department of Computer Science, University of Bonn. arxiv:1006.4093v1 [cs.ds] 21 Jun 2010 Abstract In this paper we describe a dynamic external

More information

High Performance Data Structures: Theory and Practice. Roman Elizarov Devexperts, 2006

High Performance Data Structures: Theory and Practice. Roman Elizarov Devexperts, 2006 High Performance Data Structures: Theory and Practice Roman Elizarov Devexperts, 2006 e1 High performance? NO!!! We ll talk about different High Performance. One that aimed at avoiding all those racks

More information

Practice Midterm Exam Solutions

Practice Midterm Exam Solutions CSE 332: Data Abstractions Autumn 2015 Practice Midterm Exam Solutions Name: Sample Solutions ID #: 1234567 TA: The Best Section: A9 INSTRUCTIONS: You have 50 minutes to complete the exam. The exam is

More information

I/O-Algorithms Lars Arge Aarhus University

I/O-Algorithms Lars Arge Aarhus University I/O-Algorithms Aarhus University April 10, 2008 I/O-Model Block I/O D Parameters N = # elements in problem instance B = # elements that fits in disk block M = # elements that fits in main memory M T =

More information

Lecture Notes: Range Searching with Linear Space

Lecture Notes: Range Searching with Linear Space Lecture Notes: Range Searching with Linear Space Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk In this lecture, we will continue our discussion

More information

External-Memory Search Trees with Fast Insertions. Jelani Nelson

External-Memory Search Trees with Fast Insertions. Jelani Nelson External-Memory Search Trees with Fast Insertions by Jelani Nelson Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of

More information

I/O Model. Cache-Oblivious Algorithms : Algorithms in the Real World. Advantages of Cache-Oblivious Algorithms 4/9/13

I/O Model. Cache-Oblivious Algorithms : Algorithms in the Real World. Advantages of Cache-Oblivious Algorithms 4/9/13 I/O Model 15-853: Algorithms in the Real World Locality II: Cache-oblivious algorithms Matrix multiplication Distribution sort Static searching Abstracts a single level of the memory hierarchy Fast memory

More information

Theoretical Computer Science

Theoretical Computer Science Theoretical Computer Science 412 (2011) 3579 3588 Contents lists available at ScienceDirect Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs Cache-oblivious index for approximate

More information

arxiv: v2 [cs.ds] 1 Jul 2014

arxiv: v2 [cs.ds] 1 Jul 2014 Cache-Oblivious Persistence Pooya Davoodi 1, Jeremy T. Fineman 2, John Iacono 1, and Özgür Özkan1 1 New York University 2 Georgetown University arxiv:1402.5492v2 [cs.ds] 1 Jul 2014 Abstract Partial persistence

More information

Introduction to I/O Efficient Algorithms (External Memory Model)

Introduction to I/O Efficient Algorithms (External Memory Model) Introduction to I/O Efficient Algorithms (External Memory Model) Jeff M. Phillips August 30, 2013 Von Neumann Architecture Model: CPU and Memory Read, Write, Operations (+,,,...) constant time polynomially

More information

BRICS Research Activities Algorithms

BRICS Research Activities Algorithms BRICS Research Activities Algorithms Gerth Stølting Brodal BRICS Retreat, Sandbjerg, 21 23 October 2002 1 Outline of Talk The Algorithms Group Courses Algorithm Events Expertise within BRICS Examples Algorithms

More information

Dynamic Graph Algorithms and Complexity

Dynamic Graph Algorithms and Complexity Dynamic Graph Algorithms and Complexity Danupon Nanongkai KTH, Sweden Disclaimer: Errors are possible. Please double check the cited sources for accuracy. 1 About this talk I will try to answer these questions.?

More information

Simple and Semi-Dynamic Structures for Cache-Oblivious Planar Orthogonal Range Searching

Simple and Semi-Dynamic Structures for Cache-Oblivious Planar Orthogonal Range Searching Simple and Semi-Dynamic Structures for Cache-Oblivious Planar Orthogonal Range Searching ABSTRACT Lars Arge Department of Computer Science University of Aarhus IT-Parken, Aabogade 34 DK-8200 Aarhus N Denmark

More information

Applications of Geometric Spanner

Applications of Geometric Spanner Title: Name: Affil./Addr. 1: Affil./Addr. 2: Affil./Addr. 3: Keywords: SumOriWork: Applications of Geometric Spanner Networks Joachim Gudmundsson 1, Giri Narasimhan 2, Michiel Smid 3 School of Information

More information

Hierarchical Memory. Modern machines have complicated memory hierarchy

Hierarchical Memory. Modern machines have complicated memory hierarchy Hierarchical Memory Modern machines have complicated memory hierarchy Levels get larger and slower further away from CPU Data moved between levels using large blocks Lecture 2: Slow IO Review Disk access

More information

Memory Management. Memory Management

Memory Management. Memory Management Memory Management Gordon College Stephen Brinton Memory Management Background Swapping Contiguous Allocation Paging Segmentation Segmentation with Paging 1 Background Program must be brought into memory

More information

Introduction to Randomized Algorithms

Introduction to Randomized Algorithms Introduction to Randomized Algorithms Gopinath Mishra Advanced Computing and Microelectronics Unit Indian Statistical Institute Kolkata 700108, India. Organization 1 Introduction 2 Some basic ideas from

More information

Dynamic Optimality Almost

Dynamic Optimality Almost Dynamic Optimality Almost Erik D. Demaine Dion Harmon John Iacono Mihai Pǎtraşcu Abstract We present an O(lg lg n)-competitive online binary search tree, improving upon the best previous (trivial) competitive

More information

On Asymptotic Cost of Triangle Listing in Random Graphs

On Asymptotic Cost of Triangle Listing in Random Graphs On Asymptotic Cost of Triangle Listing in Random Graphs Di Xiao, Yi Cui, Daren B.H. Cline, Dmitri Loguinov Internet Research Lab Department of Computer Science and Engineering Texas A&M University May

More information

Uniform Hashing in Constant Time and Linear Space

Uniform Hashing in Constant Time and Linear Space Uniform Hashing in Constant Time and Linear Space Anna Östlin IT University of Copenhagen Glentevej 67, 2400 København NV Denmark annao@itu.dk Rasmus Pagh IT University of Copenhagen Glentevej 67, 2400

More information

Algorithms in Systems Engineering IE172. Midterm Review. Dr. Ted Ralphs

Algorithms in Systems Engineering IE172. Midterm Review. Dr. Ted Ralphs Algorithms in Systems Engineering IE172 Midterm Review Dr. Ted Ralphs IE172 Midterm Review 1 Textbook Sections Covered on Midterm Chapters 1-5 IE172 Review: Algorithms and Programming 2 Introduction to

More information

Module 5: Hash-Based Indexing

Module 5: Hash-Based Indexing Module 5: Hash-Based Indexing Module Outline 5.1 General Remarks on Hashing 5. Static Hashing 5.3 Extendible Hashing 5.4 Linear Hashing Web Forms Transaction Manager Lock Manager Plan Executor Operator

More information

I/O-Efficient Data Structures for Colored Range and Prefix Reporting

I/O-Efficient Data Structures for Colored Range and Prefix Reporting I/O-Efficient Data Structures for Colored Range and Prefix Reporting Kasper Green Larsen MADALGO Aarhus University, Denmark larsen@cs.au.dk Rasmus Pagh IT University of Copenhagen Copenhagen, Denmark pagh@itu.dk

More information

Memory management: outline

Memory management: outline Memory management: outline Concepts Swapping Paging o Multi-level paging o TLB & inverted page tables 1 Memory size/requirements are growing 1951: the UNIVAC computer: 1000 72-bit words! 1971: the Cray

More information

Memory management: outline

Memory management: outline Memory management: outline Concepts Swapping Paging o Multi-level paging o TLB & inverted page tables 1 Memory size/requirements are growing 1951: the UNIVAC computer: 1000 72-bit words! 1971: the Cray

More information

Deterministic Leader Election in O(D + log n) Rounds with Messages of size O(1)

Deterministic Leader Election in O(D + log n) Rounds with Messages of size O(1) Deterministic Leader Election in O(D + log n) Rounds with Messages of size O(1) A. Casteigts, Y. Métivier, J.M. Robson, A. Zemmari LaBRI - University of Bordeaux DISC 2016, Paris A. Casteigts, Y. Métivier,

More information

Optimal Sampling from Distributed Streams

Optimal Sampling from Distributed Streams Optimal Sampling from Distributed Streams Graham Cormode AT&T Labs-Research Joint work with S. Muthukrishnan (Rutgers) Ke Yi (HKUST) Qin Zhang (HKUST) 1-1 Reservoir sampling [Waterman??; Vitter 85] Maintain

More information

Constant Price of Anarchy in Network Creation Games via Public Service Advertising

Constant Price of Anarchy in Network Creation Games via Public Service Advertising Constant Price of Anarchy in Network Creation Games via Public Service Advertising Erik D. Demaine and Morteza Zadimoghaddam MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA 02139,

More information

Cache-Oblivious Index for Approximate String Matching

Cache-Oblivious Index for Approximate String Matching Cache-Oblivious Index for Approximate String Matching Wing-Kai Hon 1, Tak-Wah Lam 2, Rahul Shah 3, Siu-Lung Tam 2, and Jeffrey Scott Vitter 3 1 Department of Computer Science, National Tsing Hua University,

More information

Cache-Aware and Cache-Oblivious Adaptive Sorting

Cache-Aware and Cache-Oblivious Adaptive Sorting Cache-Aware and Cache-Oblivious Adaptive Sorting Gerth Stølting rodal 1,, Rolf Fagerberg 2,, and Gabriel Moruz 1 1 RICS, Department of Computer Science, University of Aarhus, IT Parken, Åbogade 34, DK-8200

More information

Chapter 17: Parallel Databases

Chapter 17: Parallel Databases Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems Database Systems

More information

Lecture 24 November 24, 2015

Lecture 24 November 24, 2015 CS 229r: Algorithms for Big Data Fall 2015 Prof. Jelani Nelson Lecture 24 November 24, 2015 Scribes: Zhengyu Wang 1 Cache-oblivious Model Last time we talked about disk access model (as known as DAM, or

More information

Deletion Without Rebalancing in Multiway Search Trees

Deletion Without Rebalancing in Multiway Search Trees Deletion Without Rebalancing in Multiway Search Trees Siddhartha Sen 1,3 and Robert E. Tarjan 1,2,3 1 Princeton University, {sssix,ret}@cs.princeton.edu 2 HP Laboratories, Palo Alto CA 94304 Abstract.

More information

Lecture 6: External Interval Tree (Part II) 3 Making the external interval tree dynamic. 3.1 Dynamizing an underflow structure

Lecture 6: External Interval Tree (Part II) 3 Making the external interval tree dynamic. 3.1 Dynamizing an underflow structure Lecture 6: External Interval Tree (Part II) Yufei Tao Division of Web Science and Technology Korea Advanced Institute of Science and Technology taoyf@cse.cuhk.edu.hk 3 Making the external interval tree

More information

Optimal Parallel Randomized Renaming

Optimal Parallel Randomized Renaming Optimal Parallel Randomized Renaming Martin Farach S. Muthukrishnan September 11, 1995 Abstract We consider the Renaming Problem, a basic processing step in string algorithms, for which we give a simultaneously

More information

How TokuDB Fractal TreeTM. Indexes Work. Bradley C. Kuszmaul. MySQL UC 2010 How Fractal Trees Work 1

How TokuDB Fractal TreeTM. Indexes Work. Bradley C. Kuszmaul. MySQL UC 2010 How Fractal Trees Work 1 MySQL UC 2010 How Fractal Trees Work 1 How TokuDB Fractal TreeTM Indexes Work Bradley C. Kuszmaul MySQL UC 2010 How Fractal Trees Work 2 More Information You can download this talk and others at http://tokutek.com/technology

More information

CSE332: Data Abstractions Lecture 7: B Trees. James Fogarty Winter 2012

CSE332: Data Abstractions Lecture 7: B Trees. James Fogarty Winter 2012 CSE2: Data Abstractions Lecture 7: B Trees James Fogarty Winter 20 The Dictionary (a.k.a. Map) ADT Data: Set of (key, value) pairs keys must be comparable insert(jfogarty,.) Operations: insert(key,value)

More information

Bkd-tree: A Dynamic Scalable kd-tree

Bkd-tree: A Dynamic Scalable kd-tree Bkd-tree: A Dynamic Scalable kd-tree Octavian Procopiuc Pankaj K. Agarwal Lars Arge Jeffrey Scott Vitter July 1, 22 Abstract In this paper we propose a new index structure, called the Bkd-tree, for indexing

More information

Geometric data structures:

Geometric data structures: Geometric data structures: Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade Sham Kakade 2017 1 Announcements: HW3 posted Today: Review: LSH for Euclidean distance Other

More information

Extendible Chained Bucket Hashing for Main Memory Databases. Abstract

Extendible Chained Bucket Hashing for Main Memory Databases. Abstract Extendible Chained Bucket Hashing for Main Memory Databases Pyung-Chul Kim *, Kee-Wook Rim, Jin-Pyo Hong Electronics and Telecommunications Research Institute (ETRI) P.O. Box 106, Yusong, Taejon, 305-600,

More information