Dynamic Indexability and the Optimality of B-trees and Hash Tables
|
|
- Gerald Carpenter
- 6 years ago
- Views:
Transcription
1 Dynamic Indexability and the Optimality of B-trees and Hash Tables Ke Yi Hong Kong University of Science and Technology Dynamic Indexability and Lower Bounds for Dynamic One-Dimensional Range Query Indexes, PODS 09 Dynamic External Hashing: The Limit of Buffering, with Zhewei Wei and Qin Zhang, SPAA 09 + some latest development 1-1
2 An index is... An index is a single number calculated from a set of prices Dow Jones, S & P, Hang Seng 2-1
3 An index is... An index is a single number calculated from a set of prices Dow Jones, S & P, Hang Seng An index is a list of keywords and their page numbers in a book An index is an exponent An index is a finger An index is a list of academic publications and their citations 2-2
4 An index is... An index is a single number calculated from a set of prices Dow Jones, S & P, Hang Seng An index is a list of keywords and their page numbers in a book An index is an exponent An index is a finger An index is a list of academic publications and their citations An index (search engine) is an inverted list from keywords to web pages An index (database) is a (disk-based) data structure that improves the speed of data retrieval operations (queries) on a database table. 2-3
5 Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually used in all database systems 3-1
6 Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually used in all database systems B-tree: lookups and range queries; Hash table: lookups 3-2
7 Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually used in all database systems B-tree: lookups and range queries; Hash table: lookups External memory model (I/O model): Memory Memory of size M Each I/O reads/writes a block Disk Disk partitioned into blocks of size B 3-3
8 4-1 The B-tree
9 The B-tree A range query in O(log B N + K/B) I/Os K: output size 4-2
10 The B-tree memory A range query in O(log B N + K/B) I/Os K: output size log B N log B M = log B N M 4-3
11 The B-tree memory A range query in O(log B N + K/B) I/Os K: output size log B N log B M = log B N M The height of B-tree never goes beyond 5 (e.g., if B = 100, then a B-tree with 5 levels stores n = 10 billion records). We will N assume log B M = O(1). 4-4
12 External Hashing h(x) = last digit of x null null null null null null null 55 null null 5-1
13 External Hashing h(x) = last digit of x null null null null null null null 55 null null Ideal hash function assumption: h maps each object to a hash value uniformly independently at random 5-2
14 External Hashing h(x) = last digit of x null null null null null null null 55 null null Ideal hash function assumption: h maps each object to a hash value uniformly independently at random Expected average cost of a successful (or unsuccessful) lookup is 1 + 1/2 Ω(B) disk accesses, provided the load factor is less than a constant smaller than 1 [Knuth, 1973] 5-3
15 Exact Numbers Calculated by Knuth The Art of Computer Programming, volume 3, 1998, page
16 Exact Numbers Calculated by Knuth Extremely close to ideal The Art of Computer Programming, volume 3, 1998, page
17 Now Let s Go Dynamic Focus on insertions first: Both the B-tree and hash table do a search first, then insert into the appropriate block B-tree: Split blocks when necessary Hashing: Rebuild the hash table when too full; extensible hashing [Fagin, Nievergelt, Pippenger, Strong, 79]; linear hashing [Litwin, 80] 7-1
18 Now Let s Go Dynamic Focus on insertions first: Both the B-tree and hash table do a search first, then insert into the appropriate block B-tree: Split blocks when necessary Hashing: Rebuild the hash table when too full; extensible hashing [Fagin, Nievergelt, Pippenger, Strong, 79]; linear hashing [Litwin, 80] These resizing operations only add O(1/B) I/Os amortized per insertion; bottleneck is the first search + insert 7-2
19 Now Let s Go Dynamic Focus on insertions first: Both the B-tree and hash table do a search first, then insert into the appropriate block B-tree: Split blocks when necessary Hashing: Rebuild the hash table when too full; extensible hashing [Fagin, Nievergelt, Pippenger, Strong, 79]; linear hashing [Litwin, 80] These resizing operations only add O(1/B) I/Os amortized per insertion; bottleneck is the first search + insert Cannot hope for lower than 1 I/O per insertion only if the changes must be committed to disk right away (necessary?) 7-3
20 Now Let s Go Dynamic Focus on insertions first: Both the B-tree and hash table do a search first, then insert into the appropriate block B-tree: Split blocks when necessary Hashing: Rebuild the hash table when too full; extensible hashing [Fagin, Nievergelt, Pippenger, Strong, 79]; linear hashing [Litwin, 80] These resizing operations only add O(1/B) I/Os amortized per insertion; bottleneck is the first search + insert Cannot hope for lower than 1 I/O per insertion only if the changes must be committed to disk right away (necessary?) Otherwise we probably can lower the amortized insertion cost by buffering, like numerous problems in external memory, e.g. stack, priority queue,... All of them support an insertion in O(1/B) I/Os the best possible 7-4
21 Dynamic B-trees Dynamic Hash Tables 8-1
22 Dynamic B-trees for Fast Insertions LSM-tree [O Neil, Cheng, Gawlick, O Neil, Acta Informatica 96]: Logarithmic method + B-tree M lm memory l 2 M 9-1
23 Dynamic B-trees for Fast Insertions LSM-tree [O Neil, Cheng, Gawlick, O Neil, Acta Informatica 96]: Logarithmic method + B-tree M lm memory Insertion: O( l B log l N M ) l 2 M Query: O(log l N M ) (omit the K B output term) 9-2
24 Dynamic B-trees for Fast Insertions LSM-tree [O Neil, Cheng, Gawlick, O Neil, Acta Informatica 96]: Logarithmic method + B-tree M lm memory Insertion: O( l B log l N M ) l 2 M Query: O(log l N M ) (omit the K B output term) Stepped merge tree [Jagadish, Narayan, Seshadri, Sudarshan, Kannegantil, VLDB 97]: variant of LSM-tree Insertion: O( 1 B log l N M ) Query: O(l log l N M ) 9-3
25 Dynamic B-trees for Fast Insertions LSM-tree [O Neil, Cheng, Gawlick, O Neil, Acta Informatica 96]: Logarithmic method + B-tree M lm memory Insertion: O( l B log l N M ) l 2 M Query: O(log l N M ) (omit the K B output term) Stepped merge tree [Jagadish, Narayan, Seshadri, Sudarshan, Kannegantil, VLDB 97]: variant of LSM-tree Insertion: O( 1 B log l N M ) Query: O(l log l N M ) Usually l is set to be a constant, then they both have O( 1 B log N M ) insertion and O(log N M ) query 9-4
26 More Dynamic B-trees Buffer-tree (buffered-repository tree) [Arge, WADS 95; Buchsbaum, Goldwasser, Venkatasubramanian, Westbrook, SODA 00] Streaming B-tree [Bender, Farach-Colton, Fineman, Fogel, Kuszmaul, Nelson, SPAA 07] Y-tree [Jermaine, Datta, Omiecinski, VLDB 99] 10-1
27 More Dynamic B-trees Buffer-tree (buffered-repository tree) [Arge, WADS 95; Buchsbaum, Goldwasser, Venkatasubramanian, Westbrook, SODA 00] Streaming B-tree [Bender, Farach-Colton, Fineman, Fogel, Kuszmaul, Nelson, SPAA 07] Y-tree [Jermaine, Datta, Omiecinski, VLDB 99] q log B u 1 B log B 1 1 B Bɛ B ɛ 1 B 10-2
28 More Dynamic B-trees Buffer-tree (buffered-repository tree) [Arge, WADS 95; Buchsbaum, Goldwasser, Venkatasubramanian, Westbrook, SODA 00] Streaming B-tree [Bender, Farach-Colton, Fineman, Fogel, Kuszmaul, Nelson, SPAA 07] Y-tree [Jermaine, Datta, Omiecinski, VLDB 99] q log B u 1 B log B 1 1 B Bɛ B ɛ 1 B Deletions? Standard trick: inserting delete signals 10-3
29 More Dynamic B-trees Buffer-tree (buffered-repository tree) [Arge, WADS 95; Buchsbaum, Goldwasser, Venkatasubramanian, Westbrook, SODA 00] Streaming B-tree [Bender, Farach-Colton, Fineman, Fogel, Kuszmaul, Nelson, SPAA 07] Y-tree [Jermaine, Datta, Omiecinski, VLDB 99] Cache-oblivious model [Demaine, Fineman, Iacono, Langerman, Munro, SODA 10] q log B Deletions? Standard trick: inserting delete signals No better solutions known... u 1 B log B 1 1 B Bɛ B ɛ 1 B 10-4
30 Compare with the rich results in RAM! Range reporting O( log N/ log log N) insertion and query [Andersson, Thorup, JACM 07] O(log N/ log log N) insertion and O(log log N) query [Mortensen, Pagh, Pǎtraşcu, STOC 05] Other results that depend on the word size w Predecessor Θ( log N/ log log N) insertion and query [Andersson, Thorup, JACM 07] Partial-sum Θ(log N) insertion query [Pǎtraşcu, Demaine, SODA 04] 11-1
31 12-1 Are the EM and DB people just dumb?
32 Our Main Result For any dynamic range query index with a query cost of q and an amortized insertion cost of u, the following tradeoff holds { q log(ub/q) = Ω(log B), for q < α log B, α is any constant; ub log q = Ω(log B), for all q. 13-1
33 Our Main Result For any dynamic range query index with a query cost of q and an amortized insertion cost of u, the following tradeoff holds { q log(ub/q) = Ω(log B), for q < α log B, α is any constant; ub log q = Ω(log B), for all q. Assuming log B N M Current upper bounds: = O(1), all the bounds are tight! q log B u 1 B log B 1 1 B Bɛ B ɛ 1 B 13-2
34 Our Main Result For any dynamic range query index with a query cost of q and an amortized insertion cost of u, the following tradeoff holds { q log(ub/q) = Ω(log B), for q < α log B, α is any constant; ub log q = Ω(log B), for all q. Assuming log B N M Current upper bounds: log N M = O(1), all the bounds are tight! q log B u 1 B log B 1 1 B Bɛ B ɛ 1 B 1 B log N M Can t be true for B = o( log n log log n), since the exponential tree achieves u = q = O( log n/ log log n) [Andersson, Thorup, JACM 07]. (n = N/M) 13-3
35 The real question How large does B need to be for buffer-tree to be optimal for range reporting? Known: somewhere between Ω( log n log log n) and O(n ɛ ) 14-1
36 Lower Bound Model: Dynamic Indexability Indexability: [Hellerstein, Koutsoupias, Papadimitriou, PODS 97, JACM 02] 15-1
37 Lower Bound Model: Dynamic Indexability Indexability: [Hellerstein, Koutsoupias, Papadimitriou, PODS 97, JACM 02] Objects are stored in disk blocks of size up to B, possibly with redundancy. 15-2
38 Lower Bound Model: Dynamic Indexability Indexability: [Hellerstein, Koutsoupias, Papadimitriou, PODS 97, JACM 02] a query reports {2,3,4,5} Objects are stored in disk blocks of size up to B, possibly with redundancy. 15-3
39 Lower Bound Model: Dynamic Indexability Indexability: [Hellerstein, Koutsoupias, Papadimitriou, PODS 97, JACM 02] a query reports {2,3,4,5} cost = Objects are stored in disk blocks of size up to B, possibly with redundancy. The query cost is the minimum number of blocks that can cover all the required results (search time ignored!). 15-4
40 Lower Bound Model: Dynamic Indexability Indexability: [Hellerstein, Koutsoupias, Papadimitriou, PODS 97, JACM 02] a query reports {2,3,4,5} cost = Objects are stored in disk blocks of size up to B, possibly with redundancy. The query cost is the minimum number of blocks that can cover all the required results (search time ignored!). Similar in spirit to popular lower bound models: model, semigroup model cell probe 15-5
41 Previous Results on Static Indexability Nearly all external indexing lower bounds are under this model Tradeoff between space (s) and query time (q) 16-1
42 Previous Results on Static Indexability Nearly all external indexing lower bounds are under this model Tradeoff between space (s) and query time (q) 2D range queries: s/n log q = Ω(log(N/B)) [Hellerstein, Koutsoupias, Papadimitriou, PODS 97], [Koutsoupias, Taylor, PODS 98], [Arge, Samoladas, Vitter, PODS 99] 16-2
43 Previous Results on Static Indexability Nearly all external indexing lower bounds are under this model Tradeoff between space (s) and query time (q) 2D range queries: s/n log q = Ω(log(N/B)) [Hellerstein, Koutsoupias, Papadimitriou, PODS 97], [Koutsoupias, Taylor, PODS 98], [Arge, Samoladas, Vitter, PODS 99] 2D stabbing queries: q log(s/n) = Ω(log(N/B)) [Arge, Samoladas, Yi, ESA 04, Algorithmica 99] 16-3
44 Previous Results on Static Indexability Nearly all external indexing lower bounds are under this model Tradeoff between space (s) and query time (q) 2D range queries: s/n log q = Ω(log(N/B)) [Hellerstein, Koutsoupias, Papadimitriou, PODS 97], [Koutsoupias, Taylor, PODS 98], [Arge, Samoladas, Vitter, PODS 99] 2D stabbing queries: q log(s/n) = Ω(log(N/B)) [Arge, Samoladas, Yi, ESA 04, Algorithmica 99] 1D range queries: s = N, q = 1 trivially 16-4
45 Previous Results on Static Indexability Nearly all external indexing lower bounds are under this model Tradeoff between space (s) and query time (q) 2D range queries: s/n log q = Ω(log(N/B)) [Hellerstein, Koutsoupias, Papadimitriou, PODS 97], [Koutsoupias, Taylor, PODS 98], [Arge, Samoladas, Vitter, PODS 99] 2D stabbing queries: q log(s/n) = Ω(log(N/B)) [Arge, Samoladas, Yi, ESA 04, Algorithmica 99] 1D range queries: s = N, q = 1 trivially Adding dynamization makes it much more interesting! 16-5
46 Dynamic Indexability Still consider only insertions 17-1
47 Dynamic Indexability Still consider only insertions time t: memory of size M blocks of size B = snapshot 17-2
48 Dynamic Indexability Still consider only insertions time t: memory of size M blocks of size B = snapshot time t + 1: inserted 17-3
49 Dynamic Indexability Still consider only insertions time t: memory of size M blocks of size B = snapshot time t + 1: inserted time t + 2: inserted 17-4
50 Dynamic Indexability Still consider only insertions time t: memory of size M blocks of size B = snapshot time t + 1: inserted time t + 2: inserted transition cost =
51 Dynamic Indexability Still consider only insertions time t: memory of size M blocks of size B = snapshot time t + 1: inserted time t + 2: inserted transition cost = 2 Update cost: u = amortized transition cost per insertion 17-6
52 The Ball-Shuffling Problem B balls q bins 18-1
53 The Ball-Shuffling Problem B balls q bins cost =
54 The Ball-Shuffling Problem B balls q bins cost = 1 cost = 2 cost of putting the ball directly into a bin = # balls in the bin
55 The Ball-Shuffling Problem B balls q bins 19-1
56 The Ball-Shuffling Problem B balls q bins Shuffle: cost =
57 The Ball-Shuffling Problem B balls q bins Shuffle: cost = 5 Cost of shuffling = # balls in the involved bins 19-3
58 The Ball-Shuffling Problem B balls q bins Shuffle: cost = 5 Cost of shuffling = # balls in the involved bins Putting a ball directly into a bin is a special shuffle 19-4
59 The Ball-Shuffling Problem B balls q bins Shuffle: cost = 5 Cost of shuffling = # balls in the involved bins Putting a ball directly into a bin is a special shuffle Goal: Accommodating all B balls using q bins with minimum cost 19-5
60 The Workload Construction round 1: keys time 20-1
61 The Workload Construction round 1: keys round 2: time 20-2
62 The Workload Construction round 1: keys round 2: round 3: round B: time 20-3
63 The Workload Construction round 1: keys round 2: round 3: round B: time Queries that we require the index to cover with q blocks # queries 2MB 20-4
64 The Workload Construction round 1: round 2: round 3: snapshot snapshot snapshot keys time round B: snapshot Queries that we require the index to cover with q blocks # queries 2MB Snapshots of the dynamic index considered 20-5
65 The Workload Construction round 1: round 2: round 3: round B: keys There exists a query such that The B objects of the query reside in q blocks in all snapshots time All of its objects are on disk in all B snapshots (we have MB queries) The index moves its objects ub 2 times in total 21-1
66 The Reduction An index with update cost u and query A gives us a solution to the ball-shuffling game with cost ub 2 for B balls and q bins 22-1
67 The Reduction An index with update cost u and query A gives us a solution to the ball-shuffling game with cost ub 2 for B balls and q bins Lower bound on the ball-shuffling problem: Theorem: The cost of any solution for the ball-shuffling problem is at least { Ω(q B 1+Ω(1/q) ), for q < α log B where α is any constant; Ω(B log q B), for any q. 22-2
68 The Reduction An index with update cost u and query A gives us a solution to the ball-shuffling game with cost ub 2 for B balls and q bins Lower bound on the ball-shuffling problem: Theorem: The cost of any solution for the ball-shuffling problem is at least { Ω(q B 1+Ω(1/q) ), for q < α log B where α is any constant; Ω(B log q B), for any q. { q log(ub/q) = Ω(log B), for q < α log B, α is any constant; ub log q = Ω(log B), for all q. 22-3
69 Ball-Shuffling Lower Bounds Theorem: The cost of any solution for the ball-shuffling problem is at least { Ω(q B 1+Ω(1/q) ), for q < α log B where α is any constant; Ω(B log q B), for any q. cost lower bound q 23-1
70 Ball-Shuffling Lower Bounds Theorem: The cost of any solution for the ball-shuffling problem is at least { Ω(q B 1+Ω(1/q) ), for q < α log B where α is any constant; Ω(B log q B), for any q. cost lower bound B 2 1 q 23-2
71 Ball-Shuffling Lower Bounds Theorem: The cost of any solution for the ball-shuffling problem is at least { Ω(q B 1+Ω(1/q) ), for q < α log B where α is any constant; Ω(B log q B), for any q. cost lower bound B 2 B 4/3 1 2 q 23-3
72 Ball-Shuffling Lower Bounds Theorem: The cost of any solution for the ball-shuffling problem is at least { Ω(q B 1+Ω(1/q) ), for q < α log B where α is any constant; Ω(B log q B), for any q. cost lower bound B 2 B 4/3 B log B 1 2 log B q 23-4
73 Ball-Shuffling Lower Bounds Theorem: The cost of any solution for the ball-shuffling problem is at least { Ω(q B 1+Ω(1/q) ), for q < α log B where α is any constant; Ω(B log q B), for any q. cost lower bound B 2 B 4/3 B log B B 1 2 log B B ɛ q 23-5
74 Dynamic B-trees Dynamic Hash Tables 24-1
75 Dynamic Hash Tables B-tree query I/O: O(log B N M ) Hash table query I/O: 1 + 1/2 Ω(B) ; insertion the same 25-1
76 Dynamic Hash Tables B-tree query I/O: O(log B N M ) Hash table query I/O: 1 + 1/2 Ω(B) ; insertion the same A long-time conjecture in the external memory community: The insertion cost must be Ω(1) I/Os if the query cost is required to be O(1) I/Os. 25-2
77 Dynamic Hash Tables B-tree query I/O: O(log B N M ) Hash table query I/O: 1 + 1/2 Ω(B) ; insertion the same A long-time conjecture in the external memory community: The insertion cost must be Ω(1) I/Os if the query cost is required to be O(1) I/Os. Buffering is useless? 25-3
78 Dynamic Hash Tables (for successful queries) Logarithmic method (folklore?) m memory 2m 4m 8m 26-1
79 Dynamic Hash Tables (for successful queries) Logarithmic method (folklore?) Insertion: O( 1 B log N M ) Expected average query: O(1) m 2m memory 4m 8m 26-2
80 Dynamic Hash Tables (for successful queries) Logarithmic method (folklore?) Insertion: O( 1 B log N M ) Expected average query: O(1) Improving query time Idea: Keep one table large enough m 2m memory 4m 8m 26-3
81 Dynamic Hash Tables (for successful queries) Logarithmic method (folklore?) Insertion: O( 1 B log N M ) Expected average query: O(1) m 2m memory 4m 8m Improving query time For some parameter β = B c, c 1 Idea: Keep one table large enough x x/β 26-4
82 Dynamic Hash Tables (for successful queries) Logarithmic method (folklore?) Insertion: O( 1 B log N M ) Expected average query: O(1) m 2m memory 4m 8m Improving query time For some parameter β = B c, c 1 Idea: Keep one table large enough x x/β 26-5
83 Dynamic Hash Tables (for successful queries) Logarithmic method (folklore?) Insertion: O( 1 B log N M ) Expected average query: O(1) m 2m memory 4m 8m Improving query time Idea: Keep one table large enough For some parameter β = B c, c 1 x x/β 2x 26-6
84 Dynamic Hash Tables (for successful queries) Logarithmic method (folklore?) Insertion: O( 1 B log N M ) Expected average query: O(1) m 2m memory 4m 8m Improving query time For some parameter β = B c, c 1 Idea: Keep one table large enough x x/β Insertion: O(B c 1 ) 2x 26-7
85 Dynamic Hash Tables (for successful queries) Logarithmic method (folklore?) Insertion: O( 1 B log N M ) Expected average query: O(1) m 2m memory 4m 8m Improving query time For some parameter β = B c, c 1 Idea: Keep one table large enough x x/β Insertion: O(B c 1 ) Query: 1+O(1/B c ) 2x 26-8
86 Dynamic Hash Tables (for successful queries) Logarithmic method (folklore?) Insertion: O( 1 B log N M ) Expected average query: O(1) m 2m memory 4m 8m Improving query time For some parameter β = B c, c 1 Idea: Keep one table large enough x x/β Insertion: O(B c 1 ) Query: 1+O(1/B c ) 2x Still far from the target 1+1/Ω(2 B ) 26-9
87 Query-Insertion Tradeoff for Successful queries [Wei, Yi, Zhang, SPAA 09] Insertion standard hashing 1 + 1/2 Ω(B) upper bounds lower bounds 1 O(1/B (c 1)/4 ) O(1) Ω(1) O(B c 1 ) 1 Ω(B c 1 ) 1+Θ(1/B c ) 1 + Θ(1/B c ), c < 1 c > Θ(1/B) Query 27-1
88 Indexability Too Strong! Naïve solution: For every B items, write to a block. Query cost is 1, insertion is 1/B 28-1
89 Indexability Too Strong! Naïve solution: For every B items, write to a block. Query cost is 1, insertion is 1/B Too many possible mappings! 28-2
90 Indexability Too Strong! Naïve solution: For every B items, write to a block. Query cost is 1, insertion is 1/B Too many possible mappings! Indexabilty + information-theoretical argument If with only the information in memory, the hash table cannot locate the item, then querying it takes at least 2 I/Os. 28-3
91 The Abstraction Consider the layout of a hash table at any snapshot. Denote all the blocks on disk by B 1, B 2,..., B d. Let f : U {1,..., d} be any function computable within memory. We divide items inserted into 3 zones with respect to f. Memory zone M: set of items stored in memory. t q = 0. Fast zone F : set of items x such that x B f(x). t q = 1. Slow zone S: The rest of items. t q =
92 The Key The hash table can employ a family F of at most 2 M distinct f s. Note that the current f adopted by the hash table is dependent upon the already inserted items, but the family F has to be fixed beforehand. 30-1
93 How about All Queries? (Latest results) We are essentially talking about the membership problem Can t use indexability model Have to use cell probe model 31-1
94 All queries (the membership problem) (The cell probe model) Query truncated buffer tree n ɛ upper bounds lower bounds buffer tree l B log l n, log l n hashing 1 + 1/2 Ω(B) 0 1 B log M B n 1 B ɛ B log n log n 1 B Insert 32-1
95 All queries (the membership problem) (The cell probe model) Query truncated buffer tree n ɛ upper bounds lower bounds buffer tree l B log l n, log l n [Yi, Zhang, SODA 10] hashing B log M B n 1 B log n B ɛ log n B 1 + 1/2 Ω(B) Insert 32-2
96 All queries (the membership problem) (The cell probe model) Query truncated buffer tree n ɛ upper bounds lower bounds buffer tree log B log n n l B log l n, log l n [Yi, Zhang, SODA 10] hashing B log M B [Verbin, Zhang, STOC 10] n 1 B log n B ɛ log n B 1 + 1/2 Ω(B) Insert 32-3
97 THE BIG BOLD CONJECTURE All these fundamental data structure problems have the same query-update tradeoff in external memory when u = o(1), for sufficiently large B. Partial-sum: all B; Range reporting: B > n ɛ ; Predecessor: unknown. 33-1
98 THE BIG BOLD CONJECTURE All these fundamental data structure problems have the same query-update tradeoff in external memory when u = o(1), for sufficiently large B. Partial-sum: all B; Range reporting: B > n ɛ ; Predecessor: unknown. Strong implication: The buffer tree (and many of the log method based structures) is simple, practical, versatile, and optimal! 33-2
99 The End T HAN K YOU Q and A 34-1
Dynamic Indexability and Lower Bounds for Dynamic One-Dimensional Range Query Indexes
Dynamic Indexability and Lower Bounds for Dynamic One-Dimensional Range Query Indexes Ke Yi HKUST 1-1 First Annual SIGMOD Programming Contest (to be held at SIGMOD 2009) Student teams from degree granting
More informationOn the Cell Probe Complexity of Dynamic Membership or
On the Cell Probe Complexity of Dynamic Membership or Can We Batch Up Updates in External Memory? Ke Yi and Qin Zhang Hong Kong University of Science & Technology SODA 2010 Jan. 17, 2010 1-1 The power
More informationHashing Based Dictionaries in Different Memory Models. Zhewei Wei
Hashing Based Dictionaries in Different Memory Models Zhewei Wei Outline Introduction RAM model I/O model Cache-oblivious model Open questions Outline Introduction RAM model I/O model Cache-oblivious model
More informationWrite-Optimization in B-Trees. Bradley C. Kuszmaul MIT & Tokutek
Write-Optimization in B-Trees Bradley C. Kuszmaul MIT & Tokutek The Setup What and Why B-trees? B-trees are a family of data structures. B-trees implement an ordered map. Implement GET, PUT, NEXT, PREV.
More informationThe Batched Predecessor Problem in External Memory
The Batched Predecessor Problem in External Memory Michael A. Bender 12, Martín Farach-Colton 23, Mayank Goswami 4, Dzejla Medjedovic 5, Pablo Montes 1, and Meng-Tsung Tsai 3 1 Stony Brook University,
More informationLecture 22 November 19, 2015
CS 229r: Algorithms for ig Data Fall 2015 Prof. Jelani Nelson Lecture 22 November 19, 2015 Scribe: Johnny Ho 1 Overview Today we re starting a completely new topic, which is the external memory model,
More informationLecture 8 13 March, 2012
6.851: Advanced Data Structures Spring 2012 Prof. Erik Demaine Lecture 8 13 March, 2012 1 From Last Lectures... In the previous lecture, we discussed the External Memory and Cache Oblivious memory models.
More informationIndexing Big Data. Big data problem. Michael A. Bender. data ingestion. answers. oy vey ??? ????????? query processor. data indexing.
Indexing Big Data Michael A. Bender Big data problem data ingestion answers oy vey 365 data indexing query processor 42???????????? queries The problem with big data is microdata Microdata is small data.
More informationLecture 19 Apr 25, 2007
6.851: Advanced Data Structures Spring 2007 Prof. Erik Demaine Lecture 19 Apr 25, 2007 Scribe: Aditya Rathnam 1 Overview Previously we worked in the RA or cell probe models, in which the cost of an algorithm
More informationDatabases & External Memory Indexes, Write Optimization, and Crypto-searches
Databases & External Memory Indexes, Write Optimization, and Crypto-searches Michael A. Bender Tokutek & Stony Brook Martin Farach-Colton Tokutek & Rutgers What s a Database? DBs are systems for: Storing
More informationAlgorithms and Data Structures: Efficient and Cache-Oblivious
7 Ritika Angrish and Dr. Deepak Garg Algorithms and Data Structures: Efficient and Cache-Oblivious Ritika Angrish* and Dr. Deepak Garg Department of Computer Science and Engineering, Thapar University,
More informationLecture April, 2010
6.851: Advanced Data Structures Spring 2010 Prof. Eri Demaine Lecture 20 22 April, 2010 1 Memory Hierarchies and Models of Them So far in class, we have wored with models of computation lie the word RAM
More informationReport on Cache-Oblivious Priority Queue and Graph Algorithm Applications[1]
Report on Cache-Oblivious Priority Queue and Graph Algorithm Applications[1] Marc André Tanner May 30, 2014 Abstract This report contains two main sections: In section 1 the cache-oblivious computational
More informationA tale of two trees Exploring write optimized index structures. Ryan Johnson
1 A tale of two trees Exploring write optimized index structures Ryan Johnson The I/O model of computation 2 M bytes of RAM Blocks of size B N bytes on disk Goal: minimize block transfers to/from disk
More informationBetrFS: A Right-Optimized Write-Optimized File System
BetrFS: A Right-Optimized Write-Optimized File System Amogh Akshintala, Michael Bender, Kanchan Chandnani, Pooja Deo, Martin Farach-Colton, William Jannen, Rob Johnson, Zardosht Kasheff, Bradley C. Kuszmaul,
More informationThe TokuFS Streaming File System
The TokuFS Streaming File System John Esmet Rutgers Michael A. Bender Stony Brook Martin Farach-Colton Rutgers Bradley C. Kuszmaul MIT Abstract The TokuFS file system outperforms write-optimized file systems
More informationLecture 7 8 March, 2012
6.851: Advanced Data Structures Spring 2012 Lecture 7 8 arch, 2012 Prof. Erik Demaine Scribe: Claudio A Andreoni 2012, Sebastien Dabdoub 2012, Usman asood 2012, Eric Liu 2010, Aditya Rathnam 2007 1 emory
More information38 Cache-Oblivious Data Structures
38 Cache-Oblivious Data Structures Lars Arge Duke University Gerth Stølting Brodal University of Aarhus Rolf Fagerberg University of Southern Denmark 38.1 The Cache-Oblivious Model... 38-1 38.2 Fundamental
More informationThe History of I/O Models Erik Demaine
The History of I/O Models Erik Demaine MASSACHUSETTS INSTITUTE OF TECHNOLOGY Memory Hierarchies in Practice CPU 750 ps Registers 100B Level 1 Cache 100KB Level 2 Cache 1MB 10GB 14 ns Main Memory 1EB-1ZB
More informationDynamic Range Majority Data Structures
Dynamic Range Majority Data Structures Amr Elmasry 1, Meng He 2, J. Ian Munro 3, and Patrick K. Nicholson 3 1 Department of Computer Science, University of Copenhagen, Denmark 2 Faculty of Computer Science,
More informationLecture Notes: External Interval Tree. 1 External Interval Tree The Static Version
Lecture Notes: External Interval Tree Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk This lecture discusses the stabbing problem. Let I be
More informationOptimal Tracking of Distributed Heavy Hitters and Quantiles
Optimal Tracking of Distributed Heavy Hitters and Quantiles Qin Zhang Hong Kong University of Science & Technology Joint work with Ke Yi PODS 2009 June 30, 2009 - 2- Models and Problems Distributed streaming
More informationICS 691: Advanced Data Structures Spring Lecture 8
ICS 691: Advanced Data Structures Spring 2016 Prof. odari Sitchinava Lecture 8 Scribe: Ben Karsin 1 Overview In the last lecture we continued looking at arborally satisfied sets and their equivalence to
More informationComputational Geometry in the Parallel External Memory Model
Computational Geometry in the Parallel External Memory Model Nodari Sitchinava Institute for Theoretical Informatics Karlsruhe Institute of Technology nodari@ira.uka.de 1 Introduction Continued advances
More informationOptimality in External Memory Hashing
Optimality in External Memory Hashing Morten Skaarup Jensen and Rasmus Pagh April 2, 2007 Abstract Hash tables on external memory are commonly used for indexing in database management systems. In this
More informationIntroduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far
Chapter 5 Hashing 2 Introduction hashing performs basic operations, such as insertion, deletion, and finds in average time better than other ADTs we ve seen so far 3 Hashing a hash table is merely an hashing
More informationA Distribution-Sensitive Dictionary with Low Space Overhead
A Distribution-Sensitive Dictionary with Low Space Overhead Prosenjit Bose, John Howat, and Pat Morin School of Computer Science, Carleton University 1125 Colonel By Dr., Ottawa, Ontario, CANADA, K1S 5B6
More informationMassive Data Algorithmics. Lecture 12: Cache-Oblivious Model
Typical Computer Hierarchical Memory Basics Data moved between adjacent memory level in blocks A Trivial Program A Trivial Program: d = 1 A Trivial Program: d = 1 A Trivial Program: n = 2 24 A Trivial
More informationCache-Oblivious Persistence
Cache-Oblivious Persistence Pooya Davoodi 1,, Jeremy T. Fineman 2,, John Iacono 1,,andÖzgür Özkan 1 1 New York University 2 Georgetown University Abstract. Partial persistence is a general transformation
More informationTheory and Implementation of Dynamic Data Structures for the GPU
Theory and Implementation of Dynamic Data Structures for the GPU John Owens UC Davis Martín Farach-Colton Rutgers NVIDIA OptiX & the BVH Tero Karras. Maximizing parallelism in the construction of BVHs,
More informationThe Algorithmics of Write Optimization
The Algorithmics of Write Optimization Michael A. Bender Stony Brook University Birds-eye view of data storage incoming data queries stored data answers Birds-eye view of data storage? incoming data queries
More informationInteger Algorithms and Data Structures
Integer Algorithms and Data Structures and why we should care about them Vladimír Čunát Department of Theoretical Computer Science and Mathematical Logic Doctoral Seminar 2010/11 Outline Introduction Motivation
More informationPlanar Point Location in Sublogarithmic Time
Planar Point Location in Sublogarithmic Time Mihai Pătraşcu (MIT) Point Location in o(log n) Time, Voronoi Diagrams in o(n log n) Time, and Other Transdichotomous Results in Computational Geometry Timothy
More informationsystems & research project
class 4 systems & research project prof. HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS265/ index index knows order about the data data filtering data: point/range queries index data A B C sorted A B C initial
More information1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:
CS 124 Section #8 Hashing, Skip Lists 3/20/17 1 Probability Review Expectation (weighted average): the expectation of a random quantity X is: x= x P (X = x) For each value x that X can take on, we look
More informationModule 5: Hashing. CS Data Structures and Data Management. Reza Dorrigiv, Daniel Roche. School of Computer Science, University of Waterloo
Module 5: Hashing CS 240 - Data Structures and Data Management Reza Dorrigiv, Daniel Roche School of Computer Science, University of Waterloo Winter 2010 Reza Dorrigiv, Daniel Roche (CS, UW) CS240 - Module
More informationMulti-core Computing Lecture 2
Multi-core Computing Lecture 2 MADALGO Summer School 2012 Algorithms for Modern Parallel and Distributed Models Phillip B. Gibbons Intel Labs Pittsburgh August 21, 2012 Multi-core Computing Lectures: Progress-to-date
More informationPredecessor Data Structures. Philip Bille
Predecessor Data Structures Philip Bille Outline Predecessor problem First tradeoffs Simple tries x-fast tries y-fast tries Predecessor Problem Predecessor Problem The predecessor problem: Maintain a set
More information6 Distributed data management I Hashing
6 Distributed data management I Hashing There are two major approaches for the management of data in distributed systems: hashing and caching. The hashing approach tries to minimize the use of communication
More informationPath Queries in Weighted Trees
Path Queries in Weighted Trees Meng He 1, J. Ian Munro 2, and Gelin Zhou 2 1 Faculty of Computer Science, Dalhousie University, Canada. mhe@cs.dal.ca 2 David R. Cheriton School of Computer Science, University
More informationHashing. Yufei Tao. Department of Computer Science and Engineering Chinese University of Hong Kong
Department of Computer Science and Engineering Chinese University of Hong Kong In this lecture, we will revisit the dictionary search problem, where we want to locate an integer v in a set of size n or
More informationNearest Neighbor with KD Trees
Case Study 2: Document Retrieval Finding Similar Documents Using Nearest Neighbors Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox January 22 nd, 2013 1 Nearest
More informationCache-Oblivious Algorithms A Unified Approach to Hierarchical Memory Algorithms
Cache-Oblivious Algorithms A Unified Approach to Hierarchical Memory Algorithms Aarhus University Cache-Oblivious Current Trends Algorithms in Algorithms, - A Unified Complexity Approach to Theory, Hierarchical
More informationarxiv: v1 [cs.cr] 19 Sep 2017
BIOS ORAM: Improved Privacy-Preserving Data Access for Parameterized Outsourced Storage arxiv:1709.06534v1 [cs.cr] 19 Sep 2017 Michael T. Goodrich University of California, Irvine Dept. of Computer Science
More informationINSERTION SORT is O(n log n)
Michael A. Bender Department of Computer Science, SUNY Stony Brook bender@cs.sunysb.edu Martín Farach-Colton Department of Computer Science, Rutgers University farach@cs.rutgers.edu Miguel Mosteiro Department
More information39 B-trees and Cache-Oblivious B-trees with Different-Sized Atomic Keys
39 B-trees and Cache-Oblivious B-trees with Different-Sized Atomic Keys Michael A. Bender, Department of Computer Science, Stony Brook University Roozbeh Ebrahimi, Google Inc. Haodong Hu, School of Information
More informationA Multi Join Algorithm Utilizing Double Indices
Journal of Convergence Information Technology Volume 4, Number 4, December 2009 Hanan Ahmed Hossni Mahmoud Abd Alla Information Technology Department College of Computer and Information Sciences King Saud
More informationHow TokuDB Fractal TreeTM. Indexes Work. Bradley C. Kuszmaul. Guest Lecture in MIT Performance Engineering, 18 November 2010.
6.172 How Fractal Trees Work 1 How TokuDB Fractal TreeTM Indexes Work Bradley C. Kuszmaul Guest Lecture in MIT 6.172 Performance Engineering, 18 November 2010. 6.172 How Fractal Trees Work 2 I m an MIT
More informationCACHE-OBLIVIOUS MAPS. Edward Kmett McGraw Hill Financial. Saturday, October 26, 13
CACHE-OBLIVIOUS MAPS Edward Kmett McGraw Hill Financial CACHE-OBLIVIOUS MAPS Edward Kmett McGraw Hill Financial CACHE-OBLIVIOUS MAPS Indexing and Machine Models Cache-Oblivious Lookahead Arrays Amortization
More informationFunnel Heap - A Cache Oblivious Priority Queue
Alcom-FT Technical Report Series ALCOMFT-TR-02-136 Funnel Heap - A Cache Oblivious Priority Queue Gerth Stølting Brodal, Rolf Fagerberg Abstract The cache oblivious model of computation is a two-level
More informationChapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,
Introduction Chapter 5 Hashing hashing performs basic operations, such as insertion, deletion, and finds in average time 2 Hashing a hash table is merely an of some fixed size hashing converts into locations
More informationData Structures for Range Minimum Queries in Multidimensional Arrays
Data Structures for Range Minimum Queries in Multidimensional Arrays Hao Yuan Mikhail J. Atallah Purdue University January 17, 2010 Yuan & Atallah (PURDUE) Multidimensional RMQ January 17, 2010 1 / 28
More informationOutline for Today. How can we speed up operations that work on integer data? A simple data structure for ordered dictionaries.
van Emde Boas Trees Outline for Today Data Structures on Integers How can we speed up operations that work on integer data? Tiered Bitvectors A simple data structure for ordered dictionaries. van Emde
More informationFlash memory: Models and Algorithms. Guest Lecturer: Deepak Ajwani
Flash memory: Models and Algorithms Guest Lecturer: Deepak Ajwani Flash Memory Flash Memory Flash Memory Flash Memory Flash Memory Flash Memory Flash Memory Solid State Disks A data-storage device that
More informationNearest Neighbor with KD Trees
Case Study 2: Document Retrieval Finding Similar Documents Using Nearest Neighbors Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox January 22 nd, 2013 1 Nearest
More informationSearchable Symmetric Encryption: Optimal Locality in Linear Space via Two-Dimensional Balanced Allocations
Searchable Symmetric Encryption: Optimal Locality in Linear Space via Two-Dimensional Balanced Allocations Gilad Asharov Cornell-Tech Moni Naor Gil Segev Ido Shahaf (Hebrew University) Weizmann Hebrew
More informationFast Prefix Search in Little Space, with Applications
Fast Prefix Search in Little Space, with Applications Djamal Belazzougui 1, Paolo Boldi 2, Rasmus Pagh 3, and Sebastiano Vigna 1 1 Université Paris Diderot Paris 7, France 2 Dipartimento di Scienze dell
More informationLocality- Sensitive Hashing Random Projections for NN Search
Case Study 2: Document Retrieval Locality- Sensitive Hashing Random Projections for NN Search Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 18, 2017 Sham Kakade
More informationA NEW SORTING ALGORITHM FOR ACCELERATING JOIN-BASED QUERIES
Mathematical and Computational Applications, Vol. 15, No. 2, pp. 208-217, 2010. Association for Scientific Research A NEW SORTING ALGORITHM FOR ACCELERATING JOIN-BASED QUERIES Hassan I. Mathkour Department
More informationComparing the strength of query types in property testing: The case of testing k-colorability
Comparing the strength of query types in property testing: The case of testing k-colorability Ido Ben-Eliezer Tali Kaufman Michael Krivelevich Dana Ron Abstract We study the power of four query models
More informationarxiv: v1 [cs.ds] 21 Jun 2010
Dynamic Range Reporting in External Memory Yakov Nekrich Department of Computer Science, University of Bonn. arxiv:1006.4093v1 [cs.ds] 21 Jun 2010 Abstract In this paper we describe a dynamic external
More informationHigh Performance Data Structures: Theory and Practice. Roman Elizarov Devexperts, 2006
High Performance Data Structures: Theory and Practice Roman Elizarov Devexperts, 2006 e1 High performance? NO!!! We ll talk about different High Performance. One that aimed at avoiding all those racks
More informationPractice Midterm Exam Solutions
CSE 332: Data Abstractions Autumn 2015 Practice Midterm Exam Solutions Name: Sample Solutions ID #: 1234567 TA: The Best Section: A9 INSTRUCTIONS: You have 50 minutes to complete the exam. The exam is
More informationI/O-Algorithms Lars Arge Aarhus University
I/O-Algorithms Aarhus University April 10, 2008 I/O-Model Block I/O D Parameters N = # elements in problem instance B = # elements that fits in disk block M = # elements that fits in main memory M T =
More informationLecture Notes: Range Searching with Linear Space
Lecture Notes: Range Searching with Linear Space Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk In this lecture, we will continue our discussion
More informationExternal-Memory Search Trees with Fast Insertions. Jelani Nelson
External-Memory Search Trees with Fast Insertions by Jelani Nelson Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of
More informationI/O Model. Cache-Oblivious Algorithms : Algorithms in the Real World. Advantages of Cache-Oblivious Algorithms 4/9/13
I/O Model 15-853: Algorithms in the Real World Locality II: Cache-oblivious algorithms Matrix multiplication Distribution sort Static searching Abstracts a single level of the memory hierarchy Fast memory
More informationTheoretical Computer Science
Theoretical Computer Science 412 (2011) 3579 3588 Contents lists available at ScienceDirect Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs Cache-oblivious index for approximate
More informationarxiv: v2 [cs.ds] 1 Jul 2014
Cache-Oblivious Persistence Pooya Davoodi 1, Jeremy T. Fineman 2, John Iacono 1, and Özgür Özkan1 1 New York University 2 Georgetown University arxiv:1402.5492v2 [cs.ds] 1 Jul 2014 Abstract Partial persistence
More informationIntroduction to I/O Efficient Algorithms (External Memory Model)
Introduction to I/O Efficient Algorithms (External Memory Model) Jeff M. Phillips August 30, 2013 Von Neumann Architecture Model: CPU and Memory Read, Write, Operations (+,,,...) constant time polynomially
More informationBRICS Research Activities Algorithms
BRICS Research Activities Algorithms Gerth Stølting Brodal BRICS Retreat, Sandbjerg, 21 23 October 2002 1 Outline of Talk The Algorithms Group Courses Algorithm Events Expertise within BRICS Examples Algorithms
More informationDynamic Graph Algorithms and Complexity
Dynamic Graph Algorithms and Complexity Danupon Nanongkai KTH, Sweden Disclaimer: Errors are possible. Please double check the cited sources for accuracy. 1 About this talk I will try to answer these questions.?
More informationSimple and Semi-Dynamic Structures for Cache-Oblivious Planar Orthogonal Range Searching
Simple and Semi-Dynamic Structures for Cache-Oblivious Planar Orthogonal Range Searching ABSTRACT Lars Arge Department of Computer Science University of Aarhus IT-Parken, Aabogade 34 DK-8200 Aarhus N Denmark
More informationApplications of Geometric Spanner
Title: Name: Affil./Addr. 1: Affil./Addr. 2: Affil./Addr. 3: Keywords: SumOriWork: Applications of Geometric Spanner Networks Joachim Gudmundsson 1, Giri Narasimhan 2, Michiel Smid 3 School of Information
More informationHierarchical Memory. Modern machines have complicated memory hierarchy
Hierarchical Memory Modern machines have complicated memory hierarchy Levels get larger and slower further away from CPU Data moved between levels using large blocks Lecture 2: Slow IO Review Disk access
More informationMemory Management. Memory Management
Memory Management Gordon College Stephen Brinton Memory Management Background Swapping Contiguous Allocation Paging Segmentation Segmentation with Paging 1 Background Program must be brought into memory
More informationIntroduction to Randomized Algorithms
Introduction to Randomized Algorithms Gopinath Mishra Advanced Computing and Microelectronics Unit Indian Statistical Institute Kolkata 700108, India. Organization 1 Introduction 2 Some basic ideas from
More informationDynamic Optimality Almost
Dynamic Optimality Almost Erik D. Demaine Dion Harmon John Iacono Mihai Pǎtraşcu Abstract We present an O(lg lg n)-competitive online binary search tree, improving upon the best previous (trivial) competitive
More informationOn Asymptotic Cost of Triangle Listing in Random Graphs
On Asymptotic Cost of Triangle Listing in Random Graphs Di Xiao, Yi Cui, Daren B.H. Cline, Dmitri Loguinov Internet Research Lab Department of Computer Science and Engineering Texas A&M University May
More informationUniform Hashing in Constant Time and Linear Space
Uniform Hashing in Constant Time and Linear Space Anna Östlin IT University of Copenhagen Glentevej 67, 2400 København NV Denmark annao@itu.dk Rasmus Pagh IT University of Copenhagen Glentevej 67, 2400
More informationAlgorithms in Systems Engineering IE172. Midterm Review. Dr. Ted Ralphs
Algorithms in Systems Engineering IE172 Midterm Review Dr. Ted Ralphs IE172 Midterm Review 1 Textbook Sections Covered on Midterm Chapters 1-5 IE172 Review: Algorithms and Programming 2 Introduction to
More informationModule 5: Hash-Based Indexing
Module 5: Hash-Based Indexing Module Outline 5.1 General Remarks on Hashing 5. Static Hashing 5.3 Extendible Hashing 5.4 Linear Hashing Web Forms Transaction Manager Lock Manager Plan Executor Operator
More informationI/O-Efficient Data Structures for Colored Range and Prefix Reporting
I/O-Efficient Data Structures for Colored Range and Prefix Reporting Kasper Green Larsen MADALGO Aarhus University, Denmark larsen@cs.au.dk Rasmus Pagh IT University of Copenhagen Copenhagen, Denmark pagh@itu.dk
More informationMemory management: outline
Memory management: outline Concepts Swapping Paging o Multi-level paging o TLB & inverted page tables 1 Memory size/requirements are growing 1951: the UNIVAC computer: 1000 72-bit words! 1971: the Cray
More informationMemory management: outline
Memory management: outline Concepts Swapping Paging o Multi-level paging o TLB & inverted page tables 1 Memory size/requirements are growing 1951: the UNIVAC computer: 1000 72-bit words! 1971: the Cray
More informationDeterministic Leader Election in O(D + log n) Rounds with Messages of size O(1)
Deterministic Leader Election in O(D + log n) Rounds with Messages of size O(1) A. Casteigts, Y. Métivier, J.M. Robson, A. Zemmari LaBRI - University of Bordeaux DISC 2016, Paris A. Casteigts, Y. Métivier,
More informationOptimal Sampling from Distributed Streams
Optimal Sampling from Distributed Streams Graham Cormode AT&T Labs-Research Joint work with S. Muthukrishnan (Rutgers) Ke Yi (HKUST) Qin Zhang (HKUST) 1-1 Reservoir sampling [Waterman??; Vitter 85] Maintain
More informationConstant Price of Anarchy in Network Creation Games via Public Service Advertising
Constant Price of Anarchy in Network Creation Games via Public Service Advertising Erik D. Demaine and Morteza Zadimoghaddam MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA 02139,
More informationCache-Oblivious Index for Approximate String Matching
Cache-Oblivious Index for Approximate String Matching Wing-Kai Hon 1, Tak-Wah Lam 2, Rahul Shah 3, Siu-Lung Tam 2, and Jeffrey Scott Vitter 3 1 Department of Computer Science, National Tsing Hua University,
More informationCache-Aware and Cache-Oblivious Adaptive Sorting
Cache-Aware and Cache-Oblivious Adaptive Sorting Gerth Stølting rodal 1,, Rolf Fagerberg 2,, and Gabriel Moruz 1 1 RICS, Department of Computer Science, University of Aarhus, IT Parken, Åbogade 34, DK-8200
More informationChapter 17: Parallel Databases
Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems Database Systems
More informationLecture 24 November 24, 2015
CS 229r: Algorithms for Big Data Fall 2015 Prof. Jelani Nelson Lecture 24 November 24, 2015 Scribes: Zhengyu Wang 1 Cache-oblivious Model Last time we talked about disk access model (as known as DAM, or
More informationDeletion Without Rebalancing in Multiway Search Trees
Deletion Without Rebalancing in Multiway Search Trees Siddhartha Sen 1,3 and Robert E. Tarjan 1,2,3 1 Princeton University, {sssix,ret}@cs.princeton.edu 2 HP Laboratories, Palo Alto CA 94304 Abstract.
More informationLecture 6: External Interval Tree (Part II) 3 Making the external interval tree dynamic. 3.1 Dynamizing an underflow structure
Lecture 6: External Interval Tree (Part II) Yufei Tao Division of Web Science and Technology Korea Advanced Institute of Science and Technology taoyf@cse.cuhk.edu.hk 3 Making the external interval tree
More informationOptimal Parallel Randomized Renaming
Optimal Parallel Randomized Renaming Martin Farach S. Muthukrishnan September 11, 1995 Abstract We consider the Renaming Problem, a basic processing step in string algorithms, for which we give a simultaneously
More informationHow TokuDB Fractal TreeTM. Indexes Work. Bradley C. Kuszmaul. MySQL UC 2010 How Fractal Trees Work 1
MySQL UC 2010 How Fractal Trees Work 1 How TokuDB Fractal TreeTM Indexes Work Bradley C. Kuszmaul MySQL UC 2010 How Fractal Trees Work 2 More Information You can download this talk and others at http://tokutek.com/technology
More informationCSE332: Data Abstractions Lecture 7: B Trees. James Fogarty Winter 2012
CSE2: Data Abstractions Lecture 7: B Trees James Fogarty Winter 20 The Dictionary (a.k.a. Map) ADT Data: Set of (key, value) pairs keys must be comparable insert(jfogarty,.) Operations: insert(key,value)
More informationBkd-tree: A Dynamic Scalable kd-tree
Bkd-tree: A Dynamic Scalable kd-tree Octavian Procopiuc Pankaj K. Agarwal Lars Arge Jeffrey Scott Vitter July 1, 22 Abstract In this paper we propose a new index structure, called the Bkd-tree, for indexing
More informationGeometric data structures:
Geometric data structures: Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade Sham Kakade 2017 1 Announcements: HW3 posted Today: Review: LSH for Euclidean distance Other
More informationExtendible Chained Bucket Hashing for Main Memory Databases. Abstract
Extendible Chained Bucket Hashing for Main Memory Databases Pyung-Chul Kim *, Kee-Wook Rim, Jin-Pyo Hong Electronics and Telecommunications Research Institute (ETRI) P.O. Box 106, Yusong, Taejon, 305-600,
More information