Dynamic Indexability and the Optimality of B-trees and Hash Tables

Size: px

Start display at page:

Download "Dynamic Indexability and the Optimality of B-trees and Hash Tables"

Gerald Carpenter
6 years ago
Views:

1 Dynamic Indexability and the Optimality of B-trees and Hash Tables Ke Yi Hong Kong University of Science and Technology Dynamic Indexability and Lower Bounds for Dynamic One-Dimensional Range Query Indexes, PODS 09 Dynamic External Hashing: The Limit of Buffering, with Zhewei Wei and Qin Zhang, SPAA 09 + some latest development 1-1

2 An index is... An index is a single number calculated from a set of prices Dow Jones, S & P, Hang Seng 2-1

3 An index is... An index is a single number calculated from a set of prices Dow Jones, S & P, Hang Seng An index is a list of keywords and their page numbers in a book An index is an exponent An index is a finger An index is a list of academic publications and their citations 2-2

4 An index is... An index is a single number calculated from a set of prices Dow Jones, S & P, Hang Seng An index is a list of keywords and their page numbers in a book An index is an exponent An index is a finger An index is a list of academic publications and their citations An index (search engine) is an inverted list from keywords to web pages An index (database) is a (disk-based) data structure that improves the speed of data retrieval operations (queries) on a database table. 2-3

5 Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually used in all database systems 3-1

6 Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually used in all database systems B-tree: lookups and range queries; Hash table: lookups 3-2

7 Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually used in all database systems B-tree: lookups and range queries; Hash table: lookups External memory model (I/O model): Memory Memory of size M Each I/O reads/writes a block Disk Disk partitioned into blocks of size B 3-3

8 4-1 The B-tree

9 The B-tree A range query in O(log B N + K/B) I/Os K: output size 4-2

10 The B-tree memory A range query in O(log B N + K/B) I/Os K: output size log B N log B M = log B N M 4-3

11 The B-tree memory A range query in O(log B N + K/B) I/Os K: output size log B N log B M = log B N M The height of B-tree never goes beyond 5 (e.g., if B = 100, then a B-tree with 5 levels stores n = 10 billion records). We will N assume log B M = O(1). 4-4

12 External Hashing h(x) = last digit of x null null null null null null null 55 null null 5-1

13 External Hashing h(x) = last digit of x null null null null null null null 55 null null Ideal hash function assumption: h maps each object to a hash value uniformly independently at random 5-2

14 External Hashing h(x) = last digit of x null null null null null null null 55 null null Ideal hash function assumption: h maps each object to a hash value uniformly independently at random Expected average cost of a successful (or unsuccessful) lookup is 1 + 1/2 Ω(B) disk accesses, provided the load factor is less than a constant smaller than 1 [Knuth, 1973] 5-3

15 Exact Numbers Calculated by Knuth The Art of Computer Programming, volume 3, 1998, page

16 Exact Numbers Calculated by Knuth Extremely close to ideal The Art of Computer Programming, volume 3, 1998, page

17 Now Let s Go Dynamic Focus on insertions first: Both the B-tree and hash table do a search first, then insert into the appropriate block B-tree: Split blocks when necessary Hashing: Rebuild the hash table when too full; extensible hashing [Fagin, Nievergelt, Pippenger, Strong, 79]; linear hashing [Litwin, 80] 7-1

18 Now Let s Go Dynamic Focus on insertions first: Both the B-tree and hash table do a search first, then insert into the appropriate block B-tree: Split blocks when necessary Hashing: Rebuild the hash table when too full; extensible hashing [Fagin, Nievergelt, Pippenger, Strong, 79]; linear hashing [Litwin, 80] These resizing operations only add O(1/B) I/Os amortized per insertion; bottleneck is the first search + insert 7-2

19 Now Let s Go Dynamic Focus on insertions first: Both the B-tree and hash table do a search first, then insert into the appropriate block B-tree: Split blocks when necessary Hashing: Rebuild the hash table when too full; extensible hashing [Fagin, Nievergelt, Pippenger, Strong, 79]; linear hashing [Litwin, 80] These resizing operations only add O(1/B) I/Os amortized per insertion; bottleneck is the first search + insert Cannot hope for lower than 1 I/O per insertion only if the changes must be committed to disk right away (necessary?) 7-3

20 Now Let s Go Dynamic Focus on insertions first: Both the B-tree and hash table do a search first, then insert into the appropriate block B-tree: Split blocks when necessary Hashing: Rebuild the hash table when too full; extensible hashing [Fagin, Nievergelt, Pippenger, Strong, 79]; linear hashing [Litwin, 80] These resizing operations only add O(1/B) I/Os amortized per insertion; bottleneck is the first search + insert Cannot hope for lower than 1 I/O per insertion only if the changes must be committed to disk right away (necessary?) Otherwise we probably can lower the amortized insertion cost by buffering, like numerous problems in external memory, e.g. stack, priority queue,... All of them support an insertion in O(1/B) I/Os the best possible 7-4

21 Dynamic B-trees Dynamic Hash Tables 8-1

22 Dynamic B-trees for Fast Insertions LSM-tree [O Neil, Cheng, Gawlick, O Neil, Acta Informatica 96]: Logarithmic method + B-tree M lm memory l 2 M 9-1

23 Dynamic B-trees for Fast Insertions LSM-tree [O Neil, Cheng, Gawlick, O Neil, Acta Informatica 96]: Logarithmic method + B-tree M lm memory Insertion: O( l B log l N M ) l 2 M Query: O(log l N M ) (omit the K B output term) 9-2

24 Dynamic B-trees for Fast Insertions LSM-tree [O Neil, Cheng, Gawlick, O Neil, Acta Informatica 96]: Logarithmic method + B-tree M lm memory Insertion: O( l B log l N M ) l 2 M Query: O(log l N M ) (omit the K B output term) Stepped merge tree [Jagadish, Narayan, Seshadri, Sudarshan, Kannegantil, VLDB 97]: variant of LSM-tree Insertion: O( 1 B log l N M ) Query: O(l log l N M ) 9-3

25 Dynamic B-trees for Fast Insertions LSM-tree [O Neil, Cheng, Gawlick, O Neil, Acta Informatica 96]: Logarithmic method + B-tree M lm memory Insertion: O( l B log l N M ) l 2 M Query: O(log l N M ) (omit the K B output term) Stepped merge tree [Jagadish, Narayan, Seshadri, Sudarshan, Kannegantil, VLDB 97]: variant of LSM-tree Insertion: O( 1 B log l N M ) Query: O(l log l N M ) Usually l is set to be a constant, then they both have O( 1 B log N M ) insertion and O(log N M ) query 9-4

26 More Dynamic B-trees Buffer-tree (buffered-repository tree) [Arge, WADS 95; Buchsbaum, Goldwasser, Venkatasubramanian, Westbrook, SODA 00] Streaming B-tree [Bender, Farach-Colton, Fineman, Fogel, Kuszmaul, Nelson, SPAA 07] Y-tree [Jermaine, Datta, Omiecinski, VLDB 99] 10-1

27 More Dynamic B-trees Buffer-tree (buffered-repository tree) [Arge, WADS 95; Buchsbaum, Goldwasser, Venkatasubramanian, Westbrook, SODA 00] Streaming B-tree [Bender, Farach-Colton, Fineman, Fogel, Kuszmaul, Nelson, SPAA 07] Y-tree [Jermaine, Datta, Omiecinski, VLDB 99] q log B u 1 B log B 1 1 B Bɛ B ɛ 1 B 10-2

28 More Dynamic B-trees Buffer-tree (buffered-repository tree) [Arge, WADS 95; Buchsbaum, Goldwasser, Venkatasubramanian, Westbrook, SODA 00] Streaming B-tree [Bender, Farach-Colton, Fineman, Fogel, Kuszmaul, Nelson, SPAA 07] Y-tree [Jermaine, Datta, Omiecinski, VLDB 99] q log B u 1 B log B 1 1 B Bɛ B ɛ 1 B Deletions? Standard trick: inserting delete signals 10-3

29 More Dynamic B-trees Buffer-tree (buffered-repository tree) [Arge, WADS 95; Buchsbaum, Goldwasser, Venkatasubramanian, Westbrook, SODA 00] Streaming B-tree [Bender, Farach-Colton, Fineman, Fogel, Kuszmaul, Nelson, SPAA 07] Y-tree [Jermaine, Datta, Omiecinski, VLDB 99] Cache-oblivious model [Demaine, Fineman, Iacono, Langerman, Munro, SODA 10] q log B Deletions? Standard trick: inserting delete signals No better solutions known... u 1 B log B 1 1 B Bɛ B ɛ 1 B 10-4

30 Compare with the rich results in RAM! Range reporting O( log N/ log log N) insertion and query [Andersson, Thorup, JACM 07] O(log N/ log log N) insertion and O(log log N) query [Mortensen, Pagh, Pǎtraşcu, STOC 05] Other results that depend on the word size w Predecessor Θ( log N/ log log N) insertion and query [Andersson, Thorup, JACM 07] Partial-sum Θ(log N) insertion query [Pǎtraşcu, Demaine, SODA 04] 11-1

31 12-1 Are the EM and DB people just dumb?

32 Our Main Result For any dynamic range query index with a query cost of q and an amortized insertion cost of u, the following tradeoff holds { q log(ub/q) = Ω(log B), for q < α log B, α is any constant; ub log q = Ω(log B), for all q. 13-1

33 Our Main Result For any dynamic range query index with a query cost of q and an amortized insertion cost of u, the following tradeoff holds { q log(ub/q) = Ω(log B), for q < α log B, α is any constant; ub log q = Ω(log B), for all q. Assuming log B N M Current upper bounds: = O(1), all the bounds are tight! q log B u 1 B log B 1 1 B Bɛ B ɛ 1 B 13-2

34 Our Main Result For any dynamic range query index with a query cost of q and an amortized insertion cost of u, the following tradeoff holds { q log(ub/q) = Ω(log B), for q < α log B, α is any constant; ub log q = Ω(log B), for all q. Assuming log B N M Current upper bounds: log N M = O(1), all the bounds are tight! q log B u 1 B log B 1 1 B Bɛ B ɛ 1 B 1 B log N M Can t be true for B = o( log n log log n), since the exponential tree achieves u = q = O( log n/ log log n) [Andersson, Thorup, JACM 07]. (n = N/M) 13-3

35 The real question How large does B need to be for buffer-tree to be optimal for range reporting? Known: somewhere between Ω( log n log log n) and O(n ɛ ) 14-1

36 Lower Bound Model: Dynamic Indexability Indexability: [Hellerstein, Koutsoupias, Papadimitriou, PODS 97, JACM 02] 15-1

37 Lower Bound Model: Dynamic Indexability Indexability: [Hellerstein, Koutsoupias, Papadimitriou, PODS 97, JACM 02] Objects are stored in disk blocks of size up to B, possibly with redundancy. 15-2

38 Lower Bound Model: Dynamic Indexability Indexability: [Hellerstein, Koutsoupias, Papadimitriou, PODS 97, JACM 02] a query reports {2,3,4,5} Objects are stored in disk blocks of size up to B, possibly with redundancy. 15-3

39 Lower Bound Model: Dynamic Indexability Indexability: [Hellerstein, Koutsoupias, Papadimitriou, PODS 97, JACM 02] a query reports {2,3,4,5} cost = Objects are stored in disk blocks of size up to B, possibly with redundancy. The query cost is the minimum number of blocks that can cover all the required results (search time ignored!). 15-4

40 Lower Bound Model: Dynamic Indexability Indexability: [Hellerstein, Koutsoupias, Papadimitriou, PODS 97, JACM 02] a query reports {2,3,4,5} cost = Objects are stored in disk blocks of size up to B, possibly with redundancy. The query cost is the minimum number of blocks that can cover all the required results (search time ignored!). Similar in spirit to popular lower bound models: model, semigroup model cell probe 15-5

41 Previous Results on Static Indexability Nearly all external indexing lower bounds are under this model Tradeoff between space (s) and query time (q) 16-1

42 Previous Results on Static Indexability Nearly all external indexing lower bounds are under this model Tradeoff between space (s) and query time (q) 2D range queries: s/n log q = Ω(log(N/B)) [Hellerstein, Koutsoupias, Papadimitriou, PODS 97], [Koutsoupias, Taylor, PODS 98], [Arge, Samoladas, Vitter, PODS 99] 16-2

43 Previous Results on Static Indexability Nearly all external indexing lower bounds are under this model Tradeoff between space (s) and query time (q) 2D range queries: s/n log q = Ω(log(N/B)) [Hellerstein, Koutsoupias, Papadimitriou, PODS 97], [Koutsoupias, Taylor, PODS 98], [Arge, Samoladas, Vitter, PODS 99] 2D stabbing queries: q log(s/n) = Ω(log(N/B)) [Arge, Samoladas, Yi, ESA 04, Algorithmica 99] 16-3

44 Previous Results on Static Indexability Nearly all external indexing lower bounds are under this model Tradeoff between space (s) and query time (q) 2D range queries: s/n log q = Ω(log(N/B)) [Hellerstein, Koutsoupias, Papadimitriou, PODS 97], [Koutsoupias, Taylor, PODS 98], [Arge, Samoladas, Vitter, PODS 99] 2D stabbing queries: q log(s/n) = Ω(log(N/B)) [Arge, Samoladas, Yi, ESA 04, Algorithmica 99] 1D range queries: s = N, q = 1 trivially 16-4

45 Previous Results on Static Indexability Nearly all external indexing lower bounds are under this model Tradeoff between space (s) and query time (q) 2D range queries: s/n log q = Ω(log(N/B)) [Hellerstein, Koutsoupias, Papadimitriou, PODS 97], [Koutsoupias, Taylor, PODS 98], [Arge, Samoladas, Vitter, PODS 99] 2D stabbing queries: q log(s/n) = Ω(log(N/B)) [Arge, Samoladas, Yi, ESA 04, Algorithmica 99] 1D range queries: s = N, q = 1 trivially Adding dynamization makes it much more interesting! 16-5

46 Dynamic Indexability Still consider only insertions 17-1

47 Dynamic Indexability Still consider only insertions time t: memory of size M blocks of size B = snapshot 17-2

48 Dynamic Indexability Still consider only insertions time t: memory of size M blocks of size B = snapshot time t + 1: inserted 17-3

49 Dynamic Indexability Still consider only insertions time t: memory of size M blocks of size B = snapshot time t + 1: inserted time t + 2: inserted 17-4

50 Dynamic Indexability Still consider only insertions time t: memory of size M blocks of size B = snapshot time t + 1: inserted time t + 2: inserted transition cost =

51 Dynamic Indexability Still consider only insertions time t: memory of size M blocks of size B = snapshot time t + 1: inserted time t + 2: inserted transition cost = 2 Update cost: u = amortized transition cost per insertion 17-6

52 The Ball-Shuffling Problem B balls q bins 18-1

53 The Ball-Shuffling Problem B balls q bins cost =

54 The Ball-Shuffling Problem B balls q bins cost = 1 cost = 2 cost of putting the ball directly into a bin = # balls in the bin

55 The Ball-Shuffling Problem B balls q bins 19-1

56 The Ball-Shuffling Problem B balls q bins Shuffle: cost =

57 The Ball-Shuffling Problem B balls q bins Shuffle: cost = 5 Cost of shuffling = # balls in the involved bins 19-3

58 The Ball-Shuffling Problem B balls q bins Shuffle: cost = 5 Cost of shuffling = # balls in the involved bins Putting a ball directly into a bin is a special shuffle 19-4

59 The Ball-Shuffling Problem B balls q bins Shuffle: cost = 5 Cost of shuffling = # balls in the involved bins Putting a ball directly into a bin is a special shuffle Goal: Accommodating all B balls using q bins with minimum cost 19-5

60 The Workload Construction round 1: keys time 20-1

61 The Workload Construction round 1: keys round 2: time 20-2

62 The Workload Construction round 1: keys round 2: round 3: round B: time 20-3

63 The Workload Construction round 1: keys round 2: round 3: round B: time Queries that we require the index to cover with q blocks # queries 2MB 20-4

64 The Workload Construction round 1: round 2: round 3: snapshot snapshot snapshot keys time round B: snapshot Queries that we require the index to cover with q blocks # queries 2MB Snapshots of the dynamic index considered 20-5

65 The Workload Construction round 1: round 2: round 3: round B: keys There exists a query such that The B objects of the query reside in q blocks in all snapshots time All of its objects are on disk in all B snapshots (we have MB queries) The index moves its objects ub 2 times in total 21-1

66 The Reduction An index with update cost u and query A gives us a solution to the ball-shuffling game with cost ub 2 for B balls and q bins 22-1

67 The Reduction An index with update cost u and query A gives us a solution to the ball-shuffling game with cost ub 2 for B balls and q bins Lower bound on the ball-shuffling problem: Theorem: The cost of any solution for the ball-shuffling problem is at least { Ω(q B 1+Ω(1/q) ), for q < α log B where α is any constant; Ω(B log q B), for any q. 22-2

68 The Reduction An index with update cost u and query A gives us a solution to the ball-shuffling game with cost ub 2 for B balls and q bins Lower bound on the ball-shuffling problem: Theorem: The cost of any solution for the ball-shuffling problem is at least { Ω(q B 1+Ω(1/q) ), for q < α log B where α is any constant; Ω(B log q B), for any q. { q log(ub/q) = Ω(log B), for q < α log B, α is any constant; ub log q = Ω(log B), for all q. 22-3

69 Ball-Shuffling Lower Bounds Theorem: The cost of any solution for the ball-shuffling problem is at least { Ω(q B 1+Ω(1/q) ), for q < α log B where α is any constant; Ω(B log q B), for any q. cost lower bound q 23-1

70 Ball-Shuffling Lower Bounds Theorem: The cost of any solution for the ball-shuffling problem is at least { Ω(q B 1+Ω(1/q) ), for q < α log B where α is any constant; Ω(B log q B), for any q. cost lower bound B 2 1 q 23-2

71 Ball-Shuffling Lower Bounds Theorem: The cost of any solution for the ball-shuffling problem is at least { Ω(q B 1+Ω(1/q) ), for q < α log B where α is any constant; Ω(B log q B), for any q. cost lower bound B 2 B 4/3 1 2 q 23-3

72 Ball-Shuffling Lower Bounds Theorem: The cost of any solution for the ball-shuffling problem is at least { Ω(q B 1+Ω(1/q) ), for q < α log B where α is any constant; Ω(B log q B), for any q. cost lower bound B 2 B 4/3 B log B 1 2 log B q 23-4

73 Ball-Shuffling Lower Bounds Theorem: The cost of any solution for the ball-shuffling problem is at least { Ω(q B 1+Ω(1/q) ), for q < α log B where α is any constant; Ω(B log q B), for any q. cost lower bound B 2 B 4/3 B log B B 1 2 log B B ɛ q 23-5

74 Dynamic B-trees Dynamic Hash Tables 24-1

75 Dynamic Hash Tables B-tree query I/O: O(log B N M ) Hash table query I/O: 1 + 1/2 Ω(B) ; insertion the same 25-1

76 Dynamic Hash Tables B-tree query I/O: O(log B N M ) Hash table query I/O: 1 + 1/2 Ω(B) ; insertion the same A long-time conjecture in the external memory community: The insertion cost must be Ω(1) I/Os if the query cost is required to be O(1) I/Os. 25-2

77 Dynamic Hash Tables B-tree query I/O: O(log B N M ) Hash table query I/O: 1 + 1/2 Ω(B) ; insertion the same A long-time conjecture in the external memory community: The insertion cost must be Ω(1) I/Os if the query cost is required to be O(1) I/Os. Buffering is useless? 25-3

78 Dynamic Hash Tables (for successful queries) Logarithmic method (folklore?) m memory 2m 4m 8m 26-1

79 Dynamic Hash Tables (for successful queries) Logarithmic method (folklore?) Insertion: O( 1 B log N M ) Expected average query: O(1) m 2m memory 4m 8m 26-2

80 Dynamic Hash Tables (for successful queries) Logarithmic method (folklore?) Insertion: O( 1 B log N M ) Expected average query: O(1) Improving query time Idea: Keep one table large enough m 2m memory 4m 8m 26-3

81 Dynamic Hash Tables (for successful queries) Logarithmic method (folklore?) Insertion: O( 1 B log N M ) Expected average query: O(1) m 2m memory 4m 8m Improving query time For some parameter β = B c, c 1 Idea: Keep one table large enough x x/β 26-4

82 Dynamic Hash Tables (for successful queries) Logarithmic method (folklore?) Insertion: O( 1 B log N M ) Expected average query: O(1) m 2m memory 4m 8m Improving query time For some parameter β = B c, c 1 Idea: Keep one table large enough x x/β 26-5

83 Dynamic Hash Tables (for successful queries) Logarithmic method (folklore?) Insertion: O( 1 B log N M ) Expected average query: O(1) m 2m memory 4m 8m Improving query time Idea: Keep one table large enough For some parameter β = B c, c 1 x x/β 2x 26-6

84 Dynamic Hash Tables (for successful queries) Logarithmic method (folklore?) Insertion: O( 1 B log N M ) Expected average query: O(1) m 2m memory 4m 8m Improving query time For some parameter β = B c, c 1 Idea: Keep one table large enough x x/β Insertion: O(B c 1 ) 2x 26-7

85 Dynamic Hash Tables (for successful queries) Logarithmic method (folklore?) Insertion: O( 1 B log N M ) Expected average query: O(1) m 2m memory 4m 8m Improving query time For some parameter β = B c, c 1 Idea: Keep one table large enough x x/β Insertion: O(B c 1 ) Query: 1+O(1/B c ) 2x 26-8

86 Dynamic Hash Tables (for successful queries) Logarithmic method (folklore?) Insertion: O( 1 B log N M ) Expected average query: O(1) m 2m memory 4m 8m Improving query time For some parameter β = B c, c 1 Idea: Keep one table large enough x x/β Insertion: O(B c 1 ) Query: 1+O(1/B c ) 2x Still far from the target 1+1/Ω(2 B ) 26-9

87 Query-Insertion Tradeoff for Successful queries [Wei, Yi, Zhang, SPAA 09] Insertion standard hashing 1 + 1/2 Ω(B) upper bounds lower bounds 1 O(1/B (c 1)/4 ) O(1) Ω(1) O(B c 1 ) 1 Ω(B c 1 ) 1+Θ(1/B c ) 1 + Θ(1/B c ), c < 1 c > Θ(1/B) Query 27-1

88 Indexability Too Strong! Naïve solution: For every B items, write to a block. Query cost is 1, insertion is 1/B 28-1

89 Indexability Too Strong! Naïve solution: For every B items, write to a block. Query cost is 1, insertion is 1/B Too many possible mappings! 28-2

90 Indexability Too Strong! Naïve solution: For every B items, write to a block. Query cost is 1, insertion is 1/B Too many possible mappings! Indexabilty + information-theoretical argument If with only the information in memory, the hash table cannot locate the item, then querying it takes at least 2 I/Os. 28-3

91 The Abstraction Consider the layout of a hash table at any snapshot. Denote all the blocks on disk by B 1, B 2,..., B d. Let f : U {1,..., d} be any function computable within memory. We divide items inserted into 3 zones with respect to f. Memory zone M: set of items stored in memory. t q = 0. Fast zone F : set of items x such that x B f(x). t q = 1. Slow zone S: The rest of items. t q =

92 The Key The hash table can employ a family F of at most 2 M distinct f s. Note that the current f adopted by the hash table is dependent upon the already inserted items, but the family F has to be fixed beforehand. 30-1

93 How about All Queries? (Latest results) We are essentially talking about the membership problem Can t use indexability model Have to use cell probe model 31-1

94 All queries (the membership problem) (The cell probe model) Query truncated buffer tree n ɛ upper bounds lower bounds buffer tree l B log l n, log l n hashing 1 + 1/2 Ω(B) 0 1 B log M B n 1 B ɛ B log n log n 1 B Insert 32-1

95 All queries (the membership problem) (The cell probe model) Query truncated buffer tree n ɛ upper bounds lower bounds buffer tree l B log l n, log l n [Yi, Zhang, SODA 10] hashing B log M B n 1 B log n B ɛ log n B 1 + 1/2 Ω(B) Insert 32-2

96 All queries (the membership problem) (The cell probe model) Query truncated buffer tree n ɛ upper bounds lower bounds buffer tree log B log n n l B log l n, log l n [Yi, Zhang, SODA 10] hashing B log M B [Verbin, Zhang, STOC 10] n 1 B log n B ɛ log n B 1 + 1/2 Ω(B) Insert 32-3

97 THE BIG BOLD CONJECTURE All these fundamental data structure problems have the same query-update tradeoff in external memory when u = o(1), for sufficiently large B. Partial-sum: all B; Range reporting: B > n ɛ ; Predecessor: unknown. 33-1

98 THE BIG BOLD CONJECTURE All these fundamental data structure problems have the same query-update tradeoff in external memory when u = o(1), for sufficiently large B. Partial-sum: all B; Range reporting: B > n ɛ ; Predecessor: unknown. Strong implication: The buffer tree (and many of the log method based structures) is simple, practical, versatile, and optimal! 33-2

99 The End T HAN K YOU Q and A 34-1

Dynamic Indexability and Lower Bounds for Dynamic One-Dimensional Range Query Indexes

Dynamic Indexability and Lower Bounds for Dynamic One-Dimensional Range Query Indexes Ke Yi HKUST 1-1 First Annual SIGMOD Programming Contest (to be held at SIGMOD 2009) Student teams from degree granting