Scalable Low-Latency Indexes for a Key-Value Store Ankita Kejriwal With Arjun Gopalan, Ashish Gupta, Zhihao Jia, Stephen Yang and John Ousterhout
Conjecture Can a key value store support strongly consistent secondary indexes while operating at low latency and large scale? SLIK Slide 2
Scalable Low-latency Indexes for a Key-value Store: SLIK Enables multiple secondary keys for each object Allows lookups and range queries on these keys Key design features: Scalability using independent partitioning Strong consistency using an ordered write approach Implemented in RAMCloud Low-latency, DRAM-based, distributed key-value store Performance: Summary of Results Scalability: Linear throughput increase with increasing number of partitions Low-latency:11-13 µs indexed reads, 29-37 µs durable writes/overwrites Latency approximately 2x non-indexed reads and writes SLIK Slide 3
Talk Outline Motivation Design Performance Related Work Summary SLIK Slide 4
Motivation Traditional RDBMs MySQL SLIK Slide 5
Motivation Traditional RDBMs MySQL + scalability - data models - consistency NoSQL Systems SLIK Slide 6
Motivation + consistency H-Base Espresso RAMCloud Traditional RDBMs MySQL + scalability - data models - consistency NoSQL Systems Yesquel Spanner Megastore HyperDex MongoDB H-Store Memcached PNUTS Tao + data models + low latency SLIK Slide 7
Motivation + consistency H-Base Espresso RAMCloud Traditional RDBMs MySQL + scalability - data models - consistency NoSQL Systems Yesquel Spanner Megastore HyperDex MongoDB H-Store Memcached PNUTS Tao + data models + low latency SLIK Slide 8
Talk Outline Motivation Design Performance Related Work Summary SLIK Slide 9
Design Data model Scalability Strong consistency Storage Durability Availability SLIK Slide 10
Design Data model Scalability Strong consistency Storage Durability Availability SLIK Slide 11
Design Data model Scalability Strong consistency Storage Durability Availability SLIK Slide 12
Scalability Nearly constant low latency irrespective of the server span Linear increase in throughput with the server span SLIK Slide 13
Index Partitioning: Colocation Colocate index entries and objects One of the keys used to partition the objects and indexes Server 1 Server 2 Server 3 Indexlet Indexlet Indexlet index key primary key a è 2 n è 3 q è 1 Tablet e è 4 g è 6 Tablet v è 5 b è 8 m è 7 Tablet 1 q rose 2 a tulip 3 n violet 4 e clover 5 v daily 6 g iris 7 m lily 8 b dahlia primary key index key value SLIK Slide 14
Index Partitioning: Colocation Colocate index entries and objects One of the keys used to partition the objects and indexes No association between index partitions and index key ranges Server 1 Server 2 Server 3 Indexlet Indexlet Indexlet index key primary key a è 2 n è 3 q è 1 Tablet e è 4 g è 6 Tablet v è 5 b è 8 m è 7 Tablet 1 q rose 2 a tulip 3 n violet 4 e clover 5 v daily 6 g iris 7 m lily 8 b dahlia primary key index key value Metadata: tablet & indexlet w/ pk 1 to 3: S 1 tablet & indexlet w/ pk 4 to 6: S 2 tablet & indexlet w/ pk >= 7: S 3 Slide 15
Index Partitioning: Colocation Client query: objects with index key between m - q Server 1 Server 2 Server 3 Indexlet Indexlet Indexlet a è 2 n è 3 q è 1 e è 4 g è 6 v è 5 b è 8 m è 7 Tablet 1 q rose 2 a tulip 3 n violet Tablet 4 e clover 5 v daily 6 g iris Tablet 7 m lily 8 b dahlia SLIK Slide 16
Index Partitioning: Colocation Client query: objects with index key between m - q Server 1 Server 2 Server 3 Indexlet Indexlet Indexlet a è 2 n è 3 q è 1 e è 4 g è 6 v è 5 b è 8 m è 7 Tablet 1 q rose 2 a tulip 3 n violet Tablet 4 e clover 5 v daily 6 g iris Tablet 7 m lily 8 b dahlia SLIK Slide 17
Index Partitioning: Colocation Client query: objects with index key between m - q Server 1 Server 2 Server 3 Indexlet Indexlet Indexlet a è 2 n è 3 q è 1 e è 4 g è 6 v è 5 b è 8 m è 7 Tablet 1 q rose 2 a tulip 3 n violet Tablet 4 e clover 5 v daily 6 g iris Tablet 7 m lily 8 b dahlia SLIK Slide 18
Index Partitioning: Colocation Client query: objects with index key between m - q Server 1 Server 2 Server 3 Indexlet Indexlet Indexlet a è 2 n è 3 q è 1 e è 4 g è 6 v è 5 b è 8 m è 7 Tablet 1 q rose 2 a tulip 3 n violet Tablet 4 e clover 5 v daily 6 g iris Tablet 7 m lily 8 b dahlia Not Scalable! Slide 19
Index Partitioning: Independent Partition each index and table independently Partition each index according to sort order for that index Server 4 Server 5 index key primary key Indexlet a è 2 b è 8 e è 4 g è 6 Indexlet m è 7 n è 3 q è 1 v è 5 Server 1 Server 2 Server 3 Tablet Tablet Tablet 1 q rose 4 e clover 7 m lily 2 a tulip 5 v daily 8 b dahlia 3 n violet 6 g iris primary key index key value SLIK Slide 20
Index Partitioning: Independent Partition each index and table independently Partition each index according to sort order for that index Server 4 Server 5 Indexlet Indexlet index key primary key a è 2 b è 8 e è 4 g è 6 Tablet Server 1 Server 2 Server 3 1 q rose 2 a tulip 3 n violet Tablet 4 e clover 5 v daily 6 g iris m è 7 n è 3 q è 1 v è 5 Tablet 7 m lily 8 b dahlia Metadata: tablet w/ pk 1 to 3: S 1 tablet w/ pk 4 to 6: S 2 tablet w/ pk >= 7: S 3 indexlet w/ sk a to g: S 4 indexlet w/ sk >= h: S 5 primary key index key value SLIK Slide 21
Index Partitioning: Independent Server 4 Server 5 Indexlet Indexlet a è 2 b è 8 e è 4 g è 6 m è 7 n è 3 q è 1 v è 5 Client query: objects with index key between m - q Server 1 Server 2 Server 3 Tablet Tablet Tablet 1 q rose 4 e clover 7 m lily 2 a tulip 5 v daily 8 b dahlia 3 n violet 6 g iris SLIK Slide 22
Index Partitioning: Independent Server 4 Server 5 Indexlet Indexlet a è 2 b è 8 e è 4 g è 6 m è 7 n è 3 q è 1 v è 5 Client query: objects with index key between m - q Server 1 Server 2 Server 3 Tablet Tablet Tablet 1 q rose 4 e clover 7 m lily 2 a tulip 5 v daily 8 b dahlia 3 n violet 6 g iris SLIK Slide 23
Index Partitioning: Independent Server 4 Server 5 Indexlet Indexlet a è 2 b è 8 e è 4 g è 6 m è 7 n è 3 q è 1 v è 5 Client query: objects with index key between m - q Server 1 Server 2 Server 3 Tablet Tablet Tablet 1 q rose 4 e clover 7 m lily 2 a tulip 5 v daily 8 b dahlia 3 n violet 6 g iris SLIK Slide 24
Index Partitioning: Independent Server 4 Server 5 Indexlet Indexlet a è 2 b è 8 e è 4 g è 6 m è 7 n è 3 q è 1 v è 5 Client query: objects with index key between m - q Server 1 Server 2 Server 3 Tablet Tablet Tablet 1 q rose 4 e clover 7 m lily 2 a tulip 5 v daily 8 b dahlia 3 n violet 6 g iris Scalable! Slide 25
Scalability Nearly constant low latency irrespective of the server span Linear increase in throughput with the server span SLIK Slide 26
Scalability Nearly constant low latency irrespective of the server span Linear increase in throughput with the server span Solution: Use independent partitioning But: indexed object writes: distributed operations Potential consistency issues between indexes and objects SLIK Slide 27
Design Data model Scalability Strong consistency Storage Durability Availability SLIK Slide 28
Design Data model Scalability Strong consistency Storage Durability Availability SLIK Slide 29
Consistency Properties If an object contains a given secondary key, then an index lookup with that key will return the object If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range SLIK Slide 30
Consistency Properties If an object contains a given secondary key, then an index lookup with that key will return the object If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range Bob Peggy Frank Alice Carol Trent SLIK Slide 31
Consistency Properties If an object contains a given secondary key, then an index lookup with that key will return the object If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range Bob Alice Frank Trent Peggy Carol students with name between a d? Alice Carol? SLIK Slide 32
Consistency Properties If an object contains a given secondary key, then an index lookup with that key will return the object If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range SLIK Slide 33
Consistency Properties If an object contains a given secondary key, then an index lookup with that key will return the object If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range Bob Peggy Frank Alice Carol Trent SLIK Slide 34
Consistency Properties If an object contains a given secondary key, then an index lookup with that key will return the object If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range Alice Bob Alice students Bob Frank Trent with name between a d? Carol Peggy Carol? Peggy SLIK Slide 35
Consistency Properties If an object contains a given secondary key, then an index lookup with that key will return the object If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range Bob Alice Frank Trent Peggy Carol students with name between a d? Alice Bob Carol SLIK Slide 36
Consistency properties: Consistency If an object contains a given secondary key, then an index lookup with that key will return the object If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range Solution: Longer index lifespan (via ordered writes) Object data is ground truth and index entries serve as hints Slide 37
Consistency properties: Consistency If an object contains a given secondary key, then an index lookup with that key will return the object If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range Solution: Longer index lifespan (via ordered writes) Object data is ground truth and index entries serve as hints Index Entry Bob è Foo Object Foo Bob... Slide 38
Consistency properties: Consistency If an object contains a given secondary key, then an index lookup with that key will return the object If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range Solution: Longer index lifespan (via ordered writes) Object data is ground truth and index entries serve as hints Index Entry Bob è Foo Sam è Foo 1. Add new index entry Object Foo Bob... Slide 39
Consistency properties: Consistency If an object contains a given secondary key, then an index lookup with that key will return the object If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range Solution: Longer index lifespan (via ordered writes) Object data is ground truth and index entries serve as hints Index Entry Bob è Foo Sam è Foo 1. Add new index entry 2. Modify object Object Foo Sam... Slide 40
Consistency properties: Consistency If an object contains a given secondary key, then an index lookup with that key will return the object If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range Solution: Longer index lifespan (via ordered writes) Object data is ground truth and index entries serve as hints Index Entry Object Foo Sam... Sam è Foo 1. Add new index entry 2. Modify object 3. Remove old index entry Slide 41
Consistency properties: Consistency If an object contains a given secondary key, then an index lookup with that key will return the object If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range Solution: Longer index lifespan (via ordered writes) Object data is ground truth and index entries serve as hints Index Entry Bob è Foo Sam è Foo Object Foo Bob... Foo Sam... time write object modify object remove object Slide 42
Consistency properties: Consistency If an object contains a given secondary key, then an index lookup with that key will return the object If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range Solution: Longer index lifespan (via ordered writes) Object data is ground truth and index entries serve as hints Index Entry Bob è Foo Sam è Foo Object Foo Bob... Foo Sam... time write object modify object remove object Slide 43
Consistency properties: Consistency If an object contains a given secondary key, then an index lookup with that key will return the object If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range Solution: Longer index lifespan (via ordered writes) Object data is ground truth and index entries serve as hints commit point commit point x commit point Index Entry Bob è Foo Sam è Foo Object Foo Bob... Foo Sam... time write object modify object remove object Slide 44
Consistency properties: Consistency If an object contains a given secondary key, then an index lookup with that key will return the object If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range Solution: Longer index lifespan (via ordered writes) Object data is ground truth and index entries serve as hints commit point commit point commit point Index Entry Bob è Foo Sam è Foo Object Foo Bob... Foo Sam... time write object modify object remove object Slide 45
Talk Outline Motivation Design Performance Related Work Summary SLIK Slide 46
Performance: Questions Does SLIK provide low latency? Does SLIK provide scalability? How does the performance of indexing with SLIK compare to other state-of-the-art systems? SLIK Slide 47
Performance: Systems for Comparison H-Store: Main memory database Data (and indexes) partitioned based on specified attribute Many parameters for tuning Got assistance from developers to tune for each test Examples: txn_incoming_delay, partitioning column HyperDex: Spaces containing objects Data (and indexes) partitioned using hyperspace hashing Each index contains all object data Designed to use disk for storage SLIK Slide 48
Hardware SLIK Slide 49
Latency Experiments: 1. Lookups: table with single secondary index 2. Overwrites: table with single secondary index 3. Overwrites: varying number of secondary indexes Configuration: Single client Single partition for table and (each) index Object: 30 B pk, 30 B sk, 100 B value SLIK: Three-way replication to durable backups H-Store: No replication, durability disabled, single server Slide 50
Lookup Latency 250 H-Store SLIK TCP SLIK (a) Lookup Latency (µs) 200 150 100 50 142.10 150.11 152.37 138.18 134.97 45.2 44.9 42.7 48.5 49.7 147.42 45.6 132.00 54.5 11.0 10.2 11.7 11.6 12.7 12.8 13.1 0 10 10 0 10 10 1 10 10 2 10 3 10 4 10 10 5 10 10 6 Size of Index (# objects) Slide 51
Lookup Latency 250 H-Store SLIK TCP SLIK (a) Lookup Latency (µs) 200 150 100 50 142.10 150.11 152.37 138.18 134.97 45.2 44.9 42.7 48.5 49.7 147.42 45.6 132.00 54.5 11.0 10.2 11.7 11.6 12.7 12.8 13.1 0 10 10 0 10 10 1 10 10 2 10 3 10 4 10 10 5 10 10 6 Size of Index (# objects) Slide 52
Overwrite Latency 250 H-Store SLIK TCP SLIK (b) Overwrite Latency (µs) 200 150 100 50 143.51 143.00 124.4 151.39 153.28 148.88 137.74 133.54 135.8 126.5 125.9 129.7 124.3 123.8 31.4 32.7 34.2 34.3 35.2 35.2 37.0 0 10 0 10 1 10 2 10 3 10 4 10 5 10 6 Size of Index (# objects) Slide 53
Multiple Secondary Indexes 1474.39 1682.16 1704.69 1756.99 Overwrite Latency (µs) 1000 100 10 265.91 273.41 285.99 287.72 181.98 152.97 138.5 139.1 156.2 165.2 164.3 175.3 175.7 179.1 182.0 184.6 33.0 35.3 39.8 39.0 42.7 42.4 46.4 47.3 49.5 51.2 H-Store via PK SLIK TCP H-Store via SK SLIK 0 1 2 3 4 5 6 7 8 9 10 Number of Indexes Slide 54
Multiple Secondary Indexes 1474.39 1682.16 1704.69 1756.99 Overwrite Latency (µs) 1000 100 10 265.91 273.41 285.99 287.72 181.98 152.97 138.5 139.1 156.2 165.2 164.3 175.3 175.7 179.1 182.0 184.6 33.0 35.3 39.8 39.0 42.7 42.4 46.4 47.3 49.5 51.2 H-Store via PK SLIK TCP H-Store via SK SLIK 0 1 2 3 4 5 6 7 8 9 10 Number of Indexes Slide 55
Multiple Secondary Indexes 1474.39 1682.16 1704.69 1756.99 Overwrite Latency (µs) 1000 100 10 265.91 273.41 285.99 287.72 181.98 152.97 138.5 139.1 156.2 165.2 164.3 175.3 175.7 179.1 182.0 184.6 33.0 35.3 39.8 39.0 42.7 42.4 46.4 47.3 49.5 51.2 H-Store via PK SLIK TCP H-Store via SK SLIK 0 1 2 3 4 5 6 7 8 9 10 Number of Indexes Slide 56
Scalability Compare: (a) partitioning approaches (b) systems Experiments: 1. Lookup throughput with increasing number of partitions 2. Lookup latency with increasing number of partitions Configuration: Single table with one secondary index Table and index partitioned across servers Object: 30 B pk, 30 B sk, 100 B value Throughput experiments: Loaded system Latency experiments: Unloaded system Slide 57
Scalability: Throughput Throughput (10 3 lookups/sec) 4000 3500 Independent Partitioning Colocation 3000 2782 2478 2500 2184 1859 2000 1558 1500 1249 1000 940 634 500 435 0 335 461 423 463 447 457 447 357 441 418 0 2 4 6 8 10 Number of Indexlets 3092 Slide 58
Scalability: Throughput Throughput (10 3 lookups/sec) H-Store 5000 4629 SLIK TCP 4248 SLIK 4000 3629 3199 3000 2655 2197 2000 1619 1352 1199 1127 1001 794 1000 653 580 430 220 96.90 0 0 2 4 6 8 10 12 14 16 18 20 312.35 347.32 396.27 Number of Servers 5069 1445 1663 1807 Slide 59
Scalability: Latency Lookup Latency (µs) 100 80 60 40 20 Colocation size 1 89.7 Colocation size 10 87.3 Independent size 1 Independent size 10 28.8 22.3 16.2 26.7 12.7 15.2 16.7 8.3 0 0 10 20 30 40 50 60 70 80 Number of Servers Slide 60
Scalability: Latency Average Latency per Lookup (µs) 300 250 200 150 100 50 H-Store SLIK TCP SLIK 267.4 269.8 267.2 243.4 240.3 215.6 199.2 179.6 152.6 119.4 112.7 114.4 113.3 90.6 91.0 81.6 69.1 49.9 55.6 58.2 13.1 13.3 13.9 13.7 14.4 14.7 14.5 14.4 14.6 14.7 0 0 1 2 3 4 5 6 7 8 9 10 Number of Indexlets Slide 61
Talk Outline Motivation Design Performance Related Work Summary SLIK Slide 62
Data storage system Related Work Data model (spectrum from key-value to relational) Consistency (spectrum from eventual to strong) Performance: latency and/or throughput SLIK Slide 63
Current Web Scale Datastores Better Strong Consistency Level Causal, SI, Define your own Eventual 100s 10s 1s 100ms 10ms 1ms 100µs 10µs Read / write latency (approx) Better SLIK Slide 64
Current Web Scale Datastores Better Strong Consistency Level Causal, SI, Define your own Eventual PNUTS CouchDB Espresso Tao 100s 10s 1s 100ms 10ms 1ms 100µs 10µs Read / write latency (approx) Better SLIK Slide 65
Current Web Scale Datastores Better Strong MongoDB H-Base Spanner H-Store HyperDex Consistency Level Causal, SI, Define your own Eventual PNUTS CouchDB Espresso Tao 100s 10s 1s 100ms 10ms 1ms 100µs 10µs Read / write latency (approx) Better SLIK Slide 66
Current Web Scale Datastores Better Strong MongoDB H-Base Spanner H-Store HyperDex Consistency Level Causal, SI, Define your own Eventual Cassandra Espresso PNUTS CouchDB Tao 100s 10s 1s 100ms 10ms 1ms 100µs 10µs Read / write latency (approx) Better SLIK Slide 67
Current Web Scale Datastores Better Strong MongoDB H-Base Spanner H-Store HyperDex SLIK Consistency Level Causal, SI, Define your own Eventual Cassandra Espresso PNUTS CouchDB Tao 100s 10s 1s 100ms 10ms 1ms 100µs 10µs Read / write latency (approx) Better SLIK Slide 68
Talk Outline Motivation Design Performance Related Work Summary SLIK Slide 69
Conjecture Can a key value store support strongly consistent secondary indexes while operating at low latency and large scale? SLIK Slide 70
Summary Lookups and range queries on secondary keys A key value store can support strongly consistent secondary indexes while operating at low latency and large scale. SLIK Slide 71
Summary By using ordered writes and treating indexes as hints Lookups and range queries on secondary keys A key value store can support strongly consistent secondary indexes while operating at low latency and large scale. SLIK Slide 72
Summary By using ordered writes and treating indexes as hints Lookups and range queries on secondary keys A key value store can support strongly consistent secondary indexes while operating at low latency and large scale. By using independent partitioning we get: linear throughput increase and minimal impact on latency as the scale increases SLIK Slide 73
Summary By using ordered writes and treating indexes as hints Lookups and range queries on secondary keys A key value store can support strongly consistent secondary indexes while operating at low latency and large scale. By using approaches that have minimal overheads we get: 11-13 µs lookups and 30-37 µs (over)writes By using independent partitioning we get: linear throughput increase and minimal impact on latency as the scale increases SLIK Slide 74
Summary By using ordered writes and treating indexes as hints Lookups and range queries on secondary keys A key value store can support strongly consistent secondary indexes while operating at low latency and large scale. By using approaches that have minimal overheads we get: 11-13 µs lookups and 30-37 µs (over)writes By using independent partitioning we get: linear throughput increase and minimal impact on latency as the scale increases SLIK Slide 75
Thank you! Code available free and open source: github.com/platformlab/ramcloud My papers and other information at: stanford.edu/~ankitak I can be reached at: ankitak@cs.stanford.edu