Closing the Performance Gap Between Volatile and Persistent K-V Stores

Size: px

Start display at page:

Download "Closing the Performance Gap Between Volatile and Persistent K-V Stores"

Ambrose Marsh
5 years ago
Views:

1 Closing the Performance Gap Between Volatile and Persistent K-V Stores Yihe Huang, Harvard University Matej Pavlovic, EPFL Virendra Marathe, Oracle Labs Margo Seltzer, Oracle Labs Tim Harris, Oracle Labs Steve Byan, Oracle Labs ATC 2018, 07/13/2018, Boston MA, USA 1

2 Key-Value Stores: Persistent or Fast? Persistent Slow or Volatile Fast 2

3 Key-Value Stores: Persistent or Fast? Persistent Slow or Volatile Fast What about both? 2

4 Non-Volatile Memory Faster than HDD/SSD Byte-addressable 3

5 Non-Volatile Memory Faster than HDD/SSD Slower than DRAM Byte-addressable Tricky to manage consistently 3

6 Non-Volatile Memory Faster than HDD/SSD Slower than DRAM Byte-addressable Tricky to manage consistently DRAM K-V stores don t easily translate to NVM 3

7 Outline Cross-Referencing Logs: Getting the best of DRAM and NVM Bullet: Fast and persistent key-value store 4

8 Outline Cross-Referencing Logs: Getting the best of DRAM and NVM Bullet: Fast and persistent key-value store 5

9 Caching Volatile store (frontend) caches Persistent store (backend) 6

10 Caching Volatile store (frontend) caches Persistent store (backend) NVM 7

11 Caching Volatile store (frontend) How? caches Persistent store (backend) NVM 7

12 Reading: Standard Volatile store (frontend) Lookup on cache miss Persistent store (backend) NVM 8

13 Writing: Logs Volatile store (frontend) 1. Persistently log op. 2. Update frontend 3. Return to client 4. Update backend in background Persistent store (backend) NVM 9

14 Writing: How? Logs Volatile store (frontend) 1. Persistently log op. 2. Update frontend 3. Return to client 4. Update backend in background Persistent store (backend) NVM 9

15 Writing: How? Logs Volatile store (frontend) 1. Persistently log op. 2. Update frontend 3. Return to client 4. Update backend in background Persistent store (backend) NVM Challenges: persistence, low latency, scalability, consistency 9

16 Cross-Referencing Logs Persistence: Place logs in NVM Low latency Few NVM accesses per log append Scalability Multiple logs (one per thread) Consistency Cross-log references 10

17 Cross-Referencing Logs Persistence: Place logs in NVM Low latency Few NVM accesses per log append Scalability Multiple logs (one per thread) Consistency Cross-log references 10

18 Cross-Referencing Logs Log entry: Key OP Applied Prev append consume 11

19 Cross-Referencing Logs Log entry: Key OP Applied Prev Key 1, OP: put A append consume 11

20 Cross-Referencing Logs Log entry: Key OP Applied Prev Key 1, OP: put A append Key 2, OP: put B consume 11

21 Cross-Referencing Logs Log entry: Key OP Applied Prev Key 2, OP: delete Key 1, OP: put A append Key 2, OP: put B consume 11

22 Cross-Referencing Logs Log entry: Key OP Applied Prev Key 2, OP: delete Key 1, OP: put A append Key 2, OP: put B consume 11

23 Cross-Referencing Logs Log entry: Key OP Applied Prev Key 2, OP: delete Key 1, OP: put A append Key 2, OP: put B consume Key 3, OP: delete 11

24 Cross-Referencing Logs Log entry: Key OP Applied Prev Key 2, OP: delete Key 1, OP: put A append Key 2, OP: put B consume Key 2, OP: put C Key 3, OP: delete 11

25 Cross-Referencing Logs Log entry: Key OP Applied Prev Key 2, OP: delete Key 1, OP: put A append Key 2, OP: put B consume Key 2, OP: put C Key 3, OP: delete 11

26 Cross-Referencing Logs Log entry: Key OP Applied Prev Key 2, OP: delete Key 1, OP: put A append Key 2, OP: put B consume Key 2, OP: put C Key 3, OP: delete 12

27 Cross-Referencing Logs Log entry: Key OP Applied Prev Key 2, OP: delete Key 1, OP: put A append Key 2, OP: put B consume Key 2, OP: put C Key 3, OP: delete Applied, Pr: 13

28 Cross-Referencing Logs Log entry: Key OP Applied Prev Key 2, OP: delete Key 1, OP: put A append Key 2, OP: put B consume Key 2, OP: put C Key 3, OP: delete Applied, Pr: 14

29 Cross-Referencing Logs Log entry: Key OP Applied Prev Key 2, OP: delete Key 1, OP: put A append Key 2, OP: put B consume Key 2, OP: put C Key 3, OP: delete Applied, Pr: 15

30 Cross-Referencing Logs Log entry: Key OP Applied Prev Key 2, OP: delete Key 1, OP: put A append Key 2, OP: put B consume Key 2, OP: put C Key 3, OP: delete Applied, Pr: 16

31 Cross-Referencing Logs Log entry: Key OP Applied Prev Key 2, OP: delete Key 1, OP: put A append Key 2, OP: put B Applied, Pr: consume Key 2, OP: put C Key 3, OP: delete Applied, Pr: 17

32 Cross-Referencing Logs Log entry: Key OP Applied Prev Key 2, OP: delete Applied, Pr: Key 1, OP: put A append Key 2, OP: put B Applied, Pr: consume Key 2, OP: put C Key 3, OP: delete Applied, Pr: 18

33 Cross-Referencing Logs Log entry: Key OP Applied Prev Key 2, OP: delete Applied, Pr: Key 1, OP: put A append Key 2, OP: put B Applied, Pr: consume Key 2, OP: put C Applied, Pr: Key 3, OP: delete Applied, Pr: 19

34 Outline Cross-Referencing Logs: Getting the best of DRAM and NVM Bullet: Fast and persistent key-value store 20

35 Outline Cross-Referencing Logs: Getting the best of DRAM and NVM Bullet: Fast and persistent key-value store Basic design Optimizations 20

36 Outline Cross-Referencing Logs: Getting the best of DRAM and NVM Bullet: Fast and persistent key-value store Basic design Optimizations 21

37 Bullet K-V Store Volatile store (frontend) caches Persistent store (backend) NVM 22

38 Bullet K-V Store Volatile store (frontend) caches Persistent store (backend) NVM 23

39 Bullet K-V Store Volatile store (frontend) Persistent store (backend) caches 24

40 Experimental Setup 16 CPU cores, 512 GB DRAM Intel s NVM emulation platform Zipfian distribution of keys (YCSB) Measuring 99%ile latency Comparing against HiKV 25

41 Raw Hash Table Performance 26

42 Bullet K-V Store Volatile store (frontend) Persistent store (backend) caches 27

43 Bullet K-V Store Volatile store (frontend) Persistent store (backend) X-Ref Logs 28

44 Bullet K-V Store Volatile store (frontend) Writers Persistent store (backend) X-Ref Logs 29

45 Bullet K-V Store Volatile store (frontend) Writers Gleaners Persistent store (backend) X-Ref Logs 29

46 Latency in ns Bullet Performance Read-only volatile phash Throughput in ops/sec 30

47 Latency in ns Bullet Performance Read-only volatile phash hikv-ht Throughput in ops/sec 31

48 Latency in ns Bullet Performance Read-only volatile phash hikv-ht bullet-st Throughput in ops/sec 32

49 Latency in ns Bullet Performance % writes volatile phash hikv-ht bullet-st Throughput in ops/sec 33

50 Outline Cross-Referencing Logs: Getting the best of DRAM and NVM Bullet: Fast and persistent key-value store Basic design Optimizations 34

51 Optimization: Dynamic Worker Threads Volatile store (frontend) Writers Gleaners Persistent store (backend) X-Ref Logs 35

52 Optimization: Dynamic Worker Threads Volatile store (frontend) Writers Gleaners Persistent store (backend) X-Ref Logs 36

53 Read-Heavy Workload Volatile store (frontend) Writers Gleaners Persistent store (backend) X-Ref Logs 37

54 Write-Heavy Workload Volatile store (frontend) Writers Gleaners Persistent store (backend) X-Ref Logs 38

55 Optimization: Overwriting Log entry: Key OP Applied Prev Key 2, OP: delete Key 1, OP: put A append Key 2, OP: put B consume Key 2, OP: put C Key 3, OP: delete Applied, Pr: 39

56 Optimization: Overwriting Log entry: Key OP Applied Prev Key 2, OP: delete Key 1, OP: put A append Key 2, OP: put B consume Key 2, OP: put C Key 3, OP: delete Applied, Pr: 40

57 Optimization: Overwriting Log entry: Key OP Applied Prev Key 2, OP: delete Key 1, OP: put A append Key 2, OP: put B consume Key 2, OP: put C Key 3, OP: delete Applied, Pr: 41

58 Optimization: Overwriting Log entry: Key OP Applied Prev Key 2, OP: delete Key 1, OP: put A append Key 2, OP: put B consume Key 2, OP: put C Key 3, OP: delete Applied, Pr: 42

59 Optimization: Overwriting Log entry: Key OP Applied Prev append IGNORE Key 2, OP: delete Key 1, OP: put A IGNORE Key 2, OP: put B consume Key 2, OP: put C Key 3, OP: delete Applied, Pr: 43

60 Optimization: Overwriting Volatile store (frontend) Writers Gleaners Persistent store (backend) X-Ref Logs 44

61 Latency in ns Bullet Performance Read-only volatile phash hikv-ht bullet-st Throughput in ops/sec 45

62 Latency in ns Bullet Performance Read-only volatile phash hikv-ht bullet-st bullet-dyn-ovr Throughput in ops/sec 46

63 Latency in ns Bullet Performance % writes volatile phash hikv-ht bullet-st bullet-dyn-ovr Throughput in ops/sec 47

64 Latency in ns Bullet Performance % writes volatile phash hikv-ht bullet-st bullet-dyn-ovr Throughput in ops/sec 48

65 Summary DRAM cache for NVM DRAM frontend, NVM backend Novel caching scheme: Cross-referencing logs 49

66 Summary DRAM cache for NVM DRAM frontend, NVM backend Novel caching scheme: Cross-referencing logs Bullet K-V store Almost identical hash tables in DRAM and NVM DRAM performance for reads Substantial latency improvements for writes 49

67 Summary DRAM cache for NVM DRAM frontend, NVM backend Novel caching scheme: Cross-referencing logs Bullet K-V store Almost identical hash tables in DRAM and NVM DRAM performance for reads Substantial latency improvements for writes More details in the paper! 49

68 Backup Slides 50

69 Optimization Summary Dynamic worker threads Overwrites Non-blocking reads (see paper) Backend access off critical path (see paper) 51

70 Bullet Performance Read-only 52

71 Bullet Performance 2% writes 53

72 Bullet Performance 15% writes 54

73 Bullet Performance 50% writes 55

74 Bullet Performance (end-to-end) Read-only 56

75 Bullet Performance (end-to-end) 2% writes 57

76 Bullet Performance (end-to-end) 15% writes 58

77 Bullet Performance (end-to-end) 50% writes 59

78 Latency Distribution Reads 60

79 Latency Distribution Writes 61

80 Log Size Senzitivity 62

81 Experimental Setup 16 CPU cores 512 GB DRAM NVM simulated using DRAM 384 GB (out of 512GB) 300ns load (2x DRAM) 100ns persist barrier Same throughput DRAM cache fits all data Zipfian distribution of keys (YCSB) 50M K/V pairs, 16 B keys, 100 B values Measuring 99%ile latency 63

82 Dynamic Worker Thread Switching 64

83 Cross-Referencing Logs Persistence: Circular buffers in NVM Low latency Fast appends (3 persist barriers) Scalability Multiple logs, stall only when all full Coarse-grained synchronization Epoch-based space reclamation Consistency Cross-log references 65

84 Cross-Referencing Logs Summary Persistent circular buffers (in NVM) Cross-log pointers for consistency Coarse-grained synchronization Fast appends Epoch-based space reclamation 66

85 Latency in ns Read-only volatile phash hikv-ht bullet-st bullet-dyn bullet-dyn-ovr Throughput in ops/sec 67

Big and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant

Big and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant Big and Fast Anti-Caching in OLTP Systems Justin DeBrabant Online Transaction Processing transaction-oriented small footprint write-intensive 2 A bit of history 3 OLTP Through the Years relational model