Rocksteady: Fast Migration for Low-Latency In-memory Storage. Chinmay Kulkarni, Aniraj Kesavan, Tian Zhang, Robert Ricci, Ryan Stutsman

Size: px

Start display at page:

Download "Rocksteady: Fast Migration for Low-Latency In-memory Storage. Chinmay Kulkarni, Aniraj Kesavan, Tian Zhang, Robert Ricci, Ryan Stutsman"

Sheena Gilmore
6 years ago
Views:

1 Rocksteady: Fast Migration for Low-Latency In-memory Storage Chinmay Kulkarni, niraj Kesavan, Tian Zhang, Robert Ricci, Ryan Stutsman 1

2 Introduction Distributed low-latency in-memory key-value stores are emerging Predictable response times ~10 µs median, ~60 µs 99.9 th -tile Problem: Must migrate data between servers Minimize performance impact of migration go slow? Quickly respond to hot spots, skew shifts, load spikes go fast? Solution: Fast data migration with low impact Early ownership transfer of data, leverage workload skew Low priority, parallel and adaptive migration Result: Migration protocol for RMCloud in-memory key-value store Migrates 256 G in 6 minutes, 99.9 th -tile latency less than 250 µs Median latency recovers from 40 µs to 20 µs in 14 s 2

3 Why Migrate Data? Client 1 Client 2 multiget( ) multiget( C D ) 6 Million C D Server 1 Server 2 No Locality Fanout=7 Poor spatial locality High multiget() fan-out More RPCs 3

4 Migrate To Improve Spatial Locality Client 1 Client 2 multiget( ) multiget( C D ) 6 Million C D Server 1 Server 2 C No Locality Fanout=7 4

5 Spatial Locality Improves Throughput Client 1 Client 2 multiget( ) multiget( C D ) 25 Million 6 Million C D Server 1 Server 2 Full Locality Fanout=1 No Locality Fanout=7 etter spatial locality Fewer RPCs Higher throughput enefits multiget(), range scans 5

6 The RMCloud Key-Value Store Client Client Client Client Coordinator Data Center Fabric ll Data in RM Kernel ypass/ DPDK 10 µs reads Master Master Master Master 6

7 The RMCloud Key-Value Store Client Client Client Client Write RPC Coordinator Data Center Fabric Master Master Master Master 7

8 The RMCloud Key-Value Store Client Client Client Client Write RPC Coordinator Data Center Fabric 1x in DRM 3x on Disk Master Master Master Master 8

9 Fault-tolerance & Recovery In RMCloud Client Client Client Client Coordinator Data Center Fabric Master Master Master Master 9

10 Fault-tolerance & Recovery In RMCloud Client Client Client Client Coordinator Data Center Fabric 2 seconds to recover Master Master Master Master 10

11 Performance Goals For Migration Maintain low access latency 10 µsec median latency System extremely sensitive Tail latency matters at scale Even more sensitive Migrate data fast Workloads dynamic Respond quickly Growing DRM storage: 512 G per server Slow data migration Entire day to scale cluster 11

12 Rocksteady Overview: Early Ownership Transfer Problem: Loaded source can bottleneck migration Solution: Instantly shift ownership and all load to target Client 1 Client 2 Client 3 Client 4 Reads and Writes Source Server Target Server 12

13 Rocksteady Overview: Early Ownership Transfer Problem: Loaded source can bottleneck migration Solution: Instantly shift ownership and all load to target Client 1 Client 2 Client 3 Client 4 Instantly Redirected ll future operations serviced at Target Source Server Target Server Creates headroom to speed migration 13

14 Rocksteady Overview: Leverage Skew Problem: Data has not arrived at source yet Solution: On demand migration of unavailable data Client 1 Client 2 Client 3 Client 4 Read On-demand Pull Source Server Target Server 14

15 Rocksteady Overview: Leverage Skew Problem: Data has not arrived at source yet Solution: On demand migration of unavailable data Client 1 Client 2 Client 3 Client 4 Read Hot keys move early Source Server Target Server Median Latency recovers to 20 µs in 14 s 15

16 Rocksteady Overview: daptive and Parallel Problem: Old single-threaded protocol limited to 130 M/s Solution: Pipelined and parallel at source and target Client 1 Client 2 Client 3 Client 4 On-demand Pull Source Server Parallel Pulls Target Server 16

17 Rocksteady Overview: daptive and Parallel Problem: Old single-threaded protocol limited to 130 M/s Solution: Pipelined and parallel at source and target Client 1 Client 2 Client 3 Client 4 On-demand Pull Target Driven Yields to On-demand Pulls Source Server Parallel Pulls Target Server Moves 758 M/s 17

18 Rocksteady Overview: Eliminate Sync Replication Problem: Synchronous replication bottleneck at target Solution: Safely defer replication until after migration Client 1 Client 2 Client 3 Client 4 On-demand Pull Replication Source Server Parallel Pulls Target Server 18

19 Rocksteady Overview: Eliminate Sync Replication Problem: Synchronous replication bottleneck at target Solution: Safely defer replication until after migration Client 1 Client 2 Client 3 Client 4 Replication Source Server Target Server 19

20 Rocksteady: Putting it all together Instantaneous ownership transfer Immediate load reduction at overloaded source Creates headroom for migration work Leverage skew to rapidly migrate hot data Target comes up to speed with little data movement daptive parallel, pipelined at source and target ll cores avoid stalls, but yield to client-facing operations Safely defer replication at target Eliminates replication bottleneck and contention 20

21 Rocksteady Instantaneous ownership transfer Leverage skew to rapidly migrate hot data daptive parallel, pipelined at source and target Safely defer synchronous replication at target 21

22 Evaluation Setup Client YCS- (95/5) Skew=0.99 Client YCS- (95/5) Skew=0.99 Client YCS- (95/5) Skew=0.99 Client YCS- (95/5) Skew= Million Records 45 G Source Server Target Server 22

23 Evaluation Setup Client YCS- (95/5) Skew=0.99 Client YCS- (95/5) Skew=0.99 Client YCS- (95/5) Skew=0.99 Client YCS- (95/5) Skew= Million Records 22.5 G 150 Million Records 22.5 G 150 Million Records 22.5 G Target Server 23

24 Instantaneous Ownership Transfer Source CPU Utilization 80% 25% Created 55% Source CPU Headroom efore Ownership Transfer Immediately fter Transfer efore migration: Source over-loaded, Target under-loaded Ownership transfer creates Source headroom for migration 24

25 Rocksteady Instantaneous ownership transfer Leverage skew to rapidly migrate hot data daptive parallel, pipelined at source and target Safely defer synchronous replication at target 25

26 Leverage Skew To Move Hot Data 99.9th Latency Median Latency efore Migration: 240µs 245µs 75µs 155µs 28µs 17µs Median=10 µs 99.9 th = 60 µs Uniform (Low) Skew=0.99 Skew=1.5 (High) fter ownership transfer, hot keys pulled on-demand More skew Median restored faster (migrate fewer hot keys) 26

27 Rocksteady Instantaneous ownership transfer Leverage skew to rapidly migrate hot data daptive parallel, pipelined at source and target Safely defer synchronous replication at target 27

28 Parallel, Pipelined, & daptive Pulls Target Hash Table Per-Core uffers Dispatch Core NIC Worker Cores Polling replay replay read() Migration Manager Pull uffers pulling Target driven, migration manager Co-partitioned hash tables, pull from partitions in parallel Replay pulled data into per-core buffers 28

29 Parallel, Pipelined, & daptive Pulls Source Hash Table Copy ddresses Dispatch Core NIC Polling read() pull(11) Gather List pull(17) Gather List Stateless passive Source Granular 20 K pulls 29

30 Parallel, Pipelined, & daptive Pulls Redirect any idle CPU for migration Migration yields to regular requests, on-demand pulls 30

31 Rocksteady Instantaneous ownership transfer Leverage skew to rapidly migrate hot data daptive parallel, pipelined at source and target Safely defer synchronous replication at target 31

32 Naïve Fault Tolerance During Migration Each server has a recovery log distributed across the cluster Source Target C Source Recovery Log C C C Target Recovery Log 32

33 Naïve Fault Tolerance During Migration Migrated data needs to be triplicated to target s recovery log Source Target C Source Recovery Log C C C Target Recovery Log 33

34 Naïve Fault Tolerance During Migration Migrated data needs to be triplicated to target s recovery log Source Target C Source Recovery Log C C C Target Recovery Log 34

35 Synchronous Replication ottlenecks Migration Synchronous replication hits migration speed by 34% Source C Target Source Recovery Log C C C Target Recovery Log 35

36 Rocksteady: Safely Defer Replication t The Target Replicate at Target only after all data has been moved over Source C Target C Source Recovery Log C C C Target Recovery Log 36

37 Writes/Mutations Served y Target Mutations have to be replicated by the target Write Source C Target C Source Recovery Log C C C Target Recovery Log C C C 37

38 Crash Safety During Migration Need both Source and Target recovery log for data recovery Initial table state on Source recovery log Writes/Mutations on Target recovery log Transfer ownership back to Source in case of crash Migration cancelled Recovery involves both recovery logs Source takes a dependency on Target recovery log at migration start Stored reliably at the cluster coordinator Identifies position after which mutations present 38

39 If The Source Crashes During Migration Recover Source, recover from Target recovery log Source C Target C Source Recovery Log C C C Target Recovery Log C C C 39

40 If The Target Crashes During Migration Recover from Source and Target recovery log, recover Target Source C Target C Source Recovery Log C C C Target Recovery Log C C C 40

41 Crash Safety During Migration Need both Source and Target recovery log for data recovery Initial table state on Source recovery log Writes/Mutations on Target recovery log Safely Transfer Ownership t Migration Start Transfer ownership back to Source in case of crash Migration cancelled Recovery involves both recovery logs Safely Delay Replication Till ll Data Has een Moved Source takes a dependency on Target recovery log at migration start Stored reliably at the cluster coordinator Identifies position after which mutations present 41

42 Performance of Rocksteady YCS-, 300 Million objects (30 key, 100 value), migrate half Median ccess Latency (µs) Rocksteady Source Keeps Ownership + Sync Replication Time Since Experiment Start (Seconds) 42

43 Performance of Rocksteady YCS-, 300 Million objects (30 key, 100 value), migrate half Median latency better after ~14 seconds Median ccess Latency (µs) Rocksteady Source Keeps Ownership + Sync Replication Time Since Experiment Start (Seconds) 43

44 Performance of Rocksteady YCS-, 300 Million objects (30 key, 100 value), migrate half Median ccess Latency (µs) Median latency better after ~14 seconds 28% faster migration Rocksteady Source Keeps Ownership + Sync Replication Time Since Experiment Start (Seconds) 44

45 Related Work Dynamo: Pre-partition hash keys Spanner: pplications given control over locality (Directories) FaRM and DrTM: Re-use in-memory redundancy for migration Squall: Reconfiguration protocol for H-Store Early ownership transfer Paced background migration Fully partitioned, serial execution, no synchronization Each migration pull stalls execution Synchronous replication at the target 45

46 Conclusion Distributed low-latency in-memory key-value stores are emerging Predictable response times ~10 µs median, ~60 µs 99.9 th -tile Problem: Must migrate data between servers Minimize performance impact of migration go slow? Quickly respond to hot spots, skew shifts, load spikes go fast? Solution: Fast data migration with low impact Leverage skew: Transfer ownership before data, move hot data first Low priority, parallel and adaptive migration Result: Migration protocol for RMCloud in-memory key-value store Migrates at 758 Mps with 99.9 th -tile latency < 250 µs Source Code: 46

47 Slides 47

48 Rocksteady Tail Latency reakdown Rocksteady 48

49 Rocksteady Tail Latency reakdown Rocksteady Disabling parallel pulls brings tail latency down to 160 µsec 49

50 Rocksteady Tail Latency reakdown Rocksteady Disabling parallel pulls brings tail latency down to 160 µsec Synchronous on-demand pulls further brings tail latency down to 135 µsec 50

Rocksteady: Fast Migration for Low-latency In-memory Storage

Rocksteady: Fast Migration for Low-latency In-memory Storage Chinmay Kulkarni, Aniraj Kesavan, Tian Zhang, Robert Ricci, and Ryan Stutsman University of Utah ABSTRACT Scalable in-memory key-value stores