All about Eve: Execute-Verify Replication for Multi-Core Servers

Size: px

Start display at page:

Download "All about Eve: Execute-Verify Replication for Multi-Core Servers"

Posy Sparks
6 years ago
Views:

1 All about Eve: Execute-Verify Replication for Multi-Core Servers Manos Kapritsos, Yang Wang, Vivien Quema, Allen Clement, Lorenzo Alvisi, Mike Dahlin

2 Dependability Multi-core Databases Key-value stores Coordination & locking File servers

3 Dependability Multi-core

4 How do we build dependable multithreaded services? Answer: State Machine Replication

5 STATE MACHINE REPLICATION (SMR) Ingredients: a service.implement service as a deterministic state machine. Replicate. Provide all replicas with the same input input Server Guarantee: all correct replicas will produce the same output

6 SMR IMPLEMENTATION Server Agree Server Server

7 How do we build dependable multithreaded services? Maybe use deterministic multithreading? Server Nope. Won t support modern replication protocols Server Server

8 How do we build dependable multithreaded services? Server Performance Server Dependability Server

9 Eve State machine replication with multithreaded execution

10 Outline Motivation Insight Mechanisms Architecture Evaluation

11 SMR requires replica convergence Agree Execute Agree-Execute enforces sequential execution

12 EXECUTE-VERIFY Execute Agree Verify First execute... (multithreaded and without agreeing on the order)...then verify (that replicas agree on the outcome)

13 ON CONVERGENCE Server token Commit YES Server token Commit YES Verify match? YES Server token Commit

14 ON DIVERGENCE Server token Repair NO Server token Repair NO Verify NO Server token Repair Repair: rollback and re-execute sequentially

15 Outline Motivation Insight Mechanisms Architecture Evaluation

16 Eve s logic at a glance Frequent if (converged) commit else repair divergence Uncommon. Make divergence uncommon. Detect divergence efficiently. Repair divergence efficiently

17 MAKING DIVERGENCE UNCOMMON 4 Server token if (converged) commit else repair divergence 4 Server token Idea: identify commutative requests Mixer: group together commutative requests Execute requests within a group in parallel 4 Server token Mixer is a hint, not an oracle

18 EXAMPLE: TPC-W MIXER Transaction Read tables Write tables getbestsellers docart dobuyconfirm item, author, order_line item customer, address shopping_cart_line, shopping_cart order_line, item, cc_xacts, shopping_cart_line frequent transactions of the TPC-W browsing workload

19 EFFICIENT DIVERGENCE DETECTION Need to compare application states & responses frequently token if (converged) commit else repair divergence } Merkle tree Application state

20 GROWING DETERMINISTIC MERKLE TREES Idea: postpone adding objects until token generation Ensure that all replicas add objects in the same order Requests are ordered: requestid Single thread per request: objectseqnumber (requestid,objectseqnumber): unique and sortable Optimization: leverage deterministic order of references

21 EFFICIENT DIVERGENCE REPAIR if (converged) commit else repair divergence Need to rollback application states after every divergence Application state Rollback Copy-on-write

22 if (converged) commit else repair divergence. Make divergence uncommon Mixer. Detect divergence efficiently Merkle tree. Repair divergence efficiently Copy-on-Write

23 Outline Motivation Insight Mechanisms Architecture Evaluation

24 Dependability Performance Independent execution Non-deterministic order of requests Replication of multithreaded services Bonus: mask concurrency bugs

25 MASKING CONCURRENCY BUGS Server token Server token Verify token Server

26 EXECUTE-VERIFY: AN ARCHITECTURAL CHANGE Arbitrary failures Crash failures Synchronous Asynchronous

27 CONFIGURATIONS Asynchronous BFT Synchronous primary-backup Execution Verification Primary Backup Tolerates arbitrary fault Tolerates omission fault

28 EVALUATION What is the performance benefit of Eve compared to traditional SMR systems? How does the quality of the mixer affect Eve s performance?

29 EXPERIMENTAL SETUP Emulab testbed deployment Execution replicas: 6 cores Applications H Database Engine (TPC-W benchmark) Key-value store (Microbenchmarks)

30 Application: H Database Engine Workload: TPC-W (browsing) Unreplicated Throughput (requests/sec) Eve(primary-backup) Eve(BFT) Traditional SMR 6.5x 7.5x # execution threads

31 IMPACT OF THE MIXER Application: Key-value store Number of key-value pairs Determines available parallelism Mixer Quality False conflicts: misclassify non-conflicting requests as conflicting Reduces parallelism Undetected conflicts: misclassify conflicting requests as non-conflicting Can introduce divergence

32 FALSE CONFLICTS REDUCE THE AVAILABLE PARALLELISM 000 Throughput (requests/sec) key-value pairs 0 key-value pairs Traditional SMR (sequential) False conflicts (%)

33 UNDETECTED CONFLICTS CAUSE DIVERGENCE AND ROLLBACKS Throughput (requests/sec) key-value pairs 00 key-value pairs 0 key-value pairs Traditional SMR (sequential) Undetected conflicts (%) (log)

34 TPC-W EXPERIMENTS: NO ROLLBACKS OBSERVED Unreplicated Throughput (requests/sec) Eve(primary-backup) Eve(BFT) Traditional SMR 6.5x 7.5x # execution threads

35 CONCLUSION Replication and multithreading are not mutually exclusive Redesign replication: from agree-execute to execute-verify Execute Agree Verify

EECS 591 DISTRIBUTED SYSTEMS. Manos Kapritsos Fall 2018

EECS 591 DISTRIBUTED SYSTEMS. Manos Kapritsos Fall 2018 EECS 591 DISTRIBUTED SYSTEMS Manos Kapritsos Fall 2018 THE GENERAL IDEA Replicas A Primary A 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 PBFT: NORMAL OPERATION Three phases: Pre-prepare Prepare Commit assigns sequence