Tolerating Latency in Replicated State Machines through Client Speculation April 22, 2009 1, James Cowling 2, Edmund B. Nightingale 3, Peter M. Chen 1, Jason Flinn 1, Barbara Liskov 2 University of Michigan 1, MIT CSAIL 2, Microsoft Research 3
Simple Service Configuration 1 ++x x=1 2
Replicated State Machines (RSM) 2 ++x x=1 x=2 2 ++x x=1 x=2 Agree on request 2 ++x x=1 x=2 All non-faulty replies are identical 2 ++x x=1 x=2 3
RSMs have high latency 2 1. Need many replies 2. Agreement 3. Geographic Distribution 4
Hide the Latency Use speculative execution inside RSM Speculate before consensus is reached Without faults, any reply predicts consensus value Let client continue after receiving one reply 5
Overview Introduction Improving RSMs with speculation Application to PBFT Performance Conclusion 6
Speculative Execution in RSM Take Checkpoint Blocked Predict: 1 Speculate! Commit x=1 x=1 Continue processing while waiting 7
Critical path: first reply 1 1 Completion latency less relevant First reply latency sets critical path Speed Accuracy Other desirable properties Throughput Stability under contention Smaller number of replicas 8
Requests while speculative while!check_lottery(): submit_tps() buy_corvette() Predict win? = yes yes 1. Hold request Bad performance buy 2. Distributed commit/rollback State tracking complex win? What do we do with this? 9
Resolve speculations on the replicas while!check_lottery(): submit_tps() buy_corvette() Predict win? = yes win? = yes yes yes win? if win?=yes: buy Explicitly encode dependencies as predicates No special request handling needed Replicas need to log past replies Local decision at replicas matches client 10
Overview Introduction Improving RSMs with speculation Application to PBFT Performance Conclusion 11
Practical BFT-CS [Castro and Liskov 1999] client primary f=1 12
Additional Details Tentative execution PBFT/PBFT-CS complete in 4 phases Read-only optimization Accurate answer from backup replica Failure threshold Bound worst-case failure Correctness 13
Overview Introduction Improving RSMs with speculation Application to PBFT Performance Conclusion 14
Benchmarks Shared counter Simple checkpoint No computation NFS: Apache httpd build Complex checkpoint Significant computation 15
Topology 1. Primary-local 2. Primary-remote 3. Uniform Primary 2.5 or 15 ms 16
Base case: no replication 1. Primary-local 2. Primary-remote 3. Uniform 2.5 or 15 ms 17
Run Time (sec) Shared Counter 120 100 Primary-local topology 80 60 40 PBFT PBFT-CS No replication 20 0 0 5 10 15 Network Delay (ms) 18
Run Time (sec) Shared Counter 120 100 Primary-local topology 80 60 40 20 PBFT PBFT-CS No replication Zyzzyva [Kotla et al. 07] 0 0 5 10 15 Network Delay (ms) 19
Run Time (sec) Shared Counter 120 100 Uniform & Primary-remote topology 80 60 40 PBFT PBFT-CS No replication 20 0 0 5 10 15 Network Delay (ms) 20
Run Time (sec) Shared Counter 120 100 Uniform & Primary-remote topology 80 60 40 20 PBFT PBFT-CS No replication Zyzzyva 0 0 5 10 15 Network Delay (ms) 21
Run Time (min) NFS: Apache build 35 30 Primary-local topology 25 20 15 10 PBFT PBFT-CS No replication 5 0 0 5 10 15 Network Delay (ms) 22
Run Time (min) NFS: Apache build 35 30 Uniform topology 25 20 15 10 PBFT PBFT-CS No replication 5 0 0 5 10 15 Network Delay (ms) 23
Run Time (min) NFS: Apache build 35 30 Primary-remote topology 25 20 15 10 PBFT PBFT-CS No replication 5 0 0 5 10 15 Network Delay (ms) 24
Run Time (min) NFS: With Failure 35 30 Primary-local topology 25 20 15 10 5 PBFT PBFT-CS No replication PBFT-CS (1% fail) 0 0 5 10 15 Network Delay (ms) 25
KOps/sec Throughput (Shared Counter) 70 60 50 LAN topology 40 30 20 PBFT PBFT-CS Zyzzyva 10 0 1 10 100 Number of Clients 26
Conclusion Integrate client speculation within RSMs Predicated requests: performance without complexity Clients less sensitive to latency between replicas 5x speedup over non-speculative protocol Makes WAN deployments more practical 27