Replicated State Machine in Wide-area Networks
|
|
- Elaine Small
- 5 years ago
- Views:
Transcription
1 Replicated State Machine in Wide-area Networks Yanhua Mao CSE223A WI09 1
2 Building replicated state machine with consensus General approach to replicate stateful deterministic services Provide strong consistency E.g., FT file system for LANs Servers agree on the same sequence of command to execute Network x x y z Consensus to reach agreement E.g. Paxos, Fast Paxos. y x y z x y z z 2
3 Why wide-area? Reliability and availability To cope with colocation failure: fire etc. Consistency To coordinate wide-area applications Yahoo s Zookeeper, Google s Chubby Are existing protocols efficient in wide-area? If not, can we improve them? 3
4 More on wide-area Wide-area is more complex than simply not local area Distribution of servers LAN WAN Distribution of clients LAN {LxL}:Localarea system {L,W} WAN {WxL}:Clusterbased system {WxW}: Grid, SOA, mobile 4
5 Paxos vs. Fast Paxos in (W, L) Topology All servers in the same LAN Clients talk to the servers via WAN Sources of asynchrony Wide area network Client Operating system Observation Client High variance on delays WAN delay dominates WAN LAN Servers 5
6 Consensus speed Theoretically Counting the number of message delays to reach consensus Assuming each message take the same delay. Practically Message delay varies greatly WAN message delay maybe hundreds time larger than that of a LAN message Client Replica cmd cmd cmd result result result Replica Replica 6
7 Paxos Analysis Client cmd result Leader 1a 1b 2a Client WAN LAN Replica 1a 1b 2a Client Replica Paxos Servers Wide area delay for Paxos The message from the client to the leader 7
8 Fast Paxos Analysis Client Leader 1a 1b 2a ANY 2a ANY cmd cmd cmd result Replica 1a 1a 1b 1b 2a ANY result result cmd result Client WAN LAN Replica Client Replica Fast Paxos Quorum Size: 2f+1 Need 2f+1 replicas to receive the cmd from the client to make progress Wide area delay (f=1) The third fastest message from the client to the replicas Servers 8
9 Classic Paxos vs. Fast Paxos In (W, L) Paxos Fast Paxos Learning delay 3 2 Number of replicas 2f+1 3f+1 Quorum Size f+1 2f+1 Collision No Yes Question: Is Fast Paxos always faster in practice in collision-free runs? 9
10 Analysis Result Assumption WAN message delay dominates WAN message delay is IID and follows a continuous distribution Result Classic Paxos has a fixed probability (60%) of being faster (f=1) Classic Paxos does even better with larger f Intuition WAN dominates and has high variance 10
11 Quick Proof Given a wide area latency distribution Draw 5 variables (A-E) from the distribution A < B < C < D for Fast Paxos E for Paxos E s relative order to A-D ( 5 cases ) Each case has the same probability A < B < C < D E E E Paxos Faster E E Fast Paxos Faster 11
12 Summary Protocol performance is network settings sensitive Fast Paxos better in (L,L) Paxos beter in (W,L) and (W,W) How to design a protocol to have the best of both in a wide-range of network settings Run both Paxos and Fast Paxos concurrently Switch between Paxos and Fast Paxos when network condition changes 12
13 High performance replicated state machine for (W, W) systems Model A small number of n sites inter-connected by asynchronous wide-area network Up to f out of n servers can fail by crashing System load Possible variable and changing load from each site Network-bound (large request size) CPU-bound (small request size) Goals Low latency under low client load High throughput with high client load WAN 13
14 Paxos in steady state P 0 /Leader P 1 /Replica P 2 /Replica P 0 /Leader Clinet 0 sends x x P 0 proposes x x x proposeaccept learn Clinet 0 recv x x P 0 learns x fwd propose accept z z z P 2 fwd z z Client 2 sends z Client 2 recv z Instance 0 Instance 1 learn P 2 learns z Pros Simple and low message complexity (3n-3) Low latency at the leader (2 steps) Cons High latency at the non-leader replicas (4 steps) Not load balanced: bottleneck at the leader or the leader s links 14
15 Fast Paxos in steady state x P 0 /Replica P 1 /Replica P 2 /Replica x P 0 proposes x P 0 learns x x x propose propose accept x propose accept P 3 /Replica z z P 3 proposes z z P 3 learns z z Pros Load balanced Instance 0 Instance 1 Low latency at all servers under no contention(2 steps) Cons Collision results in additional latency under contention High message complexity (n 2-1) More replicas needed (3f+1) 15
16 Paxos vs. Fast Paxos in (W, W) system Paxos Fast Paxos New Alg? WAN learning latency Leader: 2 Non-leader: 4 2 (when no collision) 2 Collision No Yes No Number of replicas 2f+1 3f+1 2f+1 WAN msg Leader: 3n-3 n 2-1 3n-3 complexity Non-Leader: 3n-2 Load balancing No Yes Yes Neither algorithm is perfect, but Paxos is better. Question: can we achieve best of both with a new algorithm? 16
17 Deriving Mencius from Paxos Approach Rotating leader A variant of consensus Optimizations Advantages Ensure safety Safety is easy when derived from existing protocol Flexible design Others may derive their own protocol 17
18 First step: Rotating the leader Each instance of consensus is assigned to a coordinator server The coordinator is the default leader of that instance Simple well-known assignment: e.g. round-robin A server proposes client requests immediately to the next available instance it coordinates Don t have to wait for other servers: reduce latency A server only proposes client requests to instances it coordinates There will be no contention 18
19 Mencius Rule 1 A server p maintains its index I p, i.e., the next consensus instance it coordinates. Rule 1: Upon receiving a client request v, it immediately proposes v to instance I p and updates its index accordingly. 19
20 Benefits of rotating the leader All servers can now propose requests directly Potentially low latency at all sites Load balancing at the servers Higher throughput under CPU-bound client load Balanced communication pattern Higher throughput under network-bound load Leader fwd, accept propose, learn propose, learn fwd, accept prop, acc, learn prop, acc, learn prop, acc, learn prop, acc, learn prop, acc, learn prop, acc, learn Paxos Mencius 20
21 Ensure liveness when servers have different load Rule 1 only works well when all the servers have the same load Servers may observe different client loads Servers can skip their turns (propose no-ops) w/o skipping x 1 y 1 z 1 y 2 z 2 z w/ skipping x y z z y z no-op no-op no-op
22 Mencius Rule 2 Rule 2: If server p receives a propose message with some value v other than no-op for instance i and i>i p, before accepting the v and sending back an accept message, p updates its index I p to be greater than i and proposes no-ops for each instance in range [I p, I p ) that p coordinates, where I p is p s new index. 22
23 Proposing no-ops is costly Consider the case where only one server proposes requests It takes 3n(n-1) messages to get one value v chosen 3(n-1) for choosing v 3(n-1) 2 for n-1 no-ops Solution Use a simpler abstraction than consensus 23
24 Simple consensus for efficiency Simple consensus Coordinator: can propose either a client request or a no-op Non-coordinator: only no-op Benefits no-op can be learned in one message delay if the coordinator skips (proposes a no-op) Easy to piggyback no-ops to improve efficiency: essentially no extra cost 24
25 Coordinated Paxos Starting state The coordinator is the default leader Start from state as if phase one is done by the coordinator Suggest The coordinator propose a request by running phase 2 Skip The coordinator proposes a no-op Fast learning for skipping Revoke A replica start the full phase 1 and 2 and propose a no-op Only needed when the coordinator is detected as having failed 25
26 Skips with simple consensus Replica 0 x y 2a 3a 2a 3a Replica 1 2a 3a 2a 3a Replica 2 Replica 0 x y 2a 3a 2a skip 3a Replica 1 2a 3a 2a skip 3a Replica 2 skip x no-op no-op y Pros: Simple and correct protocol Cons: High Message complexity: (n-1)(n+2) 26
27 Reduce message complexity Replica 0 x y 2a 3a 2a skip 3a Replica 1 2a 3a 2a skip skip 3a Replica 2 Replica 0 Replica 1 x 2a 2a 3a 3a y 2a 2a +skip +skip 3a 3a Replica 0 s view: x no-op no-op y Replica Idea: piggyback skip to messages Can piggyback to future 2a messages too No longer need to broadcast 27
28 Mencius Optimization 1 & 2 When a server p receives a suggest message from server q. Let r be a server other than p or q. Optimization 1: p does not send a separate skip message to q. Instead, p uses the accept message that replies the suggest to promise not to suggest any client request to instance smaller than i in the future. Optimization 2: p does not send a skip message to r immediately. Instead, p waits for a future suggest message from p to r to indicate that p has promised not to suggest any client requests to instance smaller than i. 28
29 Gap in idle replicas Potentially unbounded number of requests wait to be committed Only happens to idle replicas. Once the replica sends requests, the pending skips are resolved. Idle replicas can send skips periodically to accelerate resolution Replica 0 Replica 1 Replica 2 x no-op no-op y no-op no-op z x no-op y no-op z x no-op y no-op z
30 Mencius Accelerator 1 When a server p receives a suggest message from server q, let r be a server other than p or q. Accelerator 1: A server p propagates skip messages to r if the total number of outstanding skip messages to r is more than some constant α, or the message has been deferred for more than some time τ. 30
31 Failures Faulty processes cannot skip Leave gap in the requests sequence Replica 0 x y 2a 3a 2a skip 3a Replica 1 2a 3a 2a skip 3a Replica 2 Replica 0 x no-op y Replica 1 x no-op y
32 Mencius: Rule 3 Rule 3: Let q be a server that another server p suspects has failed, and let C q be the smallest instance that is coordinated by q and not learned by p, p revokes q for all instances in the range [C q, I p ] that q coordinates. 32
33 Imperfect failure detector A false suspicion may result no-op chosen, even if the coordinator is not crashed and has proposed some value v Solution: re-suggest v when the coordinator learns no-op What if false suspicion happens again? Keep trying? W is not enough to provide liveness One server can be permanently falsely suspected So, we require P instead Eventually there is no false suspicion, hence P guarantees eventual liveness 33
34 Mencius: Rule 4 Rule 4: If server p suggests a value v other than no-op to instance i, and p learns no-op is chosen, then p suggests v again. 34
35 Deal with failure Revoke: propose no-op on behalf of the faulty processes Problem: Full 3 phases of Paxos are costly Solution: revoke for large block
36 Deal with failure Revoke: propose no-op on behalf of the faulty processes Problem: Full 3 phases of Paxos are costly Solution: revoke for large block
37 Deal with failure Revoke: propose no-op on behalf of the faulty processes Problem: Full 3 phases of Paxos are costly Solution: revoke for large block x y x y x y x y x y x y x y x y x y x y
38 Deal with failure Revoke: propose no-op on behalf of the faulty processes Problem: Full 3 phases of Paxos are costly Solution: revoke for large block x y x y x y x y x y x y x y x y x y x y
39 Deal with failure Revoke: propose no-op on behalf of the faulty processes Problem: Full 3 phases of Paxos are costly Solution: revoke for large block x y x y x y x y x y x y x y x y x y x y x y x y x y x y x y x y x y x y x y x y
40 Mencius failure recovery P 2 may come back because of failure recovery or false suspicion Find out the next available slots it coordinates Start proposing request to that slot x y x y x y x y x y x y x y x y x y x y x y x y x y x y x y x y x y x y x y x y z P 2 recovers and propose request z to its next available slot 40
41 Mencius failure recovery P 2 may come back because of failure recovery or false suspicion Find out the next available slots it coordinates Start proposing request to that slot Other servers catch up with P 2 by skipping x y x y x y x y x y x y x y x y x y x y x y x y x y x y x y x y x y x y x y x y z P 0 and P 1 skip their turns upon receiving the request from P 2 41
42 Mencius: Optimization 3 Optimization 3: Let q be a server that another server p suspects to have failed, and let C q be the smallest instance that is coordinated by q and not learned by p. For some constant β, p revokes q for all instances in the range [C q, I p +2β] that q coordinates if C q <I p +β. 42
43 Mencius summary Rule 1 Suggest client request immediately to the next consensus instance a server coordinates Rule 2 Update index and skip unused instances when accepting a value Rule 3 Revoke suspected servers Rule 4 Re-suggest a client request after a false suspicion 43
44 Mencius summary continued Optimization 1 Piggyback skips to accept () messages Optimization 2 Piggyback skips to future suggest (2a) messages Accelerator 1 Put a bound before starting flush skip messages Optimization 3 Issue revocation in large blocks to amortize the cost 44
45 Mencius vs. Paxos and Fast Paxos in (W, W) system Paxos Fast Paxos New Mencius Alg? WAN learning latency Leader: 2 Non-leader: 4 2 (when no collision) 2* Collision No Yes No Number of replicas 2f+1 3f+1 2f+1 WAN msg Leader: 3n-3 n 2-1 3n-3 complexity Non-Leader: 3n-2 Load balancing No Yes Yes Mencius achieves the best of both Paxos and Fast Paxos 45
46 Delayed commit P 0 P 0 suggests x to instance 0 x P 0 learns and commits x x y P 0 learns and commits y P 2 y 2a P 2 suggests y to instance 2 2a x? no-op y 3a 3a y P 2 learns y x P 2 learns y and commits x, y x no-op y Delayed Commit Up to one round-trip extra delay Out-of-order commit reduces the extra delay when requests are commutable A benefit of using simple consensus 46
47 Experimental setup Mencius vs. Paxos A clique topology for three sites A simple replicated register service k total registers: Mencius-k ρ bytes dummy payload ρ = 4000 : Network-bound ρ = 0 : CPU-bound Out-of-order commit option Clients send requests at a fixed rate 50ms, 5~20Mbps 50ms, 5~20Mbps 50ms, 5~20Mbps 47
48 Latency vs. system load Network-bound load, 20Mbps for all links Latency (ms) Mencius Mencius-1024 Mencius-128 Mencius-16 Paxos ρ = Throughput (ops) Take away: Mencius has good latency Latency goes to infinitive when reaching maximum throughput Mencius s latency is as good as Paxos s under high contention Enable out-of-order commit reduces latency up to 30% 48
49 Throughput % improvement when network-bound Throughput (ops) Mencius Mencius-1024 Mencius-128 Mencius-16 Paxos x ρ = 4000 (Network-bound) y ρ = 0 (CPU-bound) Up-to 50% improvement when CPU-bounded Enabling out-of-order hurts throughput Take away: Mencius has good throughput 49
50 Throughput under failure Caused by delayed commits Throughput (ops) Paxos: crash follower 800 Paxos: crash leader Mencius 600 Temporary stalls when any server fails Still higher than Paxos Temporary stalls Caused by when leader fails duplicates Failure suspected Time (sec) Failure reported 50
51 Fault scalability Star topology: 10 Mbps link from each site to the central node Throughput (ops) sites Mencius Paxos 5 sites 7 sites ρ = 4000 (Network-bound) sites 5 sites 7 sites ρ = 0 (CPU-bound) Take away: Mencius scales better than Paxos 51
52 More evaluation results Lower latency at all servers under low contention True, even without out-of-order commit Adaptable to available bandwidth Adaptation happens automatically Easy to take advantage of all available bandwidth 52
53 Summary Mencius Rotating leader Paxos Easy to ensure safety and a flexible design Simple consensus The foundation of efficiency design High performance High throughput under high load Low latency under low load Low latency under medium to high load when enabling out-oforder commit Better load balancing Better scalability 53
54 References Flavio Junqueira, Yanhua Mao, Keith Marzullo "Classic Paxos vs. Fast Paxos: Caveat Emptor, HotDep 2007, Edinburgh, UK Yanhua Mao, Flavio Junqueira, Keith Marzullo "Mencius: Building Efficient Replicated State Machines for WANs, OSDI 2008, San Diego, CA, USA 54
Mencius: Another Paxos Variant???
Mencius: Another Paxos Variant??? Authors: Yanhua Mao, Flavio P. Junqueira, Keith Marzullo Presented by Isaiah Mayerchak & Yijia Liu State Machine Replication in WANs WAN = Wide Area Network Goals: Web
More informationMENCIUS: BUILDING EFFICIENT
MENCIUS: BUILDING EFFICIENT STATE MACHINE FOR WANS By: Yanhua Mao Flavio P. Junqueira Keith Marzullo Fabian Fuxa, Chun-Yu Hsiung November 14, 2018 AGENDA 1. Motivation 2. Breakthrough 3. Rules of Mencius
More informationHP: Hybrid Paxos for WANs
HP: Hybrid Paxos for WANs Dan Dobre, Matthias Majuntke, Marco Serafini and Neeraj Suri {dan,majuntke,marco,suri}@cs.tu-darmstadt.de TU Darmstadt, Germany Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded
More informationThere Is More Consensus in Egalitarian Parliaments
There Is More Consensus in Egalitarian Parliaments Iulian Moraru, David Andersen, Michael Kaminsky Carnegie Mellon University Intel Labs Fault tolerance Redundancy State Machine Replication 3 State Machine
More informationAGREEMENT PROTOCOLS. Paxos -a family of protocols for solving consensus
AGREEMENT PROTOCOLS Paxos -a family of protocols for solving consensus OUTLINE History of the Paxos algorithm Paxos Algorithm Family Implementation in existing systems References HISTORY OF THE PAXOS ALGORITHM
More informationHT-Paxos: High Throughput State-Machine Replication Protocol for Large Clustered Data Centers
1 HT-Paxos: High Throughput State-Machine Replication Protocol for Large Clustered Data Centers Vinit Kumar 1 and Ajay Agarwal 2 1 Associate Professor with the Krishna Engineering College, Ghaziabad, India.
More informationSDPaxos: Building Efficient Semi-Decentralized Geo-replicated State Machines
SDPaxos: Building Efficient Semi-Decentralized Geo-replicated State Machines Hanyu Zhao *, Quanlu Zhang, Zhi Yang *, Ming Wu, Yafei Dai * * Peking University Microsoft Research Replication for Fault Tolerance
More informationIntuitive distributed algorithms. with F#
Intuitive distributed algorithms with F# Natallia Dzenisenka Alena Hall @nata_dzen @lenadroid A tour of a variety of intuitivedistributed algorithms used in practical distributed systems. and how to prototype
More informationDistributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf
Distributed systems Lecture 6: distributed transactions, elections, consensus and replication Malte Schwarzkopf Last time Saw how we can build ordered multicast Messages between processes in a group Need
More informationTAPIR. By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton
TAPIR By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton Outline Problem Space Inconsistent Replication TAPIR Evaluation Conclusion Problem
More informationPractical Byzantine Fault Tolerance. Miguel Castro and Barbara Liskov
Practical Byzantine Fault Tolerance Miguel Castro and Barbara Liskov Outline 1. Introduction to Byzantine Fault Tolerance Problem 2. PBFT Algorithm a. Models and overview b. Three-phase protocol c. View-change
More informationDistributed Systems 11. Consensus. Paul Krzyzanowski
Distributed Systems 11. Consensus Paul Krzyzanowski pxk@cs.rutgers.edu 1 Consensus Goal Allow a group of processes to agree on a result All processes must agree on the same value The value must be one
More informationDistributed Systems Consensus
Distributed Systems Consensus Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 1 / 56 What is the Problem?
More informationConsensus and related problems
Consensus and related problems Today l Consensus l Google s Chubby l Paxos for Chubby Consensus and failures How to make process agree on a value after one or more have proposed what the value should be?
More informationTransactions. CS 475, Spring 2018 Concurrent & Distributed Systems
Transactions CS 475, Spring 2018 Concurrent & Distributed Systems Review: Transactions boolean transfermoney(person from, Person to, float amount){ if(from.balance >= amount) { from.balance = from.balance
More informationPARALLEL CONSENSUS PROTOCOL
CANOPUS: A SCALABLE AND MASSIVELY PARALLEL CONSENSUS PROTOCOL Bernard Wong CoNEXT 2017 Joint work with Sajjad Rizvi and Srinivasan Keshav CONSENSUS PROBLEM Agreement between a set of nodes in the presence
More informationGroup Replication: A Journey to the Group Communication Core. Alfranio Correia Principal Software Engineer
Group Replication: A Journey to the Group Communication Core Alfranio Correia (alfranio.correia@oracle.com) Principal Software Engineer 4th of February Copyright 7, Oracle and/or its affiliates. All rights
More informationReplication in Distributed Systems
Replication in Distributed Systems Replication Basics Multiple copies of data kept in different nodes A set of replicas holding copies of a data Nodes can be physically very close or distributed all over
More informationPaxos and Replication. Dan Ports, CSEP 552
Paxos and Replication Dan Ports, CSEP 552 Today: achieving consensus with Paxos and how to use this to build a replicated system Last week Scaling a web service using front-end caching but what about the
More informationMDCC MULTI DATA CENTER CONSISTENCY. amplab. Tim Kraska, Gene Pang, Michael Franklin, Samuel Madden, Alan Fekete
MDCC MULTI DATA CENTER CONSISTENCY Tim Kraska, Gene Pang, Michael Franklin, Samuel Madden, Alan Fekete gpang@cs.berkeley.edu amplab MOTIVATION 2 3 June 2, 200: Rackspace power outage of approximately 0
More informationFast Paxos Made Easy: Theory and Implementation
International Journal of Distributed Systems and Technologies, 6(1), 15-33, January-March 2015 15 Fast Paxos Made Easy: Theory and Implementation Wenbing Zhao, Department of Electrical and Computer Engineering,
More informationSpecPaxos. James Connolly && Harrison Davis
SpecPaxos James Connolly && Harrison Davis Overview Background Fast Paxos Traditional Paxos Implementations Data Centers Mostly-Ordered-Multicast Network layer Speculative Paxos Protocol Application layer
More informationBeyond FLP. Acknowledgement for presentation material. Chapter 8: Distributed Systems Principles and Paradigms: Tanenbaum and Van Steen
Beyond FLP Acknowledgement for presentation material Chapter 8: Distributed Systems Principles and Paradigms: Tanenbaum and Van Steen Paper trail blog: http://the-paper-trail.org/blog/consensus-protocols-paxos/
More informationRecovering from a Crash. Three-Phase Commit
Recovering from a Crash If INIT : abort locally and inform coordinator If Ready, contact another process Q and examine Q s state Lecture 18, page 23 Three-Phase Commit Two phase commit: problem if coordinator
More informationDesigning Distributed Systems using Approximate Synchrony in Data Center Networks
Designing Distributed Systems using Approximate Synchrony in Data Center Networks Dan R. K. Ports Jialin Li Naveen Kr. Sharma Vincent Liu Arvind Krishnamurthy University of Washington CSE Today s most
More informationDistributed Consensus: Making Impossible Possible
Distributed Consensus: Making Impossible Possible Heidi Howard PhD Student @ University of Cambridge heidi.howard@cl.cam.ac.uk @heidiann360 hh360.user.srcf.net Sometimes inconsistency is not an option
More informationConsensus a classic problem. Consensus, impossibility results and Paxos. Distributed Consensus. Asynchronous networks.
Consensus, impossibility results and Paxos Ken Birman Consensus a classic problem Consensus abstraction underlies many distributed systems and protocols N processes They start execution with inputs {0,1}
More informationLast time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson
Distributed systems Lecture 6: Elections, distributed transactions, and replication DrRobert N. M. Watson 1 Last time Saw how we can build ordered multicast Messages between processes in a group Need to
More informationIntroduction to Distributed Systems Seif Haridi
Introduction to Distributed Systems Seif Haridi haridi@kth.se What is a distributed system? A set of nodes, connected by a network, which appear to its users as a single coherent system p1 p2. pn send
More informationExploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout
Exploiting Commutativity For Practical Fast Replication Seo Jin Park and John Ousterhout Overview Problem: consistent replication adds latency and throughput overheads Why? Replication happens after ordering
More informationS-Paxos: Offloading the Leader for High Throughput State Machine Replication
212 31st International Symposium on Reliable Distributed Systems S-: Offloading the Leader for High Throughput State Machine Replication Martin Biely, Zarko Milosevic, Nuno Santos, André Schiper Ecole
More informationFailure models. Byzantine Fault Tolerance. What can go wrong? Paxos is fail-stop tolerant. BFT model. BFT replication 5/25/18
Failure models Byzantine Fault Tolerance Fail-stop: nodes either execute the protocol correctly or just stop Byzantine failures: nodes can behave in any arbitrary way Send illegal messages, try to trick
More informationCPS 512 midterm exam #1, 10/7/2016
CPS 512 midterm exam #1, 10/7/2016 Your name please: NetID: Answer all questions. Please attempt to confine your answers to the boxes provided. If you don t know the answer to a question, then just say
More informationDistributed Systems 8L for Part IB
Distributed Systems 8L for Part IB Handout 3 Dr. Steven Hand 1 Distributed Mutual Exclusion In first part of course, saw need to coordinate concurrent processes / threads In particular considered how to
More informationDistributed Systems. Before We Begin. Advantages. What is a Distributed System? CSE 120: Principles of Operating Systems. Lecture 13.
CSE 120: Principles of Operating Systems Lecture 13 Distributed Systems December 2, 2003 Before We Begin Read Chapters 15, 17 (on Distributed Systems topics) Prof. Joe Pasquale Department of Computer Science
More informationEfficient and Scalable Replication of Services over Wide-Area Networks
Efficient and Scalable Replication of Services over Wide-Area Networks Thesis by Abdallah Abouzamazem In Partial Fulfillment of the Requirements For the Degree of Doctor of Philosophy University of Newcastle
More informationDistributed systems. Consensus
Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory Consensus B A C 2 Consensus In the consensus problem, the processes propose values and have to agree on one among these
More informationDistributed Consensus Protocols
Distributed Consensus Protocols ABSTRACT In this paper, I compare Paxos, the most popular and influential of distributed consensus protocols, and Raft, a fairly new protocol that is considered to be a
More informationClock-RSM: Low-Latency Inter-Datacenter State Machine Replication Using Loosely Synchronized Physical Clocks (Technical Report)
: Low-Latency Inter-Datacenter State Machine Replication Using Loosely Synchronized Physical Clocks (Technical Report) Jiaqing Du, Daniele Sciascia, Sameh Elnikety, Willy Zwaenepoel, and Fernando Pedone
More informationZooKeeper Atomic Broadcast
ZooKeeper Atomic Broadcast The heart of the ZooKeeper coordination service Benjamin Reed, Flavio Junqueira Yahoo! Research ZooKeeper Service Transforms a request into an idempotent transaction Request
More informationBasic vs. Reliable Multicast
Basic vs. Reliable Multicast Basic multicast does not consider process crashes. Reliable multicast does. So far, we considered the basic versions of ordered multicasts. What about the reliable versions?
More informationJust Say NO to Paxos Overhead: Replacing Consensus with Network Ordering
Just Say NO to Paxos Overhead: Replacing Consensus with Network Ordering Jialin Li, Ellis Michael, Naveen Kr. Sharma, Adriana Szekeres, Dan R. K. Ports Server failures are the common case in data centers
More informationDistributed Algorithms. Partha Sarathi Mandal Department of Mathematics IIT Guwahati
Distributed Algorithms Partha Sarathi Mandal Department of Mathematics IIT Guwahati Thanks to Dr. Sukumar Ghosh for the slides Distributed Algorithms Distributed algorithms for various graph theoretic
More informationData Consistency and Blockchain. Bei Chun Zhou (BlockChainZ)
Data Consistency and Blockchain Bei Chun Zhou (BlockChainZ) beichunz@cn.ibm.com 1 Data Consistency Point-in-time consistency Transaction consistency Application consistency 2 Strong Consistency ACID Atomicity.
More informationToday: Fault Tolerance
Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing
More informationEMPIRICAL STUDY OF UNSTABLE LEADERS IN PAXOS LONG KAI THESIS
2013 Long Kai EMPIRICAL STUDY OF UNSTABLE LEADERS IN PAXOS BY LONG KAI THESIS Submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science in the Graduate
More informationVive La Différence: Paxos vs. Viewstamped Replication vs. Zab
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, MANUSCRIPT ID 1 Vive La Différence: Paxos vs. Viewstamped Replication vs. Zab Robbert van Renesse, Nicolas Schiper, Fred B. Schneider, Fellow, IEEE
More informationReplication. Feb 10, 2016 CPSC 416
Replication Feb 10, 2016 CPSC 416 How d we get here? Failures & single systems; fault tolerance techniques added redundancy (ECC memory, RAID, etc.) Conceptually, ECC & RAID both put a master in front
More informationSystems. Roland Kammerer. 10. November Institute of Computer Engineering Vienna University of Technology. Communication Protocols for Embedded
Communication Roland Institute of Computer Engineering Vienna University of Technology 10. November 2010 Overview 1. Definition of a protocol 2. Protocol properties 3. Basic Principles 4. system communication
More informationEnhancing Throughput of
Enhancing Throughput of NCA 2017 Zhongmiao Li, Peter Van Roy and Paolo Romano Enhancing Throughput of Partially Replicated State Machines via NCA 2017 Zhongmiao Li, Peter Van Roy and Paolo Romano Enhancing
More informationDesigning for Understandability: the Raft Consensus Algorithm. Diego Ongaro John Ousterhout Stanford University
Designing for Understandability: the Raft Consensus Algorithm Diego Ongaro John Ousterhout Stanford University Algorithms Should Be Designed For... Correctness? Efficiency? Conciseness? Understandability!
More informationGoogle Spanner - A Globally Distributed,
Google Spanner - A Globally Distributed, Synchronously-Replicated Database System James C. Corbett, et. al. Feb 14, 2013. Presented By Alexander Chow For CS 742 Motivation Eventually-consistent sometimes
More informationPractical Byzantine Fault Tolerance
Practical Byzantine Fault Tolerance Robert Grimm New York University (Partially based on notes by Eric Brewer and David Mazières) The Three Questions What is the problem? What is new or different? What
More informationDistributed Systems. 19. Fault Tolerance Paul Krzyzanowski. Rutgers University. Fall 2013
Distributed Systems 19. Fault Tolerance Paul Krzyzanowski Rutgers University Fall 2013 November 27, 2013 2013 Paul Krzyzanowski 1 Faults Deviation from expected behavior Due to a variety of factors: Hardware
More informationCoordinating distributed systems part II. Marko Vukolić Distributed Systems and Cloud Computing
Coordinating distributed systems part II Marko Vukolić Distributed Systems and Cloud Computing Last Time Coordinating distributed systems part I Zookeeper At the heart of Zookeeper is the ZAB atomic broadcast
More informationDistributed algorithms
Distributed algorithms Prof R. Guerraoui lpdwww.epfl.ch Exam: Written Reference: Book - Springer Verlag http://lpd.epfl.ch/site/education/da - Introduction to Reliable (and Secure) Distributed Programming
More informationCS 138: Practical Byzantine Consensus. CS 138 XX 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.
CS 138: Practical Byzantine Consensus CS 138 XX 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. Scenario Asynchronous system Signed messages s are state machines It has to be practical CS 138
More informationDistributed Systems. replication Johan Montelius ID2201. Distributed Systems ID2201
Distributed Systems ID2201 replication Johan Montelius 1 The problem The problem we have: servers might be unavailable The solution: keep duplicates at different servers 2 Building a fault-tolerant service
More informationLocal Recovery for High Availability in Strongly Consistent Cloud Services
IEEE TRANSACTION ON DEPENDABLE AND SECURE COMPUTING, VOL. X, NO. Y, JANUARY 2013 1 Local Recovery for High Availability in Strongly Consistent Cloud Services James W. Anderson, Hein Meling, Alexander Rasmussen,
More informationConsensus, impossibility results and Paxos. Ken Birman
Consensus, impossibility results and Paxos Ken Birman Consensus a classic problem Consensus abstraction underlies many distributed systems and protocols N processes They start execution with inputs {0,1}
More informationProviding Real-Time and Fault Tolerance for CORBA Applications
Providing Real-Time and Tolerance for CORBA Applications Priya Narasimhan Assistant Professor of ECE and CS University Pittsburgh, PA 15213-3890 Sponsored in part by the CMU-NASA High Dependability Computing
More informationPractice: Large Systems Part 2, Chapter 2
Practice: Large Systems Part 2, Chapter 2 Overvie Introduction Strong Consistency Crash Failures: Primary Copy, Commit Protocols Crash-Recovery Failures: Paxos, Chubby Byzantine Failures: PBFT, Zyzzyva
More informationTheoretical Computer Science
Theoretical Computer Science 496 (2013) 170 183 Contents lists available at SciVerse ScienceDirect Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs Optimizing Paxos with batching
More informationCoordination 1. To do. Mutual exclusion Election algorithms Next time: Global state. q q q
Coordination 1 To do q q q Mutual exclusion Election algorithms Next time: Global state Coordination and agreement in US Congress 1798-2015 Process coordination How can processes coordinate their action?
More informationBe General and Don t Give Up Consistency in Geo- Replicated Transactional Systems
Be General and Don t Give Up Consistency in Geo- Replicated Transactional Systems Alexandru Turcu, Sebastiano Peluso, Roberto Palmieri and Binoy Ravindran Replicated Transactional Systems DATA CONSISTENCY
More informationParsimonious Asynchronous Byzantine-Fault-Tolerant Atomic Broadcast
Parsimonious Asynchronous Byzantine-Fault-Tolerant Atomic Broadcast HariGovind V. Ramasamy Christian Cachin August 19, 2005 Abstract Atomic broadcast is a communication primitive that allows a group of
More informationToday: Fault Tolerance. Fault Tolerance
Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing
More informationAsynchronous Reconfiguration for Paxos State Machines
Asynchronous Reconfiguration for Paxos State Machines Leander Jehl and Hein Meling Department of Electrical Engineering and Computer Science University of Stavanger, Norway Abstract. This paper addresses
More informationATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases
ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases We talked about transactions and how to implement them in a single-node database. We ll now start looking into how to
More informationAgreement and Consensus. SWE 622, Spring 2017 Distributed Software Engineering
Agreement and Consensus SWE 622, Spring 2017 Distributed Software Engineering Today General agreement problems Fault tolerance limitations of 2PC 3PC Paxos + ZooKeeper 2 Midterm Recap 200 GMU SWE 622 Midterm
More informationCORFU: A Shared Log Design for Flash Clusters
CORFU: A Shared Log Design for Flash Clusters Authors: Mahesh Balakrishnan, Dahlia Malkhi, Vijayan Prabhakaran, Ted Wobber, Michael Wei, John D. Davis EECS 591 11/7/18 Presented by Evan Agattas and Fanzhong
More informationCS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.
Question 1 What makes a message unstable? How does an unstable message become stable? Distributed Systems 2016 Exam 2 Review Paul Krzyzanowski Rutgers University Fall 2016 In virtual sychrony, a message
More informationPaxos Made Live. An Engineering Perspective. Authors: Tushar Chandra, Robert Griesemer, Joshua Redstone. Presented By: Dipendra Kumar Jha
Paxos Made Live An Engineering Perspective Authors: Tushar Chandra, Robert Griesemer, Joshua Redstone Presented By: Dipendra Kumar Jha Consensus Algorithms Consensus: process of agreeing on one result
More informationPriya Narasimhan. Assistant Professor of ECE and CS Carnegie Mellon University Pittsburgh, PA
OMG Real-Time and Distributed Object Computing Workshop, July 2002, Arlington, VA Providing Real-Time and Fault Tolerance for CORBA Applications Priya Narasimhan Assistant Professor of ECE and CS Carnegie
More informationINF-5360 Presentation
INF-5360 Presentation Optimistic Replication Ali Ahmad April 29, 2013 Structure of presentation Pessimistic and optimistic replication Elements of Optimistic replication Eventual consistency Scheduling
More informationJPaxos: State machine replication based on the Paxos protocol
JPaxos: State machine replication based on the Paxos protocol Jan Kończak 2, Nuno Santos 1, Tomasz Żurkowski 2, Paweł T. Wojciechowski 2, and André Schiper 1 1 EPFL, Switzerland 2 Poznań University of
More informationFault-Tolerant Distributed Services and Paxos"
Fault-Tolerant Distributed Services and Paxos" INF346, 2015 2015 P. Kuznetsov and M. Vukolic So far " " Shared memory synchronization:" Wait-freedom and linearizability" Consensus and universality " Fine-grained
More informationOptimizing GIS Services: Scalability & Performance. Firdaus Asri
Optimizing GIS Services: Scalability & Performance Firdaus Asri Define Performance Performance The speed at which a given operation occurs E.g. Request response time measured in seconds Scalability The
More informationBuilding Consistent Transactions with Inconsistent Replication
Building Consistent Transactions with Inconsistent Replication Irene Zhang, Naveen Kr. Sharma, Adriana Szekeres, Arvind Krishnamurthy, Dan R. K. Ports University of Washington Distributed storage systems
More informationDistributed Systems. Day 9: Replication [Part 1]
Distributed Systems Day 9: Replication [Part 1] Hash table k 0 v 0 k 1 v 1 k 2 v 2 k 3 v 3 ll Facebook Data Does your client know about all of F s servers? Security issues? Performance issues? How do clients
More informationLow-Latency Multi-Datacenter Databases using Replicated Commit
Low-Latency Multi-Datacenter Databases using Replicated Commit Hatem Mahmoud, Faisal Nawab, Alexander Pucher, Divyakant Agrawal, Amr El Abbadi UCSB Presented by Ashutosh Dhekne Main Contributions Reduce
More informationCanopus: A Scalable and Massively Parallel Consensus Protocol
Canopus: A Scalable and Massively Parallel Consensus Protocol (Extended report) Sajjad Rizvi, Bernard Wong, Srinivasan Keshav Cheriton School of Computer Science University of Waterloo, 200 University
More informationLecture XII: Replication
Lecture XII: Replication CMPT 401 Summer 2007 Dr. Alexandra Fedorova Replication 2 Why Replicate? (I) Fault-tolerance / High availability As long as one replica is up, the service is available Assume each
More informationZyzzyva. Speculative Byzantine Fault Tolerance. Ramakrishna Kotla. L. Alvisi, M. Dahlin, A. Clement, E. Wong University of Texas at Austin
Zyzzyva Speculative Byzantine Fault Tolerance Ramakrishna Kotla L. Alvisi, M. Dahlin, A. Clement, E. Wong University of Texas at Austin The Goal Transform high-performance service into high-performance
More informationDistributed Consensus: Making Impossible Possible
Distributed Consensus: Making Impossible Possible QCon London Tuesday 29/3/2016 Heidi Howard PhD Student @ University of Cambridge heidi.howard@cl.cam.ac.uk @heidiann360 What is Consensus? The process
More informationState Machine Replication is More Expensive than Consensus
State Machine Replication is More Expensive than Consensus Karolos Antoniadis EPFL Rachid Guerraoui EPFL Dahlia Malkhi VMware Research Group Dragos-Adrian Seredinschi EPFL Abstract Consensus and State
More informationApplications of Paxos Algorithm
Applications of Paxos Algorithm Gurkan Solmaz COP 6938 - Cloud Computing - Fall 2012 Department of Electrical Engineering and Computer Science University of Central Florida - Orlando, FL Oct 15, 2012 1
More informationDistributed DNS Name server Backed by Raft
Distributed DNS Name server Backed by Raft Emre Orbay Gabbi Fisher December 13, 2017 Abstract We implement a asynchronous distributed DNS server backed by Raft. Our DNS server system supports large cluster
More informationDistributed Systems. Lec 9: Distributed File Systems NFS, AFS. Slide acks: Dave Andersen
Distributed Systems Lec 9: Distributed File Systems NFS, AFS Slide acks: Dave Andersen (http://www.cs.cmu.edu/~dga/15-440/f10/lectures/08-distfs1.pdf) 1 VFS and FUSE Primer Some have asked for some background
More informationComputer Network Fundamentals Spring Week 3 MAC Layer Andreas Terzis
Computer Network Fundamentals Spring 2008 Week 3 MAC Layer Andreas Terzis Outline MAC Protocols MAC Protocol Examples Channel Partitioning TDMA/FDMA Token Ring Random Access Protocols Aloha and Slotted
More informationPaxos Replicated State Machines as the Basis of a High- Performance Data Store
Paxos Replicated State Machines as the Basis of a High- Performance Data Store William J. Bolosky, Dexter Bradshaw, Randolph B. Haagens, Norbert P. Kusters and Peng Li March 30, 2011 Q: How to build a
More informationarxiv: v2 [cs.dc] 12 Sep 2017
Efficient Synchronous Byzantine Consensus Ittai Abraham 1, Srinivas Devadas 2, Danny Dolev 3, Kartik Nayak 4, and Ling Ren 2 arxiv:1704.02397v2 [cs.dc] 12 Sep 2017 1 VMware Research iabraham@vmware.com
More informationThe Google File System
October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single
More informationErasure Coding in Object Stores: Challenges and Opportunities
Erasure Coding in Object Stores: Challenges and Opportunities Lewis Tseng Boston College July 2018, PODC Acknowledgements Nancy Lynch Muriel Medard Kishori Konwar Prakash Narayana Moorthy Viveck R. Cadambe
More informationDistributed Systems. Fault Tolerance. Paul Krzyzanowski
Distributed Systems Fault Tolerance Paul Krzyzanowski Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License. Faults Deviation from expected
More informationPRIMARY-BACKUP REPLICATION
PRIMARY-BACKUP REPLICATION Primary Backup George Porter Nov 14, 2018 ATTRIBUTION These slides are released under an Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) Creative Commons
More informationComparison of Different Implementations of Paxos for Distributed Database Usage. Proposal By. Hanzi Li Shuang Su Xiaoyu Li
Comparison of Different Implementations of Paxos for Distributed Database Usage Proposal By Hanzi Li Shuang Su Xiaoyu Li 12/08/2015 COEN 283 Santa Clara University 1 Abstract This study aims to simulate
More informationDISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 1 Introduction Modified by: Dr. Ramzi Saifan Definition of a Distributed System (1) A distributed
More informationTolerating Latency in Replicated State Machines through Client Speculation
Tolerating Latency in Replicated State Machines through Client Speculation April 22, 2009 1, James Cowling 2, Edmund B. Nightingale 3, Peter M. Chen 1, Jason Flinn 1, Barbara Liskov 2 University of Michigan
More information