There Is More Consensus in Egalitarian Parliaments

Size: px
Start display at page:

Download "There Is More Consensus in Egalitarian Parliaments"

Transcription

1 There Is More Consensus in Egalitarian Parliaments Iulian Moraru, David Andersen, Michael Kaminsky Carnegie Mellon University Intel Labs

2 Fault tolerance Redundancy

3 State Machine Replication 3

4 State Machine Replication Execute the same commands in the same order 3

5 State Machine Replication Execute the same commands in the same order 3

6 State Machine Replication Execute the same commands in the same order Paxos 3

7 State Machine Replication Execute the same commands in the same order Paxos No external failure detector required 3

8 State Machine Replication Execute the same commands in the same order Paxos No external failure detector required Fast fail-over (high availability) 3

9 Paxos is important in clusters Chubby, Boxwood, SMARTER, ZooKeeper Synchronization Resource discovery Data replication High throughput High availability 4

10 Paxos is important in the wide-area Spanner, Megastore Bring data closer to clients Low latency Tolerate datacenter outages 5

11 Paxos is important in the wide-area Spanner, Megastore Bring data closer to clients Low latency Tolerate datacenter outages 5

12 Paxos is important in the wide-area Spanner, Megastore Bring data closer to clients Low latency Tolerate datacenter outages 130ms 85ms 5

13 Paxos overview 6

14 Paxos overview Agreement protocol 6

15 Paxos overview Agreement protocol Tolerates F failures with 2F+1 replicas (optimal) No external failure detector required 6

16 Paxos overview Agreement protocol Tolerates F failures with 2F+1 replicas (optimal) No external failure detector required Replicas can fail by crashing (non-byzantine) 6

17 Paxos overview Agreement protocol Tolerates F failures with 2F+1 replicas (optimal) No external failure detector required Replicas can fail by crashing (non-byzantine) Asynchronous communication 6

18 Using Paxos to order commands

19 Using Paxos to order commands C A B

20 Using Paxos to order commands C A B

21 Using Paxos to order commands vote C C A vote A B vote B

22 Using Paxos to order commands vote C C A vote A B vote B ack B ack B

23 Using Paxos to order commands C A B 7

24 Using Paxos to order commands C A B 7

25 Using Paxos to order commands C B A 7

26 Using Paxos to order commands C D B A 7

27 Using Paxos to order commands C D B A 7

28 Using Paxos to order commands B A D C 7

29 Using Paxos to order commands B A D C Choose commands independently for each slot 7

30 Using Paxos to order commands B A D C Choose commands independently for each slot At least 2 RTTs per slot: 7

31 Using Paxos to order commands B A D C Choose commands independently for each slot At least 2 RTTs per slot: 1. Take ownership of a slot 7

32 Using Paxos to order commands B A D C Choose commands independently for each slot At least 2 RTTs per slot: 1. Take ownership of a slot 2. Propose command 7

33 Multi-Paxos

34 Multi-Paxos

35 ABCD Multi-Paxos

36 Multi-Paxos BCD A 8

37 Multi-Paxos CD A B 8

38 Multi-Paxos D A B C 8

39 Multi-Paxos A B C D 8

40 Multi-Paxos A B C D 1 RTT to commit 8

41 Multi-Paxos A B C D 1 RTT to commit Bottleneck for performance and availabilty 8

42 Can we have it all? 9

43 Can we have it all? High throughput, low latency 9

44 Can we have it all? High throughput, low latency Constant availability 9

45 Can we have it all? High throughput, low latency Constant availability Distribute load evenly across all replicas 9

46 Can we have it all? High throughput, low latency Constant availability Distribute load evenly across all replicas Use fastest replicas 9

47 Can we have it all? High throughput, low latency Constant availability Distribute load evenly across all replicas Use fastest replicas Use closest (lowest latency) replicas 9

48 Can we have it all? High throughput, low latency Constant availability Paxos Distribute load evenly across all replicas Use fastest replicas Use closest (lowest latency) replicas 9

49 Can we have it all? High throughput, low latency Constant availability Multi- Paxos Distribute load evenly across all replicas Use fastest replicas Use closest (lowest latency) replicas 9

50 Can we have it all? High throughput, low latency Constant availability Multi- Paxos Distribute load evenly across all replicas Use fastest replicas Use closest (lowest latency) replicas Egalitarian Paxos (EPaxos) 9

51 EPaxos is all about ordering 10

52 EPaxos is all about ordering Previous strategies: 10

53 EPaxos is all about ordering Previous strategies: Contend for slots... 10

54 EPaxos is all about ordering Previous strategies: Contend for slots... Paxos 10

55 EPaxos is all about ordering Previous strategies: Contend for slots... Paxos One replica decides... 10

56 EPaxos is all about ordering Previous strategies: Contend for slots... Paxos One replica decides... Multi-Paxos, Fast Paxos, Generalized Paxos 10

57 EPaxos is all about ordering Previous strategies: Contend for slots... Paxos One replica decides... Multi-Paxos, Fast Paxos, Generalized Paxos Take turns round-robin 10...

58 EPaxos is all about ordering Previous strategies: Contend for slots... Paxos One replica decides... Multi-Paxos, Fast Paxos, Generalized Paxos Take turns round-robin... Mencius 10

59 EPaxos intuition 11

60 EPaxos intuition

61 EPaxos intuition A

62 EPaxos intuition C B A 11

63 EPaxos intuition D A C B 11

64 EPaxos intuition A C D B 11

65 EPaxos intuition A C D B 11

66 EPaxos intuition A C D B 11

67 EPaxos intuition A C D B 11

68 EPaxos intuition A C D B After each replica 11

69 EPaxos intuition A C D B After each replica A B C D 11

70 EPaxos intuition A C D B After each replica A B C D Load balance (every replica is a leader) 11

71 EPaxos intuition A C D B After each replica A B C D Load balance (every replica is a leader) EPaxos can choose any quorum for each command 11

72 EPaxos commit protocol R1 R2 R3 R4 R5 12

73 EPaxos commit protocol R1 R2 PreAccept(A) A R3 R4 R5 12

74 EPaxos commit protocol R1 R2 PreAccept(A) A A R3 R4 R5 12

75 EPaxos commit protocol R1 R2 PreAccept(A) A Commit A A R3 R4 R5 12

76 EPaxos commit protocol R1 R2 PreAccept(A) A Commit A A R3 R4 R5 12

77 EPaxos commit protocol R1 R2 PreAccept(A) A Commit A A R3 R4 R5 B PreAccept(B) 12

78 EPaxos commit protocol R1 R2 PreAccept(A) A Commit A A R3 R4 R5 B PreAccept(B) B 12

79 EPaxos commit protocol R1 R2 PreAccept(A) A Commit A A R3 R4 R5 B PreAccept(B) B {A} B 12

80 EPaxos commit protocol R1 R2 PreAccept(A) A Commit A A R3 R4 R5 B PreAccept(B) B {A} B B {A} Accept(B) 12

81 EPaxos commit protocol R1 R2 PreAccept(A) A Commit A A R3 R4 B {A} B B B {A} ACK R5 PreAccept(B) Accept(B) 12

82 EPaxos commit protocol R1 R2 PreAccept(A) A Commit A A R3 R4 B {A} B B B {A} ACK R5 PreAccept(B) Accept(B) Commit B {A} 12

83 EPaxos commit protocol R1 R2 PreAccept(A) A Commit A A R3 R4 B {A} B B B {A} ACK R5 PreAccept(B) Accept(B) Commit B {A} 12

84 EPaxos commit protocol R1 PreAccept(A) Commit A PreAccept(C) A A C {A} R2 R3 R4 B {A} B B B {A} ACK R5 PreAccept(B) Accept(B) Commit B {A} 12

85 EPaxos commit protocol R1 PreAccept(A) Commit A PreAccept(C) A A C {A} C {A, B} R2 R3 R4 B {A} B B B {A} ACK R5 PreAccept(B) Accept(B) Commit B {A} 12

86 EPaxos commit protocol R1 PreAccept(A) Commit A PreAccept(C) Commit C {A, B} A A C {A} C {A, B} R2 R3 R4 B {A} B B B {A} ACK R5 PreAccept(B) Accept(B) Commit B {A} 12

87 EPaxos commit protocol R1 PreAccept(A) Commit A PreAccept(C) Commit C {A, B} A A C {A} C {A, B} R2 R3 R4 B {A} B B B {A} ACK R5 PreAccept(B) Accept(B) Commit B {A} 12

88 Order only interfering commands 1 RTT Non-concurrent commands OR non-interfering commands 2 RTTs Concurrent AND interfering 13

89 Interference is application-specific 14

90 Interference is application-specific KV store Infer from operation key 14

91 Interference is application-specific KV store Infer from operation key Google App Engine Programmer-specified 14

92 Interference is application-specific KV store Infer from operation key Google App Engine Programmer-specified Relational databases Most transactions are simple, can be analyzed Few remaining transactions interfere w/ everything 14

93 Execution A B C D E 15

94 strongly-connected component Execution A B C D E 15

95 strongly-connected component Execution A B C D E 15

96 strongly-connected component Execution A B C D E 15

97 strongly-connected component A Execution 1 E B C 2 D D 3 B, C, A E 15

98 strongly-connected component A Execution 1 E B C 2 D D 3 B, C, A E Approximate sequence # order (Lamport clock) 15

99 EPaxos properties 16

100 EPaxos properties Linearizability: If A~B, and A committed before B proposed then A will be executed before B. 16

101 EPaxos properties Linearizability: If A~B, and A committed before B proposed then A will be executed before B. Fast-path quorum: F + F / 2 Optimal for 3 and 5 replicas Better than Fast / Generalized Paxos by 1 16

102 EPaxos properties Linearizability: If A~B, and A committed before B proposed then A will be executed before B. Fast-path quorum: F + F / 2 Optimal for 3 and 5 replicas Better than Fast / Generalized Paxos by 1 16

103 EPaxos properties Linearizability: If A~B, and A committed before B proposed then A will be executed before B. Fast-path quorum: F + F / 2 Optimal for 3 and 5 replicas Better than Fast / Generalized Paxos by 1 16

104 EPaxos properties Linearizability: If A~B, and A committed before B proposed then A will be executed before B. Fast-path quorum: F + F / 2 Optimal for 3 and 5 replicas Better than Fast / Generalized Paxos by 1 16

105 EPaxos properties Linearizability: If A~B, and A committed before B proposed then A will be executed before B. Fast-path quorum: F + F / 2 Optimal for 3 and 5 replicas Better than Fast / Generalized Paxos by 1 16

106 Results 17

107 Optimal wide-area commit latency 18

108 Optimal wide-area commit latency 18

109 Optimal wide-area commit latency 18

110 Optimal wide-area commit latency 18

111 Optimal wide-area commit latency 18

112 Optimal wide-area commit latency 18

113 Optimal wide-area commit latency 130ms 300ms 150ms 90ms 85ms 18

114 Optimal wide-area commit latency 130ms 300ms 150ms 90ms 85ms EPaxos Mencius Generalized Paxos Multi-Paxos (CA leader) Median Commit Latency [ms] 18

115 Optimal wide-area commit latency 130ms 300ms 150ms 90ms 85ms CA EPaxos Mencius Generalized Paxos Multi-Paxos (CA leader) Median Commit Latency [ms] 18

116 Optimal wide-area commit latency 130ms 300ms 150ms 90ms 85ms CA VA EPaxos Mencius Generalized Paxos Multi-Paxos (CA leader) Median Commit Latency [ms] 18

117 Optimal wide-area commit latency 130ms 300ms 150ms 90ms 85ms CA EPaxos Mencius VA OR Generalized Paxos Multi-Paxos (CA leader) Median Commit Latency [ms] 18

118 Optimal wide-area commit latency 130ms 300ms 150ms 90ms 85ms CA VA OR JP EPaxos Mencius Generalized Paxos Multi-Paxos (CA leader) Median Commit Latency [ms] 18

119 Optimal wide-area commit latency 130ms 300ms 150ms 90ms 85ms CA VA OR JP EU EPaxos Mencius Generalized Paxos Multi-Paxos (CA leader) Median Commit Latency [ms] 18

120 Optimal wide-area commit latency 300ms 130ms 150ms 90ms 85ms EPaxos: Optimal commit latency in wide-area CA VA for 3 and 5 replicas OR JP EU EPaxos Mencius Generalized Paxos Multi-Paxos (CA leader) Median Commit Latency [ms] 18

121 Higher + more stable throughput 5 replicas 3 replicas Multi-Paxos Mencius EPaxos 0% EPaxos 2% EPaxos 100% Operations / sec Operations / sec 19

122 Higher + more stable throughput 5 replicas 3 replicas Multi-Paxos Mencius EPaxos 0% EPaxos 2% EPaxos 100% Operations / sec Operations / sec When one replica is slow Multi-Paxos Mencius EPaxos 0% EPaxos 100% Operations / sec Operations / sec 19

123 EPaxos: higher throughput w/ batching 5 ms batching, local area Latency [ms] log scale Multi-Paxos EPaxos 0% EPaxos 100% 20 Throughput (ops / sec)

124 EPaxos: higher throughput w/ batching 5 ms batching, local area Latency [ms] log scale Multi-Paxos EPaxos 0% EPaxos 100% Throughput (ops / sec)

125 EPaxos: higher throughput w/ batching 5 ms batching, local area Latency [ms] log scale Multi-Paxos EPaxos 0% EPaxos 100% Throughput (ops / sec)

126 EPaxos: higher throughput w/ batching 5 ms batching, local area Latency [ms] log scale Multi-Paxos EPaxos 0% EPaxos 100% Throughput (ops / sec)

127 EPaxos: higher throughput w/ batching 5 ms batching, local area Latency [ms] log scale Multi-Paxos EPaxos 0% EPaxos 100% Throughput (ops / sec)

128 Constant availability Commit throughput [ops / sec] replica failure replica failure leader failure Time [sec] EPaxos delayed commits Mencius Multi-Paxos 21

129 Constant availability Commit throughput [ops / sec] replica failure replica failure leader failure Time [sec] EPaxos delayed commits Mencius Multi-Paxos 21

130 Constant availability Commit throughput [ops / sec] replica failure replica failure leader failure Time [sec] EPaxos delayed commits Mencius Multi-Paxos 21

131 EPaxos insights 22

132 EPaxos insights Order commands explicitly 22

133 EPaxos insights Order commands explicitly High throughput 22

134 EPaxos insights Order commands explicitly High throughput Stability 22

135 EPaxos insights Order commands explicitly High throughput Stability Low latency 22

136 EPaxos insights Order commands explicitly Optimize only delays that matter (clients co-located w/ closest replica) High throughput Stability Low latency 22

137 EPaxos insights Order commands explicitly Optimize only delays that matter (clients co-located w/ closest replica) Smaller quorums High throughput Stability Low latency 22

138 EPaxos insights Order commands explicitly Optimize only delays that matter (clients co-located w/ closest replica) Smaller quorums High throughput Stability Low latency 22

139 Formal Proof TLA+ Spec Open Source Release 23

SDPaxos: Building Efficient Semi-Decentralized Geo-replicated State Machines

SDPaxos: Building Efficient Semi-Decentralized Geo-replicated State Machines SDPaxos: Building Efficient Semi-Decentralized Geo-replicated State Machines Hanyu Zhao *, Quanlu Zhang, Zhi Yang *, Ming Wu, Yafei Dai * * Peking University Microsoft Research Replication for Fault Tolerance

More information

Replicated State Machine in Wide-area Networks

Replicated State Machine in Wide-area Networks Replicated State Machine in Wide-area Networks Yanhua Mao CSE223A WI09 1 Building replicated state machine with consensus General approach to replicate stateful deterministic services Provide strong consistency

More information

TAPIR. By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton

TAPIR. By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton TAPIR By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton Outline Problem Space Inconsistent Replication TAPIR Evaluation Conclusion Problem

More information

Building Consistent Transactions with Inconsistent Replication

Building Consistent Transactions with Inconsistent Replication DB Reading Group Fall 2015 slides by Dana Van Aken Building Consistent Transactions with Inconsistent Replication Irene Zhang, Naveen Kr. Sharma, Adriana Szekeres, Arvind Krishnamurthy, Dan R. K. Ports

More information

PARALLEL CONSENSUS PROTOCOL

PARALLEL CONSENSUS PROTOCOL CANOPUS: A SCALABLE AND MASSIVELY PARALLEL CONSENSUS PROTOCOL Bernard Wong CoNEXT 2017 Joint work with Sajjad Rizvi and Srinivasan Keshav CONSENSUS PROBLEM Agreement between a set of nodes in the presence

More information

Distributed Consensus: Making Impossible Possible

Distributed Consensus: Making Impossible Possible Distributed Consensus: Making Impossible Possible QCon London Tuesday 29/3/2016 Heidi Howard PhD Student @ University of Cambridge heidi.howard@cl.cam.ac.uk @heidiann360 What is Consensus? The process

More information

Distributed Consensus: Making Impossible Possible

Distributed Consensus: Making Impossible Possible Distributed Consensus: Making Impossible Possible Heidi Howard PhD Student @ University of Cambridge heidi.howard@cl.cam.ac.uk @heidiann360 hh360.user.srcf.net Sometimes inconsistency is not an option

More information

AGREEMENT PROTOCOLS. Paxos -a family of protocols for solving consensus

AGREEMENT PROTOCOLS. Paxos -a family of protocols for solving consensus AGREEMENT PROTOCOLS Paxos -a family of protocols for solving consensus OUTLINE History of the Paxos algorithm Paxos Algorithm Family Implementation in existing systems References HISTORY OF THE PAXOS ALGORITHM

More information

Designing Distributed Systems using Approximate Synchrony in Data Center Networks

Designing Distributed Systems using Approximate Synchrony in Data Center Networks Designing Distributed Systems using Approximate Synchrony in Data Center Networks Dan R. K. Ports Jialin Li Naveen Kr. Sharma Vincent Liu Arvind Krishnamurthy University of Washington CSE Today s most

More information

There Is More Consensus in Egalitarian Parliaments

There Is More Consensus in Egalitarian Parliaments There Is More Consensus in Egalitarian Parliaments Iulian Moraru, David G. Andersen, Michael Kaminsky Carnegie Mellon University and Intel Labs Abstract This paper describes the design and implementation

More information

Building Consistent Transactions with Inconsistent Replication

Building Consistent Transactions with Inconsistent Replication Building Consistent Transactions with Inconsistent Replication Irene Zhang, Naveen Kr. Sharma, Adriana Szekeres, Arvind Krishnamurthy, Dan R. K. Ports University of Washington Distributed storage systems

More information

Paxos Replicated State Machines as the Basis of a High- Performance Data Store

Paxos Replicated State Machines as the Basis of a High- Performance Data Store Paxos Replicated State Machines as the Basis of a High- Performance Data Store William J. Bolosky, Dexter Bradshaw, Randolph B. Haagens, Norbert P. Kusters and Peng Li March 30, 2011 Q: How to build a

More information

Applications of Paxos Algorithm

Applications of Paxos Algorithm Applications of Paxos Algorithm Gurkan Solmaz COP 6938 - Cloud Computing - Fall 2012 Department of Electrical Engineering and Computer Science University of Central Florida - Orlando, FL Oct 15, 2012 1

More information

Paxos and Raft (Lecture 21, cs262a) Ion Stoica, UC Berkeley November 7, 2016

Paxos and Raft (Lecture 21, cs262a) Ion Stoica, UC Berkeley November 7, 2016 Paxos and Raft (Lecture 21, cs262a) Ion Stoica, UC Berkeley November 7, 2016 Bezos mandate for service-oriented-architecture (~2002) 1. All teams will henceforth expose their data and functionality through

More information

Replication in Distributed Systems

Replication in Distributed Systems Replication in Distributed Systems Replication Basics Multiple copies of data kept in different nodes A set of replicas holding copies of a data Nodes can be physically very close or distributed all over

More information

Leader or Majority: Why have one when you can have both? Improving Read Scalability in Raft-like consensus protocols

Leader or Majority: Why have one when you can have both? Improving Read Scalability in Raft-like consensus protocols Leader or Majority: Why have one when you can have both? Improving Read Scalability in Raft-like consensus protocols Vaibhav Arora, Tanuj Mittal, Divyakant Agrawal, Amr El Abbadi * and Xun Xue, Zhiyanan,

More information

Separating the WHEAT from the Chaff: An Empirical Design for Geo-Replicated State Machines

Separating the WHEAT from the Chaff: An Empirical Design for Geo-Replicated State Machines Separating the WHEAT from the Chaff: An Empirical Design for Geo-Replicated State Machines João Sousa and Alysson Bessani LaSIGE, Faculdade de Ciências, Universidade de Lisboa, Portugal Abstract State

More information

Janus. Consolidating Concurrency Control and Consensus for Commits under Conflicts. Shuai Mu, Lamont Nelson, Wyatt Lloyd, Jinyang Li

Janus. Consolidating Concurrency Control and Consensus for Commits under Conflicts. Shuai Mu, Lamont Nelson, Wyatt Lloyd, Jinyang Li Janus Consolidating Concurrency Control and Consensus for Commits under Conflicts Shuai Mu, Lamont Nelson, Wyatt Lloyd, Jinyang Li New York University, University of Southern California State of the Art

More information

Be General and Don t Give Up Consistency in Geo- Replicated Transactional Systems

Be General and Don t Give Up Consistency in Geo- Replicated Transactional Systems Be General and Don t Give Up Consistency in Geo- Replicated Transactional Systems Alexandru Turcu, Sebastiano Peluso, Roberto Palmieri and Binoy Ravindran Replicated Transactional Systems DATA CONSISTENCY

More information

Distributed Systems. 10. Consensus: Paxos. Paul Krzyzanowski. Rutgers University. Fall 2017

Distributed Systems. 10. Consensus: Paxos. Paul Krzyzanowski. Rutgers University. Fall 2017 Distributed Systems 10. Consensus: Paxos Paul Krzyzanowski Rutgers University Fall 2017 1 Consensus Goal Allow a group of processes to agree on a result All processes must agree on the same value The value

More information

MENCIUS: BUILDING EFFICIENT

MENCIUS: BUILDING EFFICIENT MENCIUS: BUILDING EFFICIENT STATE MACHINE FOR WANS By: Yanhua Mao Flavio P. Junqueira Keith Marzullo Fabian Fuxa, Chun-Yu Hsiung November 14, 2018 AGENDA 1. Motivation 2. Breakthrough 3. Rules of Mencius

More information

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5. Question 1 What makes a message unstable? How does an unstable message become stable? Distributed Systems 2016 Exam 2 Review Paul Krzyzanowski Rutgers University Fall 2016 In virtual sychrony, a message

More information

Just Say NO to Paxos Overhead: Replacing Consensus with Network Ordering

Just Say NO to Paxos Overhead: Replacing Consensus with Network Ordering Just Say NO to Paxos Overhead: Replacing Consensus with Network Ordering Jialin Li, Ellis Michael, Naveen Kr. Sharma, Adriana Szekeres, Dan R. K. Ports Server failures are the common case in data centers

More information

Exploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout

Exploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout Exploiting Commutativity For Practical Fast Replication Seo Jin Park and John Ousterhout Overview Problem: replication adds latency and throughput overheads CURP: Consistent Unordered Replication Protocol

More information

Geographic State Machine Replication

Geographic State Machine Replication Università della Svizzera italiana USI Technical Report Series in Informatics Geographic State Machine Replication Paulo Coelho 1, Fernando Pedone 1 1 Faculty of Informatics, Università della Svizzera

More information

Consensus and related problems

Consensus and related problems Consensus and related problems Today l Consensus l Google s Chubby l Paxos for Chubby Consensus and failures How to make process agree on a value after one or more have proposed what the value should be?

More information

MDCC MULTI DATA CENTER CONSISTENCY. amplab. Tim Kraska, Gene Pang, Michael Franklin, Samuel Madden, Alan Fekete

MDCC MULTI DATA CENTER CONSISTENCY. amplab. Tim Kraska, Gene Pang, Michael Franklin, Samuel Madden, Alan Fekete MDCC MULTI DATA CENTER CONSISTENCY Tim Kraska, Gene Pang, Michael Franklin, Samuel Madden, Alan Fekete gpang@cs.berkeley.edu amplab MOTIVATION 2 3 June 2, 200: Rackspace power outage of approximately 0

More information

Clock-RSM: Low-Latency Inter-Datacenter State Machine Replication Using Loosely Synchronized Physical Clocks (Technical Report)

Clock-RSM: Low-Latency Inter-Datacenter State Machine Replication Using Loosely Synchronized Physical Clocks (Technical Report) : Low-Latency Inter-Datacenter State Machine Replication Using Loosely Synchronized Physical Clocks (Technical Report) Jiaqing Du, Daniele Sciascia, Sameh Elnikety, Willy Zwaenepoel, and Fernando Pedone

More information

EECS 498 Introduction to Distributed Systems

EECS 498 Introduction to Distributed Systems EECS 498 Introduction to Distributed Systems Fall 2017 Harsha V. Madhyastha Replicated State Machines Logical clocks Primary/ Backup Paxos? 0 1 (N-1)/2 No. of tolerable failures October 11, 2017 EECS 498

More information

Replications and Consensus

Replications and Consensus CPSC 426/526 Replications and Consensus Ennan Zhai Computer Science Department Yale University Recall: Lec-8 and 9 In the lec-8 and 9, we learned: - Cloud storage and data processing - File system: Google

More information

Introduction to Distributed Systems Seif Haridi

Introduction to Distributed Systems Seif Haridi Introduction to Distributed Systems Seif Haridi haridi@kth.se What is a distributed system? A set of nodes, connected by a network, which appear to its users as a single coherent system p1 p2. pn send

More information

Exploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout

Exploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout Exploiting Commutativity For Practical Fast Replication Seo Jin Park and John Ousterhout Overview Problem: consistent replication adds latency and throughput overheads Why? Replication happens after ordering

More information

Group Replication: A Journey to the Group Communication Core. Alfranio Correia Principal Software Engineer

Group Replication: A Journey to the Group Communication Core. Alfranio Correia Principal Software Engineer Group Replication: A Journey to the Group Communication Core Alfranio Correia (alfranio.correia@oracle.com) Principal Software Engineer 4th of February Copyright 7, Oracle and/or its affiliates. All rights

More information

Distributed Coordination with ZooKeeper - Theory and Practice. Simon Tao EMC Labs of China Oct. 24th, 2015

Distributed Coordination with ZooKeeper - Theory and Practice. Simon Tao EMC Labs of China Oct. 24th, 2015 Distributed Coordination with ZooKeeper - Theory and Practice Simon Tao EMC Labs of China {simon.tao@emc.com} Oct. 24th, 2015 Agenda 1. ZooKeeper Overview 2. Coordination in Spring XD 3. ZooKeeper Under

More information

Paxos and Replication. Dan Ports, CSEP 552

Paxos and Replication. Dan Ports, CSEP 552 Paxos and Replication Dan Ports, CSEP 552 Today: achieving consensus with Paxos and how to use this to build a replicated system Last week Scaling a web service using front-end caching but what about the

More information

Proseminar Distributed Systems Summer Semester Paxos algorithm. Stefan Resmerita

Proseminar Distributed Systems Summer Semester Paxos algorithm. Stefan Resmerita Proseminar Distributed Systems Summer Semester 2016 Paxos algorithm stefan.resmerita@cs.uni-salzburg.at The Paxos algorithm Family of protocols for reaching consensus among distributed agents Agents may

More information

Distributed Systems. Lec 9: Distributed File Systems NFS, AFS. Slide acks: Dave Andersen

Distributed Systems. Lec 9: Distributed File Systems NFS, AFS. Slide acks: Dave Andersen Distributed Systems Lec 9: Distributed File Systems NFS, AFS Slide acks: Dave Andersen (http://www.cs.cmu.edu/~dga/15-440/f10/lectures/08-distfs1.pdf) 1 VFS and FUSE Primer Some have asked for some background

More information

Spanner : Google's Globally-Distributed Database. James Sedgwick and Kayhan Dursun

Spanner : Google's Globally-Distributed Database. James Sedgwick and Kayhan Dursun Spanner : Google's Globally-Distributed Database James Sedgwick and Kayhan Dursun Spanner - A multi-version, globally-distributed, synchronously-replicated database - First system to - Distribute data

More information

Enhancing Throughput of

Enhancing Throughput of Enhancing Throughput of NCA 2017 Zhongmiao Li, Peter Van Roy and Paolo Romano Enhancing Throughput of Partially Replicated State Machines via NCA 2017 Zhongmiao Li, Peter Van Roy and Paolo Romano Enhancing

More information

Erasure Coding in Object Stores: Challenges and Opportunities

Erasure Coding in Object Stores: Challenges and Opportunities Erasure Coding in Object Stores: Challenges and Opportunities Lewis Tseng Boston College July 2018, PODC Acknowledgements Nancy Lynch Muriel Medard Kishori Konwar Prakash Narayana Moorthy Viveck R. Cadambe

More information

Mencius: Another Paxos Variant???

Mencius: Another Paxos Variant??? Mencius: Another Paxos Variant??? Authors: Yanhua Mao, Flavio P. Junqueira, Keith Marzullo Presented by Isaiah Mayerchak & Yijia Liu State Machine Replication in WANs WAN = Wide Area Network Goals: Web

More information

EECS 498 Introduction to Distributed Systems

EECS 498 Introduction to Distributed Systems EECS 498 Introduction to Distributed Systems Fall 2017 Harsha V. Madhyastha Implementing RSMs Logical clock based ordering of requests Cannot serve requests if any one replica is down Primary-backup replication

More information

Low-Latency Multi-Datacenter Databases using Replicated Commit

Low-Latency Multi-Datacenter Databases using Replicated Commit Low-Latency Multi-Datacenter Databases using Replicated Commit Hatem Mahmoud, Faisal Nawab, Alexander Pucher, Divyakant Agrawal, Amr El Abbadi UCSB Presented by Ashutosh Dhekne Main Contributions Reduce

More information

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf Distributed systems Lecture 6: distributed transactions, elections, consensus and replication Malte Schwarzkopf Last time Saw how we can build ordered multicast Messages between processes in a group Need

More information

Byzantine Fault Tolerant Raft

Byzantine Fault Tolerant Raft Abstract Byzantine Fault Tolerant Raft Dennis Wang, Nina Tai, Yicheng An {dwang22, ninatai, yicheng} @stanford.edu https://github.com/g60726/zatt For this project, we modified the original Raft design

More information

Agreement and Consensus. SWE 622, Spring 2017 Distributed Software Engineering

Agreement and Consensus. SWE 622, Spring 2017 Distributed Software Engineering Agreement and Consensus SWE 622, Spring 2017 Distributed Software Engineering Today General agreement problems Fault tolerance limitations of 2PC 3PC Paxos + ZooKeeper 2 Midterm Recap 200 GMU SWE 622 Midterm

More information

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB s C. Faloutsos A. Pavlo Lecture#23: Distributed Database Systems (R&G ch. 22) Administrivia Final Exam Who: You What: R&G Chapters 15-22

More information

Multileader WAN Paxos: Ruling the Archipelago with Fast Consensus

Multileader WAN Paxos: Ruling the Archipelago with Fast Consensus Multileader WAN Paxos: Ruling the Archipelago with Fast Consensus Ailidani Ailijiang Aleksey Charapko Murat Demirbas Tevfik Kosar Computer Science and Engineering University at Buffalo, SUNY Abstract We

More information

Distributed Systems Consensus

Distributed Systems Consensus Distributed Systems Consensus Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 1 / 56 What is the Problem?

More information

Surviving congestion in geo-distributed storage systems

Surviving congestion in geo-distributed storage systems Surviving congestion in geo-distributed storage systems Brian Cho Marcos K. Aguilera University of Illinois at Urbana-Champaign Microsoft Research Silicon Valley Geo-distributed data centers Web applications

More information

Ovid A Software-Defined Distributed Systems Framework. Deniz Altinbuken, Robbert van Renesse Cornell University

Ovid A Software-Defined Distributed Systems Framework. Deniz Altinbuken, Robbert van Renesse Cornell University Ovid A Software-Defined Distributed Systems Framework Deniz Altinbuken, Robbert van Renesse Cornell University Ovid Build distributed systems that are easy to evolve easy to reason about easy to compose

More information

Megastore: Providing Scalable, Highly Available Storage for Interactive Services & Spanner: Google s Globally- Distributed Database.

Megastore: Providing Scalable, Highly Available Storage for Interactive Services & Spanner: Google s Globally- Distributed Database. Megastore: Providing Scalable, Highly Available Storage for Interactive Services & Spanner: Google s Globally- Distributed Database. Presented by Kewei Li The Problem db nosql complex legacy tuning expensive

More information

Verdi: A Framework for Implementing and Formally Verifying Distributed Systems

Verdi: A Framework for Implementing and Formally Verifying Distributed Systems Verdi: A Framework for Implementing and Formally Verifying Distributed Systems Key-value store VST James R. Wilcox, Doug Woos, Pavel Panchekha, Zach Tatlock, Xi Wang, Michael D. Ernst, Thomas Anderson

More information

Parallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitionin

Parallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitionin Parallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitioning (Different data at different sites More space Better

More information

Recall use of logical clocks

Recall use of logical clocks Causal Consistency Consistency models Linearizability Causal Eventual COS 418: Distributed Systems Lecture 16 Sequential Michael Freedman 2 Recall use of logical clocks Lamport clocks: C(a) < C(z) Conclusion:

More information

Failure models. Byzantine Fault Tolerance. What can go wrong? Paxos is fail-stop tolerant. BFT model. BFT replication 5/25/18

Failure models. Byzantine Fault Tolerance. What can go wrong? Paxos is fail-stop tolerant. BFT model. BFT replication 5/25/18 Failure models Byzantine Fault Tolerance Fail-stop: nodes either execute the protocol correctly or just stop Byzantine failures: nodes can behave in any arbitrary way Send illegal messages, try to trick

More information

Google Spanner - A Globally Distributed,

Google Spanner - A Globally Distributed, Google Spanner - A Globally Distributed, Synchronously-Replicated Database System James C. Corbett, et. al. Feb 14, 2013. Presented By Alexander Chow For CS 742 Motivation Eventually-consistent sometimes

More information

CPS 512 midterm exam #1, 10/7/2016

CPS 512 midterm exam #1, 10/7/2016 CPS 512 midterm exam #1, 10/7/2016 Your name please: NetID: Answer all questions. Please attempt to confine your answers to the boxes provided. If you don t know the answer to a question, then just say

More information

Intra-cluster Replication for Apache Kafka. Jun Rao

Intra-cluster Replication for Apache Kafka. Jun Rao Intra-cluster Replication for Apache Kafka Jun Rao About myself Engineer at LinkedIn since 2010 Worked on Apache Kafka and Cassandra Database researcher at IBM Outline Overview of Kafka Kafka architecture

More information

Multileader WAN Paxos: Ruling the Archipelago with Fast Consensus

Multileader WAN Paxos: Ruling the Archipelago with Fast Consensus Multileader WAN Paxos: Ruling the Archipelago with Fast Consensus Ailidani Ailijiang Aleksey Charapko Murat Demirbas Tevfik Kosar Computer Science and Engineering University at Buffalo, SUNY Abstract WPaxos

More information

Fault Tolerance via the State Machine Replication Approach. Favian Contreras

Fault Tolerance via the State Machine Replication Approach. Favian Contreras Fault Tolerance via the State Machine Replication Approach Favian Contreras Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial Written by Fred Schneider Why a Tutorial? The

More information

Distributed Systems. replication Johan Montelius ID2201. Distributed Systems ID2201

Distributed Systems. replication Johan Montelius ID2201. Distributed Systems ID2201 Distributed Systems ID2201 replication Johan Montelius 1 The problem The problem we have: servers might be unavailable The solution: keep duplicates at different servers 2 Building a fault-tolerant service

More information

Causal Consistency. CS 240: Computing Systems and Concurrency Lecture 16. Marco Canini

Causal Consistency. CS 240: Computing Systems and Concurrency Lecture 16. Marco Canini Causal Consistency CS 240: Computing Systems and Concurrency Lecture 16 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Consistency models Linearizability

More information

Data-Intensive Distributed Computing

Data-Intensive Distributed Computing Data-Intensive Distributed Computing CS 451/651 (Fall 2018) Part 7: Mutable State (2/2) November 13, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These slides are

More information

Intuitive distributed algorithms. with F#

Intuitive distributed algorithms. with F# Intuitive distributed algorithms with F# Natallia Dzenisenka Alena Hall @nata_dzen @lenadroid A tour of a variety of intuitivedistributed algorithms used in practical distributed systems. and how to prototype

More information

Distributed Systems 11. Consensus. Paul Krzyzanowski

Distributed Systems 11. Consensus. Paul Krzyzanowski Distributed Systems 11. Consensus Paul Krzyzanowski pxk@cs.rutgers.edu 1 Consensus Goal Allow a group of processes to agree on a result All processes must agree on the same value The value must be one

More information

Spanner: Google's Globally-Distributed Database. Presented by Maciej Swiech

Spanner: Google's Globally-Distributed Database. Presented by Maciej Swiech Spanner: Google's Globally-Distributed Database Presented by Maciej Swiech What is Spanner? "...Google's scalable, multi-version, globallydistributed, and synchronously replicated database." What is Spanner?

More information

A Low-latency Consensus Algorithm for Geographically Distributed Systems

A Low-latency Consensus Algorithm for Geographically Distributed Systems A Low-latency Consensus Algorithm for Geographically Distributed Systems Balaji Arun Thesis submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of

More information

HT-Paxos: High Throughput State-Machine Replication Protocol for Large Clustered Data Centers

HT-Paxos: High Throughput State-Machine Replication Protocol for Large Clustered Data Centers 1 HT-Paxos: High Throughput State-Machine Replication Protocol for Large Clustered Data Centers Vinit Kumar 1 and Ajay Agarwal 2 1 Associate Professor with the Krishna Engineering College, Ghaziabad, India.

More information

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 10: Mutable State (2/2) March 16, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These

More information

CORFU: A Shared Log Design for Flash Clusters

CORFU: A Shared Log Design for Flash Clusters CORFU: A Shared Log Design for Flash Clusters Authors: Mahesh Balakrishnan, Dahlia Malkhi, Vijayan Prabhakaran, Ted Wobber, Michael Wei, John D. Davis EECS 591 11/7/18 Presented by Evan Agattas and Fanzhong

More information

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini Large-Scale Key-Value Stores Eventual Consistency Marco Serafini COMPSCI 590S Lecture 13 Goals of Key-Value Stores Export simple API put(key, value) get(key) Simpler and faster than a DBMS Less complexity,

More information

EECS 498 Introduction to Distributed Systems

EECS 498 Introduction to Distributed Systems EECS 498 Introduction to Distributed Systems Fall 2017 Harsha V. Madhyastha Dynamo Recap Consistent hashing 1-hop DHT enabled by gossip Execution of reads and writes Coordinated by first available successor

More information

CS6450: Distributed Systems Lecture 15. Ryan Stutsman

CS6450: Distributed Systems Lecture 15. Ryan Stutsman Strong Consistency CS6450: Distributed Systems Lecture 15 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University. Licensed

More information

Consensus a classic problem. Consensus, impossibility results and Paxos. Distributed Consensus. Asynchronous networks.

Consensus a classic problem. Consensus, impossibility results and Paxos. Distributed Consensus. Asynchronous networks. Consensus, impossibility results and Paxos Ken Birman Consensus a classic problem Consensus abstraction underlies many distributed systems and protocols N processes They start execution with inputs {0,1}

More information

Distributed Systems. 19. Fault Tolerance Paul Krzyzanowski. Rutgers University. Fall 2013

Distributed Systems. 19. Fault Tolerance Paul Krzyzanowski. Rutgers University. Fall 2013 Distributed Systems 19. Fault Tolerance Paul Krzyzanowski Rutgers University Fall 2013 November 27, 2013 2013 Paul Krzyzanowski 1 Faults Deviation from expected behavior Due to a variety of factors: Hardware

More information

Coordinating distributed systems part II. Marko Vukolić Distributed Systems and Cloud Computing

Coordinating distributed systems part II. Marko Vukolić Distributed Systems and Cloud Computing Coordinating distributed systems part II Marko Vukolić Distributed Systems and Cloud Computing Last Time Coordinating distributed systems part I Zookeeper At the heart of Zookeeper is the ZAB atomic broadcast

More information

Building Consistent Transactions with Inconsistent Replication

Building Consistent Transactions with Inconsistent Replication Building Consistent Transactions with Inconsistent Replication Irene Zhang Naveen Kr. Sharma Adriana Szekeres Arvind Krishnamurthy Dan R. K. Ports University of Washington {iyzhang, naveenks, aaasz, arvind,

More information

Today s Papers. Google Chubby. Distributed Consensus. EECS 262a Advanced Topics in Computer Systems Lecture 24. Paxos/Megastore November 24 th, 2014

Today s Papers. Google Chubby. Distributed Consensus. EECS 262a Advanced Topics in Computer Systems Lecture 24. Paxos/Megastore November 24 th, 2014 EECS 262a Advanced Topics in Computer Systems Lecture 24 Paxos/Megastore November 24 th, 2014 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley Today s Papers

More information

CS6450: Distributed Systems Lecture 11. Ryan Stutsman

CS6450: Distributed Systems Lecture 11. Ryan Stutsman Strong Consistency CS6450: Distributed Systems Lecture 11 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University. Licensed

More information

Recap. CSE 486/586 Distributed Systems Paxos. Paxos. Brief History. Brief History. Brief History C 1

Recap. CSE 486/586 Distributed Systems Paxos. Paxos. Brief History. Brief History. Brief History C 1 Recap Distributed Systems Steve Ko Computer Sciences and Engineering University at Buffalo Facebook photo storage CDN (hot), Haystack (warm), & f4 (very warm) Haystack RAID-6, per stripe: 10 data disks,

More information

TIBCO StreamBase 10 Distributed Computing and High Availability. November 2017

TIBCO StreamBase 10 Distributed Computing and High Availability. November 2017 TIBCO StreamBase 10 Distributed Computing and High Availability November 2017 Distributed Computing Distributed Computing location transparent objects and method invocation allowing transparent horizontal

More information

Strong Consistency & CAP Theorem

Strong Consistency & CAP Theorem Strong Consistency & CAP Theorem CS 240: Computing Systems and Concurrency Lecture 15 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Consistency models

More information

Transactions. CS 475, Spring 2018 Concurrent & Distributed Systems

Transactions. CS 475, Spring 2018 Concurrent & Distributed Systems Transactions CS 475, Spring 2018 Concurrent & Distributed Systems Review: Transactions boolean transfermoney(person from, Person to, float amount){ if(from.balance >= amount) { from.balance = from.balance

More information

Recall our 2PC commit problem. Recall our 2PC commit problem. Doing failover correctly isn t easy. Consensus I. FLP Impossibility, Paxos

Recall our 2PC commit problem. Recall our 2PC commit problem. Doing failover correctly isn t easy. Consensus I. FLP Impossibility, Paxos Consensus I Recall our 2PC commit problem FLP Impossibility, Paxos Client C 1 C à TC: go! COS 418: Distributed Systems Lecture 7 Michael Freedman Bank A B 2 TC à A, B: prepare! 3 A, B à P: yes or no 4

More information

Making Fast Consensus Generally Faster

Making Fast Consensus Generally Faster Making Fast Consensus Generally Faster [Technical Report] Sebastiano Peluso Virginia Tech peluso@vt.edu Alexandru Turcu Virginia Tech talex@vt.edu Roberto Palmieri Virginia Tech robertop@vt.edu Giuliano

More information

Quiz I Solutions MASSACHUSETTS INSTITUTE OF TECHNOLOGY Spring Department of Electrical Engineering and Computer Science

Quiz I Solutions MASSACHUSETTS INSTITUTE OF TECHNOLOGY Spring Department of Electrical Engineering and Computer Science Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.824 Spring 2014 Quiz I Solutions 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 26-30 31-35 36-40 41-45 46-50

More information

Zyzzyva. Speculative Byzantine Fault Tolerance. Ramakrishna Kotla. L. Alvisi, M. Dahlin, A. Clement, E. Wong University of Texas at Austin

Zyzzyva. Speculative Byzantine Fault Tolerance. Ramakrishna Kotla. L. Alvisi, M. Dahlin, A. Clement, E. Wong University of Texas at Austin Zyzzyva Speculative Byzantine Fault Tolerance Ramakrishna Kotla L. Alvisi, M. Dahlin, A. Clement, E. Wong University of Texas at Austin The Goal Transform high-performance service into high-performance

More information

CockroachDB on DC/OS. Ben Darnell, CTO, Cockroach Labs

CockroachDB on DC/OS. Ben Darnell, CTO, Cockroach Labs CockroachDB on DC/OS Ben Darnell, CTO, Cockroach Labs Agenda A cloud-native database CockroachDB on DC/OS Why CockroachDB Demo! Cloud-Native Database What is Cloud-Native? Horizontally scalable Individual

More information

Replication. Feb 10, 2016 CPSC 416

Replication. Feb 10, 2016 CPSC 416 Replication Feb 10, 2016 CPSC 416 How d we get here? Failures & single systems; fault tolerance techniques added redundancy (ECC memory, RAID, etc.) Conceptually, ECC & RAID both put a master in front

More information

Distributed Systems. Day 9: Replication [Part 1]

Distributed Systems. Day 9: Replication [Part 1] Distributed Systems Day 9: Replication [Part 1] Hash table k 0 v 0 k 1 v 1 k 2 v 2 k 3 v 3 ll Facebook Data Does your client know about all of F s servers? Security issues? Performance issues? How do clients

More information

An Algorithm for Replicated Objects with Efficient Reads (Extended Abstract)

An Algorithm for Replicated Objects with Efficient Reads (Extended Abstract) An Algorithm for Replicated Objects with Efficient Reads (Extended Abstract) Tushar D. Chandra Google Mountain View, CA, USA Vassos Hadzilacos University of Toronto Toronto, ON, Canada Sam Toueg University

More information

Paxos. Sistemi Distribuiti Laurea magistrale in ingegneria informatica A.A Leonardo Querzoni. giovedì 19 aprile 12

Paxos. Sistemi Distribuiti Laurea magistrale in ingegneria informatica A.A Leonardo Querzoni. giovedì 19 aprile 12 Sistemi Distribuiti Laurea magistrale in ingegneria informatica A.A. 2011-2012 Leonardo Querzoni The Paxos family of algorithms was introduced in 1999 to provide a viable solution to consensus in asynchronous

More information

Consensus, impossibility results and Paxos. Ken Birman

Consensus, impossibility results and Paxos. Ken Birman Consensus, impossibility results and Paxos Ken Birman Consensus a classic problem Consensus abstraction underlies many distributed systems and protocols N processes They start execution with inputs {0,1}

More information

WPaxos: Ruling the Archipelago with Fast Consensus

WPaxos: Ruling the Archipelago with Fast Consensus WPaxos: Ruling the Archipelago with Fast Consensus Ailidani Ailijiang, Aleksey Charapko, Murat Demirbas and Tevfik Kosar University at Buffalo, SUNY Email: {ailidani,acharapk,demirbas,tkosar}@buffalo.edu

More information

Causal Consistency and Two-Phase Commit

Causal Consistency and Two-Phase Commit Causal Consistency and Two-Phase Commit CS 240: Computing Systems and Concurrency Lecture 16 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Consistency

More information

Canopus: A Scalable and Massively Parallel Consensus Protocol

Canopus: A Scalable and Massively Parallel Consensus Protocol Canopus: A Scalable and Massively Parallel Consensus Protocol (Extended report) Sajjad Rizvi, Bernard Wong, Srinivasan Keshav Cheriton School of Computer Science University of Waterloo, 200 University

More information

Two phase commit protocol. Two phase commit protocol. Recall: Linearizability (Strong Consistency) Consensus

Two phase commit protocol. Two phase commit protocol. Recall: Linearizability (Strong Consistency) Consensus Recall: Linearizability (Strong Consistency) Consensus COS 518: Advanced Computer Systems Lecture 4 Provide behavior of a single copy of object: Read should urn the most recent write Subsequent reads should

More information

Recovering from a Crash. Three-Phase Commit

Recovering from a Crash. Three-Phase Commit Recovering from a Crash If INIT : abort locally and inform coordinator If Ready, contact another process Q and examine Q s state Lecture 18, page 23 Three-Phase Commit Two phase commit: problem if coordinator

More information