There Is More Consensus in Egalitarian Parliaments
|
|
- Curtis Glenn
- 6 years ago
- Views:
Transcription
1 There Is More Consensus in Egalitarian Parliaments Iulian Moraru, David Andersen, Michael Kaminsky Carnegie Mellon University Intel Labs
2 Fault tolerance Redundancy
3 State Machine Replication 3
4 State Machine Replication Execute the same commands in the same order 3
5 State Machine Replication Execute the same commands in the same order 3
6 State Machine Replication Execute the same commands in the same order Paxos 3
7 State Machine Replication Execute the same commands in the same order Paxos No external failure detector required 3
8 State Machine Replication Execute the same commands in the same order Paxos No external failure detector required Fast fail-over (high availability) 3
9 Paxos is important in clusters Chubby, Boxwood, SMARTER, ZooKeeper Synchronization Resource discovery Data replication High throughput High availability 4
10 Paxos is important in the wide-area Spanner, Megastore Bring data closer to clients Low latency Tolerate datacenter outages 5
11 Paxos is important in the wide-area Spanner, Megastore Bring data closer to clients Low latency Tolerate datacenter outages 5
12 Paxos is important in the wide-area Spanner, Megastore Bring data closer to clients Low latency Tolerate datacenter outages 130ms 85ms 5
13 Paxos overview 6
14 Paxos overview Agreement protocol 6
15 Paxos overview Agreement protocol Tolerates F failures with 2F+1 replicas (optimal) No external failure detector required 6
16 Paxos overview Agreement protocol Tolerates F failures with 2F+1 replicas (optimal) No external failure detector required Replicas can fail by crashing (non-byzantine) 6
17 Paxos overview Agreement protocol Tolerates F failures with 2F+1 replicas (optimal) No external failure detector required Replicas can fail by crashing (non-byzantine) Asynchronous communication 6
18 Using Paxos to order commands
19 Using Paxos to order commands C A B
20 Using Paxos to order commands C A B
21 Using Paxos to order commands vote C C A vote A B vote B
22 Using Paxos to order commands vote C C A vote A B vote B ack B ack B
23 Using Paxos to order commands C A B 7
24 Using Paxos to order commands C A B 7
25 Using Paxos to order commands C B A 7
26 Using Paxos to order commands C D B A 7
27 Using Paxos to order commands C D B A 7
28 Using Paxos to order commands B A D C 7
29 Using Paxos to order commands B A D C Choose commands independently for each slot 7
30 Using Paxos to order commands B A D C Choose commands independently for each slot At least 2 RTTs per slot: 7
31 Using Paxos to order commands B A D C Choose commands independently for each slot At least 2 RTTs per slot: 1. Take ownership of a slot 7
32 Using Paxos to order commands B A D C Choose commands independently for each slot At least 2 RTTs per slot: 1. Take ownership of a slot 2. Propose command 7
33 Multi-Paxos
34 Multi-Paxos
35 ABCD Multi-Paxos
36 Multi-Paxos BCD A 8
37 Multi-Paxos CD A B 8
38 Multi-Paxos D A B C 8
39 Multi-Paxos A B C D 8
40 Multi-Paxos A B C D 1 RTT to commit 8
41 Multi-Paxos A B C D 1 RTT to commit Bottleneck for performance and availabilty 8
42 Can we have it all? 9
43 Can we have it all? High throughput, low latency 9
44 Can we have it all? High throughput, low latency Constant availability 9
45 Can we have it all? High throughput, low latency Constant availability Distribute load evenly across all replicas 9
46 Can we have it all? High throughput, low latency Constant availability Distribute load evenly across all replicas Use fastest replicas 9
47 Can we have it all? High throughput, low latency Constant availability Distribute load evenly across all replicas Use fastest replicas Use closest (lowest latency) replicas 9
48 Can we have it all? High throughput, low latency Constant availability Paxos Distribute load evenly across all replicas Use fastest replicas Use closest (lowest latency) replicas 9
49 Can we have it all? High throughput, low latency Constant availability Multi- Paxos Distribute load evenly across all replicas Use fastest replicas Use closest (lowest latency) replicas 9
50 Can we have it all? High throughput, low latency Constant availability Multi- Paxos Distribute load evenly across all replicas Use fastest replicas Use closest (lowest latency) replicas Egalitarian Paxos (EPaxos) 9
51 EPaxos is all about ordering 10
52 EPaxos is all about ordering Previous strategies: 10
53 EPaxos is all about ordering Previous strategies: Contend for slots... 10
54 EPaxos is all about ordering Previous strategies: Contend for slots... Paxos 10
55 EPaxos is all about ordering Previous strategies: Contend for slots... Paxos One replica decides... 10
56 EPaxos is all about ordering Previous strategies: Contend for slots... Paxos One replica decides... Multi-Paxos, Fast Paxos, Generalized Paxos 10
57 EPaxos is all about ordering Previous strategies: Contend for slots... Paxos One replica decides... Multi-Paxos, Fast Paxos, Generalized Paxos Take turns round-robin 10...
58 EPaxos is all about ordering Previous strategies: Contend for slots... Paxos One replica decides... Multi-Paxos, Fast Paxos, Generalized Paxos Take turns round-robin... Mencius 10
59 EPaxos intuition 11
60 EPaxos intuition
61 EPaxos intuition A
62 EPaxos intuition C B A 11
63 EPaxos intuition D A C B 11
64 EPaxos intuition A C D B 11
65 EPaxos intuition A C D B 11
66 EPaxos intuition A C D B 11
67 EPaxos intuition A C D B 11
68 EPaxos intuition A C D B After each replica 11
69 EPaxos intuition A C D B After each replica A B C D 11
70 EPaxos intuition A C D B After each replica A B C D Load balance (every replica is a leader) 11
71 EPaxos intuition A C D B After each replica A B C D Load balance (every replica is a leader) EPaxos can choose any quorum for each command 11
72 EPaxos commit protocol R1 R2 R3 R4 R5 12
73 EPaxos commit protocol R1 R2 PreAccept(A) A R3 R4 R5 12
74 EPaxos commit protocol R1 R2 PreAccept(A) A A R3 R4 R5 12
75 EPaxos commit protocol R1 R2 PreAccept(A) A Commit A A R3 R4 R5 12
76 EPaxos commit protocol R1 R2 PreAccept(A) A Commit A A R3 R4 R5 12
77 EPaxos commit protocol R1 R2 PreAccept(A) A Commit A A R3 R4 R5 B PreAccept(B) 12
78 EPaxos commit protocol R1 R2 PreAccept(A) A Commit A A R3 R4 R5 B PreAccept(B) B 12
79 EPaxos commit protocol R1 R2 PreAccept(A) A Commit A A R3 R4 R5 B PreAccept(B) B {A} B 12
80 EPaxos commit protocol R1 R2 PreAccept(A) A Commit A A R3 R4 R5 B PreAccept(B) B {A} B B {A} Accept(B) 12
81 EPaxos commit protocol R1 R2 PreAccept(A) A Commit A A R3 R4 B {A} B B B {A} ACK R5 PreAccept(B) Accept(B) 12
82 EPaxos commit protocol R1 R2 PreAccept(A) A Commit A A R3 R4 B {A} B B B {A} ACK R5 PreAccept(B) Accept(B) Commit B {A} 12
83 EPaxos commit protocol R1 R2 PreAccept(A) A Commit A A R3 R4 B {A} B B B {A} ACK R5 PreAccept(B) Accept(B) Commit B {A} 12
84 EPaxos commit protocol R1 PreAccept(A) Commit A PreAccept(C) A A C {A} R2 R3 R4 B {A} B B B {A} ACK R5 PreAccept(B) Accept(B) Commit B {A} 12
85 EPaxos commit protocol R1 PreAccept(A) Commit A PreAccept(C) A A C {A} C {A, B} R2 R3 R4 B {A} B B B {A} ACK R5 PreAccept(B) Accept(B) Commit B {A} 12
86 EPaxos commit protocol R1 PreAccept(A) Commit A PreAccept(C) Commit C {A, B} A A C {A} C {A, B} R2 R3 R4 B {A} B B B {A} ACK R5 PreAccept(B) Accept(B) Commit B {A} 12
87 EPaxos commit protocol R1 PreAccept(A) Commit A PreAccept(C) Commit C {A, B} A A C {A} C {A, B} R2 R3 R4 B {A} B B B {A} ACK R5 PreAccept(B) Accept(B) Commit B {A} 12
88 Order only interfering commands 1 RTT Non-concurrent commands OR non-interfering commands 2 RTTs Concurrent AND interfering 13
89 Interference is application-specific 14
90 Interference is application-specific KV store Infer from operation key 14
91 Interference is application-specific KV store Infer from operation key Google App Engine Programmer-specified 14
92 Interference is application-specific KV store Infer from operation key Google App Engine Programmer-specified Relational databases Most transactions are simple, can be analyzed Few remaining transactions interfere w/ everything 14
93 Execution A B C D E 15
94 strongly-connected component Execution A B C D E 15
95 strongly-connected component Execution A B C D E 15
96 strongly-connected component Execution A B C D E 15
97 strongly-connected component A Execution 1 E B C 2 D D 3 B, C, A E 15
98 strongly-connected component A Execution 1 E B C 2 D D 3 B, C, A E Approximate sequence # order (Lamport clock) 15
99 EPaxos properties 16
100 EPaxos properties Linearizability: If A~B, and A committed before B proposed then A will be executed before B. 16
101 EPaxos properties Linearizability: If A~B, and A committed before B proposed then A will be executed before B. Fast-path quorum: F + F / 2 Optimal for 3 and 5 replicas Better than Fast / Generalized Paxos by 1 16
102 EPaxos properties Linearizability: If A~B, and A committed before B proposed then A will be executed before B. Fast-path quorum: F + F / 2 Optimal for 3 and 5 replicas Better than Fast / Generalized Paxos by 1 16
103 EPaxos properties Linearizability: If A~B, and A committed before B proposed then A will be executed before B. Fast-path quorum: F + F / 2 Optimal for 3 and 5 replicas Better than Fast / Generalized Paxos by 1 16
104 EPaxos properties Linearizability: If A~B, and A committed before B proposed then A will be executed before B. Fast-path quorum: F + F / 2 Optimal for 3 and 5 replicas Better than Fast / Generalized Paxos by 1 16
105 EPaxos properties Linearizability: If A~B, and A committed before B proposed then A will be executed before B. Fast-path quorum: F + F / 2 Optimal for 3 and 5 replicas Better than Fast / Generalized Paxos by 1 16
106 Results 17
107 Optimal wide-area commit latency 18
108 Optimal wide-area commit latency 18
109 Optimal wide-area commit latency 18
110 Optimal wide-area commit latency 18
111 Optimal wide-area commit latency 18
112 Optimal wide-area commit latency 18
113 Optimal wide-area commit latency 130ms 300ms 150ms 90ms 85ms 18
114 Optimal wide-area commit latency 130ms 300ms 150ms 90ms 85ms EPaxos Mencius Generalized Paxos Multi-Paxos (CA leader) Median Commit Latency [ms] 18
115 Optimal wide-area commit latency 130ms 300ms 150ms 90ms 85ms CA EPaxos Mencius Generalized Paxos Multi-Paxos (CA leader) Median Commit Latency [ms] 18
116 Optimal wide-area commit latency 130ms 300ms 150ms 90ms 85ms CA VA EPaxos Mencius Generalized Paxos Multi-Paxos (CA leader) Median Commit Latency [ms] 18
117 Optimal wide-area commit latency 130ms 300ms 150ms 90ms 85ms CA EPaxos Mencius VA OR Generalized Paxos Multi-Paxos (CA leader) Median Commit Latency [ms] 18
118 Optimal wide-area commit latency 130ms 300ms 150ms 90ms 85ms CA VA OR JP EPaxos Mencius Generalized Paxos Multi-Paxos (CA leader) Median Commit Latency [ms] 18
119 Optimal wide-area commit latency 130ms 300ms 150ms 90ms 85ms CA VA OR JP EU EPaxos Mencius Generalized Paxos Multi-Paxos (CA leader) Median Commit Latency [ms] 18
120 Optimal wide-area commit latency 300ms 130ms 150ms 90ms 85ms EPaxos: Optimal commit latency in wide-area CA VA for 3 and 5 replicas OR JP EU EPaxos Mencius Generalized Paxos Multi-Paxos (CA leader) Median Commit Latency [ms] 18
121 Higher + more stable throughput 5 replicas 3 replicas Multi-Paxos Mencius EPaxos 0% EPaxos 2% EPaxos 100% Operations / sec Operations / sec 19
122 Higher + more stable throughput 5 replicas 3 replicas Multi-Paxos Mencius EPaxos 0% EPaxos 2% EPaxos 100% Operations / sec Operations / sec When one replica is slow Multi-Paxos Mencius EPaxos 0% EPaxos 100% Operations / sec Operations / sec 19
123 EPaxos: higher throughput w/ batching 5 ms batching, local area Latency [ms] log scale Multi-Paxos EPaxos 0% EPaxos 100% 20 Throughput (ops / sec)
124 EPaxos: higher throughput w/ batching 5 ms batching, local area Latency [ms] log scale Multi-Paxos EPaxos 0% EPaxos 100% Throughput (ops / sec)
125 EPaxos: higher throughput w/ batching 5 ms batching, local area Latency [ms] log scale Multi-Paxos EPaxos 0% EPaxos 100% Throughput (ops / sec)
126 EPaxos: higher throughput w/ batching 5 ms batching, local area Latency [ms] log scale Multi-Paxos EPaxos 0% EPaxos 100% Throughput (ops / sec)
127 EPaxos: higher throughput w/ batching 5 ms batching, local area Latency [ms] log scale Multi-Paxos EPaxos 0% EPaxos 100% Throughput (ops / sec)
128 Constant availability Commit throughput [ops / sec] replica failure replica failure leader failure Time [sec] EPaxos delayed commits Mencius Multi-Paxos 21
129 Constant availability Commit throughput [ops / sec] replica failure replica failure leader failure Time [sec] EPaxos delayed commits Mencius Multi-Paxos 21
130 Constant availability Commit throughput [ops / sec] replica failure replica failure leader failure Time [sec] EPaxos delayed commits Mencius Multi-Paxos 21
131 EPaxos insights 22
132 EPaxos insights Order commands explicitly 22
133 EPaxos insights Order commands explicitly High throughput 22
134 EPaxos insights Order commands explicitly High throughput Stability 22
135 EPaxos insights Order commands explicitly High throughput Stability Low latency 22
136 EPaxos insights Order commands explicitly Optimize only delays that matter (clients co-located w/ closest replica) High throughput Stability Low latency 22
137 EPaxos insights Order commands explicitly Optimize only delays that matter (clients co-located w/ closest replica) Smaller quorums High throughput Stability Low latency 22
138 EPaxos insights Order commands explicitly Optimize only delays that matter (clients co-located w/ closest replica) Smaller quorums High throughput Stability Low latency 22
139 Formal Proof TLA+ Spec Open Source Release 23
SDPaxos: Building Efficient Semi-Decentralized Geo-replicated State Machines
SDPaxos: Building Efficient Semi-Decentralized Geo-replicated State Machines Hanyu Zhao *, Quanlu Zhang, Zhi Yang *, Ming Wu, Yafei Dai * * Peking University Microsoft Research Replication for Fault Tolerance
More informationReplicated State Machine in Wide-area Networks
Replicated State Machine in Wide-area Networks Yanhua Mao CSE223A WI09 1 Building replicated state machine with consensus General approach to replicate stateful deterministic services Provide strong consistency
More informationTAPIR. By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton
TAPIR By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton Outline Problem Space Inconsistent Replication TAPIR Evaluation Conclusion Problem
More informationBuilding Consistent Transactions with Inconsistent Replication
DB Reading Group Fall 2015 slides by Dana Van Aken Building Consistent Transactions with Inconsistent Replication Irene Zhang, Naveen Kr. Sharma, Adriana Szekeres, Arvind Krishnamurthy, Dan R. K. Ports
More informationPARALLEL CONSENSUS PROTOCOL
CANOPUS: A SCALABLE AND MASSIVELY PARALLEL CONSENSUS PROTOCOL Bernard Wong CoNEXT 2017 Joint work with Sajjad Rizvi and Srinivasan Keshav CONSENSUS PROBLEM Agreement between a set of nodes in the presence
More informationDistributed Consensus: Making Impossible Possible
Distributed Consensus: Making Impossible Possible QCon London Tuesday 29/3/2016 Heidi Howard PhD Student @ University of Cambridge heidi.howard@cl.cam.ac.uk @heidiann360 What is Consensus? The process
More informationDistributed Consensus: Making Impossible Possible
Distributed Consensus: Making Impossible Possible Heidi Howard PhD Student @ University of Cambridge heidi.howard@cl.cam.ac.uk @heidiann360 hh360.user.srcf.net Sometimes inconsistency is not an option
More informationAGREEMENT PROTOCOLS. Paxos -a family of protocols for solving consensus
AGREEMENT PROTOCOLS Paxos -a family of protocols for solving consensus OUTLINE History of the Paxos algorithm Paxos Algorithm Family Implementation in existing systems References HISTORY OF THE PAXOS ALGORITHM
More informationDesigning Distributed Systems using Approximate Synchrony in Data Center Networks
Designing Distributed Systems using Approximate Synchrony in Data Center Networks Dan R. K. Ports Jialin Li Naveen Kr. Sharma Vincent Liu Arvind Krishnamurthy University of Washington CSE Today s most
More informationThere Is More Consensus in Egalitarian Parliaments
There Is More Consensus in Egalitarian Parliaments Iulian Moraru, David G. Andersen, Michael Kaminsky Carnegie Mellon University and Intel Labs Abstract This paper describes the design and implementation
More informationBuilding Consistent Transactions with Inconsistent Replication
Building Consistent Transactions with Inconsistent Replication Irene Zhang, Naveen Kr. Sharma, Adriana Szekeres, Arvind Krishnamurthy, Dan R. K. Ports University of Washington Distributed storage systems
More informationPaxos Replicated State Machines as the Basis of a High- Performance Data Store
Paxos Replicated State Machines as the Basis of a High- Performance Data Store William J. Bolosky, Dexter Bradshaw, Randolph B. Haagens, Norbert P. Kusters and Peng Li March 30, 2011 Q: How to build a
More informationApplications of Paxos Algorithm
Applications of Paxos Algorithm Gurkan Solmaz COP 6938 - Cloud Computing - Fall 2012 Department of Electrical Engineering and Computer Science University of Central Florida - Orlando, FL Oct 15, 2012 1
More informationPaxos and Raft (Lecture 21, cs262a) Ion Stoica, UC Berkeley November 7, 2016
Paxos and Raft (Lecture 21, cs262a) Ion Stoica, UC Berkeley November 7, 2016 Bezos mandate for service-oriented-architecture (~2002) 1. All teams will henceforth expose their data and functionality through
More informationReplication in Distributed Systems
Replication in Distributed Systems Replication Basics Multiple copies of data kept in different nodes A set of replicas holding copies of a data Nodes can be physically very close or distributed all over
More informationLeader or Majority: Why have one when you can have both? Improving Read Scalability in Raft-like consensus protocols
Leader or Majority: Why have one when you can have both? Improving Read Scalability in Raft-like consensus protocols Vaibhav Arora, Tanuj Mittal, Divyakant Agrawal, Amr El Abbadi * and Xun Xue, Zhiyanan,
More informationSeparating the WHEAT from the Chaff: An Empirical Design for Geo-Replicated State Machines
Separating the WHEAT from the Chaff: An Empirical Design for Geo-Replicated State Machines João Sousa and Alysson Bessani LaSIGE, Faculdade de Ciências, Universidade de Lisboa, Portugal Abstract State
More informationJanus. Consolidating Concurrency Control and Consensus for Commits under Conflicts. Shuai Mu, Lamont Nelson, Wyatt Lloyd, Jinyang Li
Janus Consolidating Concurrency Control and Consensus for Commits under Conflicts Shuai Mu, Lamont Nelson, Wyatt Lloyd, Jinyang Li New York University, University of Southern California State of the Art
More informationBe General and Don t Give Up Consistency in Geo- Replicated Transactional Systems
Be General and Don t Give Up Consistency in Geo- Replicated Transactional Systems Alexandru Turcu, Sebastiano Peluso, Roberto Palmieri and Binoy Ravindran Replicated Transactional Systems DATA CONSISTENCY
More informationDistributed Systems. 10. Consensus: Paxos. Paul Krzyzanowski. Rutgers University. Fall 2017
Distributed Systems 10. Consensus: Paxos Paul Krzyzanowski Rutgers University Fall 2017 1 Consensus Goal Allow a group of processes to agree on a result All processes must agree on the same value The value
More informationMENCIUS: BUILDING EFFICIENT
MENCIUS: BUILDING EFFICIENT STATE MACHINE FOR WANS By: Yanhua Mao Flavio P. Junqueira Keith Marzullo Fabian Fuxa, Chun-Yu Hsiung November 14, 2018 AGENDA 1. Motivation 2. Breakthrough 3. Rules of Mencius
More informationCS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.
Question 1 What makes a message unstable? How does an unstable message become stable? Distributed Systems 2016 Exam 2 Review Paul Krzyzanowski Rutgers University Fall 2016 In virtual sychrony, a message
More informationJust Say NO to Paxos Overhead: Replacing Consensus with Network Ordering
Just Say NO to Paxos Overhead: Replacing Consensus with Network Ordering Jialin Li, Ellis Michael, Naveen Kr. Sharma, Adriana Szekeres, Dan R. K. Ports Server failures are the common case in data centers
More informationExploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout
Exploiting Commutativity For Practical Fast Replication Seo Jin Park and John Ousterhout Overview Problem: replication adds latency and throughput overheads CURP: Consistent Unordered Replication Protocol
More informationGeographic State Machine Replication
Università della Svizzera italiana USI Technical Report Series in Informatics Geographic State Machine Replication Paulo Coelho 1, Fernando Pedone 1 1 Faculty of Informatics, Università della Svizzera
More informationConsensus and related problems
Consensus and related problems Today l Consensus l Google s Chubby l Paxos for Chubby Consensus and failures How to make process agree on a value after one or more have proposed what the value should be?
More informationMDCC MULTI DATA CENTER CONSISTENCY. amplab. Tim Kraska, Gene Pang, Michael Franklin, Samuel Madden, Alan Fekete
MDCC MULTI DATA CENTER CONSISTENCY Tim Kraska, Gene Pang, Michael Franklin, Samuel Madden, Alan Fekete gpang@cs.berkeley.edu amplab MOTIVATION 2 3 June 2, 200: Rackspace power outage of approximately 0
More informationClock-RSM: Low-Latency Inter-Datacenter State Machine Replication Using Loosely Synchronized Physical Clocks (Technical Report)
: Low-Latency Inter-Datacenter State Machine Replication Using Loosely Synchronized Physical Clocks (Technical Report) Jiaqing Du, Daniele Sciascia, Sameh Elnikety, Willy Zwaenepoel, and Fernando Pedone
More informationEECS 498 Introduction to Distributed Systems
EECS 498 Introduction to Distributed Systems Fall 2017 Harsha V. Madhyastha Replicated State Machines Logical clocks Primary/ Backup Paxos? 0 1 (N-1)/2 No. of tolerable failures October 11, 2017 EECS 498
More informationReplications and Consensus
CPSC 426/526 Replications and Consensus Ennan Zhai Computer Science Department Yale University Recall: Lec-8 and 9 In the lec-8 and 9, we learned: - Cloud storage and data processing - File system: Google
More informationIntroduction to Distributed Systems Seif Haridi
Introduction to Distributed Systems Seif Haridi haridi@kth.se What is a distributed system? A set of nodes, connected by a network, which appear to its users as a single coherent system p1 p2. pn send
More informationExploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout
Exploiting Commutativity For Practical Fast Replication Seo Jin Park and John Ousterhout Overview Problem: consistent replication adds latency and throughput overheads Why? Replication happens after ordering
More informationGroup Replication: A Journey to the Group Communication Core. Alfranio Correia Principal Software Engineer
Group Replication: A Journey to the Group Communication Core Alfranio Correia (alfranio.correia@oracle.com) Principal Software Engineer 4th of February Copyright 7, Oracle and/or its affiliates. All rights
More informationDistributed Coordination with ZooKeeper - Theory and Practice. Simon Tao EMC Labs of China Oct. 24th, 2015
Distributed Coordination with ZooKeeper - Theory and Practice Simon Tao EMC Labs of China {simon.tao@emc.com} Oct. 24th, 2015 Agenda 1. ZooKeeper Overview 2. Coordination in Spring XD 3. ZooKeeper Under
More informationPaxos and Replication. Dan Ports, CSEP 552
Paxos and Replication Dan Ports, CSEP 552 Today: achieving consensus with Paxos and how to use this to build a replicated system Last week Scaling a web service using front-end caching but what about the
More informationProseminar Distributed Systems Summer Semester Paxos algorithm. Stefan Resmerita
Proseminar Distributed Systems Summer Semester 2016 Paxos algorithm stefan.resmerita@cs.uni-salzburg.at The Paxos algorithm Family of protocols for reaching consensus among distributed agents Agents may
More informationDistributed Systems. Lec 9: Distributed File Systems NFS, AFS. Slide acks: Dave Andersen
Distributed Systems Lec 9: Distributed File Systems NFS, AFS Slide acks: Dave Andersen (http://www.cs.cmu.edu/~dga/15-440/f10/lectures/08-distfs1.pdf) 1 VFS and FUSE Primer Some have asked for some background
More informationSpanner : Google's Globally-Distributed Database. James Sedgwick and Kayhan Dursun
Spanner : Google's Globally-Distributed Database James Sedgwick and Kayhan Dursun Spanner - A multi-version, globally-distributed, synchronously-replicated database - First system to - Distribute data
More informationEnhancing Throughput of
Enhancing Throughput of NCA 2017 Zhongmiao Li, Peter Van Roy and Paolo Romano Enhancing Throughput of Partially Replicated State Machines via NCA 2017 Zhongmiao Li, Peter Van Roy and Paolo Romano Enhancing
More informationErasure Coding in Object Stores: Challenges and Opportunities
Erasure Coding in Object Stores: Challenges and Opportunities Lewis Tseng Boston College July 2018, PODC Acknowledgements Nancy Lynch Muriel Medard Kishori Konwar Prakash Narayana Moorthy Viveck R. Cadambe
More informationMencius: Another Paxos Variant???
Mencius: Another Paxos Variant??? Authors: Yanhua Mao, Flavio P. Junqueira, Keith Marzullo Presented by Isaiah Mayerchak & Yijia Liu State Machine Replication in WANs WAN = Wide Area Network Goals: Web
More informationEECS 498 Introduction to Distributed Systems
EECS 498 Introduction to Distributed Systems Fall 2017 Harsha V. Madhyastha Implementing RSMs Logical clock based ordering of requests Cannot serve requests if any one replica is down Primary-backup replication
More informationLow-Latency Multi-Datacenter Databases using Replicated Commit
Low-Latency Multi-Datacenter Databases using Replicated Commit Hatem Mahmoud, Faisal Nawab, Alexander Pucher, Divyakant Agrawal, Amr El Abbadi UCSB Presented by Ashutosh Dhekne Main Contributions Reduce
More informationDistributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf
Distributed systems Lecture 6: distributed transactions, elections, consensus and replication Malte Schwarzkopf Last time Saw how we can build ordered multicast Messages between processes in a group Need
More informationByzantine Fault Tolerant Raft
Abstract Byzantine Fault Tolerant Raft Dennis Wang, Nina Tai, Yicheng An {dwang22, ninatai, yicheng} @stanford.edu https://github.com/g60726/zatt For this project, we modified the original Raft design
More informationAgreement and Consensus. SWE 622, Spring 2017 Distributed Software Engineering
Agreement and Consensus SWE 622, Spring 2017 Distributed Software Engineering Today General agreement problems Fault tolerance limitations of 2PC 3PC Paxos + ZooKeeper 2 Midterm Recap 200 GMU SWE 622 Midterm
More informationCMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS
Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB s C. Faloutsos A. Pavlo Lecture#23: Distributed Database Systems (R&G ch. 22) Administrivia Final Exam Who: You What: R&G Chapters 15-22
More informationMultileader WAN Paxos: Ruling the Archipelago with Fast Consensus
Multileader WAN Paxos: Ruling the Archipelago with Fast Consensus Ailidani Ailijiang Aleksey Charapko Murat Demirbas Tevfik Kosar Computer Science and Engineering University at Buffalo, SUNY Abstract We
More informationDistributed Systems Consensus
Distributed Systems Consensus Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 1 / 56 What is the Problem?
More informationSurviving congestion in geo-distributed storage systems
Surviving congestion in geo-distributed storage systems Brian Cho Marcos K. Aguilera University of Illinois at Urbana-Champaign Microsoft Research Silicon Valley Geo-distributed data centers Web applications
More informationOvid A Software-Defined Distributed Systems Framework. Deniz Altinbuken, Robbert van Renesse Cornell University
Ovid A Software-Defined Distributed Systems Framework Deniz Altinbuken, Robbert van Renesse Cornell University Ovid Build distributed systems that are easy to evolve easy to reason about easy to compose
More informationMegastore: Providing Scalable, Highly Available Storage for Interactive Services & Spanner: Google s Globally- Distributed Database.
Megastore: Providing Scalable, Highly Available Storage for Interactive Services & Spanner: Google s Globally- Distributed Database. Presented by Kewei Li The Problem db nosql complex legacy tuning expensive
More informationVerdi: A Framework for Implementing and Formally Verifying Distributed Systems
Verdi: A Framework for Implementing and Formally Verifying Distributed Systems Key-value store VST James R. Wilcox, Doug Woos, Pavel Panchekha, Zach Tatlock, Xi Wang, Michael D. Ernst, Thomas Anderson
More informationParallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitionin
Parallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitioning (Different data at different sites More space Better
More informationRecall use of logical clocks
Causal Consistency Consistency models Linearizability Causal Eventual COS 418: Distributed Systems Lecture 16 Sequential Michael Freedman 2 Recall use of logical clocks Lamport clocks: C(a) < C(z) Conclusion:
More informationFailure models. Byzantine Fault Tolerance. What can go wrong? Paxos is fail-stop tolerant. BFT model. BFT replication 5/25/18
Failure models Byzantine Fault Tolerance Fail-stop: nodes either execute the protocol correctly or just stop Byzantine failures: nodes can behave in any arbitrary way Send illegal messages, try to trick
More informationGoogle Spanner - A Globally Distributed,
Google Spanner - A Globally Distributed, Synchronously-Replicated Database System James C. Corbett, et. al. Feb 14, 2013. Presented By Alexander Chow For CS 742 Motivation Eventually-consistent sometimes
More informationCPS 512 midterm exam #1, 10/7/2016
CPS 512 midterm exam #1, 10/7/2016 Your name please: NetID: Answer all questions. Please attempt to confine your answers to the boxes provided. If you don t know the answer to a question, then just say
More informationIntra-cluster Replication for Apache Kafka. Jun Rao
Intra-cluster Replication for Apache Kafka Jun Rao About myself Engineer at LinkedIn since 2010 Worked on Apache Kafka and Cassandra Database researcher at IBM Outline Overview of Kafka Kafka architecture
More informationMultileader WAN Paxos: Ruling the Archipelago with Fast Consensus
Multileader WAN Paxos: Ruling the Archipelago with Fast Consensus Ailidani Ailijiang Aleksey Charapko Murat Demirbas Tevfik Kosar Computer Science and Engineering University at Buffalo, SUNY Abstract WPaxos
More informationFault Tolerance via the State Machine Replication Approach. Favian Contreras
Fault Tolerance via the State Machine Replication Approach Favian Contreras Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial Written by Fred Schneider Why a Tutorial? The
More informationDistributed Systems. replication Johan Montelius ID2201. Distributed Systems ID2201
Distributed Systems ID2201 replication Johan Montelius 1 The problem The problem we have: servers might be unavailable The solution: keep duplicates at different servers 2 Building a fault-tolerant service
More informationCausal Consistency. CS 240: Computing Systems and Concurrency Lecture 16. Marco Canini
Causal Consistency CS 240: Computing Systems and Concurrency Lecture 16 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Consistency models Linearizability
More informationData-Intensive Distributed Computing
Data-Intensive Distributed Computing CS 451/651 (Fall 2018) Part 7: Mutable State (2/2) November 13, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These slides are
More informationIntuitive distributed algorithms. with F#
Intuitive distributed algorithms with F# Natallia Dzenisenka Alena Hall @nata_dzen @lenadroid A tour of a variety of intuitivedistributed algorithms used in practical distributed systems. and how to prototype
More informationDistributed Systems 11. Consensus. Paul Krzyzanowski
Distributed Systems 11. Consensus Paul Krzyzanowski pxk@cs.rutgers.edu 1 Consensus Goal Allow a group of processes to agree on a result All processes must agree on the same value The value must be one
More informationSpanner: Google's Globally-Distributed Database. Presented by Maciej Swiech
Spanner: Google's Globally-Distributed Database Presented by Maciej Swiech What is Spanner? "...Google's scalable, multi-version, globallydistributed, and synchronously replicated database." What is Spanner?
More informationA Low-latency Consensus Algorithm for Geographically Distributed Systems
A Low-latency Consensus Algorithm for Geographically Distributed Systems Balaji Arun Thesis submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of
More informationHT-Paxos: High Throughput State-Machine Replication Protocol for Large Clustered Data Centers
1 HT-Paxos: High Throughput State-Machine Replication Protocol for Large Clustered Data Centers Vinit Kumar 1 and Ajay Agarwal 2 1 Associate Professor with the Krishna Engineering College, Ghaziabad, India.
More informationBig Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)
Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 10: Mutable State (2/2) March 16, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These
More informationCORFU: A Shared Log Design for Flash Clusters
CORFU: A Shared Log Design for Flash Clusters Authors: Mahesh Balakrishnan, Dahlia Malkhi, Vijayan Prabhakaran, Ted Wobber, Michael Wei, John D. Davis EECS 591 11/7/18 Presented by Evan Agattas and Fanzhong
More informationLarge-Scale Key-Value Stores Eventual Consistency Marco Serafini
Large-Scale Key-Value Stores Eventual Consistency Marco Serafini COMPSCI 590S Lecture 13 Goals of Key-Value Stores Export simple API put(key, value) get(key) Simpler and faster than a DBMS Less complexity,
More informationEECS 498 Introduction to Distributed Systems
EECS 498 Introduction to Distributed Systems Fall 2017 Harsha V. Madhyastha Dynamo Recap Consistent hashing 1-hop DHT enabled by gossip Execution of reads and writes Coordinated by first available successor
More informationCS6450: Distributed Systems Lecture 15. Ryan Stutsman
Strong Consistency CS6450: Distributed Systems Lecture 15 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University. Licensed
More informationConsensus a classic problem. Consensus, impossibility results and Paxos. Distributed Consensus. Asynchronous networks.
Consensus, impossibility results and Paxos Ken Birman Consensus a classic problem Consensus abstraction underlies many distributed systems and protocols N processes They start execution with inputs {0,1}
More informationDistributed Systems. 19. Fault Tolerance Paul Krzyzanowski. Rutgers University. Fall 2013
Distributed Systems 19. Fault Tolerance Paul Krzyzanowski Rutgers University Fall 2013 November 27, 2013 2013 Paul Krzyzanowski 1 Faults Deviation from expected behavior Due to a variety of factors: Hardware
More informationCoordinating distributed systems part II. Marko Vukolić Distributed Systems and Cloud Computing
Coordinating distributed systems part II Marko Vukolić Distributed Systems and Cloud Computing Last Time Coordinating distributed systems part I Zookeeper At the heart of Zookeeper is the ZAB atomic broadcast
More informationBuilding Consistent Transactions with Inconsistent Replication
Building Consistent Transactions with Inconsistent Replication Irene Zhang Naveen Kr. Sharma Adriana Szekeres Arvind Krishnamurthy Dan R. K. Ports University of Washington {iyzhang, naveenks, aaasz, arvind,
More informationToday s Papers. Google Chubby. Distributed Consensus. EECS 262a Advanced Topics in Computer Systems Lecture 24. Paxos/Megastore November 24 th, 2014
EECS 262a Advanced Topics in Computer Systems Lecture 24 Paxos/Megastore November 24 th, 2014 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley Today s Papers
More informationCS6450: Distributed Systems Lecture 11. Ryan Stutsman
Strong Consistency CS6450: Distributed Systems Lecture 11 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University. Licensed
More informationRecap. CSE 486/586 Distributed Systems Paxos. Paxos. Brief History. Brief History. Brief History C 1
Recap Distributed Systems Steve Ko Computer Sciences and Engineering University at Buffalo Facebook photo storage CDN (hot), Haystack (warm), & f4 (very warm) Haystack RAID-6, per stripe: 10 data disks,
More informationTIBCO StreamBase 10 Distributed Computing and High Availability. November 2017
TIBCO StreamBase 10 Distributed Computing and High Availability November 2017 Distributed Computing Distributed Computing location transparent objects and method invocation allowing transparent horizontal
More informationStrong Consistency & CAP Theorem
Strong Consistency & CAP Theorem CS 240: Computing Systems and Concurrency Lecture 15 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Consistency models
More informationTransactions. CS 475, Spring 2018 Concurrent & Distributed Systems
Transactions CS 475, Spring 2018 Concurrent & Distributed Systems Review: Transactions boolean transfermoney(person from, Person to, float amount){ if(from.balance >= amount) { from.balance = from.balance
More informationRecall our 2PC commit problem. Recall our 2PC commit problem. Doing failover correctly isn t easy. Consensus I. FLP Impossibility, Paxos
Consensus I Recall our 2PC commit problem FLP Impossibility, Paxos Client C 1 C à TC: go! COS 418: Distributed Systems Lecture 7 Michael Freedman Bank A B 2 TC à A, B: prepare! 3 A, B à P: yes or no 4
More informationMaking Fast Consensus Generally Faster
Making Fast Consensus Generally Faster [Technical Report] Sebastiano Peluso Virginia Tech peluso@vt.edu Alexandru Turcu Virginia Tech talex@vt.edu Roberto Palmieri Virginia Tech robertop@vt.edu Giuliano
More informationQuiz I Solutions MASSACHUSETTS INSTITUTE OF TECHNOLOGY Spring Department of Electrical Engineering and Computer Science
Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.824 Spring 2014 Quiz I Solutions 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 26-30 31-35 36-40 41-45 46-50
More informationZyzzyva. Speculative Byzantine Fault Tolerance. Ramakrishna Kotla. L. Alvisi, M. Dahlin, A. Clement, E. Wong University of Texas at Austin
Zyzzyva Speculative Byzantine Fault Tolerance Ramakrishna Kotla L. Alvisi, M. Dahlin, A. Clement, E. Wong University of Texas at Austin The Goal Transform high-performance service into high-performance
More informationCockroachDB on DC/OS. Ben Darnell, CTO, Cockroach Labs
CockroachDB on DC/OS Ben Darnell, CTO, Cockroach Labs Agenda A cloud-native database CockroachDB on DC/OS Why CockroachDB Demo! Cloud-Native Database What is Cloud-Native? Horizontally scalable Individual
More informationReplication. Feb 10, 2016 CPSC 416
Replication Feb 10, 2016 CPSC 416 How d we get here? Failures & single systems; fault tolerance techniques added redundancy (ECC memory, RAID, etc.) Conceptually, ECC & RAID both put a master in front
More informationDistributed Systems. Day 9: Replication [Part 1]
Distributed Systems Day 9: Replication [Part 1] Hash table k 0 v 0 k 1 v 1 k 2 v 2 k 3 v 3 ll Facebook Data Does your client know about all of F s servers? Security issues? Performance issues? How do clients
More informationAn Algorithm for Replicated Objects with Efficient Reads (Extended Abstract)
An Algorithm for Replicated Objects with Efficient Reads (Extended Abstract) Tushar D. Chandra Google Mountain View, CA, USA Vassos Hadzilacos University of Toronto Toronto, ON, Canada Sam Toueg University
More informationPaxos. Sistemi Distribuiti Laurea magistrale in ingegneria informatica A.A Leonardo Querzoni. giovedì 19 aprile 12
Sistemi Distribuiti Laurea magistrale in ingegneria informatica A.A. 2011-2012 Leonardo Querzoni The Paxos family of algorithms was introduced in 1999 to provide a viable solution to consensus in asynchronous
More informationConsensus, impossibility results and Paxos. Ken Birman
Consensus, impossibility results and Paxos Ken Birman Consensus a classic problem Consensus abstraction underlies many distributed systems and protocols N processes They start execution with inputs {0,1}
More informationWPaxos: Ruling the Archipelago with Fast Consensus
WPaxos: Ruling the Archipelago with Fast Consensus Ailidani Ailijiang, Aleksey Charapko, Murat Demirbas and Tevfik Kosar University at Buffalo, SUNY Email: {ailidani,acharapk,demirbas,tkosar}@buffalo.edu
More informationCausal Consistency and Two-Phase Commit
Causal Consistency and Two-Phase Commit CS 240: Computing Systems and Concurrency Lecture 16 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Consistency
More informationCanopus: A Scalable and Massively Parallel Consensus Protocol
Canopus: A Scalable and Massively Parallel Consensus Protocol (Extended report) Sajjad Rizvi, Bernard Wong, Srinivasan Keshav Cheriton School of Computer Science University of Waterloo, 200 University
More informationTwo phase commit protocol. Two phase commit protocol. Recall: Linearizability (Strong Consistency) Consensus
Recall: Linearizability (Strong Consistency) Consensus COS 518: Advanced Computer Systems Lecture 4 Provide behavior of a single copy of object: Read should urn the most recent write Subsequent reads should
More informationRecovering from a Crash. Three-Phase Commit
Recovering from a Crash If INIT : abort locally and inform coordinator If Ready, contact another process Q and examine Q s state Lecture 18, page 23 Three-Phase Commit Two phase commit: problem if coordinator
More information