Replications and Consensus

Similar documents
Time and distributed systems. Just use time stamps? Correct consistency model? Replication and Consistency

Strong Consistency & CAP Theorem

Strong Consistency and Agreement

CS6450: Distributed Systems Lecture 11. Ryan Stutsman

CS6450: Distributed Systems Lecture 15. Ryan Stutsman

Consistency examples. COS 418: Distributed Systems Precept 5. Themis Melissaris

Replication in Distributed Systems

Distributed Systems. Lec 12: Consistency Models Sequential, Causal, and Eventual Consistency. Slide acks: Jinyang Li

Recall our 2PC commit problem. Recall our 2PC commit problem. Doing failover correctly isn t easy. Consensus I. FLP Impossibility, Paxos

Versioning, Consistency, and Agreement

Causal Consistency and Two-Phase Commit

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

Parallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitionin

Cryptographic Systems

Two phase commit protocol. Two phase commit protocol. Recall: Linearizability (Strong Consistency) Consensus

Distributed Systems Consensus

Replication of Data. Data-Centric Consistency Models. Reliability vs. Availability

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Proseminar Distributed Systems Summer Semester Paxos algorithm. Stefan Resmerita

Consensus a classic problem. Consensus, impossibility results and Paxos. Distributed Consensus. Asynchronous networks.

Recovering from a Crash. Three-Phase Commit

Important Lessons. A Distributed Algorithm (2) Today's Lecture - Replication

Paxos provides a highly available, redundant log of events

Distributed Systems (5DV147)

Distributed Systems 11. Consensus. Paul Krzyzanowski

Exam 2 Review. October 29, Paul Krzyzanowski 1

CS 425 / ECE 428 Distributed Systems Fall 2017

Distributed Systems. replication Johan Montelius ID2201. Distributed Systems ID2201

Security (and finale) Dan Ports, CSEP 552

Consensus, impossibility results and Paxos. Ken Birman

Replication. Feb 10, 2016 CPSC 416

Distributed Systems. 10. Consensus: Paxos. Paul Krzyzanowski. Rutgers University. Fall 2017

Performance and Forgiveness. June 23, 2008 Margo Seltzer Harvard University School of Engineering and Applied Sciences

Global atomicity. Such distributed atomicity is called global atomicity A protocol designed to enforce global atomicity is called commit protocol

Consistency and Replication. Why replicate?

Consistency and Replication. Why replicate?

Consistency. CS 475, Spring 2018 Concurrent & Distributed Systems

Computing Parable. The Archery Teacher. Courtesy: S. Keshav, U. Waterloo. Computer Science. Lecture 16, page 1

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

EECS 498 Introduction to Distributed Systems

TAPIR. By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton

Consensus in Distributed Systems. Jeff Chase Duke University

There Is More Consensus in Egalitarian Parliaments

Failures in Distributed Systems

Recall use of logical clocks

Introduction to Distributed Systems Seif Haridi

Causal Consistency. CS 240: Computing Systems and Concurrency Lecture 16. Marco Canini

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

Basic vs. Reliable Multicast

Replication and Consistency

CS6450: Distributed Systems Lecture 13. Ryan Stutsman

EECS 498 Introduction to Distributed Systems

Today: Fault Tolerance

Availability versus consistency. Eventual Consistency: Bayou. Eventual consistency. Bayou: A Weakly Connected Replicated Storage System

AGREEMENT PROTOCOLS. Paxos -a family of protocols for solving consensus

Today: Fault Tolerance. Fault Tolerance

Memory Consistency. Minsoo Ryu. Department of Computer Science and Engineering. Hanyang University. Real-Time Computing and Communications Lab.

Chapter 4: Distributed Systems: Replication and Consistency. Fall 2013 Jussi Kangasharju

Replication and Consistency. Fall 2010 Jussi Kangasharju

DrRobert N. M. Watson

Recap. CSE 486/586 Distributed Systems Paxos. Paxos. Brief History. Brief History. Brief History C 1

Beyond FLP. Acknowledgement for presentation material. Chapter 8: Distributed Systems Principles and Paradigms: Tanenbaum and Van Steen

CPS 512 midterm exam #1, 10/7/2016

Distributed Systems. Day 9: Replication [Part 1]

Building Consistent Transactions with Inconsistent Replication

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions

Building Consistent Transactions with Inconsistent Replication

Practice: Large Systems Part 2, Chapter 2

Transactions. CS 475, Spring 2018 Concurrent & Distributed Systems

Distributed Consensus Protocols

Applications of Paxos Algorithm

Distributed Systems. Day 13: Distributed Transaction. To Be or Not to Be Distributed.. Transactions

Lecture 6 Consistency and Replication

Consistency: Relaxed. SWE 622, Spring 2017 Distributed Software Engineering

Distributed System. Gang Wu. Spring,2018

Distributed Systems Intro and Course Overview

Data Consistency and Blockchain. Bei Chun Zhou (BlockChainZ)

DISTRIBUTED COMPUTER SYSTEMS

CS October 2017

Distributed Systems. Lec 11: Consistency Models. Slide acks: Jinyang Li, Robert Morris

Topics in Reliable Distributed Systems

Intuitive distributed algorithms. with F#

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Distributed System Engineering: Spring Exam II

Distributed Algorithms Benoît Garbinato

Fundamentals Large-Scale Distributed System Design. (a.k.a. Distributed Systems 1)

Viewstamped Replication to Practical Byzantine Fault Tolerance. Pradipta De

Fault Tolerance via the State Machine Replication Approach. Favian Contreras

Distributed Systems. Catch-up Lecture: Consistency Model Implementations

MDCC MULTI DATA CENTER CONSISTENCY. amplab. Tim Kraska, Gene Pang, Michael Franklin, Samuel Madden, Alan Fekete

MYE017 Distributed Systems. Kostas Magoutis

Replication. Consistency models. Replica placement Distribution protocols

Distributed Systems. 09. State Machine Replication & Virtual Synchrony. Paul Krzyzanowski. Rutgers University. Fall Paul Krzyzanowski

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

CS Amazon Dynamo

CSE 5306 Distributed Systems. Fault Tolerance

Differential Privacy

Consensus and related problems

Paxos and Distributed Transactions

CISC 7610 Lecture 2b The beginnings of NoSQL

Consistency and Replication

Transcription:

CPSC 426/526 Replications and Consensus Ennan Zhai Computer Science Department Yale University

Recall: Lec-8 and 9 In the lec-8 and 9, we learned: - Cloud storage and data processing - File system: Google file system - MapReduce - BigTable Google Application, e.g., Gmail MapReduce BigTable Google File System (GFS)

Lecture Roadmap Consistency Issues Consistency Models Two-Phase Commit Consensus Case Study: Paxos

Replication Technique Distributed systems replicate data across multiple servers - Replication provides fault-tolerance if servers fail - Allowing clients to access different servers potentially increasing scalability (max throughput) - What is the problem? Replica1 Replica2 Replica3 Server1 Server2 Server3

Replication Technique Distributed systems replicate data across multiple servers - Replication provides fault-tolerance if servers fail - Allowing clients to access different servers potentially increasing scalability (max throughput) - What is the problem? Replica1 Replica2 Replica3 Server1 Server2 Server3

Replication Technique Distributed systems replicate data across multiple servers - Replication provides fault-tolerance if servers fail - Allowing clients to access different servers potentially increasing scalability (max throughput) - What is the problem? Client (Beijing) Replica1 Replica2 Replica3 Client (DC) Server1 Server2 Server3

Replication Technique Distributed systems replicate data across multiple servers - Replication provides fault-tolerance if servers fail - Allowing clients to access different servers potentially increasing scalability (max throughput) - What is the problem? Client (Beijing) Replica1 Replica2 Replica3 Client (DC) Server1 Server2 Server3

Consistency Problem Client (Beijing) X=0 X=0 X=0 Client (DC) Server1 Server2 Server3

Consistency Problem Client (Beijing) W(X,1) X=0 X=0 X=0 Client (DC) Server1 Server2 Server3

Consistency Problem R(X) = 1 or 0? Client (Beijing) W(X,1) X=1 X=? X=? Client (DC) Server1 Server2 Server3

Consistency Problem Client W(X,1) X=1 X=1 X=1 R(X)=1 Client (DC) (Beijing) Server1 Server2 Server3

Consistency Problem Client W(X,1) X=1 X=1 X=1 R(X)=1 Client (DC) (Beijing) Server1 Server2 Server3 Client W(X,1) X=1 X=0 X=0 R(X)=0 Client (DC) (Beijing) Server1 Server2 Server3

Lecture Roadmap Consistency Issues Consistency Models Two-Phase Commit Consensus Case Study: Paxos

Consistency Models A consistency model specifies a contract between programmer and system, wherein the system guarantees that if the programmer follows the rules, data will be consistent. A consistency model basically refers to the degree of consistency that should be maintained for the shared data. If a system supports the stronger consistency model, then the weaker consistency model is automatically supported. But stronger consistency models sacrifice more availability and fault tolerance.

Consistency Models A consistency model specifies a contract between programmer and system, wherein the system guarantees that if the programmer follows the rules, data will be consistent. A consistency model basically refers to the degree of consistency that should be maintained for the shared data. If a system supports the stronger consistency model, then the weaker consistency model is automatically supported. But stronger consistency models sacrifice more availability and fault tolerance.

Consistency Models A consistency model specifies a contract between programmer and system, wherein the system guarantees that if the programmer follows the rules, data will be consistent. A consistency model basically refers to the degree of consistency that should be maintained for the shared data. If a system supports the stronger consistency model, then the weaker consistency model is automatically supported. But stronger consistency models sacrifice more availability and fault tolerance.

Consistency Models A consistency model specifies a contract between programmer and system, wherein the system guarantees that if the programmer follows the rules, data will be consistent. A consistency model basically refers to the degree of consistency that should be maintained for the shared data. If a system supports the stronger consistency model, then the weaker consistency model is automatically supported. But stronger consistency models sacrifice more availability and fault tolerance.

Consistency Models Strict consistency Strong consistency (Linearizability) Sequential consistency Causal consistency Weaker Consistency Models Eventual consistency These models describe when and how different nodes in a distributed system view the order of operations

Consistency Models Strict consistency Strong consistency (Linearizability) Sequential consistency Why we have so many consistency models? Causal consistency Eventual consistency Weaker Consistency Models They are used for different application scenarios that balance the trade-off between consistency/availability/fault-tolerance. These models describe when and how different nodes in a distributed system view the order of operations

Consistency Models Strict consistency Strong consistency (Linearizability) Sequential consistency Causal consistency Weaker Consistency Models Eventual consistency These models describe when and how different nodes in a distributed system view the order of operations

Strict Consistency Strongest consistency model we will consider - Any read on a data item X returns value corresponding to result of the most recent write on X Need an absolute global time - Most recent needs to be unambiguous - Corresponds to when operation was issued - Impossible to implement in real-world (network delays)

Strict Consistency Strongest consistency model we will consider - Any read on a data item X returns value corresponding to result of the most recent write on X Need an absolute global time - Most recent needs to be unambiguous - Corresponds to when operation was issued - Impossible to implement in real-world (network delays) Server 1 Server 2 Server 3 W(x,a) 0=R(x) a=r(x)

Consistency Models Strict consistency Strong consistency (Linearizability) Sequential consistency Causal consistency Weaker Consistency Models Eventual consistency These models describe when and how different nodes in a distributed system view the order of operations

Strong Consistency Provide behavior of a single copy of object: - Read should return the most recent write - Subsequent reads should return same value, until next write

Strong Consistency Provide behavior of a single copy of object: - Read should return the most recent write - Subsequent reads should return same value, until next write Telephone intuition: - 1. Alice updates Facebook post - 2. Alice calls Bob on phone: Check my Facebook post! - 3. Bob read s Alice s wall, sees her post

Strong Consistency? Server 1 Server 2 Server 3 write(a,1) success

Strong Consistency? write(a,1) Server 1 Server 2 Server 3 read(a) 1

Strong Consistency? write(a,1) Server 1 Server 2 Server 3 read(a) 1 Phone call: Ensures happens-before relationship, even though out-of-band communication

Strong Consistency? write(a,1) Server 1 Server 2 Server 3 read(a) 1 Cool idea: Delay responding to writes/ops until committed

Strong Consistency? This is buggy! Server 1 Server 2 write(a,1) Server 3 read(a) 1? or 0? - Isn t sufficient to return value of server3: It does not know precisely when op is globally committed - Instead: Need to actually order read operation

Strong Consistency? This is buggy! Server 1 Server 2 write(a,1) success Server 3 committed read(a) 1 - Isn t sufficient to return value of server3: It does not know precisely when op is globally committed - Instead: Need to actually order read operation

Strong Consistency!!! Server 1 Server 2 write(a,1) write success on all servers Server 3 read(a) 1 - Order all operations via (1) leader and (2) agreement

Strong Consistency = Linearizability Linearizability: - All servers execute all ops in some identical sequential order - Global ordering preserves each client s own local ordering - Global ordering preserves real-time guarantee All operations receive global time-stamp via a sync d clock If TS(x)<TS(y), then OP(x) precedes OP(y) in the sequence Once write completes, all later reads should return value of that write or value of later write Once read returns particular value, all later reads should return that value or value of later write

Intuition: Real-time ordering Server 1 Server 2 write(a,1) Server 3 committed read(a) 1 Once write completes, all later reads should return value of that write or value of later write Once read returns particular value, all later reads should return that value or value of later write

Intuition: Real-time ordering Server 1 Server 2 write(a,1) Server 3 committed read(a) 1 Once write completes, all later reads should return value of that write or value of later write Once read returns particular value, all later reads should return that value or value of later write

Consistency Models Strict consistency Strong consistency (Linearizability) Sequential consistency Causal consistency Eventual consistency Weaker Consistency Models These models describe when and how different nodes in a distributed system view the order of operations

Weaker: Sequential Consistency Sequential = linearizability - real-time ordering

Weaker: Sequential Consistency Sequential = linearizability - real-time ordering Linearizability: - All servers execute all ops in some identical sequential order - Global ordering preserves each client s own local ordering - Global ordering preserves real-time guarantee All operations receive global time-stamp via a sync d clock If TS(x)<TS(y), then OP(x) precedes OP(y) in the sequence

Weaker: Sequential Consistency Sequential = linearizability - real-time ordering Linearizability Sequential: - All servers execute all ops in some identical sequential order - Global ordering preserves each client s own local ordering - Global ordering preserves real-time guarantee All operations receive global time-stamp via a sync d clock If TS(x)<TS(y), then OP(x) precedes OP(y) in the sequence

Weaker: Sequential Consistency Sequential consistency: All (read/write) operations on data store were executed in some sequential order, and the operations of each individual process appear in this sequence With concurrent ops, reordering of ops acceptable, but all servers must see same order: - linearizability cares about time but sequential consistency cares about program order

Sequential Consistency write(a,1) Server 1 Server 2 Server 3 read(a) 0 In this example, system orders read(a) before write(a, 1)

Implementing Sequential Consistency Nodes use vector clocks to determine if two events had distinct happens-before relationship: - lf timestamp(a) < timestamp(b) => a b If ops are concurrent (i,j, a[i]<b[i] and a[j]>b[j]): - Hosts can order ops a, b arbitrarily but consistently OP2 OP4 OP1 OP3

Building Block: Vector Clock Initially all clocks are zero Each time a process experiences an internal event, it increments its own logical clock in the vector by one Each time for a process to send a message, it increments its own clock and then sends a copy of its own vector Each time a process receives a message, it increments its own logical clock by one and updates each element in its vector by max(own, received)

Building Block: Vector Clock A A:0 B B:0 C C:0

Building Block: Vector Clock A A:0 B B:0 B:1 C:1 C C:0 C:1

Building Block: Vector Clock A A:0 A:1 B:2 C:1 B B:0 B:1 C:1 B:2 C:1 C C:0 C:1

Building Block: Vector Clock A A:0 A:1 B:2 C:1 A:2 B:2 C:1 B B:0 B:1 C:1 B:2 C:1 B:3 C:1 A:2 B:4 C:1 C C:0 C:1 B:3 C:2

Building Block: Vector Clock A A:0 A:1 B:2 C:1 A:2 B:2 C:1 A:3 B:3 C:3 B B:0 B:1 C:1 B:2 C:1 B:3 C:1 A:2 B:4 C:1 C C:0 C:1 B:3 C:2 B:3 C:3

Building Block: Vector Clock A A:0 B B:0 B:1 C:1 A:1 B:2 C:1 B:2 C:1 B:3 C:1 A:2 B:2 C:1 A:2 B:4 C:1 A:2 B:5 C:1 A:3 B:3 C:3 C C:0 C:1 B:3 C:2 B:3 C:3 A:2 B:5 C:4

Building Block: Vector Clock A A:0 B B:0 B:1 C:1 A:1 B:2 C:1 B:2 C:1 B:3 C:1 A:2 B:2 C:1 A:2 B:4 C:1 A:2 B:5 C:1 A:3 B:3 C:3 A:4 B:5 C:5 C C:0 C:1 B:3 C:2 B:3 C:3 A:2 B:5 C:4 A:2 B:5 C:5

Implementing Sequential Consistency Nodes use vector clocks to determine if two events had distinct happens-before relationship: - lf timestamp(a) < timestamp(b) => a b If ops are concurrent (i,j, a[i]<b[i] and a[j]>b[j]): - Hosts can order ops a, b arbitrarily but consistently OP2 OP4 OP1 OP3

Implementing Sequential Consistency Host1: OP 1, 2, 3, 4 Host2: OP 1, 2, 3, 4 OP2 OP4 OP1 OP3

Implementing Sequential Consistency Host1: OP 1, 2, 3, 4 Host2: OP 1, 2, 3, 4 Host1: OP 1, 3, 2, 4 Host2: OP 1, 3, 2, 4 OP2 OP4 OP1 OP3

Implementing Sequential Consistency Host1: OP 1, 2, 3, 4 Host2: OP 1, 2, 3, 4 Host1: OP 1, 3, 2, 4 Host2: OP 1, 3, 2, 4 Host1: OP 1, 2, 3, 4 Host2: OP 1, 3, 2, 4 OP2 OP4 OP1 OP3

Sequential Consistency Server 1 Server 2 Server 3 Server 4 W(x,a) W(x,b) b=r(x) b=r(x) a=r(x) a=r(x) Is this valid sequential consistency? - It is, because Server 3 and 4 agree on order of ops

Sequential Consistency Server 1 Server 2 Server 3 Server 4 W(x,a) W(x,b) b=r(x) b=r(x) a=r(x) a=r(x) Is this valid sequential consistency? - It is, because Server 3 and 4 agree on order of ops

Sequential Consistency Server 1 Server 2 Server 3 Server 4 W(x,a) W(x,b) b=r(x) a=r(x) a=r(x) b=r(x) Is this valid sequential consistency? - No, because Server 3 and 4 do not agree on order of ops. - In practice, does not matter when events took place on different machine, as long as server agree on order

Sequential Consistency Server 1 Server 2 Server 3 Server 4 W(x,a) W(x,b) b=r(x) a=r(x) a=r(x) b=r(x) Is this valid sequential consistency? - No, because Server 3 and 4 do not agree on order of ops. - In practice, does not matter when events took place on different machine, as long as server agree on order

Sequential Consistency Server 1 Server 2 Server 3 Server 4 W(x,a) W(x,b) b=r(x) a=r(x) a=r(x) b=r(x) Is this valid sequential consistency? - No, because Server 3 and 4 do not agree on order of ops. - In practice, does not matter when events took place on different machine, as long as server agree on order Causal consistency

Sequential Consistency Server 1 Server 2 Server 3 Server 4 W(x,a) W(x,b) b=r(x) b=r(x) a=r(x) a=r(x) A valid sequential consistency

Sequential Consistency W(x,a) Server 1 Server 2 Server 3 Server 4 W(x,b) b=r(x) b=r(x) a=r(x) a=r(x) A valid sequential consistency or not?

Sequential Consistency W(x,a) Server 1 Server 2 Server 3 Server 4 W(x,b) b=r(x) b=r(x) a=r(x) a=r(x) A valid sequential consistency or not? - No, because it does not preserve local ordering

Consistency Models Strict consistency Strong consistency (Linearizability) Sequential consistency Causal consistency Eventual consistency Weaker Consistency Models Weak consistency model These models describe when and how different nodes in a distributed system view the order of operations

Causal consistency: Causal Consistency - Causal consistency is one of weak consistency models - Causally related writes must be seen by all processes in the same order - Concurrent writes may be seen in different orders on different machines

Causal Consistency Server 1 Server 2 W(x,a) a=r(x) W(x,b) Server 3 b=r(x) a=r(x) Server 4 a=r(x) b=r(x) Not valid Causally related writes must be seen by all processes in the same order

Causal Consistency Server 1 Server 2 Server 3 Server 4 W(x,a) W(x,b) b=r(x) a=r(x) a=r(x) b=r(x) Valid

Consistency Models Strict consistency Strong consistency (Linearizability) Sequential consistency Causal consistency Eventual consistency Weaker Consistency Models Weak consistency model These models describe when and how different nodes in a distributed system view the order of operations

Eventual Consistency Eventual consistency: - Achieve high availability - If no new updates are made to a given data item, eventually all accesses to the data will return the last updated value. Eventual consistency is commonly used: - Git repo, iphone sync - Dropbox and Amazon Dynamo

Consistency Models Strict consistency Strong consistency (Linearizability) Sequential consistency Causal consistency Eventual consistency Weaker Consistency Models Weak consistency model These models describe when and how different nodes in a distributed system view the order of operations

Lecture Roadmap Consistency Issues Consistency Models Two-Phase Commit Consensus Case Study: Paxos

Two-Phase Commit Goal: Reliably agree to commit or abort a collection of sub-transactions All the operations happens at single master node - Concurrent machines - Failure and recovery of machines Achieve strong consistency!

Intuitive Example You want to organize outing with 3 friends at 6pm Tue - Go out only if all friends can make it What do you do? - Call each of them and ask if can do 6pm Tue (voting phase) - If all can do Tue, call each friend back to ACK (commit) - If one cannot do Tue, call others to cancel (abort)

Intuitive Example You want to organize outing with 3 friends at 6pm Tue - Go out only if all friends can make it What do you do? - Call each of them and ask if can do 6pm Tue (voting phase) - If all can do Tue, call each friend back to ACK (commit) - If one cannot do Tue, call others to cancel (abort) This is exactly how two-phase commit works

Two-Phase Commit Protocol Phase 1: Voting phase - Get commit agreement from every participant - A single no response means that we will have to abort Coordinator commit? commit? commit? commit? Participant Participant Participant Participant

Two-Phase Commit Protocol Phase 1: Voting phase - Get commit agreement from every participant - A single no response means that we will have to abort yes yes Participant Participant Coordinator yes yes Participant Participant

Two-Phase Commit Protocol Phase 1: Voting phase - Get commit agreement from every participant - A single no response means that we will have to abort yes yes Participant Participant Coordinator yes yes Participant Participant

Two-Phase Commit Protocol Phase 2: Commit phase - Send the results of the vote to every participant - Send abort if any participant voted no in Phase 1 Coordinator commit commit commit commit Participant Participant Participant Participant

Two-Phase Commit Protocol Phase 2: Commit phase - Get committed acknowledgements from every participant - A single no response means that we will have to abort ack ack Participant Participant Coordinator ack ack Participant Participant

Two-Phase Commit Protocol Two-phase commit assumes a fail-recover model - Any failed system will eventually recover A recovered system cannot change its mind - If a node agreed to commit and then crashed, it must be willing and able to commit upon recovery If the leader fails? - Lose availability: system not longer live

Two-Phase Commit Protocol

Lecture Roadmap Consistency Issues Consistency Models Two-Phase Commit Consensus Case Study: Paxos

Consensus / Agreement Problem Definition: - A general agreement about something - An idea or opinion that is shared by all the people in a group Given a set of processors, each with an initial value: - Termination: All non-faulty processes eventually decide on a value - Agreement: All processes that decide do so on the same value - Validity: The value that has been decided must have proposed by some process

Consensus / Agreement Problem Goal: N processes want to agree on a value Correctness (safety): - All N nodes agree on the same value - The agreed value has been proposed by some node Fault-tolerance: - If <= F faults in a window, consensus reached eventually - Liveness not guaranteed: If > F faults, no consensus - Given goal of F, what is N? Depends on fault model ( Crash fault need 2F+1; Byzantine fault needs 3F+1)

Consensus / Agreement Problem Goal: N processes want to agree on a value Correctness (safety): - All N nodes agree on the same value - The agreed value has been proposed by some node Fault-tolerance: - If <= F faults in a window, consensus reached eventually - Liveness not guaranteed: If > F faults, no consensus - Given goal of F, what is N? Depends on fault model ( Crash fault need 2F+1; Byzantine fault needs 3F+1)

Consensus / Agreement Problem Goal: N processes want to agree on a value Correctness (safety): - All N nodes agree on the same value - The agreed value has been proposed by some node Fault-tolerance: - If <= F faults in a window, consensus reached eventually - Liveness not guaranteed: If > F faults, no consensus - Given goal of F, what is N? Depends on fault model ( Crash fault need 2F+1; Byzantine fault needs 3F+1)

Lecture Roadmap Consistency Issues Consistency Models Two-Phase Commit Consensus Case Study: Paxos

Paxos

Paxos Safety: - Only a single value is chosen - Only a proposed value can be chosen - Only chosen values are learned by processes Liveness: - Some proposed value eventually chosen if fewer than half of processes fail - If value is chosen, a process eventually learns it

Three conceptual roles: Paxos - Proposers: propose values - Acceptors: accept values, where chosen if majority accept - Learners: learn the outcome (the chosen value) In reality, a process can play any/all roles Ordering: proposal is tuple [proposal #, value] = [n,v] - Proposal # strictly increasing, globally unique - Globally unique? Trick: set low-order bits to proposer s ID

Three conceptual roles: Paxos - Proposers: propose values - Acceptors: accept values, where chosen if majority accept - Learners: learn the outcome (the chosen value) In reality, a process can play any/all roles Ordering: proposal is tuple [proposal #, value] = [n,v] - Proposal # strictly increasing, globally unique - Globally unique? Trick: set low-order bits to proposer s ID

Three conceptual roles: Paxos - Proposers: propose values - Acceptors: accept values, where chosen if majority accept - Learners: learn the outcome (the chosen value) In reality, a process can play any/all roles Ordering: proposal is tuple [proposal #, value] = [n,v] - Proposal # strictly increasing, globally unique - Globally unique? Trick: set low-order bits to proposer s ID

Paxos Proposer A Proposer B Acceptor A Acceptor B Acceptor C Learner Prepare request [n=2, v=5]

Paxos Proposer A Proposer B Acceptor A Prepare request [n=2, v=5] [n=2, v=5] Acceptor B [n=2, v=5] Acceptor C Learner

Paxos Proposer A Proposer B Acceptor A Prepare request [n=2, v=5] [n=2, v=5] Prepare response [no previous] Prepare response [no previous] Acceptor B [n=2, v=5] Acceptor C Learner

Paxos Proposer A Proposer B Prepare request [n=2, v=5] Prepare response [no previous] Prepare request [n=4, v=8] Prepare response [no previous] Acceptor A [n=2, v=5] Acceptor B [n=2, v=5] Acceptor C Learner

Paxos Proposer A Proposer B Prepare request [n=2, v=5] Prepare response [no previous] Prepare request [n=4, v=8] Prepare response [no previous] Acceptor A [n=2, v=5] Acceptor B [n=2, v=5] Acceptor C [n=4, v=8] Learner

Paxos Proposer A Proposer B Prepare request [n=2, v=5] Prepare response [no previous] Prepare request [n=4, v=8] Prepare response [no previous] Prepare response [no previous] Acceptor A [n=2, v=5] Acceptor B [n=2, v=5] Acceptor C [n=4, v=8] Learner

Proposer A Proposer B Acceptor A Prepare request [n=2, v=5] [n=2, v=5] Paxos Prepare response [no previous] Prepare request [n=4, v=8] Prepare response [no previous] Prepare response [n=2, v=5] Prepare response [n=2, v=5] Acceptor B [n=2, v=5] Acceptor C [n=4, v=8] Learner

Paxos Proposer A Proposer B Acceptor A Prepare request [n=2, v=5] [n=2, v=5] Prepare response [no previous] Prepare request [n=4, v=8] Prepare response [no previous] [n=4, v=8] Accept request [n=4, v=8] Acceptor B [n=2, v=5] [n=4, v=8] Acceptor C Learner [n=4, v=8] [n=4, v=8]

Paxos Proposer A Proposer B Acceptor A Prepare request [n=2, v=5] [n=2, v=5] Prepare response [no previous] Prepare request [n=4, v=8] Prepare response [no previous] [n=4, v=8] Accept request [n=4, v=8] Acceptor B [n=2, v=5] [n=4, v=8] Acceptor C Learner [n=4, v=8] [n=4, v=8] Chosen value [v=8]

Paxos + Two-Phase Commit Use Paxos for view-change - If anybody notices current master unavailable, or one or more replicas unavailable - Propose view change Paxos to establish new group: Value agreed upon = <2PC Master, {2PC Replicas} >. Use two-phase commit for actual data - Writes go to master for two-phase commit - Reads go to acceptors and/or master

Distributed System Basics are done! UseNet [~1980] PageRank P2P lookup service Attacks in P2P systems P2P Computing Trust/Reputation Google file system & Xen MapReduce BigTable Firewall and NATs Cloud Computing

Next Lecture In the lec-11, I will cover: - Preserving privacy in the cloud - Basic cryptographic knowledge - CryptDB