Distributed Systems 11. Consensus. Paul Krzyzanowski

Similar documents
Distributed Systems. 10. Consensus: Paxos. Paul Krzyzanowski. Rutgers University. Fall 2017

Paxos Made Simple. Leslie Lamport, 2001

Distributed Systems. Fault Tolerance. Paul Krzyzanowski

Agreement in Distributed Systems CS 188 Distributed Systems February 19, 2015

Distributed Systems 24. Fault Tolerance

Today: Fault Tolerance

Distributed Systems. 09. State Machine Replication & Virtual Synchrony. Paul Krzyzanowski. Rutgers University. Fall Paul Krzyzanowski

Today: Fault Tolerance. Fault Tolerance

CSE 5306 Distributed Systems

Exam 2 Review. Fall 2011

CSE 5306 Distributed Systems. Fault Tolerance

Recall our 2PC commit problem. Recall our 2PC commit problem. Doing failover correctly isn t easy. Consensus I. FLP Impossibility, Paxos

Consensus and related problems

Distributed Systems. 19. Fault Tolerance Paul Krzyzanowski. Rutgers University. Fall 2013

Distributed Systems Consensus

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

Distributed Systems. 12. Concurrency Control. Paul Krzyzanowski. Rutgers University. Fall 2017

Distributed Systems 23. Fault Tolerance

Proseminar Distributed Systems Summer Semester Paxos algorithm. Stefan Resmerita

CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University

Dep. Systems Requirements

EMPIRICAL STUDY OF UNSTABLE LEADERS IN PAXOS LONG KAI THESIS

Fault Tolerance Dealing with an imperfect world

Paxos provides a highly available, redundant log of events

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Failures, Elections, and Raft

To do. Consensus and related problems. q Failure. q Raft

Consensus a classic problem. Consensus, impossibility results and Paxos. Distributed Consensus. Asynchronous networks.

Distributed Systems. replication Johan Montelius ID2201. Distributed Systems ID2201

AGREEMENT PROTOCOLS. Paxos -a family of protocols for solving consensus

Recovering from a Crash. Three-Phase Commit

Recap. CSE 486/586 Distributed Systems Paxos. Paxos. Brief History. Brief History. Brief History C 1

Paxos and Raft (Lecture 21, cs262a) Ion Stoica, UC Berkeley November 7, 2016

Failure Tolerance. Distributed Systems Santa Clara University

Consensus, impossibility results and Paxos. Ken Birman

Distributed Systems. Before We Begin. Advantages. What is a Distributed System? CSE 120: Principles of Operating Systems. Lecture 13.

Fault Tolerance. Distributed Systems. September 2002

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

Today: Fault Tolerance. Replica Management

416 practice questions (PQs)

WA 2. Paxos and distributed programming Martin Klíma

Integrity in Distributed Databases

Fault Tolerance Part I. CS403/534 Distributed Systems Erkay Savas Sabanci University

Distributed Systems Fault Tolerance

Global atomicity. Such distributed atomicity is called global atomicity A protocol designed to enforce global atomicity is called commit protocol

Viewstamped Replication to Practical Byzantine Fault Tolerance. Pradipta De

Today: Fault Tolerance. Failure Masking by Redundancy

Consensus Problem. Pradipta De

Exam 2 Review. October 29, Paul Krzyzanowski 1

Fault Tolerance. Basic Concepts

Practical Byzantine Fault Tolerance. Miguel Castro and Barbara Liskov

Beyond FLP. Acknowledgement for presentation material. Chapter 8: Distributed Systems Principles and Paradigms: Tanenbaum and Van Steen

Paxos. Sistemi Distribuiti Laurea magistrale in ingegneria informatica A.A Leonardo Querzoni. giovedì 19 aprile 12

Distributed Commit in Asynchronous Systems

Basic concepts in fault tolerance Masking failure by redundancy Process resilience Reliable communication. Distributed commit.

Fault Tolerance 1/64

Chapter 8 Fault Tolerance

Module 8 - Fault Tolerance

Chapter 5: Distributed Systems: Fault Tolerance. Fall 2013 Jussi Kangasharju

Paxos and Replication. Dan Ports, CSEP 552

Byzantine Techniques

Verteilte Systeme/Distributed Systems Ch. 5: Various distributed algorithms

Fault-Tolerance & Paxos

Concepts. Techniques for masking faults. Failure Masking by Redundancy. CIS 505: Software Systems Lecture Note on Consensus

HT-Paxos: High Throughput State-Machine Replication Protocol for Large Clustered Data Centers

CS October 2017

Replication in Distributed Systems

Failure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems

Chapter 8 Fault Tolerance

Distributed Systems COMP 212. Lecture 19 Othon Michail

Module 8 Fault Tolerance CS655! 8-1!

Distributed Systems Principles and Paradigms. Chapter 08: Fault Tolerance

FAULT TOLERANCE. Fault Tolerant Systems. Faults Faults (cont d)

Distributed Systems COMP 212. Revision 2 Othon Michail

Byzantine Fault Tolerance and Consensus. Adi Seredinschi Distributed Programming Laboratory

Basic vs. Reliable Multicast

Practical Byzantine Fault Tolerance (The Byzantine Generals Problem)

Distributed Systems Exam 1 Review. Paul Krzyzanowski. Rutgers University. Fall 2016

Replications and Consensus

Byzantine Fault Tolerance

Paxos and Distributed Transactions

Distributed Systems. Day 11: Replication [Part 3 Raft] To survive failures you need a raft

Consensus in Distributed Systems. Jeff Chase Duke University

2017 Paul Krzyzanowski 1

Fault Tolerance. Distributed Software Systems. Definitions

Fault Tolerance. Goals: transparent: mask (i.e., completely recover from) all failures, or predictable: exhibit a well defined failure behavior

Fast Paxos. Leslie Lamport

Distributed Consensus Protocols

Intuitive distributed algorithms. with F#

Parallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitionin

Consensus on Transaction Commit

PROCESS SYNCHRONIZATION

Failure models. Byzantine Fault Tolerance. What can go wrong? Paxos is fail-stop tolerant. BFT model. BFT replication 5/25/18

Coordination 1. To do. Mutual exclusion Election algorithms Next time: Global state. q q q

Distributed Systems. 05. Clock Synchronization. Paul Krzyzanowski. Rutgers University. Fall 2017

Practice: Large Systems Part 2, Chapter 2

Group Replication: A Journey to the Group Communication Core. Alfranio Correia Principal Software Engineer

Paxos Made Live. An Engineering Perspective. Authors: Tushar Chandra, Robert Griesemer, Joshua Redstone. Presented By: Dipendra Kumar Jha

Homework #2 Nathan Balon CIS 578 October 31, 2004

Introduction to Distributed Systems Seif Haridi

Transcription:

Distributed Systems 11. Consensus Paul Krzyzanowski pxk@cs.rutgers.edu 1

Consensus Goal Allow a group of processes to agree on a result All processes must agree on the same value The value must be one that was submitted by at least one process (the consensus algorithm cannot just make up a value) 2

We saw this! Mutual exclusion Election algorithms Other uses: Manage group membership Distributed commit Synchronize state General problem: Get unanimous agreement on a given value 3

Achieving consensus is easy! Designate a system-wide coordinator to determine outcome BUT this assumes there are no failures or we are willing to wait indefinitely for recovery Example: 2-phase commit protocol (indefinite wait) All of the mutual exclusion algorithms we studied 4

Faults Three categories transient faults intermittent faults permanent faults Any fault may be fail-silent (fail-stop) Byzantine synchronous system vs. asynchronous system Synchronous = system responds to a message in a bounded time E.g., IP packet versus serial port transmission 5

Agreement in faulty systems Two army problem Good processors - faulty communication lines Coordinated attack Infinite acknowledgement problem 6

Agreement in faulty systems Byzantine Generals problem reliable communication lines - faulty processors n generals head different divisions m generals are traitors and are trying to prevent others from reaching agreement 4 generals agree to attack 4 generals agree to retreat 1 traitor tells the 1 st group that he ll attack and tells the 2 nd group that he ll retreat can the loyal generals reach agreement? 7

Agreement in faulty systems Byzantine Generals problem Solutions require: (3m+1) participants for m traitors (2m+1 loyal generals) m+1 rounds of message exchanges O(m 2 ) messages Costly solution! Variation: use signed messages Messages from loyal generals cannot be forged/altered Traitors can still lie Consensus can be achieved with (m+2) loyal generals 8

Agreement in faulty systems It is impossible to achieve consensus with asynchronous faulty processes There is no way to check whether a process failed or is alive but not communicating (or communicating quickly enough) We have to live with this 9

Consensus Goal Create a fault-tolerant consensus algorithm that does not block if a majority of processes are working Goal: agree on one result among a group of participants Processors may fail (some may need stable storage) Messages may be lost, out of order, or duplicated If delivered, messages are not corrupted The algorithm needs (2P+1) processors survive the simultaneous failure of P processors 10

Paxos 11

Paxos Fault-tolerant distributed consensus algorithm Goal: all machines agree on a proposed value The value can be associated with an event Paxos ensures that no other machine associates the value with another event 12

Paxos players Client: makes a request Proposers: Get a request from a client and run the protocol Leader: elected coordinator among the proposers s: multiple processes that remember the state of the protocol Quorum = any majority of acceptors Learners: When agreement has been reached by acceptors, a Learner executes the request and/or sends a response back to the client 13

Paxos in action Goal: have all acceptors agree to a value v associated with a proposal Leader Proposer Quorum Client Learner Paxos nodes: one machine may serve several roles 14

Paxos in action: Phase 0 Client sends a request to a proposer request Proposer Quorum Client Learner 15

Paxos in action: Phase 1a Proposer (leader) creates a proposal #N (N acts like a Lamport time stamp), where N is greater than any previous proposal number used by this proposer Send to Quorum Proposer Quorum request Client Proposal N Learner 16

Paxos in action: Phase 1b : if proposer s ID > any previous proposal reply with highest past proposal # & promise to ignore all IDs < N else ignore (return NACK proposal is rejected) request Proposer Promise(N) Quorum Client Proposal N Promise to ignore all proposals < N Learner Promise may contain the previous N and its value 17

Paxos in action: Phase 2a Proposer: if proposer receives enough promises Set a value v to the proposal. v can be max(previous values)+1 Send Accept Request to quorum with the chosen value request Proposer promise Quorum Client Accept Request(N, v) Promise to ignore all proposals < N Learner 18

Paxos in action: Phase 2b : if the promise still holds, then register the value v Send Accepted message to Proposer and every Learner else ignore the message (or send NACK) request Proposer Accepted promise Quorum Client Accept Request(N, v) Accepted Promise to ignore all proposals < N Learners 19

Paxos in action: Phase 3 Learner: Respond to client and/or take action on the request request Proposer promise Quorum Client Accept Request(N, v) Promise to ignore all proposals < N Learners 20

Paxos: Keep trying A proposal may fail because the acceptor may have made a new promise to ignore all proposals less than some value >N Because there are conflicts in Prepare messages from different proposers Because a proposer does not receive a quorum of responses: either promise (phase 1b) or accept (phase 2b) Algorithm then has to be restarted with a higher proposal # 21

Paxos summary Paxos allows us to ensure consistent (total) ordering over a set of events in a group of machines Events = commands, actions, state updates Each machine will have the latest state or a previous version of the state 22

Paxos summary To make a change to the system Tell the proposer (leader) the event/command you want to add Note: these requests may occur concurrently The proposer picks its next highest event ID and asks all the acceptors to reserve that event ID If any acceptor sees it s not the latest event ID, it rejects the proposal; the proposer will have to try again with another event ID When the majority of acceptors accept the proposal, accepted events are sent to learners, which can act on them (e.g., update system state) Fault tolerant: need 2k+1 servers for k fault tolerance 23

Replicated state machines We want high scalability and high availability Achieve via redundancy High availability means replicated functioning components will take place of ones that stop working Active-passive: replicated components are standing by Active-active: replicated components are working Model system as a state machine and replicate it Input to a specific state produces deterministic output and a transition to a new state To ensure correct execution & high availability Each process must see the same inputs Obtain consensus at each state transition 24

Leasing versus Locking Common approach: Get a lock for exclusive access to a resource But: locks are not fault-tolerant It s safer to use a lock that expires instead Lease = lock with a time limit Example: three-phase commit vs. two-phase commit Remote objects in.net or Java Trade-off Long lease with possibility of long wait after failure Or Short leases that need to be renewed frequently 25

Hierarchical Leases For fault tolerance, leases should be granted by consensus Consensus isn t super-efficient Compromise Use consensus as an election algorithm to elect a coordinator Coordinator is granted a lease on a large set of resources Coordinator hands out sub-leases on those resources When the coordinator s lease expires Consensus algorithm is run again 26

The End 27