Distributed algorithms

Similar documents
Distributed Algorithms

Distributed Algorithms Failure detection and Consensus. Ludovic Henrio CNRS - projet SCALE

Distributed Algorithms Models

Distributed systems. Consensus

Distributed Algorithms Benoît Garbinato

Coordination and Agreement

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

R. Guerraoui Distributed Programming Laboratory lpdwww.epfl.ch

Distributed Systems (ICE 601) Fault Tolerance

Coordination and Agreement

CSE 5306 Distributed Systems

Part 5: Total Order Broadcast

Introduction to Distributed Systems Seif Haridi

CSE 5306 Distributed Systems. Fault Tolerance

Distributed Systems. coordination Johan Montelius ID2201. Distributed Systems ID2201

Two-Phase Atomic Commitment Protocol in Asynchronous Distributed Systems with Crash Failure

CS505: Distributed Systems

Consensus, impossibility results and Paxos. Ken Birman

Consensus a classic problem. Consensus, impossibility results and Paxos. Distributed Consensus. Asynchronous networks.

CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University

CS505: Distributed Systems

Ruminations on Domain-Based Reliable Broadcast

Failures, Elections, and Raft

C 1. Recap. CSE 486/586 Distributed Systems Failure Detectors. Today s Question. Two Different System Models. Why, What, and How.

Today: Fault Tolerance

Self Stabilization. CS553 Distributed Algorithms Prof. Ajay Kshemkalyani. by Islam Ismailov & Mohamed M. Ali

Fault Tolerance. Distributed Software Systems. Definitions

System Models. 2.1 Introduction 2.2 Architectural Models 2.3 Fundamental Models. Nicola Dragoni Embedded Systems Engineering DTU Informatics

Parallel and Distributed Systems. Programming Models. Why Parallel or Distributed Computing? What is a parallel computer?

Semi-Passive Replication in the Presence of Byzantine Faults

CSE 486/586 Distributed Systems

Failure Tolerance. Distributed Systems Santa Clara University

C 1. Today s Question. CSE 486/586 Distributed Systems Failure Detectors. Two Different System Models. Failure Model. Why, What, and How

Specifying and Proving Broadcast Properties with TLA

Coordination and Agreement

Consensus and related problems

Today: Fault Tolerance. Fault Tolerance

Intuitive distributed algorithms. with F#

Distributed systems. Total Order Broadcast

Distributed Algorithms Reliable Broadcast

Research Report. (Im)Possibilities of Predicate Detection in Crash-Affected Systems. RZ 3361 (# 93407) 20/08/2001 Computer Science 27 pages

Consensus Problem. Pradipta De

MENCIUS: BUILDING EFFICIENT

CSE 486/586 Distributed Systems

Distributed Systems COMP 212. Revision 2 Othon Michail

Distributed Systems. Day 13: Distributed Transaction. To Be or Not to Be Distributed.. Transactions

Replicated State Machine in Wide-area Networks

Fault Tolerance Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University

Consensus in the Presence of Partial Synchrony

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

Distributed Algorithms. Partha Sarathi Mandal Department of Mathematics IIT Guwahati

Fault Tolerance. Basic Concepts

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Fault Tolerance. Distributed Systems. September 2002

Topics in Reliable Distributed Systems

Distributed Computing. CS439: Principles of Computer Systems November 19, 2018

Coordination 1. To do. Mutual exclusion Election algorithms Next time: Global state. q q q

Parsimonious Asynchronous Byzantine-Fault-Tolerant Atomic Broadcast

Synchrony Weakened by Message Adversaries vs Asynchrony Enriched with Failure Detectors. Michel Raynal, Julien Stainer

Lecture XII: Replication

Distributed Computing. CS439: Principles of Computer Systems November 20, 2017

Clock Synchronization. Synchronization. Clock Synchronization Algorithms. Physical Clock Synchronization. Tanenbaum Chapter 6 plus additional papers

Lecture 7: Logical Time

Concepts. Techniques for masking faults. Failure Masking by Redundancy. CIS 505: Software Systems Lecture Note on Consensus

A Case Study of Agreement Problems in Distributed Systems : Non-Blocking Atomic Commitment

Dep. Systems Requirements

To do. Consensus and related problems. q Failure. q Raft

Distributed Systems (5DV147)

Arvind Krishnamurthy Fall Collection of individual computing devices/processes that can communicate with each other

Distributed Systems COMP 212. Lecture 19 Othon Michail

7 Fault Tolerant Distributed Transactions Commit protocols

Distributed Systems. 09. State Machine Replication & Virtual Synchrony. Paul Krzyzanowski. Rutgers University. Fall Paul Krzyzanowski

Paxos. Sistemi Distribuiti Laurea magistrale in ingegneria informatica A.A Leonardo Querzoni. giovedì 19 aprile 12

Distributed Systems 8L for Part IB

Data Consistency and Blockchain. Bei Chun Zhou (BlockChainZ)

Point-Set Topology for Impossibility Results in Distributed Computing. Thomas Nowak

Practical Byzantine Fault Tolerance

DISTRIBUTED SYSTEMS [COMP9243] Lecture 5: Synchronisation and Coordination (Part 2) TRANSACTION EXAMPLES TRANSACTIONS.

DISTRIBUTED SYSTEMS [COMP9243] Lecture 5: Synchronisation and Coordination (Part 2) TRANSACTION EXAMPLES TRANSACTIONS.

A General Characterization of Indulgence

Distributed System. Gang Wu. Spring,2018

Distributed Synchronization. EECS 591 Farnam Jahanian University of Michigan

Agreement in Distributed Systems CS 188 Distributed Systems February 19, 2015

Distributed Consensus Protocols

Byzantine Failures. Nikola Knezevic. knl

Coordinating distributed systems part II. Marko Vukolić Distributed Systems and Cloud Computing

6.852: Distributed Algorithms Fall, Class 21

Distributed Systems. Multicast and Agreement

CS 425 / ECE 428 Distributed Systems Fall 2017

Chapter 8 Fault Tolerance

CS457 Transport Protocols. CS 457 Fall 2014

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University

Network Protocols. Sarah Diesburg Operating Systems CS 3430

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad

Applications of Paxos Algorithm

Global atomicity. Such distributed atomicity is called global atomicity A protocol designed to enforce global atomicity is called commit protocol

Transactions. CS 475, Spring 2018 Concurrent & Distributed Systems

Distributed Deadlock

Adapting Commit Protocols for Large-Scale and Dynamic Distributed Applications

Transcription:

Distributed algorithms Prof R. Guerraoui lpdwww.epfl.ch Exam: Written Reference: Book - Springer Verlag http://lpd.epfl.ch/site/education/da - Introduction to Reliable (and Secure) Distributed Programming - R. Guerraoui 1

Algorithms (History) M. Al-Khawarizmi ~9th century: inventor of the zero, the decimal system, Arithmetic and Algebra A. Turing: all machines are equal 2

What is an algorithm? An ordered set of elementary instructions All execute on the same Turing machine Complexity measures the number of instructions (variables) 3

Distributed algorithms E. Dijkstra (concurrent os)~60 s L. Lamport: a distributed system is one that stops your application because a machine you have never heard from crashed ~70 s J. Gray (transactions) ~70 s N. Lynch (consensus) ~80 s Birman, Schneider, Toueg Cornell (this course) ~90 s 4

In short We study algorithms for distributed systems A new way of thinking about algorithms and their complexity Whereas a centralized algorithm is the soul of a computer, a distributed algorithm is the soul of a society of computers 5

Important This course is complementary to the course (concurrent algorithms) We study here message passing based algorithms whereas the other course focuses on shared memory based algorithms 6

Overview (1) Why? Motivation (2) Where? Between the network and the application (3) How? (3.1) Specifications, (3.2) assumptions, and (3.3) algorithms 7

A distributed system B A C 8

Clients-server Client A Client B Server 9

Multiple servers (genuine distribution) Server B Server A Server C 10

Applications Traffic control Finances: e-transactions, e- banking, stock-exchange Reservation systems Pretty much everything on the cloud 11

The optimistic view Concurrency => speed (loadbalancing) Partial failures => highavailability 12

The pessimistic view Concurrency (interleaving) => incorrectness Partial failures => incorrectness 13

Distributed algorithms (Today: Google) Hundreds of thousands of machines connected A Google job involves 2000 machines 10 machines go down per day 14

Overview (1) Why? Motivation (2) Where? Between the network and the application (3) How? (3.1) Specifications, (3.2) assumptions, and (3.3) algorithms 15

Distributed systems 16

Distributed systems The application needs underlying services for distributed interaction The network is not enough Reliability guarantees (e.g., TCP) are only offered for communication among pairs of processes, i.e., one-to-one communication (client-server) 17

Content of this course Reliable broadcast Causal order broadcast Shared memory Consensus Total order broadcast Atomic commit Leader election Terminating reliable broadcast 18

Reliable distributed services Example 1: reliable broadcast Ensure that a message sent to a group of processes is received (delivered) by all or none Example 2: atomic commit Ensure that the processes reach a common decision on whether to commit or abort a transaction 19

Underlying services (1): processes (abstracting computers) (2): channels (abstracting networks) (3): failure detectors (abstracting time) 20

Processes The distributed system is made of a finite set of processes: each process models a sequential program Processes are denoted by p1,..pn or p, q, r Processes have unique identities and know each other Every pair of processes is connected by a link through which the processes exchange messages 21

Processes A process executes a step at every tick of its local clock: a step consists of A local computation (local event) and message exchanges with other processes (global event) NB. One message is delivered from/sent to a process per step 22

Processes The program of a process is made of a finite set of modules (or components) organized as a software stack Modules within the same process interact by exchanging events upon event < Event1, att1, att2,..> do // something trigger < Event2, att1, att2,..> 23

Modules of a process indication request request request (deliver) indication (deliver) indication (deliver) 24

Overview (1) Why? Motivation (2) Where? Between the network and the application (3) How? (3.1) Specifications, (3.2) assumptions, and (3.3) algorithms 25

Approach Specifications: What is the service? i.e., the problem ~ liveness + safety Assumptions: What is the model, i.e., the power of the adversary? Algorithms: How do we implement the service? Where are the bugs (proof)? What cost? 26

Overview (1) Why? Motivation (2) Where? Between the network and the application (3) How? (3.1) Specifications, (3.2) assumptions, and (3.3) algorithms 27

Liveness and safety Safety is a property which states that nothing bad should happen Liveness is a property which states that something good should happen Any specification can be expressed in terms of liveness and safety properties (Lamport and Schneider) 28

Liveness and safety Example: Tell the truth Having to say something is liveness Not lying is safety 29

Specifications Example 1: reliable broadcast Ensure that a message sent to a group of processes is received by all or none Example 2: atomic commit Ensure that the processes reach a common decision on whether to commit or abort a transaction 30

Overview (1) Why? Motivation (2) Where? Between the network and the application (3) How? (3.1) Specifications, (3.2) assumptions, and (3.3) algorithms 31

Overview (1) Why? Motivation (2) Where? Between the network and the application (3) How? (3.1) Specifications, (3.2) assumptions, and (3.3) algorithms 3.2.1 Assumptions on processes and channels 3.2.2 Failure detection 32

Processes A process either executes the algorithm assigned to it (steps) or fails Two kinds of failures are mainly considered: Omissions: the process omits to send messages it is supposed to send (distracted) Arbitrary: the process sends messages it is not supposed to send (malicious or Byzantine) Many models are in between 33

Processes Crash-stop: a more specific case of omissions A process that omits a message to a process, omits all subsequent messages to all processes (permanent distraction): it crashes 34

Processes By default, we shall assume a crashstop model throughout this course; that is, unless specified otherwise: processes fail only by crashing (no recovery) A correct process is a process that does not fail (that does not crash) 35

Processes/Channels Processes communicate by message passing through communication channels Messages are uniquely identified and the message identifier includes the sender s identifier 36

Fair-loss links FL1. Fair-loss: If a message is sent infinitely often by pi to pj, and neither pi or pj crashes, then m is delivered infinitely often by pj FL2. Finite duplication: If a message m is sent a finite number of times by pi to pj, m is delivered a finite number of times by pj FL3. No creation: No message is delivered unless it was sent 37

Stubborn links SL1. Stubborn delivery: if a process pi sends a message m to a correct process pj, and pi does not crash, then pj delivers m an infinite number of times SL2. No creation: No message is delivered unless it was sent 38

Algorithm (sl) Implements: StubbornLinks (sp2p). Uses: FairLossLinks (flp2p). upon event < sp2psend, dest, m> do while (true) do trigger < flp2psend, dest, m>; upon event < flp2pdeliver, src, m> do trigger < sp2pdeliver, src, m>; 39

Reliable (Perfect) links Properties PL1. Validity: If pi and pj are correct, then every message sent by pi to pj is eventually delivered by pj PL2. No duplication: No message is delivered (to a process) more than once PL3. No creation: No message is delivered unless it was sent 40

Algorithm (pl) Implements: PerfectLinks (pp2p). Uses: StubbornLinks (sp2p). upon event < Init> do delivered := ; upon event < pp2psend, dest, m> do trigger < sp2psend, dest, m>; upon event < sp2pdeliver, src, m> do if m delivered then trigger < pp2pdeliver, src, m>; add m to delivered; 41

Reliable links We shall assume reliable links (also called perfect) throughout this course (unless specified otherwise) Roughly speaking, reliable links ensure that messages exchanged between correct processes are not lost 42

Reliable FIFO links Ensures properties PL1 to PL3 of perfect links FIFO. The messages are delivered in the same order they were sent. 43

Algorithm (fl1) Implements: Reliable FIFO links (fp2p). Uses: Reliable links (pp2p). Relies on acknowledgements messages. Acknowledgements are control messages. 44

Algorithm (fl1) upon event <init> do nb_acks[*] := 0 nb_sent[*] := 0 upon event <fp2psend, dest, m> do wait nb_acks[dest] = nb_sent[dest] nb_sent[dest] := nb_sent[dest]+1 trigger <p2psend, dest, m> 45

Algorithm (fl1) upon event <pp2pdeliver, src, m> do trigger <pp2psend, src, ack> trigger <fp2pdeliver, src, m> upon event <pp2pdeliver, src, ack> do nb_ack[src] := nb_ack[src]+1 46

Algorithm (fl2) Implements: Reliable FIFO links (fp2p). Uses: Reliable links (pp2p). Relies on sequence numbers attached to each message. upon event <init> do seq_nb[*] := 0 next[*] := 0 47

Algorithm (fl2) upon event <fp2psend, dest, m> do fifo_m := ( seq_nb[dest], m ) trigger <pp2psend, dest, fifo_m)> seq_nb[dest] := seq_nb[dest]+1 upon event <pp2pdeliver, src, (sn,m)> do wait next[src] = sn trigger <fp2pdeliver, src, m> next[src] := next[src]+1 48

(fl1) vs. (fl2) (fl1) uses 2 messages per applicative message. (fl1) artificially limits bandwidth if latency is high. (fl2) increases the size of messages. Sequence numbers in (fl2) have an unbounded size. 49

Algorithm (fl3) Implements: Reliable FIFO links (fp2p). Uses: Reliable links (pp2p). Combines acknowledgements and sequence numbers mechanisms. An acknowledgement is sent every ack_int messages received. The sequence numbers are reset when they reach ack_int x win_size. The sender has to block at the right moment. 50

Algorithm (fl3) upon event <init> do seq_nb[*] := 0 next[*] := 0 ack_nb[*] := 0 51

Algorithm (fl3) upon event <fp2psend, dest, m> do wait ack_nb[dest] x ack_int > seq_nb[dest] win_size x ack_int fifo_m := ( seq_nb[dest] mod (win_size x ack_int), m ) trigger <pp2psend, dest, fifo_m)> seq_nb[dest] := seq_nb[dest]+1 52

Algorithm (fl3) upon event <pp2pdeliver, src, (sn,m)> do wait next[src] = sn if (sn+1) mod ack_int = 0 trigger <pp2psend, src, ack> next[src] := (next[src]+1) mod (win_size x ack_int) trigger <fp2pdeliver, src, m> upon event <pp2pdeliver, src, ack> do ack_nb[src] := ack_nb[src]+1 53

Overview (1) Why? Motivation (2) Where? Between the network and the application (3) How? (3.1) Specifications, (3.2) assumptions, and (3.3) algorithms 3.2.1 Processes and links 3.2.2 Failure Detection 54

Failure Detection A failure detector is a distributed oracle that provides processes with suspicions about crashed processes It is implemented using (i.e., it encapsulates) timing assumptions According to the timing assumptions, the suspicions can be accurate or not 55

Failure Detection A failure detector module is defined by events and properties Events Indication: <crash, p> Properties: Completeness Accuracy 56

Failure Detection Perfect: Strong Completeness: Eventually, every process that crashes is permanently suspected by every correct process Strong Accuracy: No process is suspected before it crashes Eventually Perfect: Strong Completeness Eventual Strong Accuracy: Eventually, no correct process is ever suspected 57

Failure detection Implementation: (1) Processes periodically send heartbeat messages (2) A process sets a timeout based on worst case round trip of a message exchange (3) A process suspects another process if it timeouts that process (4) A process that delivers a message from a suspected process revises its suspicion and doubles its time-out 58

Timing assumptions Synchronous: Processing: the time it takes for a process to execute a step is bounded and known Delays: there is a known upper bound limit on the time it takes for a message to be received Clocks: the drift between a local clock and the global real time clock is bounded and known Eventually Synchronous: the timing assumptions hold eventually Asynchronous: no assumption 59

Overview (1) Why? Motivation (2) Where? Between the network and the application (3) How? (3.1) Specifications, (3.2) assumptions, and (3.3) algorithms 60

Algorithms modules of a process indication request request request (deliver) indication (deliver) indication (deliver) 61

Algorithms p1 p2 m1 m3 m2 p3 62

Algorithms p1 m1 p2 m2 p3 crash 63

For every abstraction (A) We assume a crash-stop system with a perfect failure detector (failstop) We give algorithms (B) We try to make a weaker assumption We revisit the algorithms 64

Content of this course Reliable broadcast Causal order broadcast Shared memory Consensus Total order broadcast Atomic commit Leader election Terminating reliable broadcast. 65