Coordination 2. Today. How can processes agree on an action or a value? l Group communication l Basic, reliable and l ordered multicast

Similar documents
Coordination and Agreement

Coordination and Agreement

The UNIVERSITY of EDINBURGH. SCHOOL of INFORMATICS. CS4/MSc. Distributed Systems. Björn Franke. Room 2414

Multicast Communication. Brian Nielsen

CSE 486/586 Distributed Systems Reliable Multicast --- 1

CSE 486/586 Distributed Systems

CSE 486/586 Distributed Systems

Clock Synchronization. Synchronization. Clock Synchronization Algorithms. Physical Clock Synchronization. Tanenbaum Chapter 6 plus additional papers

Process groups and message ordering

殷亚凤. Synchronization. Distributed Systems [6]

Distributed Systems. Multicast and Agreement

Time in Distributed Systems

Distributed Algorithms Reliable Broadcast

Chapter 5: Distributed Systems: Fault Tolerance. Fall 2013 Jussi Kangasharju

Distributed Systems (5DV020) Group communication. Fall Group communication. Fall Indirect communication

Indirect Communication

Fault Tolerance. Distributed Systems. September 2002

CSE 5306 Distributed Systems

DISTRIBUTED SYSTEMS [COMP9243] Lecture 5: Synchronisation and Coordination (Part 2) TRANSACTION EXAMPLES TRANSACTIONS.

DISTRIBUTED SYSTEMS [COMP9243] Lecture 5: Synchronisation and Coordination (Part 2) TRANSACTION EXAMPLES TRANSACTIONS.

Basic vs. Reliable Multicast

Consensus and related problems

Coordination 1. To do. Mutual exclusion Election algorithms Next time: Global state. q q q

CSE 5306 Distributed Systems. Synchronization

Distributed Systems (ICE 601) Fault Tolerance

Election Algorithms. has elected i. will eventually set elected i

Failure Tolerance. Distributed Systems Santa Clara University

Advanced Topics in Distributed Systems. Dr. Ayman A. Abdel-Hamid. Computer Science Department Virginia Tech

SYNCHRONIZATION. DISTRIBUTED SYSTEMS Principles and Paradigms. Second Edition. Chapter 6 ANDREW S. TANENBAUM MAARTEN VAN STEEN

CSE 5306 Distributed Systems. Fault Tolerance

Distributed Systems Fault Tolerance

Last Class: Clock Synchronization. Today: More Canonical Problems

CSE 5306 Distributed Systems

Fault Tolerance. Distributed Software Systems. Definitions

Last Class: Naming. Today: Classical Problems in Distributed Systems. Naming. Time ordering and clock synchronization (today)

Chapter 8 Fault Tolerance

Time Synchronization and Logical Clocks

TIME ATTRIBUTION 11/4/2018. George Porter Nov 6 and 8, 2018

Important Lessons. A Distributed Algorithm (2) Today's Lecture - Replication

CSE 124: LAMPORT AND VECTOR CLOCKS. George Porter October 30, 2017

Indirect Communication

Arranging lunch value of preserving the causal order. a: how about lunch? meet at 12? a: <receives b then c>: which is ok?

Synchronization. Chapter 5

Chapter 6 Synchronization

Distributed Systems Multicast & Group Communication Services

Distributed Systems COMP 212. Lecture 19 Othon Michail

Replication and Consistency

Distributed Systems. coordination Johan Montelius ID2201. Distributed Systems ID2201

Replication in Distributed Systems

Distributed Systems Principles and Paradigms. Chapter 08: Fault Tolerance

Time. COS 418: Distributed Systems Lecture 3. Wyatt Lloyd

PROCESS SYNCHRONIZATION

Consistency & Replication

Failure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems

Fault Tolerance Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University

Chapter 6 Synchronization (1)

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

05 Indirect Communication

Fault Tolerance. Basic Concepts

Internetworking. Problem: There is more than one network (heterogeneity & scale)

Clock and Time. THOAI NAM Faculty of Information Technology HCMC University of Technology

Consistency and Replication. Why replicate?

Intuitive distributed algorithms. with F#

Event Ordering. Greg Bilodeau CS 5204 November 3, 2009

Distributed Synchronization. EECS 591 Farnam Jahanian University of Michigan

Distributed Systems COMP 212. Lecture 17 Othon Michail

Introduction to Distributed Systems Seif Haridi

MODELS OF DISTRIBUTED SYSTEMS

Lecture 7: Logical Time

Time and Space. Indirect communication. Time and space uncoupling. indirect communication

Time Synchronization and Logical Clocks

Distributed Systems. 09. State Machine Replication & Virtual Synchrony. Paul Krzyzanowski. Rutgers University. Fall Paul Krzyzanowski

CSE 5306 Distributed Systems. Consistency and Replication

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Distributed Systems. replication Johan Montelius ID2201. Distributed Systems ID2201

MODELS OF DISTRIBUTED SYSTEMS

CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University

Chapter 8 Fault Tolerance

Today: Fault Tolerance

Synchronisation and Coordination (Part 2)

Reliable Distributed System Approaches

Network Protocols. Sarah Diesburg Operating Systems CS 3430

Indirect Communication

Consistency in Distributed Systems

2017 Paul Krzyzanowski 1

Lecture 6: Multicast

Chapter 6 Synchronization (2)

Today: Fault Tolerance. Reliable One-One Communication

Communication Paradigms

Atomicity. Bailu Ding. Oct 18, Bailu Ding Atomicity Oct 18, / 38

Cache Coherence in Distributed and Replicated Transactional Memory Systems. Technical Report RT/4/2009

Group ID: 3 Page 1 of 15. Authors: Anne-Marie Bosneag Ioan Raicu Davis Ford Liqiang Zhang

Algorithms and protocols for distributed systems

Large Systems: Design + Implementation: Communication Coordination Replication. Image (c) Facebook

Consistency and Replication. Why replicate?

Distributed Systems Reliable Group Communication

Distributed Systems. 10. Consensus: Paxos. Paul Krzyzanowski. Rutgers University. Fall 2017

Today. A methodology for making fault-tolerant distributed systems.

Distributed Systems Principles and Paradigms

Transcription:

Coordination 2 Today l Group communication l Basic, reliable and l ordered multicast How can processes agree on an action or a value?

Modes of communication Unicast 1ç è 1 Point to point Anycast 1è <nearest of a set> Introduced with IPv6; used with BGP Broadcast 1è all processes in a system Multicast 1è many Group communication 2

Group communication An abstraction over network or an equivalent overlay network Central concept: group Process may join/leave A message to the group is sent to all members (multicast) with certain guarantees An important building block for Reliable information dissemination Collaborative applications (e.g., games included) A range of fault-tolerance strategies, including consistent update of replicated data System monitoring and management 3

Group communication design issues Closed vs open Closed only members can multicast to it Peer vs hierarchical P2P each member communicate with the group Hierarchical to communicate go through a coordinator Group creation/destruction, membership management Providing an interface for group membership changes Detecting participant failures, notifying all of changes Performing group address translation if hierarchical Challenges limit scalability of group communication 4

Group communication System model Processes may be in multiple groups The operation multicast(m,g) sends msg m to all processes in a group g Every msg carries the unique identifier of the sender and the unique destination group identifier Deliver(m) delivers a msg sent to the calling process Deliver Receive The msg is not always handed to the application layer of the process receiving it 5

Sending, receiving and delivering Multicast receiver algorithms decide when to deliver a message to the process/application A received message may be Delivered immediately put on a delivery queue that the process reads Placed on a hold-back queue maybe to wait for an earlier message Rejected/discarded maybe duplicate or an earlier message we don t need anymore Receiver deliver 2 2 3 Incoming messages 1 2 3 discarded 1 2 hold-back queue delivery queue 6

A useful, basic multicast A correct process will eventually deliver the message, as long as the multicaster does not crash This is beyond IP multicast We call the primitives B-multicast and B-deliver A straightforward way to implement it using a reliable one-to-one send operation To B-multicast(g,m): for each process p in g, send(p,m) On receive(m) at p: B-deliver(m) at p Potential ack-implosion Acks from reliable send may arrive close to each other, overloaded multicast process will start to drops them, leading to more msgs and more acks 7

Reliable multicast Define reliable multicast with R-multicast and R-deliver Must satisfy Integrity A correct p in group g delivers m at most once Validity If a correct process multicast m, then it will eventually deliver m Agreement If a correct process delivers m, then all other correct processes in group will eventually deliver m Related to atomicity all or nothing This does not hold for B-multicast which is based on reliable oneto-one sends, so some process may deliver while others do not Validity guarantees liveness for the sender?! Yes, but validity + agreement = overall liveness If one process eventually delivers a message m, 8

Reliable multicast Building on B-multicast On initialization Received := {} For process p to R-multicast message m to group g, B-multicast(g,m) // p is included as destination On B-deliver at process q If m is not Received Received = Received U {m} If (q!= p) then B-multicast(g,m) R-deliver m End Some observations Since msgs may arrive more than once, detect and discard duplicates Correct but clearly inefficient (use IP multicast to help) 9

Uniform properties and agreement Agreement so far refers to correct/never fail processes Uniform property holds whether or not processes are correct Uniform agreement If a process, correct or failed, delivers m, then all correct processes in g eventually deliver m On B-deliver at process q If m is not received Received = Received U {m} If (q!= p) then B-multicast(g,m) End R-deliver m Crash! If it crashes after R-deliver, since it first B-multicast it follows that all correct processes will eventually deliver it 10

Uniform properties and agreement Consider a minor change in the previous code On B-deliver at process q If m is not received Received = Received U {m} R-deliver m If (q!= p) then B-multicast(g,m) End Not the same! Crash! Matters if a process can take an action that produces an observable inconsistency before it crashes As there is a uniform version of agreement, there are uniform versions of validity, integrity and ordering properties 11

12

Ordering multicast Basic algorithm delivers messages in arbitrary order Not satisfactory for many applications Common ordering requirements FIFO Msgs from same sources delivered in the order sent Partial ordering Causal Causally related messages arrive in order Causal implies FIFO, partial ordering as well Total Consistent ordering everywhere Not particular order, but the same everywhere FIFO-total and causal-total orderings Other hybrids Reliable + ordering: reliable totally ordered atomic multicast 13

FIFO ordering multicast If a correct process multicast(g,m) and multicast(g,m ), every correct process that delivers m, delivers m before m Idea of the algorithm: Keep track of the last message received from everyone; When a message arrives, holds it back unless it is the next message you are supposed to get 14

FIFO ordering multicast To implement it (FO-multicast, and FO-deliver) For p to FO-multicast a msg to group g, Piggybacks the value S g p on to the msg B-multicast the msg to g and increment S g p by 1 Upon receipt of a msg from q with seq # S p checks whether S = R gq +1, if so this is the next expected msg, FO-deliver it and set R g q = S If S > R gq +1, place it in the hold-back queue S g p count of how many msgs p has sent to g R g q seq # of latest msg p delivered from q that was sent to group g 15

Total ordering multicast If a correct process delivers m before m, every correct process that delivers m, delivers m before m Basic approach to implement it assign totally ordered identifiers to multicast messages Delivery algorithm is similar to FIFO but using groupspecific sequence #s instead of process-specific ones Two main methods to assign identifiers A sequencer process assigns them All processes collectively agree on the assignment of sequence #s 16

Example use Totally ordered multicast To guarantee that concurrent updates on a replicated database are seen in the same order everywhere P1 adds $100 to an account (initial value: $1000) P2 increments account by 1% There are two replicas Update 1: +$100 Updated 2:+1% Updated 1 performed before 2 Replicated DB Updated 2 performed before 1 In absence of proper synchronization: replica #1 $1111, while replica #2 $1110 17

Implementing total ordering Sequencer To multicast m to group g, a process Attaches a unique id id(m) to it Sends to sequencer(g) and other members of g Sequencer(g) Maintains a group-specific seq # S g Use it to assign increasing seq # to the msgs that it B-delivers A msg remains in the hold-back queue until it can be TO-delivered according to the corresponding seq # 18

Implementing total ordering - sequencer Algorithm for group member p On initialization r g :=0 To TO-multicast m to g B-multicast(g U {sequencer(g)}, <m, i>) On B-deliver(<m,i>) with g = group(m) Place <m,i> in hold-back queue On B-deliver(m order = < order, i, S>) with g = group(m order ) wait until <m, i> in hold-back queue and S = r g TO-deliver m r g := S+1 Algorithm for sequencer of g On initialization s g :=0 On B-deliver(<m,i>) with g = group(m) B-multicast(g, < order, i, s g >) s g := s g + 1 19

Implementing total ordering Sequencer Obviously, the sequencer can become a bottleneck and it is a critical point of failure Multiple sequencers to deal with failures e.g., Kaashoek et al. 1989* put a message in the hold-back queue of f + 1 nodes before it is delivered to ensure resilience to f failures Token-based sequencer (if there s only one process sending totally ordered multicast, give it the token!) Fully distributed *M. Frans Kaashoek et al, An efficient reliable broadcast protocol, SIGOPS 23(4), Oct 1989 20

Total ordering Distributed Basic idea every process wanting to multicast a msg acts as a sequencer To assign a sequence number it polls everyone by a proposed sequence number B-multicast of the message is the poll It picks the largest proposed number It lets everyone know what that is Remember multiple processes may be trying to multicast a message at once 3.Agreed seq q" 2.Proposed seq q 1.Message p q' 21

Total ordering Distributed For process p to multicast a msg m to g (1) B-multicast <m, i> to g (i is a uid for m) (3) collects all proposed seq #, Selects largest (a) as next agreed seq # B-multicast<i,a> to g Each process q (2) Replies to p with a proposal for msg agreed seq # of P gq = Max(A gq, P gq ) + 1 (2) Provisionally assigns proposed seq # to msg, place it in hold-back queue (3) When getting <i,a> sets A gq = Max(A gq, a) Attaches a to the msg, reorder msgs in the hold-back queue (3) When msg in the front of the queue has been assigned its agreed seq #, move it to the delivery queue * Based on Birman, Joseph, Reliable communication in the presence of failures, ACM TOCS 5(1), 1987 22

Causally ordered multicast If multicast(g,m) multicast(g,m ) based only on messages exchanged by processes in g, every correct process that delivers m, delivers m before m i.e., a message m is delivered only if all causally preceding messages have already been delivered Keep it in a hold-back queue until then 23

Causally ordered multicasting Clock adjustment only when sending/receiving Consider two processes; p i sending a message to p j p i increments V i [i] only when sending a message p j adjusts V j when receiving a message p j postpones delivery of m until: ts(m)[i] = V j [i] + 1 m is next msg p j was expecting from p i ts(m)[k] V j [k] for k!= j p j has seen all msgs seen by p i when it sent the message 24

Causally ordered multicasting Suppose p j receives m from p i with timestamp ts(m) P j postpones delivery of m until: ts(m)[i] = VC j [i] + 1 ts(m)[k] VC j [k] for k!= j p 0 V 0 = [1,0,0] V 0 = [1,1,0] ts = [1,0,0] ts = [1,1,0] p 1 p 2 V 1 = [1,1,0] ts = [1,1,0] Delayed msg V 2 = [1,1,0] V 2 = [0,0,0] V 2 = [1,0,0] 25

Reliable and in order Note that FIFO and causal are partial orderings Causal => FIFO Total ordering allows message delivery to be ordered arbitrarily, as long as it is the same for all No mention of reliability If you don t deliver m, you are OK We can define some hybrids FIFO-total, causal-total, reliable FIFO, reliable causal, a reliable totally ordered multicast (sometimes called atomic multicast ) 26