Last Dme. Distributed systems Lecture 5: Consistent cuts, process groups, and mutual exclusion. Dr Robert N. M. Watson

Similar documents
Distributed Systems 8L for Part IB

CSE 486/586 Distributed Systems

Distributed Systems. coordination Johan Montelius ID2201. Distributed Systems ID2201

Coordination 1. To do. Mutual exclusion Election algorithms Next time: Global state. q q q

Coordination and Agreement

Process Synchroniztion Mutual Exclusion & Election Algorithms

Clock Synchronization. Synchronization. Clock Synchronization Algorithms. Physical Clock Synchronization. Tanenbaum Chapter 6 plus additional papers

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Verteilte Systeme/Distributed Systems Ch. 5: Various distributed algorithms

Last Class: Clock Synchronization. Today: More Canonical Problems

PROCESS SYNCHRONIZATION

殷亚凤. Synchronization. Distributed Systems [6]

Homework #2 Nathan Balon CIS 578 October 31, 2004

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

Distributed Systems Exam 1 Review. Paul Krzyzanowski. Rutgers University. Fall 2016

Mutual Exclusion. A Centralized Algorithm

Distributed Synchronization. EECS 591 Farnam Jahanian University of Michigan

Distributed Systems Exam 1 Review Paul Krzyzanowski. Rutgers University. Fall 2016

Recap: Consensus. CSE 486/586 Distributed Systems Mutual Exclusion. Why Mutual Exclusion? Why Mutual Exclusion? Mutexes. Mutual Exclusion C 1

Frequently asked questions from the previous class survey

Last Time. 19: Distributed Coordination. Distributed Coordination. Recall. Event Ordering. Happens-before

CMPSCI 677 Operating Systems Spring Lecture 14: March 9

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University

Process groups and message ordering

Coordination and Agreement

Arranging lunch value of preserving the causal order. a: how about lunch? meet at 12? a: <receives b then c>: which is ok?

Last Class: Naming. Today: Classical Problems in Distributed Systems. Naming. Time ordering and clock synchronization (today)

Failure Tolerance. Distributed Systems Santa Clara University

Synchronization. Chapter 5

Coordination and Agreement

Lecture 6: Logical Time

Advanced Topics in Distributed Systems. Dr. Ayman A. Abdel-Hamid. Computer Science Department Virginia Tech

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University

Exam 2 Review. Fall 2011

Lecture 7: Logical Time

Distributed Algorithms 6.046J, Spring, 2015 Part 2. Nancy Lynch

Several of these problems are motivated by trying to use solutiions used in `centralized computing to distributed computing

Synchronization. Distributed Systems IT332

Distributed Systems Coordination and Agreement

CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [DISTRIBUTED MUTUAL EXCLUSION] Frequently asked questions from the previous class survey

Distributed Coordination! Distr. Systems: Fundamental Characteristics!

Selected Questions. Exam 2 Fall 2006

Coordination 2. Today. How can processes agree on an action or a value? l Group communication l Basic, reliable and l ordered multicast

CS 417: the one-hour study guide for exam 2

Distributed Systems Multicast & Group Communication Services

Synchronization Part 2. REK s adaptation of Claypool s adaptation oftanenbaum s Distributed Systems Chapter 5 and Silberschatz Chapter 17

How Bitcoin achieves Decentralization. How Bitcoin achieves Decentralization

Algorithms and protocols for distributed systems

Tasks. Task Implementation and management

A DAG-BASED ALGORITHM FOR DISTRIBUTED MUTUAL EXCLUSION ATHESIS MASTER OF SCIENCE

Large Systems: Design + Implementation: Communication Coordination Replication. Image (c) Facebook

Distributed Systems. Peer- to- Peer. Rik Sarkar James Cheney. University of Edinburgh Spring 2014

CS140 Operating Systems and Systems Programming Midterm Exam

Distributed Systems. Peer- to- Peer. Rik Sarkar. University of Edinburgh Fall 2014

Distributed Deadlock

CSE 486/586 Distributed Systems Reliable Multicast --- 1

Distributed Systems. Pre-Exam 1 Review. Paul Krzyzanowski. Rutgers University. Fall 2015

CS 5520/ECE 5590NA: Network Architecture I Spring Lecture 13: UDP and TCP

Verteilte Systeme (Distributed Systems)

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 8: Semaphores, Monitors, & Condition Variables

Midterm Examination ECE 419S 2015: Distributed Systems Date: March 13th, 2015, 6-8 p.m.

Distributed Operating Systems. Distributed Synchronization

Chapter 16: Distributed Synchronization

Synchronization (contd.)

DISTRIBUTED MUTEX. EE324 Lecture 11

Synchronisation and Coordination (Part 2)

Distributed Systems Principles and Paradigms

CSE 380 Computer Operating Systems

Chapter 18: Distributed

Last Class: CPU Scheduling! Adjusting Priorities in MLFQ!

Exam 2 Review. October 29, Paul Krzyzanowski 1

Transmission Control Protocol (TCP) TCP Services EECS 3214

Fault Tolerance Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University

IT 540 Operating Systems ECE519 Advanced Operating Systems

Distributed Mutual Exclusion

Reliable Distributed System Approaches

Distributed Systems COMP 212. Lecture 17 Othon Michail

CSE 486/586 Distributed Systems

Distributed Algorithms. Partha Sarathi Mandal Department of Mathematics IIT Guwahati

June 22/24/ Gerd Liefländer System Architecture Group Universität Karlsruhe (TH), System Architecture Group

DISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES

Distributed Systems (5DV147)

VALLIAMMAI ENGINEERING COLLEGE

Silberschatz and Galvin Chapter 18

Concurrency and OS recap. Based on Operating System Concepts with Java, Sixth Edition, 2003, Avi Silberschatz, Peter Galvin e Greg Gagne

2017 Paul Krzyzanowski 1

Distributed Systems. 05. Clock Synchronization. Paul Krzyzanowski. Rutgers University. Fall 2017

Agreement in Distributed Systems CS 188 Distributed Systems February 19, 2015

CSE 5306 Distributed Systems

Last Class: Clock Synchronization. Today: More Canonical Problems

Distributed Mutual Exclusion Algorithms

Distributed Systems COMP 212. Revision 2 Othon Michail

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

Event Ordering. Greg Bilodeau CS 5204 November 3, 2009

Proseminar Distributed Systems Summer Semester Paxos algorithm. Stefan Resmerita

Chapter 5: Distributed Systems: Fault Tolerance. Fall 2013 Jussi Kangasharju

Fault Tolerance. Distributed Software Systems. Definitions

6.852: Distributed Algorithms Fall, Class 12

CSE 5306 Distributed Systems. Fault Tolerance

Distributed Systems Leader election & Failure detection

Transcription:

Distributed systems Lecture 5: Consistent cuts, process groups, and mutual exclusion Dr Robert N. M. Watson 1 Last Dme Saw physical Dme can t be kept exactly in sync; instead use logical clocks to track ordering between events: Defined a b to mean a happens- before b Easy inside single process, & use causal ordering (send receive) to extend reladon across processes if send i (m 1 ) send j (m 2 ) then deliver k (m 1 ) deliver k (m 2 ) Lamport clocks, L(e): an integer Increment to (max of (sender, receiver)) + 1 on receipt But given L(a) < L(b), know nothing about order of a and b Vector clocks: list of Lamport clocks, one per process Element V i [j] captures #events at P j observed by P i Crucially: if V i (a) < V j (b), can infer that a b, and if V i (a) ~ V j (b), can infer that a ~ b 2 1

Vector Clocks: Example P1 P2 P3 (1,0,0) a e Send event (2,0,0) b m 1 c Receive event (2,1,0) (2,2,0) physical 1me When P 2 receives m 1, it merges the entries from P 1 s clock choose the maximum value in each posidon Similarly when P 3 receives m 2, it merges in P 2 s clock this incorporates the changes from P 1 that P 2 already saw Vector clocks explicitly track the transi1ve causal order: f s Dmestamp captures the history of a, b, c & d d m 2 (0,0,1) (2,2,2) f 3 Consistent Global State We have the nodon of a happens- before b (a b) or a is concurrent with b (a ~ b) What about instantaneous system- wide state? distributed debugging, GC, deadlock detecdon,... Chandy/Lamport introduced consistent cuts: draw a (possibly wiggly) line across all processes this is a consistent cut if the set of events (on the lhs) is closed under the happens- before reladonship i.e. if the cut includes event x, then it also includes all events e which happened before x In pracdcal terms, this means every delivered message included in the cut was also sent within the cut 4 2

Consistent Cuts: Example P1 a b c d P2 e f g h physical 1me P3 i j VerDcal cuts are always consistent (due to the way we draw these diagrams), but some curves are ok too: providing we don t include any receive events without their corresponding send events IntuiDon is that a consistent cut could have occurred during execudon (depending on scheduling etc), k l 5 << Observing Consistent Cuts >> Chandy/Lamport Snapshot Algorithm (1985): Distributed algorithm for generadng a snapshot of relevant system- wide state (e.g. all memory, locks held, ) Based on flooding special marker message M to all processes; causal order of flood defines the cut If P i receives M from P j and it has yet to snapshot: It pauses all communicadon, takes local snapshot & sets C ij to {} Then sends M to all other processes P k and starts recording C ik = { set of all post local snapshot messages received from P k } If P i receives M from some P k a<er taking snapshot Stops recording C ik, and saves alongside local snapshot Global snapshot comprises all local snapshots & C ij Assumes reliable, in- order messages, & no failures Fear not! This is not examinable. 6 3

Process Groups Oren useful to build distributed systems around the nodon of a process group Set of processes on some number of machines Possible to mulacast messages to all members Allows fault- tolerant systems even if some processes fail Membership can be fixed or dynamic if dynamic, have explicit join() and leave() primidves Groups can be open or closed: Closed groups only allow messages from members Internally can be structured (e.g. coordinator and set of slaves), or symmetric (peer- to- peer) Coordinator makes e.g. concurrent join/leave easier but may require extra work to elect coordinator When we use mulacast in the distributed- system context, we mean something stronger 7 than convendonal network muldcasdng using datagrams be careful not to confuse them. Group CommunicaDon: AssumpDons Assume we have ability to send a message to muldple (or all) members of a group Don t care if true muldcast (single packet sent, received by muldple recipients) or netcast (send set of messages, one to each recipient) Assume also that message delivery is reliable, and that messages arrive in bounded Dme But may take different amounts of Dme to reach different recipients Assume (for now) that processes don t crash What delivery orderings can we enforce? 8 4

FIFO Ordering P1 P2 P3 P4 m 1 m 2 m 3 m 4? physical 1me With FIFO ordering, messages from a pardcular process P i must be received at all other processes P j in the order they were sent e.g. in the above, everyone must see m 1 before m 3 (ordering of m 2 and m 4 is not constrained) Seems easy but not trivial in case of delays / retransmissions e.g. what if message m 1 to P2 takes a loooong Dme? Hence receivers may need to buffer messages to ensure order 9 Receiving versus Delivering Group communicadon middleware provides extra features above basic communicadon e.g. providing reliability and/or ordering guarantees on top of IP muldcast or netcast Assume that OS provides receive() primidve: returns with a packet when one arrives on wire Received messages either delivered or held back: delivered means inserted into delivery queue held back means inserted into hold- back queue held- back messages are delivered later as the result of the receipt of another message 10 5

ImplemenDng FIFO Ordering receive(m from Pi) { s = SeqNo(M); if (s == (Sji+1) ) { deliver(m); s = flush(hbq); Sji = s; } else holdback(m); } add M to delivery Q anything deliverable? can t deliver hold back messages consumed by applicaeon delivery queue held back message delivered hold- back queue Each process P i maintains a message sequence number (SeqNo) S i Every message sent by P i includes S i, incremented arer each send not including retransmissions! P j maintains S ji : the SeqNo of the last delivered message from P i If receive message from P i with SeqNo (S ji +1), hold back When receive message with SeqNo = (S ji +1), deliver it and also deliver any consecudve messages in hold back queue and update S ji 11 Stronger Orderings Can also implement FIFO ordering by just using a reliable FIFO transport like TCP/IP ;- ) But the general receive versus deliver model also allows us to provide stronger orderings: Causal ordering: if event mulecast(g, m 1 ) mulecast(g, m 2 ), then all processes will see m 1 before m 2 Total ordering: if any processes delivers a message m 1 before m 2, then all processes will deliver m 1 before m 2 Causal ordering implies FIFO ordering, since any two muldcasts by the same process are related by Total ordering (as defined) does not imply FIFO (or causal) ordering, just says that all processes must agree In reality oren want FIFO- total ordering (combines the two) 12 6

Causal Ordering P1 m 1 m 3 P2 P3 P4 m 2 m 4 physical 1me Same example as previously, but now causal ordering means that (a) everyone must see m 1 before m 3 (as with FIFO), and (b) everyone must see m 1 before m 2 (due to happens- before) Is this ok? No! m 1 m 2, but P2 sees m 2 before m 1 To be correct, must hold back (delay) delivery of m 2 at P2 But how do we know this? 13 ImplemenDng Causal Ordering Turns out this is preyy easy! Start with receive algorithm for FIFO muldcast and replace sequence numbers with vector clocks P1 P2 P3 (1,0,0) m 2 m 1 (1,0,1) (2,0,1) Need some care with dynamic groups must encode variable- length vector clock, typically using posidonal notadon, and deal with joins and leaves m 3 Have (1,0,1)!= (0,0,+1), so must hold back m 2 Once receive m 1, can deliver m 1 and then m 2 14 7

Total Ordering SomeDmes we want all processes to see exactly the same, FIFO, sequence of messages pardcularly for state machine replicadon (see later) One way is to have a can send token: Token passed round- robin between processes Only process with token can send (if he wants) Or use a dedicated sequencer process Other processes ask for global sequence no. (GSN), and then send with this in packet Use FIFO ordering algorithm, but on GSNs Can also build non- FIFO total order muldcast by having processes generate GSNs themselves and resolving Des 15 Ordering and Asynchrony FIFO ordering allows quite a lot of asynchrony e.g. any process can delay sending a message undl it has a batch (to improve performance) or can just tolerate variable and/or long delays Causal ordering also allows some asynchrony But must be careful queues don t grow too large! TradiDonal total order muldcast not so good: Since every message delivery transidvely depends on every other one, delays holds up the endre system Instead tend to an (almost) synchronous model, but this performs poorly, pardcularly over the wide area ;- ) Some clever work on virtual synchrony (for the interested) 16 8

Distributed Mutual Exclusion In first part of course, saw need to coordinate concurrent processes / threads In pardcular considered how to ensure mutual exclusion: allow only 1 thread in a cridcal secdon A variety of schemes possible: test- and- set locks; semaphores; event counts and sequencers; monitors; and acdve objects But most of these uldmately rely on hardware support (atomic operadons, or disabling interrupts ) not available across an endre distributed system Assuming we have some shared distributed resources, how can we provide mutual exclusion in this case? 17 SoluDon #1: Central Lock Server P1...execute criecal seceon P2 C lock(l) grant(l) lock(l) physical 1me Nominate one process C as coordinator If P i wants to enter cridcal secdon, simply sends lock message to C, and waits for a reply If resource free, C replies to P i with a grant message; otherwise C adds P i to a wait queue When finished, P i sends unlock message to C C sends grant message to first process in wait queue unlock(l) ack(l) grant(l) 18 9

Central Lock Server: Pros and Cons Central lock server has some good properdes: simple to understand and verify live (providing delays are bounded, and no failure) fair (if queue is fair, e.g. FIFO), and easily supports priorides if we want them decent performance: lock acquire takes one round- trip, and release is free with asynchronous messages But C can become a performance boyleneck and can t disdnguish crash of C from long wait can add addidonal messages, at some cost 19 SoluDon #2: Token Passing IniDal token generated by P0 If e.g. P4 wants to enter CS, holds onto token for duradon P 5 P 0 P 4 P 3 P 1 P 2 Passes clockwise around ring Avoid central boyleneck Arrange processes in a logical ring Each process knows its predecessor & successor Single token passes condnuously around ring Can only enter cridcal secdon when possess token; pass token on when finished (or if don t need to enter CS) 20 10

Token Passing: Pros and Cons Several advantages : Simple to understand: only 1 process ever has token => mutual exclusion guaranteed by construcdon No central server boyleneck Liveness guaranteed (in the absence of failure) So- so performance (between 0 and N messages undl a waidng process enters, 1 message to leave) But: Doesn t guarantee fairness (FIFO order) If a process crashes must repair ring (route around) And worse: may need to regenerate token tricky! And constant network traffic: an advantage??? 21 SoluDon #3: Totally- Ordered MulDcast Scheme due to Ricart & Agrawala (1981) Consider N processes, where each process maintains local variable state which is one of { FREE, WANT, HELD } To obtain lock, a process P i sets state:= WANT, and then muldcasts lock request to all other processes When a process P j receives a request from P i : If P j s local state is FREE, then P j replies immediately with OK If P j s local state is HELD, P j queues the request to reply later A requesdng process P i waits for OK from N- 1 processes Once received, sets state:= HELD, and enters cridcal secdon Once done, sets state:= FREE, & replies to any queued requests What about concurrent requests? By concurrent we mean: P i is in the WANT state already when it receives a request from P j 22 11

Handling Concurrent Requests Need to decide upon a total order: Each processes maintains a Lamport Dmestamp, T i Processes put current T i into request message Insufficient on its own (recall that Lamport Dmestamps can be idendcal) => use process id (or similar) to break Des Hence if a process P j receives a request from P i and P j has an outstanding request (i.e. P j s local state is WANT) If (T j, P j ) < (T i, P i ) then queue request from P i Otherwise, reply with OK, and condnue waidng Note that using the total order ensures correctness, but not fairness (i.e. no FIFO ordering) Q: can we fix this by using vector clocks? 23 Totally- Ordered MulDcast: Example 17 17 OK P 1 P 1 P 3 P 1 P 3 P 3 17 OK OK 9 9 OK P 2 9 P 2 P 2 Imagine P1 and P2 simultaneously try to acquire lock Both set state to WANT, and both send muldcast message Assume that Dmestamps are 17 (for P1) and 9 (for P2) P3 has no interest (state is FREE), so replies Ok to both Since 9 < 17, P1 replies Ok; P2 stays quiet & queues P1 s request P2 enters the cridcal secdon and executes and when done, replies to P1 (who can now enter cridcal secdon) 24 12

AddiDonal Details Completely unstructured decentralized soludon... but: Lots of messages (1 muldcast + N- 1 unicast) Ok for most recent holder to re- enter CS without any messages Variant scheme (due to Lamport): To enter, process P i muldcasts request(p i, T i ) [same as before] On receipt of a message, P j replies with an ack(p j,t j ) Processes keep all requests and acks in ordered queue If process P i sees his request is earliest, can enter CS and when done, muldcasts a release(p i, T i ) message When P j receives release, removes P i s request from queue If P j s request is now earliest in queue, can enter CS Note that both Ricart & Agrawala and Lamport s scheme, have N points of failure: doomed if any process dies :- ( 25 Next Dme Distributed transacdons Commit protocol examples Leader elecdon and distributed consensus ReplicaDon of data of services 26 13