Distributed Synchronization: outline

Size: px
Start display at page:

Download "Distributed Synchronization: outline"

Transcription

1 Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps o Vector timestamps o Causal communication Snapshots This presentation is based on the book: Distributed operating-systems & algorithms by Randy Chow and Theodore Johnson 1

2 Distributed systems A distributed system is a collection of independent computational nodes, communicating over a network, that is abstracted as a single coherent system o o o o o Grid computing Cloud computing ( infrastructure as a service, software as a service ) Peer-to-peer computing Sensor networks A distributed operating system allows sharing of resources and coordination of distributed computation in a transparent manner (a), (b) a distributed system (c) a multiprocessor 2

3 Distributed synchronization Underlies distributed operating systems and algorithms Processes communicate via message passing (no shared memory) Inter-node coordination in distributed systems challenged by o Lack of global state o Lack of global clock o Communication links may fail o Messages may be delayed or lost o Nodes may fail Distributed synchronization supports correct coordination in distributed systems o May no longer use shared-memory based locks and semaphores 3

4 Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps o Vector timestamps o Causal communication Snapshots 4

5 Distributed computation model Events o o o Sending a message Receiving a message Timeout, internal interrupt Processors send control messages to each other o send(destination, action; parameters) Processes may declare that they are waiting for events: o Wait for A 1, A 2,., A n A 1 (source; parameters) code to handle A 1.. A n (source; parameters) code to handle A n 5

6 Causality and events ordering A distributed system has no global state nor global clock no global order on all events may be determined Each processor knows total orders on events occurring in it There is a causal relation between the sending of a message and its receipt Lamport's happened-before relation H 1. e 1 < p e 2 e 1 < H e 2 (events within same processor are ordered) 2. e 1 < m e 2 e 1 < H e 2 (each message m is sent before it is received) 3. e 1 < H e 2 AND e 2 < H e 3 e 1 < H e 3 (transitivity) Leslie Lamport (1978): Time, clocks, and the ordering of events in a distributed system 6

7 e 1 e 2 Causality and events ordering (cont'd) P 1 P 2 P 3 Time e 3 e 4 e5 e 6 e 7 e8 e 1 < H e 7? Yes. e 1 < H e 3? Yes. e 1 < H e 8? Yes. e 5 < H e 7? No. 7

8 Lamport's timestamp algorithm 1 Initially my_ts=0 2 Upon event e, 3 if e is the receipt of message m 4 my_ts=max(m.ts, my_ts) 5 my_ts++ 6 e.ts=my_ts 7 If e is the sending of message m 8 m.ts=my_ts To create a total order, ties are broken by process ID 8

9 Lamport's timestamps (cont'd) P 1 P 2 P 3 Time 1.1 e 1 e 2 e e e 5 e e 7 e

10 Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps o Vector timestamps o Causal communication Snapshots 10

11 Vector timestamps - motivation Lamport's timestamps define a total order o o e 1 < H e 2 e 1.TS < e 2.TS However, e 1.TS < e 2.TS e 1 < H e 2 does not hold, in general. (concurrent events ordered arbitrarily) Definition: Message m 1 casually precedes message m 2 (written as m 1 < c m 2 ) if s(m 1 ) < H s(m 2 ) (sending m 1 happens before sending m 2 ) Definition: causality violation occurs if m 1 < c m 2 but r(m 2 ) < p r(m 1 ). In other words, m 1 is sent to processor p before m 2 but is received after it. Lamport's timestamps do not allow to detect (hence nor prevent) causality violations. 11

12 Causality violations an example P 1 P 2 P 3 Migrate O to P 2 Where is O? On p 2 M2 M1 M3 Where is O? I don t know ERROR! Causality violation between M 1 and M 3. 12

13 Vector timestamps 1 Initially my_vt=[0,,0] 2 Upon event e, 3 if e is the receipt of message m 4 for i=1 to M 5 my_vt[i]=max(m.vt[i], my_vt[i]) 6 my_vt[self]++ 7 e.vt=my_vt 8 if e is the sending of message m 9 m.vt=my_vt e 1.VT V e 2.VT if and only if e 1.VT[i] e 2.VT[i], for every i=1,,m e 1.VT < V e 2.VT if and only if e 1.VT V e 2.VT and e 1.VT e 2.VT For vector timestamps it does hold that: e 1 < VT e 2 e 1 < H e 2 13

14 An example of a vector timestamp P 1 P 2 P 3 P e 6 e.vt=(3,6,4,2) 14

15 Comparison of vector timestamps P 1 P 2 P 3 P e e1 5 e2 6 e1.vt=(5,4,1,3) e2.vt=(3,6,4,2) e3.vt=(0,0,1,3) 15

16 VTs can be used to detect causality violations P 1 P 2 P 3 Migrate O to P 2 Where is O? (2,0,1) (1,0,0) (0,0,1) On p 2 (3,0,1) M2 (3,0,2) M1 (3,1,3) I don t know (3,2,3) (3,3,3) M3 Where is O? (3,0,3) ERROR! (3,2,4) Causality violation between M 1 and M 3. 16

17 Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps o Vector timestamps o Causal communication Snapshots 17

18 Preventing causality violations A processor cannot control the order in which it receives messages But it may control the order in which they are delivered to applications Application Deliver in FIFO order Message arrival order FIFO ordering x 1 x 2 x 4 x 5 x 3 Message passing Assign timestamps Deliver x 1 Deliver x 2 Buffer x 4 Buffer x 5 Deliver x 3 x 4 x 5 Protocol for FIFO message delivery (as in, e.g., TCP) 18

19 Preventing causality violations (cont'd) Senders attach a timestamp to each message Destination delays the delivery of out-of-order messages Hold back a message m until we are assured that no message m < H m will be delivered o o For every other process p, maintain the earliest timestamp of a message m that may be delivered from p Do not deliver a message if an earlier message may still be delivered from another process 19

20 Algorithm for preventing causality violations 1 earliest[1..m] initially [ <1,0, 0>, <0,1,,0>,, <0,0,,1> ] 2 blocked[1 M] initially [ {},, {} ] 3 Upon the receipt of message m from processor p 4 Delivery_list = {} 5 If (blocked[p] is empty) 6 earliest[p]=m.timestamp 7 Add m to the tail of blocked[p] 8 While ( k such that blocked[k] is non-empty AND i {1,,M} (except k and Self) not_earlier(earliest[i], earliest[k]) 9 remove the message at the head of blocked[k], put it in delivery_list 10 if blocked[k] is non-empty 11 earliest[k] m'.timestamp, where m' is at the head of blocked[k] 12 else 13 increment the k'th element of earliest[k] 14 End While 15 Deliver the messages in delivery_list, in causal order For each processor p, the earliest timestamp with which a message from p may still be delivered 20

21 Algorithm for preventing causality violations 1 earliest[1..m] initially [ <1,0, 0>, <0,1,,0>,, <0,0,,1> ] 2 blocked[1 M] initially [ {},, {} ] 3 Upon the receipt of message m from processor p 4 Delivery_list = {} 5 If (blocked[p] is empty) 6 earliest[p]=m.timestamp 7 Add m to the tail of blocked[p] 8 While ( k such that blocked[k] is non-empty AND i {1,,M} (except k and Self) not_earlier(earliest[i], earliest[k]) 9 remove the message at the head of blocked[k], put it in delivery_list 10 if blocked[k] is non-empty 11 earliest[k] m'.timestamp, where m' is at the head of blocked[k] 12 else 13 increment the k'th element of earliest[k] 14 End While 15 Deliver the messages in delivery_list, in causal order For each process p, the messages from p that were received but were not delivered yet 21

22 Algorithm for preventing causality violations 1 earliest[1..m] initially [ <1,0, 0>, <0,1,,0>,, <0,0,,1> ] 2 blocked[1 M] initially [ {},, {} ] 3 Upon the receipt of message m from processor p 4 Delivery_list = {} 5 If (blocked[p] is empty) 6 earliest[p]=m.timestamp 7 Add m to the tail of blocked[p] 8 While ( k such that blocked[k] is non-empty AND i {1,,M} (except k and Self) not_earlier(earliest[i], earliest[k]) 9 remove the message at the head of blocked[k], put it in delivery_list 10 if blocked[k] is non-empty 11 earliest[k] m'.timestamp, where m' is at the head of blocked[k] 12 else 13 increment the k'th element of earliest[k] 14 End While 15 Deliver the messages in delivery_list, in causal order List of messages to be delivered as a result of m s receipt 22

23 Algorithm for preventing causality violations 1 earliest[1..m] initially [ <1,0, 0>, <0,1,,0>,, <0,0,,1> ] 2 blocked[1 M] initially [ {},, {} ] 3 Upon the receipt of message m from processor p 4 Delivery_list = {} 5 If (blocked[p] is empty) 6 earliest[p]=m.timestamp 7 Add m to the tail of blocked[p] 8 While ( k such that blocked[k] is non-empty AND i {1,,M} (except k and Self) not_earlier(earliest[i], earliest[k]) 9 remove the message at the head of blocked[k], put it in delivery_list 10 if blocked[k] is non-empty 11 earliest[k] m'.timestamp, where m' is at the head of blocked[k] 12 else 13 increment the k'th element of earliest[k] 14 End While 15 Deliver the messages in delivery_list, in causal order If no blocked messages from p, update earliest[p] 23

24 Algorithm for preventing causality violations 1 earliest[1..m] initially [ <1,0, 0>, <0,1,,0>,, <0,0,,1> ] 2 blocked[1 M] initially [ {},, {} ] 3 Upon the receipt of message m from processor p 4 Delivery_list = {} 5 If (blocked[p] is empty) 6 earliest[p]=m.timestamp 7 Add m to the tail of blocked[p] 8 While ( k such that blocked[k] is non-empty AND i {1,,M} (except k and Self) not_earlier(earliest[i], earliest[k]) 9 remove the message at the head of blocked[k], put it in delivery_list 10 if blocked[k] is non-empty 11 earliest[k] m'.timestamp, where m' is at the head of blocked[k] 12 else 13 increment the k'th element of earliest[k] 14 End While 15 Deliver the messages in delivery_list, in causal order m is now the most recent message from p not yet delivered 24

25 Algorithm for preventing causality violations 1 earliest[1..m] initially [ <1,0, 0>, <0,1,,0>,, <0,0,,1> ] 2 blocked[1 M] initially [ {},, {} ] 3 Upon the receipt of message m from processor p 4 Delivery_list = {} 5 If (blocked[p] is empty) 6 earliest[p]=m.timestamp 7 Add m to the tail of blocked[p] 8 While ( k such that blocked[k] is non-empty AND i {1,,M} (except k and Self) not_earlier(earliest[i], earliest[k]) 9 remove the message at the head of blocked[k], put it in delivery_list 10 if blocked[k] is non-empty 11 earliest[k] m'.timestamp, where m' is at the head of blocked[k] 12 else 13 increment the k'th element of earliest[k] 14 End While 15 Deliver the messages in delivery_list, in causal order If there is a process k with delayed messages for which it is now safe to deliver its earliest message 25

26 Algorithm for preventing causality violations 1 earliest[1..m] initially [ <1,0, 0>, <0,1,,0>,, <0,0,,1> ] 2 blocked[1 M] initially [ {},, {} ] 3 Upon the receipt of message m from processor p 4 Delivery_list = {} 5 If (blocked[p] is empty) 6 earliest[p]=m.timestamp 7 Add m to the tail of blocked[p] 8 While ( k such that blocked[k] is non-empty AND i {1,,M} (except k and Self) not_earlier(earliest[i], earliest[k]) 9 remove the message at the head of blocked[k], put it in delivery_list 10 if blocked[k] is non-empty 11 earliest[k] m'.timestamp, where m' is at the head of blocked[k] 12 else 13 increment the k'th element of earliest[k] 14 End While 15 Deliver the messages in delivery_list, in causal order Remove k'th earliest message from blocked queue, make sure it is delivered 26

27 Algorithm for preventing causality violations 1 earliest[1..m] initially [ <1,0, 0>, <0,1,,0>,, <0,0,,1> ] 2 blocked[1 M] initially [ {},, {} ] 3 Upon the receipt of message m from processor p 4 Delivery_list = {} 5 If (blocked[p] is empty) 6 earliest[p]=m.timestamp 7 Add m to the tail of blocked[p] 8 While ( k such that blocked[k] is non-empty AND i {1,,M} (except k and Self) not_earlier(earliest[i], earliest[k]) 9 remove the message at the head of blocked[k], put it in delivery_list 10 if blocked[k] is non-empty 11 earliest[k] m'.timestamp, where m' is at the head of blocked[k] 12 else 13 increment the k'th element of earliest[k] 14 End While 15 Deliver the messages in delivery_list, in causal order If there are additional blocked messages of k, update earliest[k] to be the timestamp of the earliest such message 27

28 Algorithm for preventing causality violations 1 earliest[1..m] initially [ <1,0, 0>, <0,1,,0>,, <0,0,,1> ] 2 blocked[1 M] initially [ {},, {} ] 3 Upon the receipt of message m from processor p 4 Delivery_list = {} 5 If (blocked[p] is empty) 6 earliest[p]=m.timestamp 7 Add m to the tail of blocked[p] 8 While ( k such that blocked[k] is non-empty AND i {1,,M} (except k and Self) not_earlier(earliest[i], earliest[k]) 9 remove the message at the head of blocked[k], put it in delivery_list 10 if blocked[k] is non-empty 11 earliest[k] m'.timestamp, where m' is at the head of blocked[k] 12 else 13 increment the k'th element of earliest[k] 14 End While 15 Deliver the messages in delivery_list, in causal order Otherwise the earliest message that may be delivered from k would have previous timestamp with the k'th component incremented 28

29 Algorithm for preventing causality violations 1 earliest[1..m] initially [ <1,0, 0>, <0,1,,0>,, <0,0,,1> ] 2 blocked[1 M] initially [ {},, {} ] 3 Upon the receipt of message m from processor p 4 Delivery_list = {} 5 If (blocked[p] is empty) 6 earliest[p]=m.timestamp 7 Add m to the tail of blocked[p] 8 While ( k such that blocked[k] is non-empty AND i {1,,M} (except k and Self) not_earlier(earliest[i], earliest[k]) 9 remove the message at the head of blocked[k], put it in delivery_list 10 if blocked[k] is non-empty 11 earliest[k] m'.timestamp, where m' is at the head of blocked[k] 12 else 13 increment the k'th element of earliest[k] 14 End While 15 Deliver the messages in delivery_list, in causal order Finally, deliver set of messages that will not cause causality violation (if there are any). 29

30 Execution of the algorithm as multicast P 1 P 2 P 3 (0,1,0) m 1 p 3 state earliest[1] earliest[2] earliest[3] (1,0,0) (0,1,0) (0,0,1) Upon receipt of m 2 (1,1,0) m 2 (1,1,0) (0,1,0) (0,0,1) m 2 Upon receipt of m 1 (1,1,0) (0,1,0) (0,0,1) m 2 m 1 deliver m 1 (1,1,0) (0,2,0) (0,0,1) m 2 deliver m 2 (2,1,0) (0,2,0) (0,0,1) Since the algorithm is interested only in causal order of sending events, vector timestamp is incremented only upon send events. 30

31 Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps o Vector timestamps o Causal communication Snapshots 31

32 Snapshots motivation phantom deadlock Assume we would like to implement a distributed deadlock-detector We record process states to check if there is a waits-for-cycle P 1 r 1 r 2 P 2 request r 1 OK p 1 p 2 1 request 2 OK r 2 Observed waits-for graph release request r 1 OK 3 request 4 p 1 p 2 r 2 Actual waits-for graph 32

33 What is a snapshot? Global system state: o S=(s 1, s 2,, s M ) local processor states o The contents L i,j =(m 1, m 2,, m k ) of each communication channel C i,j (channels assumed to be FIFO) These are messages sent but not yet received Global state must be consistent o If we observe in state s i that p i received message m from p k, then in observation s k, k must have sent m. o Each L i,j must contain exactly the set of messages sent by p i but not yet received by p j, as reflected by s i, s j. Snapshot state must be consistent, one that might have existed during the computation. Observations must be mutually concurrent that is, no observation casually precedes another observation (a consistent cut) 33

34 Snapshot algorithm informal description Upon joining the algorithm, a process records its local state The process that initiates the snapshot sends snapshot tokens to its neighbors (before sending any other messages) o Neighbors send them to their neighbors broadcast Upon receiving a snapshot token: o o a process records its state prior to sending/receiving additional messages Must then send tokens to all its other neighbors How shall we record sets L p,q? o o q receives token from p and that is the first time q receives token: L p,q ={ } q receives token from p but q received token before: L p,q ={all messages received by q from p since q received token} 34

35 Snapshot algorithm data structures The algorithm supports multiple ongoing snapshots one per process Different snapshots from same process distinguished by version number Per process variables integer my_version initially 0 integer current_snap[1..m] initially [0,,0] integer tokens_received[1..m] processor_state S[1 M] channel_state [1..M][1..M] The version number of my current snapshot 35

36 Snapshot algorithm data structures The algorithm supports multiple ongoing snapshots one per process Different snapshots from same process distinguished by version number Per process variables integer my_version initially 0 integer current_snap[1..m] initially [0,,0] integer tokens_received[1..m] processor_state S[1 M] channel_state [1..M][1..M] current_snap[r] contains version number of snapshot initiated by processor r 36

37 Snapshot algorithm data structures The algorithm supports multiple ongoing snapshots one per process Different snapshots from same process distinguished by version number Per process variables integer my_version initially 0 integer current_snap[1..m] initially [0,,0] integer tokens_received[1..m] processor_state S[1 M] channel_state [1..M][1..M] tokens_received[r] contains the number of tokens received for the snapshot initiated by processor r 37

38 Snapshot algorithm data structures The algorithm supports multiple ongoing snapshots one per process Different snapshots from same process distinguished by version number Per process variables integer my_version initially 0 integer current_snap[1..m] initially [0,,0] integer tokens_received[1..m] processor_state S[1 M] channel_state [1..M][1..M] processor_state[r] contains the state recorded for the snapshot of processor r 38

39 Snapshot algorithm data structures The algorithm supports multiple ongoing snapshots one per process Different snapshots from same process distinguished by version number Per process variables integer my_version initially 0 integer current_snap[1..m] initially [0,,0] integer tokens_received[1..m] processor_state S[1 M] channel_state [1..M][1..M] channel_state[r][q] records the state of channel from q to current processor for the snapshot initiated by processor r 39

40 Snapshot algorithm pseudo-code execute_snapshot() Wait for a snapshot request or a token Snapshot request: 1 my_version++, current_snap[self]=my_version 2 S[self] my_state 3 for each outgoing channel q, send(q, TOKEN, my_version) 4 tokens_received[self]=0 TOKEN(q; r,version): 5. If current_snap[r] < version 6. S[r] my state 7. current_snap[r]=version 8. L[r][q] empty, send token(r, version) on each outgoing channel 9. tokens_received[r] else 11. tokens_received[r] L[r][q] all messages received from q since first receiving token(r,version) 13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished 40

41 Snapshot algorithm pseudo-code execute_snapshot() Wait for a snapshot request or a token Snapshot request: 1 my_version++, current_snap[self]=my_version 2 S[self] my_state 3 for each outgoing channel q, send(q, TOKEN, my_version) 4 tokens_received[self]=0 TOKEN(q; r,version): 5. If current_snap[r] < version 6. S[r] my state 7. current_snap[r]=version 8. L[r][q] empty, send token(r, version) on each outgoing channel 9. tokens_received[r] else 11. tokens_received[r] L[r][q] all messages received from q since first receiving token(r,version) 13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished 41

42 Snapshot algorithm pseudo-code execute_snapshot() Wait for a snapshot request or a token Increment version number of this snapshot, record my local state Snapshot request: 1 my_version++, current_snap[self]=my_version 2 S[self] my_state 3 for each outgoing channel q, send(q, TOKEN, my_version) 4 tokens_received[self]=0 TOKEN(q; r,version): 5. If current_snap[r] < version 6. S[r] my state 7. current_snap[r]=version 8. L[r][q] empty, send token(r, version) on each outgoing channel 9. tokens_received[r] else 11. tokens_received[r] L[r][q] all messages received from q since first receiving token(r,version) 13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished 42

43 Snapshot algorithm pseudo-code execute_snapshot() Wait for a snapshot request or a token Snapshot request: 1 my_version++, current_snap[self]=my_version 2 S[self] my_state 3 for each outgoing channel q, send(q, TOKEN; self, my_version) 4 tokens_received[self]=0 Send snapshot-token on all outgoing channels, initialize number of received tokens for my snapshot to 0 TOKEN(q; r,version): 5. If current_snap[r] < version 6. S[r] my state 7. current_snap[r]=version 8. L[r][q] empty, send token(r, version) on each outgoing channel 9. tokens_received[r] else 11. tokens_received[r] L[r][q] all messages received from q since first receiving token(r,version) 13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished 43

44 Snapshot algorithm pseudo-code execute_snapshot() Wait for a snapshot request or a token Snapshot request: 1 my_version++, current_snap[self]=my_version 2 S[self] my_state 3 for each outgoing channel q, send(q, TOKEN; self, my_version) 4 tokens_received[self]=0 Upon receipt from q of TOKEN for snapshot (r,version) TOKEN(q; r,version): 5. If current_snap[r] < version 6. S[r] my state 7. current_snap[r]=version 8. L[r][q] empty, send token(r, version) on each outgoing channel 9. tokens_received[r] else 11. tokens_received[r] L[r][q] all messages received from q since first receiving token(r,version) 13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished 44

45 Snapshot algorithm pseudo-code execute_snapshot() Wait for a snapshot request or a token If this is first token for snapshot Snapshot request: (r,version) 1 my_version++, current_snap[self]=my_version 2 S[self] my_state 3 for each outgoing channel q, send(q, TOKEN; self, my_version) 4 tokens_received[self]=0 TOKEN(q; r,version): 5. If current_snap[r] < version 6. S[r] my state 7. current_snap[r]=version 8. L[r][q] empty, send token(r, version) on each outgoing channel 9. tokens_received[r] else 11. tokens_received[r] L[r][q] all messages received from q since first receiving token(r,version) 13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished 45

46 Snapshot algorithm pseudo-code execute_snapshot() Wait for a snapshot request or a token Snapshot request: Record local state for r'th snapshot 1 my_version++, current_snap[self]=my_version Update version number of r'th snapshot 2 S[self] my_state 3 for each outgoing channel q, send(q, TOKEN; self, my_version) 4 tokens_received[self]=0 TOKEN(q; r,version): 5. If current_snap[r] < version 6. S[r] my state 7. current_snap[r]=version 8. L[r][q] empty, send token(r, version) on each outgoing channel 9. tokens_received[r] else 11. tokens_received[r] L[r][q] all messages received from q since first receiving token(r,version) 13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished 46

47 Snapshot algorithm pseudo-code execute_snapshot() Wait for a snapshot request or a token Snapshot request: 1 my_version++, current_snap[self]=my_version 2 S[self] my_state 3 for each outgoing channel q, send(q, TOEKEN; self, my_version) 4 tokens_received[self]=0 TOKEN(q; r,version): 5. If current_snap[r] < version 6. S[r] my state 7. current_snap[r]=version 8. L[r][q] empty, send token(r, version) on each outgoing channel 9. tokens_received[r] else 11. tokens_received[r]++ Set of messages on channel from q is empty Send token on all outgoing channels Initialize number of received tokens to L[r][q] all messages received from q since first receiving token(r,version) 13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished 47

48 Snapshot algorithm pseudo-code execute_snapshot() Wait for a snapshot request or a token Snapshot request: 1 my_version++, current_snap[self]=my_version 2 S[self] my_state 3 for each outgoing channel q, send(q, TOEKEN; self, my_version) 4 tokens_received[self]=0 Not the first token for (r,version) TOKEN(q; r,version): 5. If current_snap[r] < version 6. S[r] my state 7. current_snap[r]=version 8. L[r][q] empty, send token(r, version) on each outgoing channel 9. tokens_received[r] else 11. tokens_received[r] L[r][q] all messages received from q since first receiving token(r,version) 13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished 48

49 Snapshot algorithm pseudo-code execute_snapshot() Wait for a snapshot request or a token Snapshot request: 1 my_version++, current_snap[self]=my_version 2 S[self] my_state 3 for each outgoing channel q, send(q, TOEKEN; self, my_version) 4 tokens_received[self]=0 Yet another token for snapshot (r,version) TOKEN(q; r,version): 5. If current_snap[r] < version 6. S[r] my state 7. current_snap[r]=version 8. L[r][q] empty, send token(r, version) on each outgoing channel 9. tokens_received[r] else 11. tokens_received[r] L[r][q] all messages received from q since first receiving token(r,version) 13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished 49

50 Snapshot algorithm pseudo-code execute_snapshot() Wait for a snapshot request or a token Snapshot request: 1 my_version++, current_snap[self]=my_version 2 S[self] my_state 3 for each outgoing channel q, send(q, TOEKEN; self, my_version) 4 tokens_received[self]=0 TOKEN(q; r,version): 5. If current_snap[r] < version 6. S[r] my state 7. current_snap[r]=version 8. L[r][q] empty, send token(r, version) on each outgoing channel 9. tokens_received[r] else 11. tokens_received[r]++ These messages are the state of the channel from q for snapshot (r,version) 12. L[r][q] all messages received from q since first receiving token(r,version) 13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished 50

51 Snapshot algorithm pseudo-code execute_snapshot() Wait for a snapshot request or a token Snapshot request: 1 my_version++, current_snap[self]=my_version 2 S[self] my_state 3 for each outgoing channel q, send(q, TOKEN; self, my_version) 4 tokens_received[self]=0 TOKEN(q; r,version): 5. If current_snap[r] < version 6. S[r] my state 7. current_snap[r]=version 8. L[r][q] empty, send token(r, version) on each outgoing channel 9. tokens_received[r] else 11. tokens_received[r]++ If all tokens of snapshot (r,version) arrived, snapshot computation is over 12. L[r][q] all messages received from q since first receiving token(r,version) 13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished 51

52 Sample execution of the snapshot algorithm A token passing system Each processor receives, in its turn, a privilege token, uses it to perform privileged operations and then passes it on Assume processor p did not receive the token for a long duration so it invokes the snapshot algorithm to find out why p q 52

53 Sample execution of the snapshot algorithm 1. p requests a snapshot p t q State(p)={} 2. q sends privilege token p t q State(p)={} 3. Snapshot token arrives at q p q State(p)={} t State(q)={}, L p,q ={} 4. Snapshot token arrives at p p q State(p)={} L q,p ={ } State(q)={}, L p,q ={} Observed state: p q 53

Process groups and message ordering

Process groups and message ordering Process groups and message ordering If processes belong to groups, certain algorithms can be used that depend on group properties membership create ( name ), kill ( name ) join ( name, process ), leave

More information

Clock Synchronization. Synchronization. Clock Synchronization Algorithms. Physical Clock Synchronization. Tanenbaum Chapter 6 plus additional papers

Clock Synchronization. Synchronization. Clock Synchronization Algorithms. Physical Clock Synchronization. Tanenbaum Chapter 6 plus additional papers Clock Synchronization Synchronization Tanenbaum Chapter 6 plus additional papers Fig 6-1. In a distributed system, each machine has its own clock. When this is the case, an event that occurred after another

More information

Failure Tolerance. Distributed Systems Santa Clara University

Failure Tolerance. Distributed Systems Santa Clara University Failure Tolerance Distributed Systems Santa Clara University Distributed Checkpointing Distributed Checkpointing Capture the global state of a distributed system Chandy and Lamport: Distributed snapshot

More information

Last Class: Clock Synchronization. Today: More Canonical Problems

Last Class: Clock Synchronization. Today: More Canonical Problems Last Class: Clock Synchronization Logical clocks Vector clocks Global state Lecture 12, page 1 Today: More Canonical Problems Distributed snapshot and termination detection Election algorithms Bully algorithm

More information

Snapshot Protocols. Angel Alvarez. January 17, 2012

Snapshot Protocols. Angel Alvarez. January 17, 2012 Angel Alvarez January 17, 2012 1 We have seen how to monitor a distributed computation in a passive manner We have also seen how to do active monitoring, and an instance of an inconsistent observation

More information

Coordination 1. To do. Mutual exclusion Election algorithms Next time: Global state. q q q

Coordination 1. To do. Mutual exclusion Election algorithms Next time: Global state. q q q Coordination 1 To do q q q Mutual exclusion Election algorithms Next time: Global state Coordination and agreement in US Congress 1798-2015 Process coordination How can processes coordinate their action?

More information

Last Class: Naming. Today: Classical Problems in Distributed Systems. Naming. Time ordering and clock synchronization (today)

Last Class: Naming. Today: Classical Problems in Distributed Systems. Naming. Time ordering and clock synchronization (today) Last Class: Naming Naming Distributed naming DNS LDAP Lecture 12, page 1 Today: Classical Problems in Distributed Systems Time ordering and clock synchronization (today) Next few classes: Leader election

More information

Advanced Topics in Distributed Systems. Dr. Ayman A. Abdel-Hamid. Computer Science Department Virginia Tech

Advanced Topics in Distributed Systems. Dr. Ayman A. Abdel-Hamid. Computer Science Department Virginia Tech Advanced Topics in Distributed Systems Dr. Ayman A. Abdel-Hamid Computer Science Department Virginia Tech Synchronization (Based on Ch6 in Distributed Systems: Principles and Paradigms, 2/E) Synchronization

More information

Synchronization. Clock Synchronization

Synchronization. Clock Synchronization Synchronization Clock Synchronization Logical clocks Global state Election algorithms Mutual exclusion Distributed transactions 1 Clock Synchronization Time is counted based on tick Time judged by query

More information

CMPSCI 677 Operating Systems Spring Lecture 14: March 9

CMPSCI 677 Operating Systems Spring Lecture 14: March 9 CMPSCI 677 Operating Systems Spring 2014 Lecture 14: March 9 Lecturer: Prashant Shenoy Scribe: Nikita Mehra 14.1 Distributed Snapshot Algorithm A distributed snapshot algorithm captures a consistent global

More information

Distributed Systems. coordination Johan Montelius ID2201. Distributed Systems ID2201

Distributed Systems. coordination Johan Montelius ID2201. Distributed Systems ID2201 Distributed Systems ID2201 coordination Johan Montelius 1 Coordination Coordinating several threads in one node is a problem, coordination in a network is of course worse: failure of nodes and networks

More information

Synchronization. Chapter 5

Synchronization. Chapter 5 Synchronization Chapter 5 Clock Synchronization In a centralized system time is unambiguous. (each computer has its own clock) In a distributed system achieving agreement on time is not trivial. (it is

More information

Synchronisation in. Distributed Systems. Co-operation and Co-ordination in. Distributed Systems. Kinds of Synchronisation. Clock Synchronization

Synchronisation in. Distributed Systems. Co-operation and Co-ordination in. Distributed Systems. Kinds of Synchronisation. Clock Synchronization Co-operation and Co-ordination in Distributed ystems Naming for searching communication partners Communication Mechanisms for the communication process ynchronisation in Distributed ystems But... Not enough

More information

Coordination 2. Today. How can processes agree on an action or a value? l Group communication l Basic, reliable and l ordered multicast

Coordination 2. Today. How can processes agree on an action or a value? l Group communication l Basic, reliable and l ordered multicast Coordination 2 Today l Group communication l Basic, reliable and l ordered multicast How can processes agree on an action or a value? Modes of communication Unicast 1ç è 1 Point to point Anycast 1è

More information

CSE 486/586 Distributed Systems

CSE 486/586 Distributed Systems CSE 486/586 Distributed Systems Mutual Exclusion Steve Ko Computer Sciences and Engineering University at Buffalo CSE 486/586 Recap: Consensus On a synchronous system There s an algorithm that works. On

More information

Important Lessons. A Distributed Algorithm (2) Today's Lecture - Replication

Important Lessons. A Distributed Algorithm (2) Today's Lecture - Replication Important Lessons Lamport & vector clocks both give a logical timestamps Total ordering vs. causal ordering Other issues in coordinating node activities Exclusive access to resources/data Choosing a single

More information

Distributed Coordination! Distr. Systems: Fundamental Characteristics!

Distributed Coordination! Distr. Systems: Fundamental Characteristics! Distributed Coordination! What makes a system distributed?! Time in a distributed system! How do we determine the global state of a distributed system?! Event ordering! Mutual exclusion! Reading: Silberschatz,

More information

殷亚凤. Synchronization. Distributed Systems [6]

殷亚凤. Synchronization. Distributed Systems [6] Synchronization Distributed Systems [6] 殷亚凤 Email: yafeng@nju.edu.cn Homepage: http://cs.nju.edu.cn/yafeng/ Room 301, Building of Computer Science and Technology Review Protocols Remote Procedure Call

More information

Clock and ordering. Yang Wang

Clock and ordering. Yang Wang Clock and ordering Yang Wang Review Happened- before relation Consistent global state Chandy Lamport protocol New problem Monitor node sometimes needs to observe other nodes events continuously Distributed

More information

Process Synchroniztion Mutual Exclusion & Election Algorithms

Process Synchroniztion Mutual Exclusion & Election Algorithms Process Synchroniztion Mutual Exclusion & Election Algorithms Paul Krzyzanowski Rutgers University November 2, 2017 1 Introduction Process synchronization is the set of techniques that are used to coordinate

More information

Distributed Synchronization. EECS 591 Farnam Jahanian University of Michigan

Distributed Synchronization. EECS 591 Farnam Jahanian University of Michigan Distributed Synchronization EECS 591 Farnam Jahanian University of Michigan Reading List Tanenbaum Chapter 5.1, 5.4 and 5.5 Clock Synchronization Distributed Election Mutual Exclusion Clock Synchronization

More information

Clock and Time. THOAI NAM Faculty of Information Technology HCMC University of Technology

Clock and Time. THOAI NAM Faculty of Information Technology HCMC University of Technology Clock and Time THOAI NAM Faculty of Information Technology HCMC University of Technology Using some slides of Prashant Shenoy, UMass Computer Science Chapter 3: Clock and Time Time ordering and clock synchronization

More information

Coordination and Agreement

Coordination and Agreement Coordination and Agreement Nicola Dragoni Embedded Systems Engineering DTU Informatics 1. Introduction 2. Distributed Mutual Exclusion 3. Elections 4. Multicast Communication 5. Consensus and related problems

More information

Distributed Systems. Rik Sarkar James Cheney Global State & Distributed Debugging February 3, 2014

Distributed Systems. Rik Sarkar James Cheney Global State & Distributed Debugging February 3, 2014 Distributed Systems Rik Sarkar James Cheney Global State & Distributed Debugging Global State: Consistent Cuts The global state is the combination of all process states and the states of the communication

More information

Verteilte Systeme/Distributed Systems Ch. 5: Various distributed algorithms

Verteilte Systeme/Distributed Systems Ch. 5: Various distributed algorithms Verteilte Systeme/Distributed Systems Ch. 5: Various distributed algorithms Holger Karl Computer Networks Group Universität Paderborn Goal of this chapter Apart from issues in distributed time and resulting

More information

Lecture 7: Logical Time

Lecture 7: Logical Time Lecture 7: Logical Time 1. Question from reviews a. In protocol, is in- order delivery assumption reasonable? i. TCP provides it b. In protocol, need all participants to be present i. Is this a reasonable

More information

Frequently asked questions from the previous class survey

Frequently asked questions from the previous class survey CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [DISTRIBUTED COORDINATION/MUTUAL EXCLUSION] Shrideep Pallickara Computer Science Colorado State University L22.1 Frequently asked questions from the previous

More information

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University Frequently asked questions from the previous class survey CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [DISTRIBUTED COORDINATION/MUTUAL EXCLUSION] Shrideep Pallickara Computer Science Colorado State University

More information

CSE Traditional Operating Systems deal with typical system software designed to be:

CSE Traditional Operating Systems deal with typical system software designed to be: CSE 6431 Traditional Operating Systems deal with typical system software designed to be: general purpose running on single processor machines Advanced Operating Systems are designed for either a special

More information

Distributed Systems Multicast & Group Communication Services

Distributed Systems Multicast & Group Communication Services Distributed Systems 600.437 Multicast & Group Communication Services Department of Computer Science The Johns Hopkins University 1 Multicast & Group Communication Services Lecture 3 Guide to Reliable Distributed

More information

Lecture 6: Logical Time

Lecture 6: Logical Time Lecture 6: Logical Time 1. Question from reviews a. 2. Key problem: how do you keep track of the order of events. a. Examples: did a file get deleted before or after I ran that program? b. Did this computers

More information

CSE 5306 Distributed Systems

CSE 5306 Distributed Systems CSE 5306 Distributed Systems Synchronization Jia Rao http://ranger.uta.edu/~jrao/ 1 Synchronization An important issue in distributed system is how process cooperate and synchronize with one another Cooperation

More information

Parallel and Distributed Systems. Programming Models. Why Parallel or Distributed Computing? What is a parallel computer?

Parallel and Distributed Systems. Programming Models. Why Parallel or Distributed Computing? What is a parallel computer? Parallel and Distributed Systems Instructor: Sandhya Dwarkadas Department of Computer Science University of Rochester What is a parallel computer? A collection of processing elements that communicate and

More information

Following are a few basic questions that cover the essentials of OS:

Following are a few basic questions that cover the essentials of OS: Operating Systems Following are a few basic questions that cover the essentials of OS: 1. Explain the concept of Reentrancy. It is a useful, memory-saving technique for multiprogrammed timesharing systems.

More information

Chapter 6 Synchronization

Chapter 6 Synchronization DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 6 Synchronization Clock Synchronization Figure 6-1. When each machine has its own clock, an event

More information

2. Time and Global States Page 1. University of Freiburg, Germany Department of Computer Science. Distributed Systems

2. Time and Global States Page 1. University of Freiburg, Germany Department of Computer Science. Distributed Systems 2. Time and Global States Page 1 University of Freiburg, Germany Department of Computer Science Distributed Systems Chapter 3 Time and Global States Christian Schindelhauer 12. May 2014 2. Time and Global

More information

Distributed Mutual Exclusion

Distributed Mutual Exclusion Distributed Mutual Exclusion Introduction Ricart and Agrawala's algorithm Raymond's algorithm This presentation is based on the book: Distributed operating-systems & algorithms by Randy Chow and Theodore

More information

Distributed Systems Principles and Paradigms

Distributed Systems Principles and Paradigms Distributed Systems Principles and Paradigms Chapter 05 (version 16th May 2006) Maarten van Steen Vrije Universiteit Amsterdam, Faculty of Science Dept. Mathematics and Computer Science Room R4.20. Tel:

More information

Distributed Systems Question Bank UNIT 1 Chapter 1 1. Define distributed systems. What are the significant issues of the distributed systems?

Distributed Systems Question Bank UNIT 1 Chapter 1 1. Define distributed systems. What are the significant issues of the distributed systems? UNIT 1 Chapter 1 1. Define distributed systems. What are the significant issues of the distributed systems? 2. What are different application domains of distributed systems? Explain. 3. Discuss the different

More information

Distributed Systems Course. a.o. Univ.-Prof. Dr. Harald Kosch

Distributed Systems Course. a.o. Univ.-Prof. Dr. Harald Kosch Distributed Systems Course a.o. Univ.-Prof. Dr. Harald Kosch Topics Introduction Advantages, Disadvantages, Hard/Soft Concepts Network / Distributed Operating Systems Basics of Communication Models (Client/Server,

More information

Arranging lunch value of preserving the causal order. a: how about lunch? meet at 12? a: <receives b then c>: which is ok?

Arranging lunch value of preserving the causal order. a: how about lunch? meet at 12? a: <receives b then c>: which is ok? Lamport Clocks: First, questions about project 1: due date for the design document is Thursday. Can be less than a page what we re after is for you to tell us what you are planning to do, so that we can

More information

A subtle problem. An obvious problem. An obvious problem. An obvious problem. t ss

A subtle problem. An obvious problem. An obvious problem. An obvious problem. t ss A subtle problem An obvious problem when LC = t do S doesn t make sense for Lamport clocks! there is no guarantee that LC will ever be t S is anyway executed after LC = t Fixes: if e is internal/send and

More information

SYNCHRONIZATION. DISTRIBUTED SYSTEMS Principles and Paradigms. Second Edition. Chapter 6 ANDREW S. TANENBAUM MAARTEN VAN STEEN

SYNCHRONIZATION. DISTRIBUTED SYSTEMS Principles and Paradigms. Second Edition. Chapter 6 ANDREW S. TANENBAUM MAARTEN VAN STEEN DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN واحد نجف آباد Chapter 6 SYNCHRONIZATION Dr. Rastegari - Email: rastegari@iaun.ac.ir - Tel: +98331-2291111-2488

More information

Process Management And Synchronization

Process Management And Synchronization Process Management And Synchronization In a single processor multiprogramming system the processor switches between the various jobs until to finish the execution of all jobs. These jobs will share the

More information

Chapter 4: Global State and Snapshot Recording Algorithms

Chapter 4: Global State and Snapshot Recording Algorithms Chapter 4: Global State and Snapshot Recording Algorithms Ajay Kshemkalyani and Mukesh Singhal Distributed Computing: Principles, Algorithms, and Systems Cambridge University Press A. Kshemkalyani and

More information

CSE 5306 Distributed Systems. Synchronization

CSE 5306 Distributed Systems. Synchronization CSE 5306 Distributed Systems Synchronization 1 Synchronization An important issue in distributed system is how processes cooperate and synchronize with one another Cooperation is partially supported by

More information

Distributed Systems Exam 1 Review. Paul Krzyzanowski. Rutgers University. Fall 2016

Distributed Systems Exam 1 Review. Paul Krzyzanowski. Rutgers University. Fall 2016 Distributed Systems 2016 Exam 1 Review Paul Krzyzanowski Rutgers University Fall 2016 Question 1 Why does it not make sense to use TCP (Transmission Control Protocol) for the Network Time Protocol (NTP)?

More information

CSE 124: LAMPORT AND VECTOR CLOCKS. George Porter October 30, 2017

CSE 124: LAMPORT AND VECTOR CLOCKS. George Porter October 30, 2017 CSE 124: LAMPORT AND VECTOR CLOCKS George Porter October 30, 2017 ATTRIBUTION These slides are released under an Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) Creative Commons license

More information

Distributed Deadlocks. Prof. Ananthanarayana V.S. Dept. of Information Technology N.I.T.K., Surathkal

Distributed Deadlocks. Prof. Ananthanarayana V.S. Dept. of Information Technology N.I.T.K., Surathkal Distributed Deadlocks Prof. Ananthanarayana V.S. Dept. of Information Technology N.I.T.K., Surathkal Objectives of This Module In this module different kind of resources, different kind of resource request

More information

Group ID: 3 Page 1 of 15. Authors: Anne-Marie Bosneag Ioan Raicu Davis Ford Liqiang Zhang

Group ID: 3 Page 1 of 15. Authors: Anne-Marie Bosneag Ioan Raicu Davis Ford Liqiang Zhang Page 1 of 15 Causal Multicast over a Reliable Point-to-Point Protocol Authors: Anne-Marie Bosneag Ioan Raicu Davis Ford Liqiang Zhang Page 2 of 15 Table of Contents Cover Page 1 Table of Contents 2 1.0

More information

Self Stabilization. CS553 Distributed Algorithms Prof. Ajay Kshemkalyani. by Islam Ismailov & Mohamed M. Ali

Self Stabilization. CS553 Distributed Algorithms Prof. Ajay Kshemkalyani. by Islam Ismailov & Mohamed M. Ali Self Stabilization CS553 Distributed Algorithms Prof. Ajay Kshemkalyani by Islam Ismailov & Mohamed M. Ali Introduction There is a possibility for a distributed system to go into an illegitimate state,

More information

PROCESS SYNCHRONIZATION

PROCESS SYNCHRONIZATION DISTRIBUTED COMPUTER SYSTEMS PROCESS SYNCHRONIZATION Dr. Jack Lange Computer Science Department University of Pittsburgh Fall 2015 Process Synchronization Mutual Exclusion Algorithms Permission Based Centralized

More information

Mutual Exclusion. A Centralized Algorithm

Mutual Exclusion. A Centralized Algorithm Mutual Exclusion Processes in a distributed system may need to simultaneously access the same resource Mutual exclusion is required to prevent interference and ensure consistency We will study three algorithms

More information

Concurrency: Mutual Exclusion and Synchronization

Concurrency: Mutual Exclusion and Synchronization Concurrency: Mutual Exclusion and Synchronization 1 Needs of Processes Allocation of processor time Allocation and sharing resources Communication among processes Synchronization of multiple processes

More information

Time in Distributed Systems

Time in Distributed Systems Time Slides are a variant of slides of a set by Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13- 239227-5 Time in Distributed

More information

VALLIAMMAI ENGINEERING COLLEGE

VALLIAMMAI ENGINEERING COLLEGE VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur 603 203 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK II SEMESTER CP7204 Advanced Operating Systems Regulation 2013 Academic Year

More information

CSE 486/586 Distributed Systems

CSE 486/586 Distributed Systems CSE 486/586 Distributed Systems Reliable Multicast (part 1) Slides by Steve Ko Computer Sciences and Engineering University at Buffalo CSE 486/586 Last Time Global state A union of all process states Consistent

More information

Three Models. 1. Time Order 2. Distributed Algorithms 3. Nature of Distributed Systems1. DEPT. OF Comp Sc. and Engg., IIT Delhi

Three Models. 1. Time Order 2. Distributed Algorithms 3. Nature of Distributed Systems1. DEPT. OF Comp Sc. and Engg., IIT Delhi DEPT. OF Comp Sc. and Engg., IIT Delhi Three Models 1. CSV888 - Distributed Systems 1. Time Order 2. Distributed Algorithms 3. Nature of Distributed Systems1 Index - Models to study [2] 1. LAN based systems

More information

Distributed Mutual Exclusion Algorithms

Distributed Mutual Exclusion Algorithms Chapter 9 Distributed Mutual Exclusion Algorithms 9.1 Introduction Mutual exclusion is a fundamental problem in distributed computing systems. Mutual exclusion ensures that concurrent access of processes

More information

IT 540 Operating Systems ECE519 Advanced Operating Systems

IT 540 Operating Systems ECE519 Advanced Operating Systems IT 540 Operating Systems ECE519 Advanced Operating Systems Prof. Dr. Hasan Hüseyin BALIK (5 th Week) (Advanced) Operating Systems 5. Concurrency: Mutual Exclusion and Synchronization 5. Outline Principles

More information

Time Synchronization and Logical Clocks

Time Synchronization and Logical Clocks Time Synchronization and Logical Clocks CS 240: Computing Systems and Concurrency Lecture 5 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Today 1. The

More information

Distributed Systems Exam 1 Review Paul Krzyzanowski. Rutgers University. Fall 2016

Distributed Systems Exam 1 Review Paul Krzyzanowski. Rutgers University. Fall 2016 Distributed Systems 2015 Exam 1 Review Paul Krzyzanowski Rutgers University Fall 2016 1 Question 1 Why did the use of reference counting for remote objects prove to be impractical? Explain. It s not fault

More information

Chapter 6 Synchronization (1)

Chapter 6 Synchronization (1) DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 6 Synchronization (1) With material from Ken Birman Tanenbaum & Van Steen, Distributed Systems:

More information

OUTLINE. Introduction Clock synchronization Logical clocks Global state Mutual exclusion Election algorithms Deadlocks in distributed systems

OUTLINE. Introduction Clock synchronization Logical clocks Global state Mutual exclusion Election algorithms Deadlocks in distributed systems Chapter 5 Synchronization OUTLINE Introduction Clock synchronization Logical clocks Global state Mutual exclusion Election algorithms Deadlocks in distributed systems Concurrent Processes Cooperating processes

More information

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University CS 555: DISTRIBUTED SYSTEMS [LOGICAL CLOCKS] Shrideep Pallickara Computer Science Colorado State University Frequently asked questions from the previous class survey What happens in a cluster when 2 machines

More information

Distributed Databases Systems

Distributed Databases Systems Distributed Databases Systems Lecture No. 07 Concurrency Control Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro Outline

More information

Efficient and Secure Source Authentication for Multicast

Efficient and Secure Source Authentication for Multicast Efficient and Secure Source Authentication for Multicast Authors: Adrian Perrig, Ran Canetti Dawn Song J. D. Tygar Presenter: Nikhil Negandhi CSC774 Network Security Outline: Background Problem Related

More information

Networked Systems and Services, Fall 2018 Chapter 4. Jussi Kangasharju Markku Kojo Lea Kutvonen

Networked Systems and Services, Fall 2018 Chapter 4. Jussi Kangasharju Markku Kojo Lea Kutvonen Networked Systems and Services, Fall 2018 Chapter 4 Jussi Kangasharju Markku Kojo Lea Kutvonen Chapter Outline Overview of interprocess communication Remote invocations (RPC etc.) Persistence and synchronicity

More information

Symmetric Multiprocessors: Synchronization and Sequential Consistency

Symmetric Multiprocessors: Synchronization and Sequential Consistency Constructive Computer Architecture Symmetric Multiprocessors: Synchronization and Sequential Consistency Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology November

More information

Analysis of Distributed Snapshot Algorithms

Analysis of Distributed Snapshot Algorithms Analysis of Distributed Snapshot Algorithms arxiv:1601.08039v1 [cs.dc] 29 Jan 2016 Sharath Srivatsa sharath.srivatsa@iiitb.org September 15, 2018 Abstract Many problems in distributed systems can be cast

More information

CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University

CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University Fault Tolerance Dr. Yong Guan Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University Outline for Today s Talk Basic Concepts Process Resilience Reliable

More information

Last Class: Clock Synchronization. Today: More Canonical Problems

Last Class: Clock Synchronization. Today: More Canonical Problems Last Class: Clock Synchronization Logical clocks Vector clocks Global state Lecture 11, page 1 Today: More Canonical Problems Distributed snapshot and termination detection Election algorithms Bully algorithm

More information

Distributed Mutual Exclusion

Distributed Mutual Exclusion Distributed Mutual Exclusion Mutual Exclusion Well-understood in shared memory systems Requirements: at most one process in critical section (safety) if more than one requesting process, someone enters

More information

wait with priority An enhanced version of the wait operation accepts an optional priority argument:

wait with priority An enhanced version of the wait operation accepts an optional priority argument: wait with priority An enhanced version of the wait operation accepts an optional priority argument: syntax: .wait the smaller the value of the parameter, the highest the priority

More information

WEEK 5 - APPLICATION OF PETRI NETS. 4.4 Producers-consumers problem with priority

WEEK 5 - APPLICATION OF PETRI NETS. 4.4 Producers-consumers problem with priority 4.4 Producers-consumers problem with priority The net shown in Fig. 27 represents a producers-consumers system with priority, i.e., consumer A has priority over consumer B in the sense that A can consume

More information

Event Ordering. Greg Bilodeau CS 5204 November 3, 2009

Event Ordering. Greg Bilodeau CS 5204 November 3, 2009 Greg Bilodeau CS 5204 November 3, 2009 Fault Tolerance How do we prepare for rollback and recovery in a distributed system? How do we ensure the proper processing order of communications between distributed

More information

Verteilte Systeme (Distributed Systems)

Verteilte Systeme (Distributed Systems) Verteilte Systeme (Distributed Systems) Karl M. Göschka Karl.Goeschka@tuwien.ac.at http://www.infosys.tuwien.ac.at/teaching/courses/ VerteilteSysteme/ Lecture 6: Clocks and Agreement Synchronization of

More information

Large Systems: Design + Implementation: Communication Coordination Replication. Image (c) Facebook

Large Systems: Design + Implementation: Communication Coordination Replication. Image (c) Facebook Large Systems: Design + Implementation: Image (c) Facebook Communication Coordination Replication Credits Slides largely based on Distributed Systems, 3rd Edition Maarten van Steen Andrew S. Tanenbaum

More information

Last Class: Synchronization

Last Class: Synchronization Last Class: Synchronization Synchronization primitives are required to ensure that only one thread executes in a critical section at a time. Concurrent programs Low-level atomic operations (hardware) load/store

More information

TIME ATTRIBUTION 11/4/2018. George Porter Nov 6 and 8, 2018

TIME ATTRIBUTION 11/4/2018. George Porter Nov 6 and 8, 2018 TIME George Porter Nov 6 and 8, 2018 ATTRIBUTION These slides are released under an Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) Creative Commons license These slides incorporate

More information

MODELS OF DISTRIBUTED SYSTEMS

MODELS OF DISTRIBUTED SYSTEMS Distributed Systems Fö 2/3-1 Distributed Systems Fö 2/3-2 MODELS OF DISTRIBUTED SYSTEMS Basic Elements 1. Architectural Models 2. Interaction Models Resources in a distributed system are shared between

More information

Resource Sharing & Management

Resource Sharing & Management Resource Sharing & Management P.C.P Bhatt P.C.P Bhatt OS/M6/V1/2004 1 Introduction Some of the resources connected to a computer system (image processing resource) may be expensive. These resources may

More information

Several of these problems are motivated by trying to use solutiions used in `centralized computing to distributed computing

Several of these problems are motivated by trying to use solutiions used in `centralized computing to distributed computing Studying Different Problems from Distributed Computing Several of these problems are motivated by trying to use solutiions used in `centralized computing to distributed computing Problem statement: Mutual

More information

1 Process Coordination

1 Process Coordination COMP 730 (242) Class Notes Section 5: Process Coordination 1 Process Coordination Process coordination consists of synchronization and mutual exclusion, which were discussed earlier. We will now study

More information

DISTRIBUTED MUTEX. EE324 Lecture 11

DISTRIBUTED MUTEX. EE324 Lecture 11 DISTRIBUTED MUTEX EE324 Lecture 11 Vector Clocks Vector clocks overcome the shortcoming of Lamport logical clocks L(e) < L(e ) does not imply e happened before e Goal Want ordering that matches causality

More information

CSE 486/586 Distributed Systems Reliable Multicast --- 1

CSE 486/586 Distributed Systems Reliable Multicast --- 1 Distributed Systems Reliable Multicast --- 1 Steve Ko Computer Sciences and Engineering University at Buffalo Last Time Global states A union of all process states Consistent global state vs. inconsistent

More information

CS 403/534 Distributed Systems Midterm April 29, 2004

CS 403/534 Distributed Systems Midterm April 29, 2004 CS 403/534 Distributed Systems Midterm April 9, 004 3 4 5 Total Name: ID: Notes: ) Please answer the questions in the provided space after each question. ) Duration is 0 minutes 3) Closed books and closed

More information

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad Course Name Course Code Class Branch INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad -500 043 COMPUTER SCIENCE AND ENGINEERING TUTORIAL QUESTION BANK 2015-2016 : DISTRIBUTED SYSTEMS

More information

Consistency in Distributed Systems

Consistency in Distributed Systems Consistency in Distributed Systems Recall the fundamental DS properties DS may be large in scale and widely distributed 1. concurrent execution of components 2. independent failure modes 3. transmission

More information

Do not start the test until instructed to do so!

Do not start the test until instructed to do so! CS 3204 Operating Systems Midterm (Abrams) Spring 2004 VIRG INIA POLYTECHNIC INSTITUTE AND STATE U T PRO SI M UNI VERSI TY Instructions: Do not start the test until instructed to do so! Print your name

More information

Distributed Systems COMP 212. Lecture 17 Othon Michail

Distributed Systems COMP 212. Lecture 17 Othon Michail Distributed Systems COMP 212 Lecture 17 Othon Michail Synchronisation 2/29 What Can Go Wrong Updating a replicated database: Customer (update 1) adds 100 to an account, bank employee (update 2) adds 1%

More information

Exercise Unit 2: Modeling Paradigms - RT-UML. UML: The Unified Modeling Language. Statecharts. RT-UML in AnyLogic

Exercise Unit 2: Modeling Paradigms - RT-UML. UML: The Unified Modeling Language. Statecharts. RT-UML in AnyLogic Exercise Unit 2: Modeling Paradigms - RT-UML UML: The Unified Modeling Language Statecharts RT-UML in AnyLogic Simulation and Modeling I Modeling with RT-UML 1 RT-UML: UML Unified Modeling Language a mix

More information

Chapter 5 Concurrency: Mutual Exclusion and Synchronization

Chapter 5 Concurrency: Mutual Exclusion and Synchronization Operating Systems: Internals and Design Principles Chapter 5 Concurrency: Mutual Exclusion and Synchronization Seventh Edition By William Stallings Designing correct routines for controlling concurrent

More information

02 - Distributed Systems

02 - Distributed Systems 02 - Distributed Systems Definition Coulouris 1 (Dis)advantages Coulouris 2 Challenges Saltzer_84.pdf Models Physical Architectural Fundamental 2/58 Definition Distributed Systems Distributed System is

More information

02 - Distributed Systems

02 - Distributed Systems 02 - Distributed Systems Definition Coulouris 1 (Dis)advantages Coulouris 2 Challenges Saltzer_84.pdf Models Physical Architectural Fundamental 2/60 Definition Distributed Systems Distributed System is

More information

Chapter 4: network layer. Network service model. Two key network-layer functions. Network layer. Input port functions. Router architecture overview

Chapter 4: network layer. Network service model. Two key network-layer functions. Network layer. Input port functions. Router architecture overview Chapter 4: chapter goals: understand principles behind services service models forwarding versus routing how a router works generalized forwarding instantiation, implementation in the Internet 4- Network

More information

THE TRANSPORT LAYER UNIT IV

THE TRANSPORT LAYER UNIT IV THE TRANSPORT LAYER UNIT IV The Transport Layer: The Transport Service, Elements of Transport Protocols, Congestion Control,The internet transport protocols: UDP, TCP, Performance problems in computer

More information

PART B UNIT II COMMUNICATION IN DISTRIBUTED SYSTEM PART A

PART B UNIT II COMMUNICATION IN DISTRIBUTED SYSTEM PART A CS6601 DISTRIBUTED SYSTEMS QUESTION BANK UNIT 1 INTRODUCTION 1. What is a distributed system? 2. Mention few examples of distributed systems. 3. Mention the trends in distributed systems. 4. What are backbones

More information

Synchronization Part 1. REK s adaptation of Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 5

Synchronization Part 1. REK s adaptation of Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 5 Synchronization Part 1 REK s adaptation of Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 5 1 Outline! Clock Synchronization! Clock Synchronization Algorithms! Logical Clocks! Election

More information

Midterm Examination ECE 419S 2015: Distributed Systems Date: March 13th, 2015, 6-8 p.m.

Midterm Examination ECE 419S 2015: Distributed Systems Date: March 13th, 2015, 6-8 p.m. Midterm Examination ECE 419S 2015: Distributed Systems Date: March 13th, 2015, 6-8 p.m. Instructor: Cristiana Amza Department of Electrical and Computer Engineering University of Toronto Problem number

More information