Slides for Chapter 12: Coordination and Agreement

Similar documents
Distributed Systems (5DV147)

Time and Coordination in Distributed Systems. Operating Systems

Slides for Chapter 15: Coordination and Agreement

Recap: Consensus. CSE 486/586 Distributed Systems Mutual Exclusion. Why Mutual Exclusion? Why Mutual Exclusion? Mutexes. Mutual Exclusion C 1

Distributed Systems. 7. Coordination and Agreement

Coordination 1. To do. Mutual exclusion Election algorithms Next time: Global state. q q q

Election Algorithms. has elected i. will eventually set elected i

Distributed Systems. coordination Johan Montelius ID2201. Distributed Systems ID2201

CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [DISTRIBUTED MUTUAL EXCLUSION] Frequently asked questions from the previous class survey

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University

CSE 486/586 Distributed Systems

Distributed Synchronization. EECS 591 Farnam Jahanian University of Michigan

Mutual Exclusion. A Centralized Algorithm

Last Class: Clock Synchronization. Today: More Canonical Problems

Coordination and Agreement

PROCESS SYNCHRONIZATION

Distributed Systems Coordination and Agreement

Process Synchroniztion Mutual Exclusion & Election Algorithms

Homework #2 Nathan Balon CIS 578 October 31, 2004

Distributed Algorithms

Clock Synchronization. Synchronization. Clock Synchronization Algorithms. Physical Clock Synchronization. Tanenbaum Chapter 6 plus additional papers

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University

CSE 5306 Distributed Systems. Synchronization

Last Class: Clock Synchronization. Today: More Canonical Problems

Algorithms and protocols for distributed systems

Distributed Mutual Exclusion

Exam 2 Review. Fall 2011

Chapter 6 Synchronization (2)

Frequently asked questions from the previous class survey

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University

Distributed Systems Exam 1 Review. Paul Krzyzanowski. Rutgers University. Fall 2016

Synchronization Part 2. REK s adaptation of Claypool s adaptation oftanenbaum s Distributed Systems Chapter 5 and Silberschatz Chapter 17

CMPSCI 677 Operating Systems Spring Lecture 14: March 9

Shuigeng Zhou. May 18, 2016 School of Computer Science Fudan University

Chapter 6 Synchronization

SEGR 550 Distributed Computing. Final Exam, Fall 2011

Several of these problems are motivated by trying to use solutiions used in `centralized computing to distributed computing

Synchronization. Chapter 5

A Two-Layer Hybrid Algorithm for Achieving Mutual Exclusion in Distributed Systems

Synchronization. Clock Synchronization

22. Swaping: Policies

Lecture 2: Leader election algorithms.

Distributed Operating Systems. Distributed Synchronization

Distributed Systems - II

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Distributed Systems Leader election & Failure detection

Verteilte Systeme (Distributed Systems)

Synchronization. Distributed Systems IT332

CSE 5306 Distributed Systems

OUTLINE. Introduction Clock synchronization Logical clocks Global state Mutual exclusion Election algorithms Deadlocks in distributed systems

A DAG-BASED ALGORITHM FOR DISTRIBUTED MUTUAL EXCLUSION ATHESIS MASTER OF SCIENCE

Chapter 16: Distributed Synchronization

Distributed Systems. Lec 9: Distributed File Systems NFS, AFS. Slide acks: Dave Andersen


Last Time. 19: Distributed Coordination. Distributed Coordination. Recall. Event Ordering. Happens-before

Chapter 18: Distributed

Privacy Preserving Moving KNN Queries

Distributed Systems 8L for Part IB

Election Administration Algorithm for Distributed Computing

1.5 Case Study. dynamic connectivity quick find quick union improvements applications

10 File System Mass Storage Structure Mass Storage Systems Mass Storage Structure Mass Storage Structure FILE SYSTEM 1

Synchronization Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University

10. Multiprocessor Scheduling (Advanced)

search(i): Returns an element in the data structure associated with key i

Event Ordering Silberschatz, Galvin and Gagne. Operating System Concepts

Verteilte Systeme/Distributed Systems Ch. 5: Various distributed algorithms

CS649 Sensor Networks IP Track Lecture 6: Graphical Models

has been retired This version of the software Sage Timberline Office Get Started Document Management 9.8 NOTICE

Self Stabilization. CS553 Distributed Algorithms Prof. Ajay Kshemkalyani. by Islam Ismailov & Mohamed M. Ali

Distributed Systems Principles and Paradigms

Distributed Coordination! Distr. Systems: Fundamental Characteristics!

Optimizing Dynamic Memory Management!

Distributed Systems Exam 1 Review Paul Krzyzanowski. Rutgers University. Fall 2016

Control. CS432: Distributed Systems Spring 2017

Leader Election in Rings

Replication Using Group Communication Over a Partitioned Network

Continuous Visible k Nearest Neighbor Query on Moving Objects

Distributed Systems. Before We Begin. Advantages. What is a Distributed System? CSE 120: Principles of Operating Systems. Lecture 13.

The Anubis Service. Paul Murray Internet Systems and Storage Laboratory HP Laboratories Bristol HPL June 8, 2005*

DISTRIBUTED SYSTEMS [COMP9243] Lecture 5: Synchronisation and Coordination (Part 2) TRANSACTION EXAMPLES TRANSACTIONS.

DISTRIBUTED SYSTEMS [COMP9243] Lecture 5: Synchronisation and Coordination (Part 2) TRANSACTION EXAMPLES TRANSACTIONS.

Theoretical Analysis of Graphcut Textures

SYNCHRONIZATION. DISTRIBUTED SYSTEMS Principles and Paradigms. Second Edition. Chapter 6 ANDREW S. TANENBAUM MAARTEN VAN STEEN

Sage Estimating (formerly Sage Timberline Estimating) Getting Started Guide. Version has been retired. This version of the software

42. Crash Consistency: FSCK and Journaling

Sensitivity Analysis for an Optimal Routing Policy in an Ad Hoc Wireless Network

殷亚凤. Synchronization. Distributed Systems [6]

Leader Election Algorithms in Distributed Systems

Silberschatz and Galvin Chapter 18

Crash Recovery CSC 375 Fall 2012 R&G - Chapter 18

Network Layer: Routing

Efficient Processing of Top-k Dominating Queries on Multi-Dimensional Data

CLIC CLient-Informed Caching for Storage Servers

Synchronization (contd.)

33. Event-based Concurrency

To appear in IEEE TKDE Title: Efficient Skyline and Top-k Retrieval in Subspaces Keywords: Skyline, Top-k, Subspace, B-tree

Lecture 8: Orthogonal Range Searching

June 22/24/ Gerd Liefländer System Architecture Group Universität Karlsruhe (TH), System Architecture Group

Distributed Systems. Pre-Exam 1 Review. Paul Krzyzanowski. Rutgers University. Fall 2015

Process groups and message ordering

Transcription:

Slides for hater : oordination and greement rom oulouris, ollimore and Kindberg istributed Systems: oncets and esign dition, ailure ssumtions and ailure etectors reliable communication channels rocess failures: crashes failure detector: object/code in a rocess that detects failures of other rocesses unreliable failure detector unsusected or susected (evidence of ossible failures) each rocess sends ``alive'' message to everyone else not receiving ``alive'' message after timeout most ractical systems reliable failure detector unsusected or failure synchronous system few ractical systems istributed Mutual xclusion rovide critical region in a distributed environment message assing for examle, locking files, lockd daemon in UNX (NS is stateless, no file-locking at the NS level) lgorithms for mutual exclusion N rocesses rocesses don't fail message delivery is reliable critical region: enter(), resourceccesses(), exit() Proerties: [M] safety: only one rocess at a time [M] liveness: eventually enter or exit [M3] haened-before ordering: ordering of enter() is the same as ordering lgorithms for mutual exclusion Performance evaluation: overhead and bandwidth consumtion: # of messages sent client delay incurred by a rocess at entry and exit throughut measured by synchronization delay: delay between one's exit and next's entry central server algorithm server kees track of a ---ermission to enter critical region a rocess requests the server for the the server grants the if it has the a rocess can enter if it gets the, otherwise waits when done, a rocess sends release and exits

Server managing a mutual exclusion for a set of rocesses central server algorithm Queue of requests. Request Server 3. rant. Release 3 Proerties: safety, why? liveness, why? ordering not guaranteed, why? [V in rocesses vs. server] Performance: enter overhead: two messages (request and grant) enter delay: time between request and grant exit overhead: one message (release) exit delay: none synchronization delay: between release and grant centralized server is the bottle neck ring-based algorithm ring of rocesses transferring a mutual exclusion logical ring, could be unrelated to the hysical configuration i sends messages to (i+) mod N when a rocess holds a, it can enter, otherwise waits when a rocess releases a (exit), it sends to its neighbor n 3 Token ring-based algorithm roerties: safety, why? liveness, why? ordering not guaranteed, why? Performance: bandwidth consumtion: kees circulating enter overhead: 0 to N messages enter delay: delay for 0 to N messages exit overhead: one message exit delay: none synchronization delay: delay for to N messages n algorithm using multicast and logical clocks multicast a request message for the enter only if all the other rocesses rely totally-ordered timestams: <T, i > each rocess kees a state: RLS, L, WNT if all have state = RLS, all rely, a rocess can hold the and enter if a rocess has state = L, doesn't rely until it exits if more than one rocess has state = WNT, rocess with the lowest timestam will get all N- relies first.

Ricart and grawala s algorithm Multicast synchronization On initialization state := RLS; To enter the section state := WNT; Multicast request to all rocesses; request rocessing deferred here T := request s timestam; Wait until (number of relies received = (N )); state := L; Rely 3 On receit of a request <T i, i > at j (i j) if (state = L or (state = WNT and (T, j ) < (T i, i ))) then queue request from i without relying; else rely immediately to i ; end if To exit the critical section state := RLS; rely to any queued requests; Rely 3 3 Rely 3 n algorithm using multicast and logical locks n algorithm using multicast and logical locks Proerties safety, why? liveness, why? ordering, why? erformance: bandwidth consumtion: no kees circulating entry overhead: (N-), why? [with multicast suort: + (N -) = N] entry delay: delay between request and getting all relies exit overhead: 0 to N- messages exit delay: none synchronization delay: delay for message (one last rely from the revious holder) Maekawa s Voting lgorithm Main dea We actually don t need all N relies onsider rocesses,, X needs relies from and X needs relies from and X f X can only rely to (vote) *one rocess at a time* and cannot have a rely from X at the same time Mutex between and --hinges on X Processes in overlaing grous members in the overla control mutex Mutual exclusion for all rocesses grou for every rocess air of grous for and overlas => mutex(, ) very ossible air of grous overlas => mutex(all ossible airs of rocesses) => mutex(all rocesses) 3

rous and members Parameters and grou membershi grou for each rocess N rocesses N grous ach grou has K members numbers of rocesses to request for ermission ach rocess is in M grous (M > ) llows overlaing => mutex f the grous are disjoint, M =, no overlaing, no mutex number of rocesses to grant ermission N = number of rocesses K = voting grou size M = number of voting grous each rocess is in Otimal (smallest K, why?) K = M ~= sqrt(n) Non-trivial to construct the grous roximation K = * sqrt(n) - Put rocess id s in a sqrt(n) by sqrt(n) table Union the rows and columns where i is N grous M = * sqrt(n) - rou membershi rou for : {,,,, } rou for : {,,,, } rou for : {,,,, } very air of grous overla N grous ach grou has K = * sqrt(n) members M = * sqrt(n) [# of grous each rocess are in] rou membershi (alternative?) rous for,, : {,,,, } rous for,, : {,,,, } rous for,, : {,,,, } very air of grous overla N grous [Sqrt(N) unique grous] ach grou has K = * sqrt(n) members M = 3 [sqrt(n)] for,,, but M = 6 [*sqrt(n)] for the rest Undesirable, why? Sketch of the Voting lgorithm igure.6 Maekawa s voting algorithm grou of rocesses V i for each rocess i ach air of grous overla X in grous: [,X] and [,X] => the grous for any air of rocesses overla To enter the critical region, i Sends RQUST s to all rocesses in V i Waits for RPLY s (VOT s) from all rocesses in V i To exit the critical region, i Sends RLS s to all rocesses in V i Three tyes of messages, not two as in the multicast alg. On initialization state := RLS; voted := LS; or i to enter the critical section state := WNT; Multicast request to all rocesses in V i ; Wait until (number of relies received = K); state := L; On receit of a request from i at j if (state = L or voted = TRU) then queue request from i without relying; else send rely to i ; voted := TRU; end if or i to exit the critical section state := RLS; Multicast release to all rocesses in V i ; On receit of a release from i at j if (queue of requests is non-emty) then remove head of queue from k, say; send rely to k ; voted := TRU; else voted := LS; end if

eadlock? Processes:,, rou :, rou :, rou :, eadlock has s rely, waiting for s rely has s rely, waiting for s rely has s rely, waiting for s rely Timestam the requests in ordering holding according to the timestam Proerties Safety: No rocess can rely/vote more than once at any time. Liveness: timestam ( ordering) ordering: above Performance ntry overhead [assuming k = sqrt(n)] Sqrt(N) requests + Sqrt(N) relies * Sqrt(N) < (N - ) [N > ] xit overhead Sqrt(N) releases lections choosing a unique rocess for a articular role for examle, server in dist. mutex each rocess can call only one multile concurrent s can be called by different rocesses articiant: engages in an rocess with the largest id wins each rocess i has variable elected i =? (don't know) initially lections Proerties: [] elected i of a ``articiant'' rocess must be P max (elected rocess---largest id) or? [] liveness: all rocesses articiate and eventually set elected i!=? (or crash) Performance: overhead (bandwidth consumtion): # of messages turnaround time: # of messages to comlete an ring-based algorithm logical ring, could be unrelated to the hysical configuration i sends messages to (i+) mod N no failures elect the coordinator with the largest id initially, every rocess is a non-articiant any rocess can call an : marks itself as articiant laces its id in an message sends the message to its neighbor 5

Ring-based algorithm ring-based in rogress receiving an message: if id > myid, forward the msg, mark articiant if id < myid non-articiant: relace id with myid: forward the msg, mark articiant articiant: sto forwarding (why? Later, multile s) if id = myid, coordinator found, mark non-articiant, elected i := id, send elected message with myid receiving an elected message: id!= myid, mark non-articiant, elected i := id forward the msg if id = myid, sto forwarding 9 5 3 Note: The was started by rocess 7. The highest rocess identifier encountered so far is. Particiant rocesses are shown darkened 7 8 Ring-based algorithm Ring-based algorithm Proerties: safety: only the rocess with the largest id can send an elected message liveness: every rocess in the ring eventually articiates in the ; extra s are stoed Performance: one, best case, when? N messages N elected messages turnaround: N messages one, worst case, when? N - messages N elected messages turnaround: 3N - messages can't tolerate failures, not very ractical rocesses can crash and can be detected by other rocesses timeout T = T transmitting + T rocessing each rocess knows all the other rocesses and can communicate with them Messages:, answer, coordinator start an detects the coordinator has failed sends an message to all rocesses with higher id's and waits for answers (excet the failed coordinator/rocess) if no answers in time T, it is the coordinator sends coordinator message (with its id) to all rocesses with lower id's else waits for a coordinator message starts an if timeout 6

The of coordinator, after the failure of and then 3 Stage Stage Stage 3 ventually... Stage answer 3 timeout answer coordinator answer 3 3 3 receiving an message sends an answer message back starts an if it hasn't started one send messages to all higher-id rocesses (including the failed coordinator the coordinator might be u by now) receiving a coordinator message set elected i to the new coordinator to be a coordinator, it has to start an when a crashed rocess is relaced the new rocess starts an and can relace the current coordinator (hence ``bully'') roerties: safety: a lower-id rocess always yields to a higher-id rocess owever, during an, if a failed rocess is relaced the low-id rocesses might have two different coordinators: the newly elected coordinator and the new rocess, why? failure detection might be unreliable liveness: all rocesses articiate and know the coordinator at the end Performance best case: when? overhead: N- coordinator messages turnaround delay: no /answer messages worst case: when? overhead: + +...+ (N-) + (N-)= (N-)(N-)/ + (N-) messages, +...+ (N-) answer messages, N- coordinator messages, total: (N-)(N-) + (N-) = (N+)(N-) = O(N ) turnaround delay: delay of and answer messages 7