CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [DISTRIBUTED MUTUAL EXCLUSION] Frequently asked questions from the previous class survey

Similar documents
CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University

Frequently asked questions from the previous class survey

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

CSE 486/586 Distributed Systems

Coordination 1. To do. Mutual exclusion Election algorithms Next time: Global state. q q q

Distributed Systems (5DV147)

Coordination and Agreement

Distributed Systems. coordination Johan Montelius ID2201. Distributed Systems ID2201

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Mutual Exclusion. A Centralized Algorithm

PROCESS SYNCHRONIZATION

Chapter 6 Synchronization

Homework #2 Nathan Balon CIS 578 October 31, 2004

Process Synchroniztion Mutual Exclusion & Election Algorithms

Advanced Topics in Distributed Systems. Dr. Ayman A. Abdel-Hamid. Computer Science Department Virginia Tech

Distributed Mutual Exclusion

Slides for Chapter 15: Coordination and Agreement

Clock Synchronization. Synchronization. Clock Synchronization Algorithms. Physical Clock Synchronization. Tanenbaum Chapter 6 plus additional papers

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

A Two-Layer Hybrid Algorithm for Achieving Mutual Exclusion in Distributed Systems

Distributed Synchronization. EECS 591 Farnam Jahanian University of Michigan

CSE 5306 Distributed Systems. Synchronization

DISTRIBUTED MUTEX. EE324 Lecture 11

SYNCHRONIZATION. DISTRIBUTED SYSTEMS Principles and Paradigms. Second Edition. Chapter 6 ANDREW S. TANENBAUM MAARTEN VAN STEEN

Distributed Systems Coordination and Agreement

Chapter 6 Synchronization (2)

Distributed Systems (5DV147)

Several of these problems are motivated by trying to use solutiions used in `centralized computing to distributed computing

Recap: Consensus. CSE 486/586 Distributed Systems Mutual Exclusion. Why Mutual Exclusion? Why Mutual Exclusion? Mutexes. Mutual Exclusion C 1

Synchronization Part 2. REK s adaptation of Claypool s adaptation oftanenbaum s Distributed Systems Chapter 5 and Silberschatz Chapter 17

Distributed Systems COMP 212. Lecture 1 Othon Michail

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University

Distributed Mutual Exclusion Algorithms

Module 7 - Replication

CS370: Operating Systems [Spring 2017] Dept. Of Computer Science, Colorado State University

CSE 5306 Distributed Systems

Introduction to Distributed Systems. Fabienne Boyer, LIG,

殷亚凤. Synchronization. Distributed Systems [6]

CS370: Operating Systems [Spring 2016] Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Middleware and Interprocess Communication

Kingdom of Saudi Arabia Ministry of Higher Education College of Computer & Information Sciences Majmaah University. Course Profile

Slides for Chapter 12: Coordination and Agreement

Module 8 Fault Tolerance CS655! 8-1!

Frequently asked questions from the previous class survey

Large Systems: Design + Implementation: Communication Coordination Replication. Image (c) Facebook

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Frequently asked questions from the previous class survey

Synchronization. Chapter 5

Algorithms and protocols for distributed systems

CS370: System Architecture & Software [Fall 2014] Dept. Of Computer Science, Colorado State University

Distributed Systems. Wintersemester - Winter Term 2003

CS370: System Architecture & Software [Fall 2014] Dept. Of Computer Science, Colorado State University

Chapter 16: Distributed Synchronization

Distributed Operating Systems. Distributed Synchronization

Module 8 - Fault Tolerance

Distributed Systems COMP 212. Lecture 1 Othon Michail

Synchronization. Clock Synchronization

Operating Systems. Mutual Exclusion in Distributed Systems. Lecturer: William Fornaciari

Chapter 18: Distributed

MYE017 Distributed Systems. Kostas Magoutis

Last Class: Clock Synchronization. Today: More Canonical Problems

PRIMARY-BACKUP REPLICATION

Verteilte Systeme/Distributed Systems Ch. 5: Various distributed algorithms

CS370: System Architecture & Software [Fall 2014] Dept. Of Computer Science, Colorado State University

Distributed Systems Concepts Design 5th Edition Solutions

Distributed Systems Concepts Design 4th Edition

Synchronization Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University

11/7/2018. Event Ordering. Module 18: Distributed Coordination. Distributed Mutual Exclusion (DME) Implementation of. DME: Centralized Approach

CS370: Operating Systems [Spring 2017] Dept. Of Computer Science, Colorado State University

Silberschatz and Galvin Chapter 18

CS370: Operating Systems [Fall 2018] Dept. Of Computer Science, Colorado State University

Distributed Systems Course. a.o. Univ.-Prof. Dr. Harald Kosch

Distributed Systems Exam 1 Review. Paul Krzyzanowski. Rutgers University. Fall 2016

Distributed Systems. Lec 9: Distributed File Systems NFS, AFS. Slide acks: Dave Andersen

Frequently asked questions from the previous class survey

CS370: Operating Systems [Fall 2018] Dept. Of Computer Science, Colorado State University

(Pessimistic) Timestamp Ordering. Rules for read and write Operations. Read Operations and Timestamps. Write Operations and Timestamps

Frequently asked questions from the previous class survey

Chapter 8 Fault Tolerance

Verteilte Systeme (Distributed Systems)

416 practice questions (PQs)

Time and Coordination in Distributed Systems. Operating Systems

Distributed Systems COMP 212. Lecture 1 Othon Michail

(Pessimistic) Timestamp Ordering

Chapter 8 Fault Tolerance

CMPSCI 677 Operating Systems Spring Lecture 14: March 9

Distributed Transaction Management 2003

CS455: Introduction to Distributed Systems [Spring 2019] Dept. Of Computer Science, Colorado State University

Synchronization Part 1. REK s adaptation of Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 5

Event Ordering Silberschatz, Galvin and Gagne. Operating System Concepts

Distributed Systems - II

Frequently asked questions from the previous class survey

Distributed Systems COMP 212. Lecture 17 Othon Michail

CS370: System Architecture & Software [Fall 2014] Dept. Of Computer Science, Colorado State University

Distributed Systems Principles and Paradigms

Transcription:

CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [DISTRIBUTED MUTUAL EXCLUSION] Shrideep Pallickara Computer Science Colorado State University L23.1 Frequently asked questions from the previous class survey Yes. But what really is a second? 1 second ==time for a cesium 133 atom to make 9,192,631,770 transitions Even though we are assuming processes do not fail, how would you cope with tokens that are lost in transit? L23.2 SLIDES CREATED BY: SHRIDEEP PALLICKARA L23.1

Topics covered in this lecture Distributed Mutual Exclusion Multicast & logical clocks [Agarwala & Ricart] Maekawa s voting based algorithm Election algorithms L23.3 Requirements for distributed mutual exclusion ME1: At most one process may execute in the critical section at a time Safety ME2: Requests to enter and exit the critical section eventually succeed Liveness: Freedom from deadlocks and starvation ME3: If one request happened-before another, then entry to the CS is granted in that order L23.4 SLIDES CREATED BY: SHRIDEEP PALLICKARA L23.2

Evaluation of the algorithms Bandwidth consumed Proportional to number of messages sent in each entry and exit operation Client delay incurred by process for each entry or exit operation Effect on throughput of the system Synchronization delay between one process exiting critical section and next process entering it Throughput is greater when synchronization delay is shorter L23.5 MUTUAL EXCLUSION USING MULTICAST & LOGICAL CLOCKS {{RICART & AGARWALA S ALGORITHM} L23.6 SLIDES CREATED BY: SHRIDEEP PALLICKARA L23.3

Agarwala & Ricart s algorithm using multicast and logical clocks Processes that require entry to a critical section multicast a request message Enter it only when all other processes have replied to request Process replies to a request are designed to ensure that ME1, ME2, and ME3 are met L23.7 The setting Processes p 1, p 2,, p N have distinct identifiers Processes have communication channels to each other Each process p i keeps a Lamport clock Messages requesting entry are of the form <T, p i > T is the sender s timestamp and p i is the sender s identifier L23.8 SLIDES CREATED BY: SHRIDEEP PALLICKARA L23.4

Each process records its state Released Outside the critical section Wanted Wanting entry into the critical section Held Being in the critical section L23.9 Entering the critical section [1/2] If a process requests entry and the state of all other processes is Released All processes respond immediately and the entry is granted If a process requests entry and some process is in the state Held That holding process will not reply to requests until it has finished with the critical section All other processes respond L23.10 SLIDES CREATED BY: SHRIDEEP PALLICKARA L23.5

Entering the critical section [2/2] If two or more processes request entry at the same time? Request with the lowest timestamp will be first to collect N-1 replies If the Lamport timestamps are the same? n Requests are ordered based on their identifiers When a process requests entry? Defers all processing requests from other processes until its own request has been sent L23.11 Multicast synchronization p 1 41 41 41 Reply Reply 34 34 p 3 Reply Initial Condition: p 3 not interested in entering critical section p 1 and p 2 request entry concurrently Timestamp of p 1 s request: 41 Timestamp of p 2 s request: 34 p 2 34 p 2 enters the critical section L23.12 SLIDES CREATED BY: SHRIDEEP PALLICKARA L23.6

Achieving the properties ME1, ME2 and ME3 If two processes p i and p j (i j) enter critical section at the same time? Both these processes would have replied to each other; but the pairs <T i, p i > are totally ordered n So it s impossible Requests to enter and exit the critical section eventually succeed because requests are served based on timestamps Satisfies ME2 and ME3 (order) L23.13 Evaluation of the algorithm Gaining entry takes 2(N-1) messages N-1 to multicast the request Followed by N-1 replies Expensive in terms of bandwidth utilization Synchronization delay Just one message transmission time n Previous algorithms incurred round-trip delays L23.14 SLIDES CREATED BY: SHRIDEEP PALLICKARA L23.7

Some observations [1/2] One of the problems with the central server algorithm was that it was a single point of failure Here, the single point of failure has been replaced by N points of failure If any process crashes, it will fail to respond to requests n This silence is interpreted (incorrectly) as a denial of permission n Blocks ALL subsequent processes from entering the critical section Solution: To have timeout mechanisms in place L23.15 Some observations [2/2] Another problem with the central server algorithm was that making it handle all requests can lead to a bottleneck In this setup all processes are involved in all decisions Improvements? Getting permission from everyone is an overkill All we need is to prevent two processes from entering the CS at the same time L23.16 SLIDES CREATED BY: SHRIDEEP PALLICKARA L23.8

MAEKAWA S VOTING ALGORITHM FOR DISTRIBUTED MUTUAL EXCLUSION L23.17 Maekawa s solution to distributed mutual exclusion In order for a process to enter a critical section it is not necessary for all peers to grant access Obtain permission from subsets of peers Subsets used by any two peers must overlap Candidate process must collect sufficient votes to enter critical section L23.18 SLIDES CREATED BY: SHRIDEEP PALLICKARA L23.9

How mutual exclusion is achieved Processes at the intersection of two sets of voters ensure this Cast votes for only one candidate L23.19 Voting sets There is a voting set V i associated with each process p i (i=1,2,, N) V i {p 1, p 2,..., p N } L23.20 SLIDES CREATED BY: SHRIDEEP PALLICKARA L23.10

Voting sets The sets V i are chosen such that, for all i, j = 1, 2,, N p i V i V i V j V i = K To be fair, each process has a voting set of the same size Each process p j is contained in M of the voting sets V i L23.21 The optimal solution to the Maekawa s algorithm K ~ M = K N Each process is in as many of the voting sets as there are elements in one of the sets L23.22 SLIDES CREATED BY: SHRIDEEP PALLICKARA L23.11

Calculation of voting sets Is not trivial As an approximation Place processes in a! by! matrix Voting set V i is the union of the row and column containing p i Voting set size is then ~2! L23.23 Maekawa s voting sets Example R 1 ={1, 2, 3} R 4 ={1, 4, 5} R 6 ={1, 6, 7} R 2 ={2, 4, 6} R 5 ={2, 5, 7} R 7 ={3, 4, 7} R 3 ={3, 5, 6} K=3 N=7 R 1 ={1, 2, 3, 4} R 5 ={1, 5, 6, 7} R 8 ={1, 8, 9, 10} R 11 ={1, 11, 12, 13} R 2 ={2, 5, 8, 11} R 6 ={2, 6, 9, 12} R 7 ={2, 7, 10, 13} R 10 ={3, 5, 10, 12} R 3 ={3, 6, 8, 13} R 9 ={3, 7, 9, 11} R 13 ={4, 5, 9, 13} R 4 ={4, 6, 10, 11} R 12 ={4, 7, 8, 12} K=4 N=13 Source: Maekawa, Mamoru, Arthur E. Oldehoft, and Rodney R. Oldehoft. Operating Systems Advanced Concepts., 1987. K=5 N=21 R 1 ={1, 2, 3, 4, 5} R 6 ={1, 6, 7, 9, 9} R 10 ={1, 10, 11, 12, 13} R 14 ={1, 14, 15, 16, 17} R 18 ={1, 18, 19, 20, 21} R 2 ={2, 6, 10, 14, 18} R 7 ={2, 7, 11, 15, 19} R 8 ={2, 8, 12, 16, 20} R 9 ={2, 9, 13, 17, 21} R 11 ={3, 6, 11, 17, 20} R 3 ={3, 7, 10, 16, 21} R 13 ={3, 8, 13, 15, 18} R 12 ={3, 9, 12, 14, 19} R 15 ={4, 6, 12, 15, 21} R 4 ={4, 7, 13, 14, 20} R 17 ={4, 8, 10, 17, 19} R 16 ={4, 9, 11, 16, 18} R 19 ={5, 6, 13, 16, 19} R 5 ={5, 7, 12, 17, 18} R 21 ={5, 8, 11, 14, 21} R 20 ={5, 9, 10, 15, 20} L23.24 SLIDES CREATED BY: SHRIDEEP PALLICKARA L23.12

Entering the critical section To obtain entry into the critical section, each p i sends request message to all K members of V i Including itself p i cannot enter critical section till it has received all K reply messages L23.25 The reply message When a process p j in V i receives p i s request message it sends a reply message immediately unless Its state is HELD It has replied (voted) since it last received a release message L23.26 SLIDES CREATED BY: SHRIDEEP PALLICKARA L23.13

The release message To leave the critical section, p i sends release message to all K members of V i (incl. itself) When a process receives a release message? Removes the head of its queue of outstanding requests and sends a reply (vote) in response to it L23.27 Satisfying the safety property If it were possible for p i and p j to enter the critical section at the same time, then Processes in V i V j would have voted for both p i and p j But a process can make at most one vote between successive receipts of a release message So it is impossible for p i and p j to both enter the critical section L23.28 SLIDES CREATED BY: SHRIDEEP PALLICKARA L23.14

But the basic algorithm is deadlock prone Consider three processes p 1, p 2, and p 3 with V 1 ={p 1, p 2 }, V 2 ={p 2, p 3 } and V 3 ={p 3, p 1 } If 3 processes concurrently request entry to the critical section it is possible for: p 1 to reply to itself and hold-off p 2 p 2 to reply to itself and hold-off p 3 p 3 to reply to itself and hold-off p 1 Each process receives one of two replies; none can proceed L23.29 Resolving the deadlock issue Processes queue requests in the happened-before order This also allows ME3 to be satisfied besides ME2 L23.30 SLIDES CREATED BY: SHRIDEEP PALLICKARA L23.15

Analyzing the performance of the algorithm Bandwidth utilization 2! messages per entry into the critical section! messages per exit Total of 3! is superior to 2(N-1) required by the previous algorithm (Ricart and Agarwala) n If N 3 Synchronization delay Round-trip time L23.31 ELECTION ALGORITHMS L23.32 SLIDES CREATED BY: SHRIDEEP PALLICKARA L23.16

Election algorithms Algorithm for choosing a unique process to play a particular role When an elected process wants to retire, another election is needed L23.33 Calling an election When a process calls an election it initiates a particular run of the election algorithm A given process does not call more than one election at a time With N processes there could be N concurrent elections At any point a process p i is either: A participant: Engaged in the election algorithm Non-participant: Not engaged in the election algorithm L23.34 SLIDES CREATED BY: SHRIDEEP PALLICKARA L23.17

The choice of the elected process must be unique Even in cases where several processes call the election simultaneously E.g., 2 processes see a coordinator has failed and they both call elections L23.35 The elected process is the one with the largest identifier The identifier is any value with the provision that the identifiers are unique and totally ordered E.g., electing process with the lowest computational load Use <1/load, i}> as the identifier Process i is used to order identifiers with same load L23.36 SLIDES CREATED BY: SHRIDEEP PALLICKARA L23.18

Managing the identity of the elected process Each process p i (i=1, 2,, N) has a variable elected i Contains identifier of the elected process When a process first becomes a participant in an election Set this variable to ^ indicating that it is undefined L23.37 Requirements for the election algorithm E1 (safety) Participant process has elected i = ^ or elected i =P n P is a non-crashed process at the end of run with the largest identifier E2 (liveness) All processes p i participate and eventually either set elected i ^ or crash L23.38 SLIDES CREATED BY: SHRIDEEP PALLICKARA L23.19

Measuring performance of election algorithms Network bandwidth utilization How many messages are sent? Turnaround time for the algorithm Number of message transmissions between the initiation and termination of a run L23.39 The contents of this slide set are based on the following references Distributed Systems: Concepts and Design. George Coulouris, Jean Dollimore, Tim Kindberg, Gordon Blair. 5th Edition. Addison Wesley. ISBN: 978-0132143011 [Chapter 15] Distributed Systems: Principles and Paradigms. Andrew S. Tanenbaum and Maarten Van Steen. 2nd Edition. Prentice Hall. ISBN: 0132392275/978-0132392273 [Chapter 6] L23.40 SLIDES CREATED BY: SHRIDEEP PALLICKARA L23.20