Distributed Systems. coordination Johan Montelius ID2201. Distributed Systems ID2201

Similar documents
Coordination 1. To do. Mutual exclusion Election algorithms Next time: Global state. q q q

CSE 486/586 Distributed Systems

Coordination and Agreement

Mutual Exclusion. A Centralized Algorithm

PROCESS SYNCHRONIZATION

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University

CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [DISTRIBUTED MUTUAL EXCLUSION] Frequently asked questions from the previous class survey

Distributed Systems Coordination and Agreement

Distributed Systems (5DV147)

Homework #2 Nathan Balon CIS 578 October 31, 2004

Clock Synchronization. Synchronization. Clock Synchronization Algorithms. Physical Clock Synchronization. Tanenbaum Chapter 6 plus additional papers

CSE 5306 Distributed Systems. Synchronization

Distributed Synchronization. EECS 591 Farnam Jahanian University of Michigan

Last Class: Clock Synchronization. Today: More Canonical Problems

Process Synchroniztion Mutual Exclusion & Election Algorithms

CMPSCI 677 Operating Systems Spring Lecture 14: March 9

Slides for Chapter 15: Coordination and Agreement

Distributed Mutual Exclusion

Distributed Systems 8L for Part IB

Recap: Consensus. CSE 486/586 Distributed Systems Mutual Exclusion. Why Mutual Exclusion? Why Mutual Exclusion? Mutexes. Mutual Exclusion C 1

Verteilte Systeme/Distributed Systems Ch. 5: Various distributed algorithms

Distributed Systems Exam 1 Review. Paul Krzyzanowski. Rutgers University. Fall 2016

Synchronization Part 2. REK s adaptation of Claypool s adaptation oftanenbaum s Distributed Systems Chapter 5 and Silberschatz Chapter 17

Frequently asked questions from the previous class survey

Time and Coordination in Distributed Systems. Operating Systems

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University

Synchronization. Distributed Systems IT332

Algorithms and protocols for distributed systems

殷亚凤. Synchronization. Distributed Systems [6]


Distributed Operating Systems. Distributed Synchronization

DISTRIBUTED MUTEX. EE324 Lecture 11

June 22/24/ Gerd Liefländer System Architecture Group Universität Karlsruhe (TH), System Architecture Group

Several of these problems are motivated by trying to use solutiions used in `centralized computing to distributed computing

A Two-Layer Hybrid Algorithm for Achieving Mutual Exclusion in Distributed Systems

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Distributed Mutual Exclusion Algorithms

Verteilte Systeme (Distributed Systems)

Synchronization. Chapter 5

Advanced Topics in Distributed Systems. Dr. Ayman A. Abdel-Hamid. Computer Science Department Virginia Tech

CSE 5306 Distributed Systems

Exam 2 Review. Fall 2011

Chapter 6 Synchronization (2)

Distributed Systems. replication Johan Montelius ID2201. Distributed Systems ID2201

Failure Tolerance. Distributed Systems Santa Clara University

OUTLINE. Introduction Clock synchronization Logical clocks Global state Mutual exclusion Election algorithms Deadlocks in distributed systems

Slides for Chapter 12: Coordination and Agreement

Event Ordering Silberschatz, Galvin and Gagne. Operating System Concepts

Coordination and Agreement

Chapter 16: Distributed Synchronization

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University

Distributed Coordination! Distr. Systems: Fundamental Characteristics!

Chapter 18: Distributed

Lecture 2: Leader election algorithms.

Selected Questions. Exam 2 Fall 2006

Mutual Exclusion in DS

Coordination and Agreement

Last Time. 19: Distributed Coordination. Distributed Coordination. Recall. Event Ordering. Happens-before

Last Class: Clock Synchronization. Today: More Canonical Problems

Synchronization. Clock Synchronization

Chapter 6 Synchronization

Intuitive distributed algorithms. with F#

VALLIAMMAI ENGINEERING COLLEGE

Last Class: Naming. Today: Classical Problems in Distributed Systems. Naming. Time ordering and clock synchronization (today)

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

Consensus in Distributed Systems. Jeff Chase Duke University

11/7/2018. Event Ordering. Module 18: Distributed Coordination. Distributed Mutual Exclusion (DME) Implementation of. DME: Centralized Approach

Distributed Systems. Lec 9: Distributed File Systems NFS, AFS. Slide acks: Dave Andersen

Distributed Systems. 7. Coordination and Agreement

CSE 380 Computer Operating Systems

An Efficient Approach of Election Algorithm in Distributed Systems

Distributed Systems Question Bank UNIT 1 Chapter 1 1. Define distributed systems. What are the significant issues of the distributed systems?

Synchronisation and Coordination (Part 2)

How Bitcoin achieves Decentralization. How Bitcoin achieves Decentralization

Synchronization Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University

Distributed Systems COMP 212. Lecture 17 Othon Michail

Silberschatz and Galvin Chapter 18

Distributed Systems. Chapman & Hall/CRC. «H Taylor S* Francis Croup Boca Raton London New York

QUESTIONS Distributed Computing Systems. Prof. Ananthanarayana V.S. Dept. Of Information Technology N.I.T.K., Surathkal

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIT-1

The UNIVERSITY of EDINBURGH. SCHOOL of INFORMATICS. CS4/MSc. Distributed Systems. Björn Franke. Room 2414

Distributed Systems. Multicast and Agreement

Agreement in Distributed Systems CS 188 Distributed Systems February 19, 2015

Distributed Systems Principles and Paradigms

Fault Tolerance. Distributed Systems. September 2002

Distributed Systems Exam 1 Review Paul Krzyzanowski. Rutgers University. Fall 2016

Distributed Transaction Management 2003

CS 417: the one-hour study guide for exam 2

Last Dme. Distributed systems Lecture 5: Consistent cuts, process groups, and mutual exclusion. Dr Robert N. M. Watson

Introduction to Distributed Systems Seif Haridi

Today: Fault Tolerance

Operating Systems. Mutual Exclusion in Distributed Systems. Lecturer: William Fornaciari

Distributed Systems Leader election & Failure detection

Exam 2 Review. October 29, Paul Krzyzanowski 1

Event Ordering. Greg Bilodeau CS 5204 November 3, 2009

Distributed Systems 11. Consensus. Paul Krzyzanowski

Fault Tolerance. Distributed Software Systems. Definitions

416 practice questions (PQs)

Design and Implementation of Distributed Algorithms for Their Application in Telematic Services. Use case: Shared Calendar

Coordination 2. Today. How can processes agree on an action or a value? l Group communication l Basic, reliable and l ordered multicast

Transcription:

Distributed Systems ID2201 coordination Johan Montelius 1

Coordination Coordinating several threads in one node is a problem, coordination in a network is of course worse: failure of nodes and networks no fixed coordinator no shared memory Coordination is often the problem of: deciding who is to decide knowing who is alive. 2

Fundamental models Interaction model: Is the system asynchronous or synchronous? Can we assume a node has crashed if it does not reply? Failure model: Will nodes crash? Will crash nodes return to life? Is crashing the only failure? 3

Failure detectors How do we detect that a process has crashed and how reliable can the result be? unreliable: result in unsuspected or suspected failure reliable: result in unsuspected or failed Reliable detectors are only possible in synchronous systems. 4

Distributed algorithms We will look at some distributed algorithms and consider: reliable systems: if nothing goes wrong unreliable systems: but nodes fail by crashing and this can be detected by reliable failure detectors 5

Three sides of the same coin Mutual exclusion Decide who is to enter a critical section. Leader election Decide who is to be the new leader. Atomic multicast Which messages, and in which order, should be deliverd. 6

Distributed mutual exclusion Requirements Safety: at most one process may be in critical section at a time Liveness: starvation free, deadlock free Ordering: allowed to enter in request happened-before order 7

Evaluation Number of messages needed. Client delay: worst, mean or, average time to enter critical section Synchronization delay: how long time between exit and enter. 8

Central service algorithm Requirements? safety liveness ordering req grant release queue 9

Ordering - what is a request A B Server 10

Performance messages enter: request, grant exit: release client delay enter: message round trip plus waiting in queue exit: constant (asynchronous message) synchronization delay round trip: release - grant 11

Failure What can happens if we allow nodes to fail? a client a client holding the token the server What if we have reliable failure detectors? Can we do with unreliable failure detectors? 12

Ring-based algorithm Requirements safety liveness ordering 13

Ring-based algorithm Performance messages client delay synchronization delay Failure the lost token 14

Distributed algorithm Send request to all peers. When all peers have acknowledged the request, enter the critical section. What could go wrong? 15

Distributed algorithm Break deadlock introduce priority Fairness Ricart and Agrawala 16

Ricart and Agrawala Enter: enter state waiting and broadcast a request {T,i} containing a Lamport time stamp T and process id I to all peers wait for replies from all peers enter state held Receiving a request {R,j}: if held or (waiting and {T,i} < {R,j}) then queue request, else reply ok Exit: reply to all queued requests 17

Ricart and Agrawala Requirements safety, liveness, ordering Efficiency messages client delay synchronization delay Failure not so good 18

Maekawa's voting Why have permission from all peers, it's sufficient to have votes from a subset S if no one can enter with the votes from the complement of S. The subset S is called a quorum. 19

Maekawa's voting Requirements safety liveness ordering 20

Maekawa's voting Efficiency messages: twice sqrt(n) client delay: round trip synchronization delay: one message Failure not that bad? 21

Election Many algorithms require a server but if no node is assigned to be the server or if the server crashes we need to find a new server. Assumptions: any node can call an election but it can only call one at a time a node is either participant or nonparticipant nodes have identifiers that are ordered 22

Election Requirements safety: a participant is either non-decided or decided with P, a unique non crashed node liveness: all nodes eventually participate and decide on a elected node Efficiency number of messages turnaround time: delay from call to close 23

Ring-based election 23 14 11 9 12 v-18 3 18 14 14 e-23 23 12 11 9 12 3 18 23 3 11 9 18 24

Ring-based election 14 v-23 23 12 v-23 14 3 23 12 3 11 9 18 v-18 14 23 11 12 v-23 3 v-18 9 23 14 18 12 3 v-23 v-18 11 9 18 11 9 18 25

Ring-based election Requirements safety liveness Efficiency messages: best case, worst case? turnaround: Failure hmm,... 26

The bully algorithm Nodes have identifiers and are ordered. Any node can reliably send messages to any other higher node. Nodes can crash (and remain dead) and this is reliably detected. Algorithm starts when a node detects that the coordinator has crashed. 27

The bully algorithm 28

The bully algorithm Requirements safety... hmm liveness Efficiency Messages: best case, worst case Turnaround: 29

Multicast communication Multicast: Sending a message to a specified group of n nodes. Reliable multicast: All nodes see the same messages. Atomic multicast: All nodes see the same messages in the same order. 30

Model group deliver deliver send receive receive 31

Requirements Integrity a process delivers a message at most once and only deliver messages that have been sent Validity if a process multicast m then it will also eventually deliver m Agreement if a process delivers m then all processes in the group eventually delivers m 32

Basic multicast To b-multicast a message m: send m to each process p If m is received: b-deliver m What was the problem? 33

Basic multicast crash b-multicast m receive m deliver m 34

Reliable multicast Can we implement reliable (atomic) multicast if the only thing we have is basic multicast? b-multicast m r-deliver m b-multicast m b-multicast m r-deliver m 35

Ordered multicast The problem with the reliable multicast is that multicast messages might arrive in different order at different nodes. Requirements: FIFO order: delivered in order as sent by the sender Causal order: delivered in order as happened before sent order Total order: delivered in same order by all processes 36

Sequencer m m-cast m message queue 37

Distributed - ISIS Multicast a message and request a sequence number. When receiving a message, propose a sequence number (including process id) and place in an ordered hold-back queue. After collecting all proposals, select the highest and multicast agreement. When receiving agreement tag message as agreed and reorder hold-back queue. If first message in queue is decided then deliver. 38

the hold-back queue deliver {m1, proposed <2,i>} {m2, agreed <3,e>} {m3, agreed <3,k>} What will the agreed sequence number be? What happened here? {m4, proposed <4,i>} {m5, proposed <5,i>} 39

Causal ordering How can we implement casual ordering? multicast vector clock holds number of multicast operations tag each multicast message with multicast clock hold b-delivered messages until clock of message is less (modulo sender) than own current message clock update own message clock Only multicasted messages are counted. 40

Summary Coordination in distributed systems is problematic. If we have a fixed set of nodes and can detect failures there are many solutions. Three sides of the same coin: mutual exclusion leader election atomic multicast 41