Last Class: Clock Synchronization. Today: More Canonical Problems

Similar documents
Last Class: Clock Synchronization. Today: More Canonical Problems

CMPSCI 677 Operating Systems Spring Lecture 14: March 9

Chapter 6 Synchronization

PROCESS SYNCHRONIZATION

Clock Synchronization. Synchronization. Clock Synchronization Algorithms. Physical Clock Synchronization. Tanenbaum Chapter 6 plus additional papers

Distributed Synchronization. EECS 591 Farnam Jahanian University of Michigan

Mutual Exclusion. A Centralized Algorithm

Chapter 6 Synchronization (2)

CSE 5306 Distributed Systems. Synchronization

SYNCHRONIZATION. DISTRIBUTED SYSTEMS Principles and Paradigms. Second Edition. Chapter 6 ANDREW S. TANENBAUM MAARTEN VAN STEEN

Last Class: Naming. Today: Classical Problems in Distributed Systems. Naming. Time ordering and clock synchronization (today)

Synchronization. Chapter 5

Process Synchroniztion Mutual Exclusion & Election Algorithms

Distributed Systems. coordination Johan Montelius ID2201. Distributed Systems ID2201

Synchronization. Distributed Systems IT332

殷亚凤. Synchronization. Distributed Systems [6]

Verteilte Systeme (Distributed Systems)

Clock and Time. THOAI NAM Faculty of Information Technology HCMC University of Technology

Coordination 1. To do. Mutual exclusion Election algorithms Next time: Global state. q q q

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Synchronization Part 2. REK s adaptation of Claypool s adaptation oftanenbaum s Distributed Systems Chapter 5 and Silberschatz Chapter 17

Failure Tolerance. Distributed Systems Santa Clara University

OUTLINE. Introduction Clock synchronization Logical clocks Global state Mutual exclusion Election algorithms Deadlocks in distributed systems

Synchronization. Clock Synchronization

Distributed Coordination 1/39

Synchronization (contd.)

CSE 5306 Distributed Systems

Distributed Coordination! Distr. Systems: Fundamental Characteristics!

Advanced Topics in Distributed Systems. Dr. Ayman A. Abdel-Hamid. Computer Science Department Virginia Tech

DISTRIBUTED MUTEX. EE324 Lecture 11

Synchronization Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University

Distributed Systems Exam 1 Review. Paul Krzyzanowski. Rutgers University. Fall 2016

Distributed Operating Systems. Distributed Synchronization

CSE 486/586 Distributed Systems

Distributed Systems (5DV147)

Algorithms and protocols for distributed systems

Homework #2 Nathan Balon CIS 578 October 31, 2004

Arranging lunch value of preserving the causal order. a: how about lunch? meet at 12? a: <receives b then c>: which is ok?

Today: Fault Tolerance. Reliable One-One Communication

Coordination and Agreement

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

Verteilte Systeme/Distributed Systems Ch. 5: Various distributed algorithms

Frequently asked questions from the previous class survey

SEGR 550 Distributed Computing. Final Exam, Fall 2011

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University

An Efficient Approach of Election Algorithm in Distributed Systems

Distributed Systems Principles and Paradigms

Lecture 2: Leader election algorithms.

Synchronisation in. Distributed Systems. Co-operation and Co-ordination in. Distributed Systems. Kinds of Synchronisation. Clock Synchronization

Coordination 2. Today. How can processes agree on an action or a value? l Group communication l Basic, reliable and l ordered multicast

Distributed Systems 8L for Part IB

Study of various Election algorithms on the basis of messagepassing

Several of these problems are motivated by trying to use solutiions used in `centralized computing to distributed computing

CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [DISTRIBUTED MUTUAL EXCLUSION] Frequently asked questions from the previous class survey

June 22/24/ Gerd Liefländer System Architecture Group Universität Karlsruhe (TH), System Architecture Group

Today: Fault Tolerance. Fault Tolerance

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University

Last Class: Naming. DNS Implementation

Distributed Systems Coordination and Agreement

Important Lessons. A Distributed Algorithm (2) Today's Lecture - Replication

Last Time. 19: Distributed Coordination. Distributed Coordination. Recall. Event Ordering. Happens-before

Distributed Systems COMP 212. Revision 2 Othon Michail

Chapter 16: Distributed Synchronization

Operating Systems. Deadlocks

Recovering from a Crash. Three-Phase Commit

Today: Fault Tolerance

Chapter 18: Distributed

Intuitive distributed algorithms. with F#

CSE 486/586 Distributed Systems

Distributed Systems. Lec 9: Distributed File Systems NFS, AFS. Slide acks: Dave Andersen

Today: Fault Tolerance. Failure Masking by Redundancy

Distributed Systems Synchronization

Distributed Systems Exam 1 Review Paul Krzyzanowski. Rutgers University. Fall 2016

Agreement in Distributed Systems CS 188 Distributed Systems February 19, 2015

Leader Election Algorithms in Distributed Systems

Distributed Systems Question Bank UNIT 1 Chapter 1 1. Define distributed systems. What are the significant issues of the distributed systems?

CHAPTER 4: INTERPROCESS COMMUNICATION AND COORDINATION

Fault Tolerance Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University

Mutual Exclusion in DS

Slides for Chapter 12: Coordination and Agreement

A Two-Layer Hybrid Algorithm for Achieving Mutual Exclusion in Distributed Systems

Event Ordering Silberschatz, Galvin and Gagne. Operating System Concepts

CSE 380 Computer Operating Systems

11/7/2018. Event Ordering. Module 18: Distributed Coordination. Distributed Mutual Exclusion (DME) Implementation of. DME: Centralized Approach

CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University

Lecture 6: Logical Time

Distributed Systems. Pre-Exam 1 Review. Paul Krzyzanowski. Rutgers University. Fall 2015

Operating Systems. Mutual Exclusion in Distributed Systems. Lecturer: William Fornaciari

Distributed Systems - II

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

Distributed Systems COMP 212. Lecture 19 Othon Michail

Synchronisation and Coordination (Part 2)

Distributed Algorithmic

Selected Questions. Exam 2 Fall 2006

CSE 486/586 Distributed Systems Reliable Multicast --- 1


Lecture 6: Multicast

Large Systems: Design + Implementation: Communication Coordination Replication. Image (c) Facebook

Today: Fault Tolerance. Replica Management

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad

Transcription:

Last Class: Clock Synchronization Logical clocks Vector clocks Global state Lecture 12, page 1 Today: More Canonical Problems Distributed snapshot and termination detection Election algorithms Bully algorithm Ring algorithm Distributed mutual exclusion Lecture 12, page 2 1

Causal Delivery Causally ordered multicasting If P j receives a message from P i Delay delivery of the message until Ts(m) [i] == VC j [i] + 1 (m is the next expected message from i) Ts(m) [k] <= VC j [k] (j has seen all messages seen by i before m) Lecture 12, page 3 Global State Global state of a distributed system Local state of each process Messages sent but not received (state of the queues) Many applications need to know the state of the system Failure recovery, distributed deadlock detection Problem: how can you figure out the state of a distributed system? Each process is independent No global clock or synchronization Distributed snapshot: a consistent global state Lecture 12, page 4 2

Global State (1) a) A consistent cut b) An inconsistent cut Lecture 12, page 5 Distributed Snapshot Algorithm Assume each process communicates with another process using unidirectional point-to-point channels (e.g, TCP connections) Any process can initiate the algorithm Checkpoint local state Send marker on every outgoing channel On receiving a marker Checkpoint state if first marker and send marker on outgoing channels, save messages on all other channels until: Subsequent marker on a channel: stop saving state for that channel Lecture 12, page 6 3

Distributed Snapshot A process finishes when It receives a marker on each incoming channel and processes them all State: local state plus state of all channels Send state to initiator Any process can initiate snapshot Multiple snapshots may be in progress Each is separate, and each is distinguished by tagging the marker with the initiator ID (and sequence number) A M M B C Lecture 12, page 7 Snapshot Algorithm Example a) Organization of a process and channels for a distributed snapshot Lecture 12, page 8 4

Snapshot Algorithm Example b) Process Q receives a marker for the first time and records its local state c) Q records all incoming message d) Q receives a marker for its incoming channel and finishes recording the state of the incoming channel Lecture 12, page 9 Termination Detection Detecting the end of a distributed computation Notation: let sender be predecessor, receiver be successor Two types of markers: Done and Continue After finishing its part of the snapshot, process Q sends a Done or a Continue to its predecessor Send a Done only when All of Q s successors send a Done Q has not received any message since it check-pointed its local state and received a marker on all incoming channels Else send a Continue Computation has terminated if the initiator receives Done messages from everyone Lecture 12, page 10 5

Election Algorithms Many distributed algorithms need one process to act as coordinator Doesn t matter which process does the job, just need to pick one Election algorithms: technique to pick a unique coordinator (aka leader election) Examples: take over the role of a failed process, pick a master in Berkeley clock synchronization algorithm Types of election algorithms: Bully and Ring algorithms Lecture 12, page 11 Bully Algorithm Each process has a unique numerical ID Processes know the Ids and address of every other process Communication is assumed reliable Key Idea: select process with highest ID Process initiates election if it just recovered from failure or if coordinator failed 3 message types: election, OK, I won Several processes can initiate an election simultaneously Need consistent result O(n 2 ) messages required with n processes Lecture 12, page 12 6

Bully Algorithm Details Any process P can initiate an election P sends Election messages to all process with higher Ids and awaits OK messages If no OK messages, P becomes coordinator and sends I won messages to all process with lower Ids If it receives an OK, it drops out and waits for an I won If a process receives an Election msg, it returns an OK and starts an election If a process receives a I won, it treats sender an coordinator Lecture 12, page 13 Bully Algorithm Example The bully election algorithm Process 4 holds an election Process 5 and 6 respond, telling 4 to stop Now 5 and 6 each hold an election Lecture 12, page 14 7

Bully Algorithm Example d) Process 6 tells 5 to stop e) Process 6 wins and tells everyone Lecture 12, page 15 Ring-based Election Processes have unique Ids and arranged in a logical ring Each process knows its neighbors Select process with highest ID Begin election if just recovered or coordinator has failed Send Election to closest downstream node that is alive Sequentially poll each successor until a live node is found Each process tags its ID on the message Initiator picks node with highest ID and sends a coordinator message Multiple elections can be in progress Wastes network bandwidth but does no harm Lecture 12, page 16 8

A Ring Algorithm Election algorithm using a ring. Lecture 12, page 17 Comparison Assume n processes and one election in progress Bully algorithm Worst case: initiator is node with lowest ID Triggers n-2 elections at higher ranked nodes: O(n 2 ) msgs Best case: immediate election: n-2 messages Ring 2 (n-1) messages always Lecture 12, page 18 9

Elections in Wireless Environments (1) Election algorithm in a wireless network, with node a as the source. (a) Initial network. (b) (e) The build-tree phase Lecture 12, page 19 Elections in Wireless Environments (2) Lecture 12, page 20 10

Elections in Large-Scale Systems (1) Requirements for superpeer selection: 1. Normal nodes should have low-latency access to superpeers. 2. Superpeers should be evenly distributed across the overlay network. 3. There should be a predefined portion of superpeers relative to the total number of nodes in the overlay network. 4. Each superpeer should not need to serve more than a fixed number of normal nodes. Lecture 12, page 21 Elections in Large-Scale Systems (2) Moving tokens in a two-dimensional space using repulsion forces. Lecture 12, page 22 11

Distributed Synchronization Distributed system with multiple processes may need to share data or access shared data structures Use critical sections with mutual exclusion Single process with multiple threads Semaphores, locks, monitors How do you do this for multiple processes in a distributed system? Processes may be running on different machines Solution: lock mechanism for a distributed environment Can be centralized or distributed Lecture 12, page 23 Centralized Mutual Exclusion Assume processes are numbered One process is elected coordinator (highest ID process) Every process needs to check with coordinator before entering the critical section To obtain exclusive access: send request, await reply To release: send release message Coordinator: Receive request: if available and queue empty, send grant; if not, queue request Receive release: remove next request from queue and send grant Lecture 12, page 24 12

Mutual Exclusion: A Centralized Algorithm a) Process 1 asks the coordinator for permission to enter a critical region. Permission is granted b) Process 2 then asks permission to enter the same critical region. The coordinator does not reply. c) When process 1 exits the critical region, it tells the coordinator, when then replies to 2 Lecture 12, page 25 Properties Simulates centralized lock using blocking calls Fair: requests are granted the lock in the order they were received Simple: three messages per use of a critical section (request, grant, release) Shortcomings: Single point of failure How do you detect a dead coordinator? A process can not distinguish between lock in use from a dead coordinator No response from coordinator in either case Performance bottleneck in large distributed systems Lecture 12, page 26 13