Last Class: Clock Synchronization. Today: More Canonical Problems

Similar documents
Last Class: Clock Synchronization. Today: More Canonical Problems

CMPSCI 677 Operating Systems Spring Lecture 14: March 9

Distributed Synchronization. EECS 591 Farnam Jahanian University of Michigan

Last Class: Naming. Today: Classical Problems in Distributed Systems. Naming. Time ordering and clock synchronization (today)

Clock and Time. THOAI NAM Faculty of Information Technology HCMC University of Technology

Clock Synchronization. Synchronization. Clock Synchronization Algorithms. Physical Clock Synchronization. Tanenbaum Chapter 6 plus additional papers

PROCESS SYNCHRONIZATION

Synchronization. Chapter 5

Failure Tolerance. Distributed Systems Santa Clara University

CSE 5306 Distributed Systems. Synchronization

Synchronization Part 2. REK s adaptation of Claypool s adaptation oftanenbaum s Distributed Systems Chapter 5 and Silberschatz Chapter 17

Mutual Exclusion. A Centralized Algorithm

Synchronization. Distributed Systems IT332

Process Synchroniztion Mutual Exclusion & Election Algorithms

Chapter 6 Synchronization

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Chapter 6 Synchronization (2)

Coordination 1. To do. Mutual exclusion Election algorithms Next time: Global state. q q q

Study of various Election algorithms on the basis of messagepassing

An Efficient Approach of Election Algorithm in Distributed Systems

Synchronization Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University

Distributed Systems. coordination Johan Montelius ID2201. Distributed Systems ID2201

OUTLINE. Introduction Clock synchronization Logical clocks Global state Mutual exclusion Election algorithms Deadlocks in distributed systems

Lecture 2: Leader election algorithms.

SEGR 550 Distributed Computing. Final Exam, Fall 2011

Distributed Systems (5DV147)

Synchronization. Clock Synchronization

Distributed Coordination! Distr. Systems: Fundamental Characteristics!

殷亚凤. Synchronization. Distributed Systems [6]

Distributed Systems Exam 1 Review. Paul Krzyzanowski. Rutgers University. Fall 2016

Arranging lunch value of preserving the causal order. a: how about lunch? meet at 12? a: <receives b then c>: which is ok?

Distributed Operating Systems. Distributed Synchronization

Distributed Coordination 1/39

Agreement in Distributed Systems CS 188 Distributed Systems February 19, 2015

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University

Today: Fault Tolerance. Reliable One-One Communication

CSE 486/586 Distributed Systems

Homework #2 Nathan Balon CIS 578 October 31, 2004

SYNCHRONIZATION. DISTRIBUTED SYSTEMS Principles and Paradigms. Second Edition. Chapter 6 ANDREW S. TANENBAUM MAARTEN VAN STEEN

Last Class: Naming. DNS Implementation

Leader Election Algorithms in Distributed Systems

C 1. Recap. CSE 486/586 Distributed Systems Failure Detectors. Today s Question. Two Different System Models. Why, What, and How.

Recovering from a Crash. Three-Phase Commit

Distributed Systems COMP 212. Lecture 19 Othon Michail

Election Administration Algorithm for Distributed Computing

Synchronization (contd.)

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

Recap. CSE 486/586 Distributed Systems Case Study: Amazon Dynamo. Amazon Dynamo. Amazon Dynamo. Necessary Pieces? Overview of Key Design Techniques

Verteilte Systeme/Distributed Systems Ch. 5: Various distributed algorithms

Proseminar Distributed Systems Summer Semester Paxos algorithm. Stefan Resmerita

CSE 5306 Distributed Systems

Parallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitionin

Verteilte Systeme (Distributed Systems)

Intuitive distributed algorithms. with F#

Recap. CSE 486/586 Distributed Systems Case Study: Amazon Dynamo. Amazon Dynamo. Amazon Dynamo. Necessary Pieces? Overview of Key Design Techniques

Today: Fault Tolerance. Fault Tolerance

Paxos provides a highly available, redundant log of events

C 1. Today s Question. CSE 486/586 Distributed Systems Failure Detectors. Two Different System Models. Failure Model. Why, What, and How

Distributed Systems Principles and Paradigms

Distributed Algorithmic

Overview. TCP & router queuing Computer Networking. TCP details. Workloads. TCP Performance. TCP Performance. Lecture 10 TCP & Routers

Leader Election in Rings

Important Lessons. A Distributed Algorithm (2) Today's Lecture - Replication

Lecture 6: Multicast

Network Layer: Routing

Linked Lists: The Role of Locking. Erez Petrank Technion

Different Layers Lecture 21

Distributed Systems Synchronization

Advanced Topics in Distributed Systems. Dr. Ayman A. Abdel-Hamid. Computer Science Department Virginia Tech

Question. Reliable Transport: The Prequel. Don t parse my words too carefully. Don t be intimidated. Decisions and Their Principles.

Last Time. 19: Distributed Coordination. Distributed Coordination. Recall. Event Ordering. Happens-before

Distributed Systems. Pre-Exam 1 Review. Paul Krzyzanowski. Rutgers University. Fall 2015

Today: Fault Tolerance

Distributed Algorithms 6.046J, Spring, 2015 Part 2. Nancy Lynch

How VoltDB does Transactions

We need a formal algorithm, as visual inspection is not sufficient Many algorithms for detecting cycles in a directed graph are known o Below is a

Bus & Signal Processing

CSE 486/586 Distributed Systems Reliable Multicast --- 1

Distributed Systems Coordination and Agreement

Links. CS125 - mylinks 1 1/22/14

CHAPTER 4: INTERPROCESS COMMUNICATION AND COORDINATION

EEC-484/584 Computer Networks

More Routing. EE122 Fall 2012 Scott Shenker

Distributed Leader Election Algorithms in Synchronous Networks

MITOCW watch?v=w_-sx4vr53m

Programming and Feature Codes

VI. Deadlocks. What is a Deadlock? Intended Schedule. Deadlock Problem (1) ???

VI. Deadlocks Operating Systems Prof. Dr. Marc H. Scholl DBIS U KN Summer Term

Distributed Systems COMP 212. Lecture 17 Othon Michail

Synchronisation in. Distributed Systems. Co-operation and Co-ordination in. Distributed Systems. Kinds of Synchronisation. Clock Synchronization

Algorithms and protocols for distributed systems

Mutual Exclusion in DS

CSL373: Lecture 6 CPU Scheduling

Putting it together. Data-Parallel Computation. Ex: Word count using partial aggregation. Big Data Processing. COS 418: Distributed Systems Lecture 21

Distributed Systems Leader election & Failure detection

Algorithms for COOPERATIVE DS: Leader Election in the MPS model

Network Rendering in AccuRender Plan the Master and the Slaves

Time Synchronization and Logical Clocks

CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University

How well do you know PIM Assert Mechanism?

Transcription:

Last Class: Clock Synchronization Logical clocks Vector clocks Global state Lecture 11, page 1 Today: More Canonical Problems Distributed snapshot and termination detection Election algorithms Bully algorithm Ring algorithm Lecture 11, page 2

Global State Global state of a distributed system Local state of each process Messages sent but not received (state of the queues) Many applications need to know the state of the system Failure recovery, distributed deadlock detection Problem: how can you figure out the state of a distributed system? Each process is independent No global clock or synchronization Distributed snapshot: a consistent global state Lecture 11, page 3 Distributed Snapshot Algorithm Assume each process communicates with another process using unidirectional point-to-point channels (e.g, TCP connections) Any process can initiate the algorithm Checkpoint local state Send marker on every outgoing channel On receiving a marker Checkpoint state if first marker and send marker on outgoing channels, save messages on all other channels until: Subsequent marker on a channel: stop saving state for that channel Lecture 11, page 4

Distributed Snapshot A process finishes when It receives a marker on each incoming channel and processes them all State: local state plus state of all channels Send state to initiator Any process can initiate snapshot Multiple snapshots may be in progress Each is separate, and each is distinguished by tagging the marker with the initiator ID (and sequence number) A M M B C Lecture 11, page 5 Snapshot Algorithm Example a) Organization of a process and channels for a distributed snapshot Lecture 11, page 6

Snapshot Algorithm Example b) Process Q receives a marker for the first time and records its local state c) Q records all incoming message d) Q receives a marker for its incoming channel and finishes recording the state of the incoming channel Lecture 11, page 7 Termination Detection Detecting the end of a distributed computation Notation: let sender be predecessor, receiver be successor Two types of markers: Done and Continue After finishing its part of the snapshot, process Q sends a Done or a Continue to its predecessor Send a Done only when All of Q s successors send a Done Q has not received any message since it check-pointed its local state and received a marker on all incoming channels Else send a Continue Computation has terminated if the initiator receives Done messages from everyone Lecture 11, page 8

Election Algorithms Many distributed algorithms need one process to act as coordinator Doesn t matter which process does the job, just need to pick one Election algorithms: technique to pick a unique coordinator (aka leader election) Examples: take over the role of a failed process, pick a master in Berkeley clock synchronization algorithm Types of election algorithms: Bully and Ring algorithms Lecture 11, page 9 Bully Algorithm Each process has a unique numerical ID Processes know the Ids and address of every other process Communication is assumed reliable Key Idea: select process with highest ID Process initiates election if it just recovered from failure or if coordinator failed 3 message types: election, OK, I won Several processes can initiate an election simultaneously Need consistent result O(n 2 ) messages required with n processes Lecture 11, page 10

Bully Algorithm Details Any process P can initiate an election P sends Election messages to all process with higher Ids and awaits OK messages If no OK messages, P becomes coordinator and sends I won messages to all process with lower Ids If it receives an OK, it drops out and waits for an I won If a process receives an Election msg, it returns an OK and starts an election If a process receives a I won, it treats sender an coordinator Lecture 11, page 11 Bully Algorithm Example The bully election algorithm Process 4 holds an election Process 5 and 6 respond, telling 4 to stop Now 5 and 6 each hold an election Lecture 11, page 12

Bully Algorithm Example d) Process 6 tells 5 to stop e) Process 6 wins and tells everyone Lecture 11, page 13 Ring-based Election Processes have unique Ids and arranged in a logical ring Each process knows its neighbors Select process with highest ID Begin election if just recovered or coordinator has failed Send Election to closest downstream node that is alive Sequentially poll each successor until a live node is found Each process tags its ID on the message Initiator picks node with highest ID and sends a coordinator message Multiple elections can be in progress Wastes network bandwidth but does no harm Lecture 11, page 14

A Ring Algorithm Election algorithm using a ring. Lecture 11, page 15 Comparison Assume n processes and one election in progress Bully algorithm Worst case: initiator is node with lowest ID Triggers n-2 elections at higher ranked nodes: O(n 2 ) msgs Best case: immediate election: n-2 messages Ring 2 (n-1) messages always Lecture 11, page 16