Distributed Systems - I

Similar documents
Mutual Exclusion in DS

Event Ordering Silberschatz, Galvin and Gagne. Operating System Concepts

Distributed Systems - II

Operating Systems. Mutual Exclusion in Distributed Systems. Lecturer: William Fornaciari

Chapter 16: Distributed Synchronization

Distributed System Chapter 16 Issues in ch 17, ch 18

Chapter 18: Distributed

11/7/2018. Event Ordering. Module 18: Distributed Coordination. Distributed Mutual Exclusion (DME) Implementation of. DME: Centralized Approach

Module 16: Distributed System Structures. Operating System Concepts 8 th Edition,

CHAPTER 17 - NETWORK AMD DISTRIBUTED SYSTEMS

Module 15: Network Structures

Module 16: Distributed System Structures

Chapter 17: Distributed Systems (DS)

Silberschatz and Galvin Chapter 18

Lecture 11: Networks & Networking

Distributed Coordination! Distr. Systems: Fundamental Characteristics!

Midterm Exam. October 20th, Thursday NSC

Lecture 7: Logical Time

Modulo II Introdução Sistemas Distribuídos

Distributed Deadlock Detection

Clock Synchronization. Synchronization. Clock Synchronization Algorithms. Physical Clock Synchronization. Tanenbaum Chapter 6 plus additional papers

Distributed Systems. Before We Begin. Advantages. What is a Distributed System? CSE 120: Principles of Operating Systems. Lecture 13.

Final Review. Quiz-5 Solutions. Tevfik Koşar

Roadmap. Shared Variables: count=0, buffer[] Producer: Background. Consumer: while (1) { Race Condition. Race Condition.

Last Time. 19: Distributed Coordination. Distributed Coordination. Recall. Event Ordering. Happens-before

Exercise (could be a quiz) Solution. Concurrent Programming. Roadmap. Tevfik Koşar. CSE 421/521 - Operating Systems Fall Lecture - IV Threads

CSE 421/521 - Operating Systems Fall Lecture - XXV. Final Review. University at Buffalo

Shared Memory and Distributed Multiprocessing. Bhanu Kapoor, Ph.D. The Saylor Foundation

CSE 486/586 Distributed Systems

Several of these problems are motivated by trying to use solutiions used in `centralized computing to distributed computing

Coordination and Agreement

Concurrency: Mutual Exclusion and Synchronization

Event Ordering. Greg Bilodeau CS 5204 November 3, 2009

Distributed Systems - III

Heckaton. SQL Server's Memory Optimized OLTP Engine

Distributed Mutual Exclusion

Concurrency Control in Distributed Systems. ECE 677 University of Arizona

CSE 5306 Distributed Systems. Synchronization

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University

PROCESS SYNCHRONIZATION

! The Process Control Block (PCB) " is included in the context,

CSE 5306 Distributed Systems

Roadmap. Tevfik Koşar. CSE 421/521 - Operating Systems Fall Lecture - XI Deadlocks - II. University at Buffalo. Deadlocks. October 6 th, 2011

CSE 421/521 - Operating Systems Fall Lecture - XII Main Memory Management. Tevfik Koşar. University at Buffalo. October 18 th, 2012.

Frequently asked questions from the previous class survey

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University

Roadmap. Deadlock Prevention. Deadlock Prevention (Cont.) Deadlock Detection. Exercise. Tevfik Koşar. CSE 421/521 - Operating Systems Fall 2012

Advanced Databases Lecture 17- Distributed Databases (continued)

Implementation of Process Networks in Java

Multi-threaded programming in Java

Announcements. Office hours. Reading. W office hour will be not starting next week. Chapter 7 (this whole week) CMSC 412 S02 (lect 7)

Virtual Memory - II. Roadmap. Tevfik Koşar. CSE 421/521 - Operating Systems Fall Lecture - XVI. University at Buffalo.

Network Layer (4): ICMP

Synchronization. Clock Synchronization

Mobile and Heterogeneous databases Distributed Database System Transaction Management. A.R. Hurson Computer Science Missouri Science & Technology


殷亚凤. Synchronization. Distributed Systems [6]

Proseminar Distributed Systems Summer Semester Paxos algorithm. Stefan Resmerita

How Bitcoin achieves Decentralization. How Bitcoin achieves Decentralization

Algorithms and protocols for distributed systems

Data Link Layer: Overview, operations

FAULT TOLERANT SYSTEMS

Coordination 1. To do. Mutual exclusion Election algorithms Next time: Global state. q q q

Introduction Distributed Systems

Lecture 9: MIMD Architectures

Distributed Systems: Consistency and Replication

Process Synchronization - I

Programming Languages

Distributed Transaction Management. Distributed Database System

Chapter 3: Processes. Operating System Concepts 8 th Edition,

Lecture 7: Parallel Processing

Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2

The University of Texas at Arlington

Distributed Operating Systems. Distributed Synchronization

Distributed Algorithms. Partha Sarathi Mandal Department of Mathematics IIT Guwahati

Synchronization Part 1. REK s adaptation of Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 5

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm

10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein Sameh Elnikety. Copyright 2012 Philip A. Bernstein

Distributed Databases Systems

The University of Texas at Arlington

Replication. Consistency models. Replica placement Distribution protocols

Atomic Transac1ons. Atomic Transactions. Q1: What if network fails before deposit? Q2: What if sequence is interrupted by another sequence?

Consistency and Replication. Some slides are from Prof. Jalal Y. Kawash at Univ. of Calgary

1. Introduction to Concurrent Programming

CS Reading Packet: "Transaction management, part 2"

CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [DISTRIBUTED MUTUAL EXCLUSION] Frequently asked questions from the previous class survey

Student Name:.. Student ID... Course Code: CSC 227 Course Title: Semester: Fall Exercises Cover Sheet:

Process Synchroniztion Mutual Exclusion & Election Algorithms

Last Class: Naming. Today: Classical Problems in Distributed Systems. Naming. Time ordering and clock synchronization (today)

Distributed KIDS Labs 1

Some portions courtesy Srini Seshan or David Wetherall

Chapter 7. Multicores, Multiprocessors, and Clusters. Goal: connecting multiple computers to get higher performance

Chapter 6 Synchronization

Synchronization Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University

DISTRIBUTED MUTEX. EE324 Lecture 11

ICT 6544 Distributed Systems Lecture 7

OS Design Approaches. Roadmap. OS Design Approaches. Tevfik Koşar. Operating System Design and Implementation

Distributed Synchronization. EECS 591 Farnam Jahanian University of Michigan

High Performance Computing Lecture 41. Matthew Jacob Indian Institute of Science

Configuring MST Using Cisco NX-OS

Transcription:

CSE 421/521 - Operating Systems Fall 2011 Lecture - XXIII Distributed Systems - I Tevfik Koşar University at Buffalo November 22 nd, 2011 1 Motivation Distributed system is collection of loosely coupled processors that do not share memory interconnected by a communications network Reasons for distributed systems Resource sharing sharing and printing files at remote sites processing information in a distributed database using remote specialized hardware devices Computation speedup load sharing Reliability detect and recover from site failure, function transfer, reintegrate failed site Communication message passing

Distributed-Operating Systems Users not aware of multiplicity of machines Access to remote resources similar to access to local resources Data Migration transfer data by transferring entire file, or transferring only those portions of the file necessary for the immediate task Computation Migration transfer the computation, rather than the data, across the system Distributed-Operating Systems (Cont.) Process Migration execute an entire process, or parts of it, at different sites Load balancing distribute processes across network to even the workload Computation speedup subprocesses can run concurrently on different sites Hardware preference process execution may require specialized processor Software preference required software may be available at only a particular site Data access run process remotely, rather than transfer all data locally

Network Topology Robustness in Distributed Systems Failure detection Reconfiguration

Failure Detection Detecting hardware failure is difficult To detect a link failure, a handshaking protocol can be used Assume Site A and Site B have established a link At fixed intervals, each site will exchange an I-am-up message indicating that they are up and running If Site A does not receive a message within the fixed interval, it assumes either (a) the other site is not up or (b) the message was lost Site A can now send an Are-you-up? message to Site B If Site A does not receive a reply, it can repeat the message or try an alternate route to Site B Failure Detection (cont) If Site A does not ultimately receive a reply from Site B, it concludes some type of failure has occurred Types of failures: - Site B is down - The direct link between A and B is down - The alternate link from A to B is down - The message has been lost However, Site A cannot determine exactly why the failure has occurred

Reconfiguration When Site A determines a failure has occurred, it must reconfigure the system: 1. If the link from A to B has failed, this must be broadcast to every site in the system 2. If a site has failed, every other site must also be notified indicating that the services offered by the failed site are no longer available When the link or the site becomes available again, this information must again be broadcast to all other sites Distributed Coordination Ordering events and achieving synchronization in centralized systems is easier. We can use common clock and memory What about distributed systems? No common clock or memory happened-before relationship provides partial ordering How to provide total ordering?

Event Ordering Happened-before relation (denoted by ) If A and B are events in the same process (assuming sequential processes), and A was executed before B, then A B If A is the event of sending a message by one process and B is the event of receiving that message by another process, then A B If A B and B C then A C If two events A and B are not related by the relation, then these events are executed concurrently. Relative Time for Three Concurrent Processes Which events are concurrent and which ones are ordered?

Exercise Which of the following event orderings are true? (a) p0 --> p3 : (b) p1 --> q3 : (c) q0 --> p3 : (d) r0 --> p4 : (e) p0 --> r4 : Which of the following statements are true? (a) (b) (c) (d) (e) p2 and q2 are concurrent processes. q1 and r1 are concurrent processes. p0 and q3 are concurrent processes. r0 and p0 are concurrent processes. r0 and p4 are concurrent processes. 13 Implementation of Associate a timestamp with each system event Require that for every pair of events A and B, if A B, then the timestamp of A is less than the timestamp of B Within each process Pi, define a logical clock The logical clock can be implemented as a simple counter that is incremented between any two successive events executed within a process Logical clock is monotonically increasing A process advances its logical clock when it receives a message whose timestamp is greater than the current value of its logical clock Assume A sends a message to B, LC 1 (A)=200, LC 2 (B)=195 --> LC 2 (B)=201 If the timestamps of two events A and B are the same, then the events are concurrent We may use the process identity numbers to break ties and to create a total ordering

Distributed Mutual Exclusion (DME) Assumptions The system consists of n processes; each process P i resides at a different processor Each process has a critical section that requires mutual exclusion Requirement If P i is executing in its critical section, then no other process P j is executing in its critical section We present two algorithms to ensure the mutual exclusion execution of processes in their critical sections DME: Centralized Approach One of the processes in the system is chosen to coordinate the entry to the critical section A process that wants to enter its critical section sends a request message to the coordinator The coordinator decides which process can enter the critical section next, and its sends that process a reply message When the process receives a reply message from the coordinator, it enters its critical section After exiting its critical section, the process sends a release message to the coordinator and proceeds with its execution This scheme requires three messages per critical-section entry: request reply release

DME: Fully Distributed Approach When process P i wants to enter its critical section, it generates a new timestamp, TS, and sends the message request (P i, TS) to all processes in the system When process P j receives a request message, it may reply immediately or it may defer sending a reply back When process P i receives a reply message from all other processes in the system, it can enter its critical section After exiting its critical section, the process sends reply messages to all its deferred requests DME: Fully Distributed Approach (Cont.) The decision whether process P j replies immediately to a request(p i, TS) message or defers its reply is based on three factors: If P j is in its critical section, then it defers its reply to P i If P j does not want to enter its critical section, then it sends a reply immediately to P i If P j wants to enter its critical section but has not yet entered it, then it compares its own request timestamp with the timestamp TS If its own request timestamp is greater than TS, then it sends a reply immediately to P i (P i asked first) Otherwise, the reply is deferred Example: P1 sends a request to P2 and P3 (timestamp=10) P3 sends a request to P1 and P2 (timestamp=4)

Undesirable Consequences The processes need to know the identity of all other processes in the system, which makes the dynamic addition and removal of processes more complex If one of the processes fails, then the entire scheme collapses This can be dealt with by continuously monitoring the state of all the processes in the system, and notifying all processes if a process fails