Distributed Systems. Big Topic: Synchronization. Lec 6: Local Synchronization Primitives. Slide acks: Jinyang Li, Dave Andersen, Randy Bryant

Similar documents
Distributed Systems. Lec 9: Distributed File Systems NFS, AFS. Slide acks: Dave Andersen

What's wrong with Semaphores?

05-concurrency.txt Tue Sep 11 10:05: , Fall 2012, Class 05, Sept. 11, 2012 Randal E. Bryant

Distributed Systems [Fall 2013]

Chapter 6: Process Synchronization

Distributed Systems. Lec 4: Remote Procedure Calls (RPC)

Consistency. CS 475, Spring 2018 Concurrent & Distributed Systems

Synchronization. Race Condition. The Critical-Section Problem Solution. The Synchronization Problem. Typical Process P i. Peterson s Solution

Project 2: Part 1: RPC and Locks

Distributed Systems. Lec 12: Consistency Models Sequential, Causal, and Eventual Consistency. Slide acks: Jinyang Li

Distributed Systems. Catch-up Lecture: Consistency Model Implementations

ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases

Concurrency: a crash course

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 8: Semaphores, Monitors, & Condition Variables

Distributed systems. Programming with threads

Multiprocessors II: CC-NUMA DSM. CC-NUMA for Large Systems

Page 1. Challenges" Concurrency" CS162 Operating Systems and Systems Programming Lecture 4. Synchronization, Atomic operations, Locks"

CSE Traditional Operating Systems deal with typical system software designed to be:

416 Distributed Systems. Distributed File Systems 1: NFS Sep 18, 2018

Agreement and Consensus. SWE 622, Spring 2017 Distributed Software Engineering

What is the Race Condition? And what is its solution? What is a critical section? And what is the critical section problem?

Lecture #7: Implementing Mutual Exclusion

Process Synchronization

Lab 4 : Caching Locks. Introduction. Getting Started

Lab 4 File System. CS140 February 27, Slides adapted from previous quarters

Process Management And Synchronization

Transactions. CS 475, Spring 2018 Concurrent & Distributed Systems

CS 425 / ECE 428 Distributed Systems Fall Indranil Gupta (Indy) Nov 28, 2017 Lecture 25: Distributed File Systems All slides IG

EE458 - Embedded Systems Lecture 8 Semaphores

Last Class: Synchronization. Review. Semaphores. Today: Semaphores. MLFQ CPU scheduler. What is test & set?

CS370 Operating Systems

IT 540 Operating Systems ECE519 Advanced Operating Systems

More Synchronization; Concurrency in Java. CS 475, Spring 2018 Concurrent & Distributed Systems

W4118 Operating Systems. Instructor: Junfeng Yang

CS370 Operating Systems

Deterministic Futexes Revisited

Semaphore. Originally called P() and V() wait (S) { while S <= 0 ; // no-op S--; } signal (S) { S++; }

Consistency: Strict & Sequential. SWE 622, Spring 2017 Distributed Software Engineering

Threads, Synchronization, and Scheduling. Eric Wu

Operating Systems (234123) Spring (Homework 3 Wet) Homework 3 Wet

Concurrency: Deadlock and Starvation

Deadlock and Monitors. CS439: Principles of Computer Systems September 24, 2018

Final review. From threads to file systems

CSE 120 PRACTICE FINAL EXAM, WINTER 2013

CS 318 Principles of Operating Systems

Lecture 15: I/O Devices & Drivers

Lecture 9: Midterm Review

CSE544 Database Architecture

CS 167 Final Exam Solutions

Locks and semaphores. Johan Montelius KTH

IV. Process Synchronisation

CHAPTER 6: PROCESS SYNCHRONIZATION

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Interprocess Communication and Synchronization

Processor speed. Concurrency Structure and Interpretation of Computer Programs. Multiple processors. Processor speed. Mike Phillips <mpp>

Applications of Paxos Algorithm

More on Synchronization and Deadlock

CSE 451: Operating Systems. Section 10 Project 3 wrap-up, final exam review

Coupling Thursday, October 21, :23 PM

Page 1. Goals for Today" Atomic Read-Modify-Write instructions" Examples of Read-Modify-Write "

recap, what s the problem Locks and semaphores Total Store Order Peterson s algorithm Johan Montelius 0 0 a = 1 b = 1 read b read a

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring 2004

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

Message Passing. Advanced Operating Systems Tutorial 7

Today: Synchronization. Recap: Synchronization

CS 31: Introduction to Computer Systems : Threads & Synchronization April 16-18, 2019

Chapter 6: Synchronization. Operating System Concepts 8 th Edition,

CSE 4/521 Introduction to Operating Systems

Operating Systems. Designed and Presented by Dr. Ayman Elshenawy Elsefy

Architectural Support. Processes. OS Structure. Threads. Scheduling. CSE 451: Operating Systems Spring Module 28 Course Review

Synchronization. CS61, Lecture 18. Prof. Stephen Chong November 3, 2011

PROCESS SYNCHRONIZATION

Introduction to Operating Systems Prof. Chester Rebeiro Department of Computer Science and Engineering Indian Institute of Technology, Madras

Agenda. Highlight issues with multi threaded programming Introduce thread synchronization primitives Introduce thread safe collections

6.824 Distributed System Engineering: Spring Quiz I Solutions

CS5460: Operating Systems

GFS: The Google File System

Curriculum 2013 Knowledge Units Pertaining to PDC

CS 153 Design of Operating Systems Winter 2016

CSE 486/586 Distributed Systems

RMI & RPC. CS 475, Spring 2019 Concurrent & Distributed Systems

Synchronization I. Jo, Heeseung

Replication in Distributed Systems

Concurrent & Distributed Systems Supervision Exercises

Deadlock and Monitors. CS439: Principles of Computer Systems February 7, 2018

EECS 482 Introduction to Operating Systems

CMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago

Last Class: Deadlocks. Today

Concurrency: Deadlock and Starvation. Chapter 6

Distributed Systems. Lec 11: Consistency Models. Slide acks: Jinyang Li, Robert Morris

NPTEL Course Jan K. Gopinath Indian Institute of Science

Network File System (NFS)

Chapter 7: Process Synchronization!

Network File System (NFS)

DEPARTMENT OF INFORMATION TECHNOLOGY QUESTION BANK. UNIT I PART A (2 marks)

CS 140 Project 4 File Systems Review Session

What is a file system

Concurrency, Mutual Exclusion and Synchronization C H A P T E R 5

Introduction to Real-Time Operating Systems

Interrupts on the 6812

Transcription:

Distributed Systems Big Topic: Synchronization Lec 6: Local Synchronization Primitives Slide acks: Jinyang Li, Dave Andersen, Randy Bryant (http://www.news.cs.nyu.edu/~jinyang/fa10/notes/ds-lec2.ppt, http://www.cs.cmu.edu/~dga/15-440/f12/lectures/05-concurrency.txt)

Example motivation: YFS Outline Local synchronization primitives Locks Semaphores Conditional variables Next time: distributed synchronization primitives 2

YFS Design (Reminder) file system interface (creat,read,write,..) client workstations (applications) YFS YFS YFS file service put/get acquire, release extent extent extent lock lock extent service (data) lock service 3

YFS Design file system interface (creat,read,write,..) YFS extent put/get extent YFS extent lock YFS acquire, release Extent Service: stores the data (dir and file contents) replicated for fault tolerance incrementally scalable with more s lock YFS Service: serves file system requests it s stateless (i.e., doesn t store data) uses extent service to store data incrementally scalable with more s Lock Service: ensure consistent updates by multiple yfs s replicated for fault tolerance 4

YFS Server Implements FS Logic Application program: creat( /d1/f1, 0777) creat( /d1/f1 ) YFS YFS file system interface YFS put/get acquire, release YFS : extent extent extent lock lock 1. GET root directory s data from Extent Service 2. Find Extent Server address for dir /d1 in root dir s data 3. GET /d1 s data from Extent Server 4. Find Extent Server address of f1 in /d1 s data 5. If not exists alloc a new data block for f1 from Extent Service add f1 to /d1 s data, PUT modified /d1 to Extent Serv. 5

Concurrent Accesses Cause Inconsistency App 1: creat( /d1/f1, 0777) Server S1: GET /d1 Find file f1 in /d1 If not exists PUT modified /d1 App 2: creat( /d1/f2, 0777) Server S2: GET /d1 Find file f2 in /d1 If not exists PUT modified /d1 What is the final result of /d1? What should it be? 6

Solution: Use a Lock Service to Synchronize Access App 1: creat( /d1/f1, 0777) Server S1: ACQUIRE( /d1 ) GET /d1 Find file f1 in /d1 If not exists PUT modified /d1 RELEASE( /d1 ) App 2: creat( /d1/f2, 0777) Server S2: ACQUIRE( /d1 ) // blocks GET /d1 RELEASE( /d1 ) 7

Putting It Together creat( /d1/f1 ) YFS extent file system interface (creat,read,write,..) creat( /d1/f1 ) put/get YFS 2. get( /d1 ) 3. put( /d1 ) 4. release( /d1 ) extent extent 1. acquire( /d1 ) acquire, release lock lock YFS 8

Another Problem: Naming creat( /d1/f1 ) YFS extent file system interface (creat,read,write,..) creat( /d1/f1 ) 1. acquire( /d1 ) put/get extent YFS 2. get( /d1 ) 3. put( /d1 ) 4. release( /d1 ) extent acquire, release lock lock YFS Any issues with this version? 9

Another Problem: Naming App 1: creat( /d0/d1/f1, 0777) Server S1: acquire( /d0/d1 ) GET /d0/d1 Find file f1 in d1 If not exists PUT modified /d0/d1 Release( /d0/d1 ) App 2: Server S2: rename( /d0, /d2 ) rename( /d3, /d0 ) Same problem occurs for reading/writing files, if we use their names when identifying them on the. 10

Solution: Using GUIDs App 1: creat( /d0/d1/f1, 0777) Server S1: acquire(d1_id) d1 = GET d1_id Find file f1 in d1 If not exists PUT(d1_id, modified d1) release(d1_id) App 2: Server S2: rename( /d0, /d2 ) rename( /d3, /d0 ) GUIDs are globally-unique, location-independent names GUID principle is pervasive in FSes and DSes! Think of inode numbers 11

(Paranthesis) HW 2-7 Build a Simplified YFS Version yfs yfs User-level kernel App FUSE syscall Single extent to store data Extent lock Communication using in-house RPC 12

HW 2: Lock Server Lock service consists of: Lock : grant a lock to clients, one at a time Lock client: talk to to acquire/release locks Correctness: At most one lock is granted to any client at any time Additional requirement: acquire() at client does not return until lock is granted 13

HW 2 Steps Step 1: Implement lock and client lock code Use in-house RPC mechanism: It s a bit more primitive than Thrift and the like (see next slide) Step 2: Implement at-most-once semantics Why? How do we do that? Due next Sept 25 before midnight Absolutely no extensions! Lab is significantly more difficult than HW 1, so start now working on it NOW! Work in layers, submit imperfect but on time! 14

YFS s RPC library (End Paranthesis) lock_client lock_ rpc call marshal args transmit RPC request receive marshal args rpc handler wait work rpc call return unmarshal args receive RPC response transmit unmarshal args rpc handler return rpcc rpcs cl->call(lock_protocol::acquire, x, ret).reg(lock_protocol::acquire, &ls, &lock_::acquire) 15

Outline Local synchronization primitives Locks // already talked about these Semaphores Condition variables Next time: distributed synchronization Many of the primitives here distribute or build upon local primitives 16

Locks are great. Why others? Locks are very low level, and often times you need more to accomplish a synchronization goal Hence, people have developed a variety of synchronization primitives that raise level of abstraction 17

Semaphores Integer variable x that allows interaction via 2 operations: x.p(): x.v(): while (x == 0) wait; --x ++x Both operations are done atomically All steps take place without any intervening operations When do we use semaphores? 18

Example: Thread-Safe FIFO Queue q.initialize(): initialize state q.insert(x): add item into queue q.remove(): block until queue not empty; return item at head of queue q.flush(): clear queue Assume that we already have a sequential queue implementation: sq But sq is not thread-safe! So, how do we make q thread-safe? 19

FIFO Queue with Mutexes q.initialize(): q.sq = NewSQueue() q.mutex = 1 q.insert(x): q.sq.insert(x) q.remove(): x = q.sq.remove() return x q.flush(): q.sq.flush() Are we done? 20

FIFO Queue with Mutexes q.initialize(): q.sq = NewSQueue() q.mutex = 1 q.insert(x): q.sq.insert(x) q.remove(): x = q.sq.remove() return x q.flush(): q.sq.flush() Are we done? Nope: Remove doesn t block when buffer s empty 21

FIFO Queue with Semaphores Use semaphore to count number of elements in queue q.initialize(): q.sq = NewSQueue() q.mutex = 1 q.items = 0 q.insert(x): q.sq.insert(x) q.items.v() Are we done? q.remove(): q.items.p() x = q.sq.remove() return x q.flush(): q.sq.flush() q.items = 0 22

FIFO Queue with Semaphores Use semaphore to count number of elements in queue q.initialize(): q.sq = NewSQueue() q.mutex = 1 q.items = 0 q.insert(x): q.sq.insert(x) q.items.v() q.flush() q.remove(): q.items.p() x = q.sq.remove() return x q.flush(): q.sq.flush() q.items = 0 Are we done? Nope: Just Insert & Remove work fine, but Flush messes things up 23

Fixing Race with Mutex? q.initialize(): q.sq = NewSQueue() q.mutex = 1 q.items = 0 q.insert(x): q.sq.insert(x) q.items.v() q.remove(): q.items.p() x = q.sq.remove() return x q.flush(): q.sq.flush() q.items = 0 Are we done? Yes, from a correctness perspective Nope, from a liveness perspective -- deadlock 24

Condition Variables Condition variables provide synchronization point, where one thread suspends until activated by another Condition variable always associated with a mutex cvar.wait(): Must be called after locking mutex Atomically: { release mutex & suspend operation } When resume, lock the mutex (may have to wait for it) cvar.signal(): If no thread suspended, then no-op Wake up one suspended thread 25

FIFO Queue with Condition Variable q.initialize(): q.sq = NewSQueue() q.mutex = 1 q.cvar = NewCond(q.mutex) q.insert(x): q.sq.insert(x) q.cvar.signal() q.remove(): if q.sq.isempty(): q.cvar.wait() x = q.sq.remove() return x q.flush(): q.sq.flush() q.flush() atomic 26

Thread-Safe FIFO Queue q.initialize(): q.sq = NewSQueue() q.mutex = 1 q.cvar = NewCond(q.mutex) q.insert(x): q.sq.insert(x) q.cvar.signal() q.remove(): while q.sq.isempty(): q.cvar.wait() x = q.sq.remove() return x q.flush(): q.sq.flush() Actually, one could build this using mutexes Build semaphore using mutex Build cond variable using semaphore + mutex But, boy, it s a mind-bender to do that that s why you want higher level of abstraction 27

Synchronization in Distributed Systems As we ve already seen in YFS Lab, distributed systems have similar issues: Multiple processes on different machines share the same resource: the data (or a printer, or the user s screen, ) Synchronization is even hairier in distributed systems, as you have to worry about failures, not just overlappings E.g.: when you get a lock, you may not even know you ve gotten it 28

Analogy: The Generals Dilemma e.g., carrier pigeon Let s attack together at 0500pm OK Hmm, should I attack? Did he get my OK? Same with distributed systems: machines need to agree on how to progress via an unreliable medium Things can get even messier if generals/machines can also fail (or even worse, go rogue) 29

Distributed Synchronization Mechanisms Logical clocks: clock synchronization is a real issue in distributed systems, hence they often maintain logical clocks, which count operations on the shared resource Consensus: multiple machines reach majority agreement over the operations they should perform and their ordering Data consistency protocols: replicas evolve their states in pre-defined ways so as to reach a common state despite different views of the input Distributed locking services: machines grab locks from a centralized, but still distributed, locking service, so as to coordinate their accesses to shared resources (e.g., files) Distributed transactions: an operation that involves multiple services either succeeds or fails at all of them We re going to look at all of these in the following lectures 30

Next Time Clocks in distributed systems Clock synchronization problem Logical (protocol) clocks 31