Advanced Topic: Efficient Synchronization

Similar documents
Lecture 10: Multi-Object Synchronization

CS 153 Design of Operating Systems Winter 2016

CPSC/ECE 3220 Summer 2017 Exam 2

CS 571 Operating Systems. Midterm Review. Angelos Stavrou, George Mason University

Deadlock. Disclaimer: some slides are adopted from Dr. Kulkarni s and book authors slides with permission 1

CPSC/ECE 3220 Summer 2018 Exam 2 No Electronics.

The deadlock problem

Deadlock. Disclaimer: some slides are adopted from Dr. Kulkarni s and book authors slides with permission 1

Deadlock. Disclaimer: some slides are adopted from Dr. Kulkarni s and book authors slides with permission 1

Operating Systems: William Stallings. Starvation. Patricia Roy Manatee Community College, Venice, FL 2008, Prentice Hall

Lecture 9: Multiprocessor OSs & Synchronization. CSC 469H1F Fall 2006 Angela Demke Brown

Last Class: Monitors. Real-world Examples

Lecture #7: Implementing Mutual Exclusion

Chapter 6: Process Synchronization

Deadlocks: Detection & Avoidance

Multiprocessors and Locking

The Deadlock Problem (1)

MULTITHREADING AND SYNCHRONIZATION. CS124 Operating Systems Fall , Lecture 10

Advance Operating Systems (CS202) Locks Discussion

UNIT:2. Process Management

Review: Test-and-set spinlock

CPSC/ECE 3220 Fall 2017 Exam Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts.

Operating Systems Antonio Vivace revision 4 Licensed under GPLv3

More on Synchronization and Deadlock

CSE 120 Principles of Operating Systems

Deadlock. Concurrency: Deadlock and Starvation. Reusable Resources

Last Class: Deadlocks. Today

What is the Race Condition? And what is its solution? What is a critical section? And what is the critical section problem?

Deadlocks. Copyright : University of Illinois CS 241 Staff 1

1 RCU. 2 Improving spinlock performance. 3 Kernel interface for sleeping locks. 4 Deadlock. 5 Transactions. 6 Scalable interface design

Last Class: Synchronization Problems. Need to hold multiple resources to perform task. CS377: Operating Systems. Real-world Examples

Readers-Writers Problem. Implementing shared locks. shared locks (continued) Review: Test-and-set spinlock. Review: Test-and-set on alpha

Chapter 6 Concurrency: Deadlock and Starvation

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Steve Gribble. Synchronization. Threads cooperate in multithreaded programs

Lecture 18: Coherence and Synchronization. Topics: directory-based coherence protocols, synchronization primitives (Sections

EECS 482 Introduction to Operating Systems

Last Class: Synchronization Problems!

Chapter 5: Process Synchronization. Operating System Concepts Essentials 2 nd Edition

CSL373: Lecture 5 Deadlocks (no process runnable) + Scheduling (> 1 process runnable)

OPERATING SYSTEMS. Deadlocks

Lecture 21: Transactional Memory. Topics: Hardware TM basics, different implementations

Deadlocks. Frédéric Haziza Spring Department of Computer Systems Uppsala University

Today: Synchronization. Recap: Synchronization

Chapter 3: Deadlocks

CS162 Operating Systems and Systems Programming Midterm Review"

Concurrency: Deadlock and Starvation

Main Points of the Computer Organization and System Software Module

CSC Operating Systems Spring Lecture - XII Midterm Review. Tevfik Ko!ar. Louisiana State University. March 4 th, 2008.

Synchronization. CS61, Lecture 18. Prof. Stephen Chong November 3, 2011

Quiz Answers. CS 537 Lecture 9 Deadlock. What can go wrong? Readers and Writers Monitor Example

Concurrency: Deadlock and Starvation. Chapter 6

CS162, Spring 2004 Discussion #6 Amir Kamil UC Berkeley 2/26/04

10/17/ Gribble, Lazowska, Levy, Zahorjan 2. 10/17/ Gribble, Lazowska, Levy, Zahorjan 4

Deadlocks. Thomas Plagemann. With slides from C. Griwodz, K. Li, A. Tanenbaum and M. van Steen

CMPT 300 Introduction to Operating Systems

Advanced Synchronization and Deadlock

Semaphore. Originally called P() and V() wait (S) { while S <= 0 ; // no-op S--; } signal (S) { S++; }

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Hank Levy 412 Sieg Hall

CSE 153 Design of Operating Systems

Operating Systems. Designed and Presented by Dr. Ayman Elshenawy Elsefy

Operating Systems. Operating Systems Sina Meraji U of T

OS Structure. User mode/ kernel mode (Dual-Mode) Memory protection, privileged instructions. Definition, examples, how it works?

Maximum CPU utilization obtained with multiprogramming. CPU I/O Burst Cycle Process execution consists of a cycle of CPU execution and I/O wait

250P: Computer Systems Architecture. Lecture 14: Synchronization. Anton Burtsev March, 2019

Process Synchronisation (contd.) Deadlock. Operating Systems. Spring CS5212

Synchronization. CISC3595/5595 Fall 2015 Fordham Univ.

Lecture: Consistency Models, TM

Operating Systems. Lecture3.2 - Deadlock Management. Golestan University. Hossein Momeni

System Model. Types of resources Reusable Resources Consumable Resources

What is Deadlock? Two or more entities need a resource to make progress, but will never get that resource. Examples from everyday life:

Lecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6)

Concurrency: Principles of Deadlock. Processes and resources. Concurrency and deadlocks. Operating Systems Fall Processes need resources to run

Chapter 6: Process Synchronization

Synchronization I. Jo, Heeseung

Bridge Crossing Example

Multiprocessor Systems. Chapter 8, 8.1

Suggested Solutions (Midterm Exam October 27, 2005)

Midterm Exam March 13, 2013 CS162 Operating Systems

COS 318: Operating Systems. Deadlocks. Jaswinder Pal Singh Computer Science Department Princeton University

Chapter 7: Deadlocks 1

Chapter 7: Deadlocks

Lesson 6: Process Synchronization

CS 318 Principles of Operating Systems

Module 6: Process Synchronization. Operating System Concepts with Java 8 th Edition

Last Class: CPU Scheduling! Adjusting Priorities in MLFQ!

Operating Systems: Quiz2 December 15, Class: No. Name:

Deadlock. Concepts to discuss. A System Model. Deadlock Characterization. Deadlock: Dining-Philosophers Example. Deadlock: Bridge Crossing Example

Midterm Exam Amy Murphy 19 March 2003

Process Management And Synchronization

MultiJav: A Distributed Shared Memory System Based on Multiple Java Virtual Machines. MultiJav: Introduction

CSE 410 Final Exam 6/09/09. Suppose we have a memory and a direct-mapped cache with the following characteristics.

Process & Thread Management II. Queues. Sleep() and Sleep Queues CIS 657

Process & Thread Management II CIS 657

CSCC 69H3 2/1/13. Today: Deadlock. Remember example from last week? Example of Deadlock. Example: dining philosophers: Not just an OS Problem!

OS Structure. User mode/ kernel mode. System call. Other concepts to know. Memory protection, privileged instructions

DEADLOCKS M O D E R N O P E R A T I N G S Y S T E M S C H A P T E R 6 S P R I N G

Deadlocks. Tore Larsen. With slides from T. Plagemann, C. Griwodz, K. Li, A. Tanenbaum and M. van Steen

Operating Systems. Synchronization

Interprocess Communication By: Kaushik Vaghani

CSE 4/521 Introduction to Operating Systems

Transcription:

Advanced Topic: Efficient Synchronization

Multi-Object Programs What happens when we try to synchronize across multiple objects in a large program? Each object with its own lock, condition variables Is locking modular? Performance Deadlock Eliminating locks

Synchronization Performance A program with lots of parallel threads can still have poor performance on a multiprocessor: Overhead of creating threads, if not needed Lock contention: only one thread at a time can hold a given lock Shared data protected by a lock may ping back and forth between cores False sharing: communication between cores even for data that is not shared

Topics Multiprocessor cache coherence MCS locks (if locks are mostly busy) RCU locks (if locks are mostly busy, and data is mostly read-only)

Why Cache?

Making things faster by adding closer memory

Multiprocessor Cache Coherence Scenario: Thread A modifies data inside a critical section and releases lock Thread B acquires lock and reads data Easy if all accesses go to main memory Thread A changes main memory; thread B reads it What if new data is cached at processor A? What if old data is cached at processor B

Write Back Cache Coherence Cache coherence = system behaves as if there is one copy of the data If data is only being read, any number of caches can have a copy If data is being modified, at most one cached copy On write: (get ownership) Invalidate all cached copies, before doing write Modified data stays in cache ( write back ) On read: Fetch value from owner or from memory

Snoop-based Cache Coherence Cache controller watches all traffic to memory subsystem either directly or via broadcast Decides how to modify local cache Works well with a bus Does NOT scale well with number of processors

Cache State Machine Read miss Read-Only Invalid Peer write Write miss Peer read Write hit Peer write Exclusive (writable)

Directory-Based Cache Coherence How do we know which cores have a location cached? Hardware keeps track of all cached copies On a read miss, if held exclusive, fetch latest copy and invalidate that copy On a write miss, invalidate all copies Read-modify-write instructions Fetch cache entry exclusive, prevent any other cache from reading the data until instruction completes

A Simple Critical Section // A counter protected by a spinlock Counter::Increment() { while (test_and_set(&lock)) ; value++; lock = FREE; memory_barrier(); }

A Simple Test of Cache Behavior Array of 1K counters, each protected by a separate spinlock Array small enough to fit in cache Test 1: one thread loops over array Test 2: two threads loop over different arrays Test 3: two threads loop over single array Test 4: two threads loop over alternate elements in single array

Results (64 core AMD Opteron) One thread, one array 51 cycles Two threads, two arrays 52 Two threads, one array 197 Two threads, odd/even 127

Reducing Lock Contention Fine-grained locking Partition object into subsets, each protected by its own lock Example: hash table buckets Per-processor data structures Partition object so that most/all accesses are made by one processor Example: per-processor heap Ownership/Staged architecture Only one thread at a time accesses shared data (CSP) Example: pipeline of threads

What If Locks are Still Mostly Busy? MCS Locks Optimize lock implementation for when lock is contended RCU (read-copy-update) Efficient readers/writers lock used in Linux kernel Readers proceed without first acquiring lock Writer ensures that readers are done Both rely on atomic read-modify-write instructions

The Problem with Test and Set Counter::Increment() { while (test_and_set(&lock)) ; value++; lock = FREE; memory_barrier(); } What happens if many processors try to acquire the lock at the same time? Hardware doesn t prioritize FREE

The Problem with Test and Test and Set Counter::Increment() { while (lock == BUSY test_and_set(&lock)) ; value++; lock = FREE; memory_barrier(); } What happens if many processors try to acquire the lock? Lock value pings between caches

Test (and Test) and Set Performance

Some Approaches Insert a delay in the spin loop Helps but acquire is slow when not much contention Spin adaptively No delay if few waiting Longer delay if many waiting Guess number of waiters by how long you wait MCS Create a linked list of waiters using compareandswap Spin on a per-processor location

Atomic CompareAndSwap Operates on a memory word Check that the value of the memory word hasn t changed from what you expect E.g., no other thread did compareandswap first If it has changed, return an error (and loop) If it has not changed, set the memory word to a new value

MCS Lock Maintain a list of threads waiting for the lock Front of list holds the lock MCSLock::tail is last thread in list New thread uses CompareAndSwap to add to the tail Lock is passed by setting next->needtowait = FALSE; Next thread spins while its needtowait is TRUE TCB { TCB *next; // next in line bool needtowait; } MCSLock { Queue *tail = NULL; // end of line }

MCS Lock Implementation MCSLock::acquire() { Queue oldtail = tail; } mytcb >next = NULL; while (!compareandswap(&tail, oldtail, &mytcb)) { oldtail = tail; } if (oldtail!= NULL) { mytcb >needtowait = TRUE; oldtail >next = mytcb; memory_barrier(); while (mytcb >needtowait) ; } MCSLock::release() { if (compareandswap(&tail, mytcb, NULL)) { // no more waiters return; } // someone else stuck a waiter on // spin until we know who they are while (mytcb >next == NULL) ; // wake them up mytcb >next >needtowait=false; }

MCS In Operation

Read-Copy-Update Goal: very fast reads to shared data Reads proceed without first acquiring a lock OK if write is (very) slow Restricted update Writer computes new version of data structure Publishes new version with a single atomic instruction Multiple concurrent versions Readers may see old or new version Integration with thread scheduler Guarantee all readers complete within grace period, and then garbage collect old version

Read-Copy-Update

Read-Copy-Update Implementation Readers disable interrupts on entry Guarantees they complete critical section in a timely fashion No read or write lock Writer Acquire write lock Compute new data structure Publish new version with atomic instruction Release write lock Wait for time slice on each CPU Only then, garbage collect old version of data structure

Deadlock Definition Resource: any (passive) thing needed by a thread to do its job (CPU, disk space, memory, lock) Preemptable: can be taken away by OS Non-preemptable: must leave with thread Starvation: thread waits indefinitely Deadlock: circular waiting for resources Deadlock => starvation, but not vice versa

Example: two locks Thread A Thread B lock1.acquire(); lock2.acquire(); lock2.release(); lock1.release(); lock2.acquire(); lock1.acquire(); lock1.release(); lock2.release();

Bidirectional Bounded Buffer Thread A Thread B buffer1.put(data); buffer1.put(data); buffer2.put(data); buffer2.put(data); buffer2.get(); buffer2.get(); buffer1.get(); buffer1.get(); Suppose buffer1 and buffer2 both start almost full.

Two locks and a condition variable Thread A lock1.acquire(); lock2.acquire(); while (need to wait) { condition.wait(lock2); } lock2.release(); lock1.release(); Thread B lock1.acquire(); lock2.acquire(); condition.signal(lock2); lock2.release(); lock1.release();

Yet another Example

Dining Lawyers Each lawyer needs two chopsticks to eat. Each grabs chopstick on the right first.

Necessary Conditions for Deadlock Limited access to resources If infinite resources, no deadlock! No preemption If resources are virtual, can break deadlock Multiple independent requests wait while holding Circular chain of requests

Question How does Dining Lawyers meet the necessary conditions for deadlock? Limited access to resources No preemption Multiple independent requests (wait while holding) Circular chain of requests How can we modify Dining Lawyers to prevent deadlock?

Preventing Deadlock Exploit or limit program behavior Limit program from doing anything that might lead to deadlock Predict the future If we know what program will do, we can tell if granting a resource might lead to deadlock Detect and recover If we can rollback a thread, we can fix a deadlock once it occurs

Exploit or Limit Behavior Provide enough resources How many chopsticks are enough? Eliminate wait while holding Release lock when calling out of module Telephone circuit setup Eliminate circular waiting Lock ordering: always acquire locks in a fixed order Example: move file from one directory to another

Example Thread 1 Thread 2 1. Acquire A 2. 3. Acquire C 4. 5. If (maybe) Wait for B 1. 2. Acquire B 3. 4. Wait for A How can we make sure to avoid deadlock?

Deadlock Dynamics Safe state: For any possible sequence of future resource requests, it is possible to eventually grant all requests May require waiting even when resources are available! Unsafe state: Some sequence of resource requests can result in deadlock Doomed state: All possible computations lead to deadlock

Possible System States

Question What are the doomed states for Dining Lawyers? What are the unsafe states? What are the safe states?

Communal Dining Lawyers n chopsticks in middle of table n lawyers, each can take one chopstick at a time What are the safe states? What are the unsafe states? What are the doomed states?

Communal Mutant Dining Lawyers N chopsticks in the middle of the table N lawyers, each takes one chopstick at a time Lawyers need k chopsticks to eat, k > 1 What are the safe states? What are the unsafe states? What are the doomed states?

Communal Mutant Absent-Minded Dining Lawyers N chopsticks in the middle of the table N lawyers, each takes one chopstick at a time Lawyers need k chopsticks to eat, k > 1 k larger if lawyer is talking on his/her cellphone What are the safe states? What are the unsafe states? What are the doomed states?

Banker s algorithm Predict the Future State maximum resource needs in advance Allocate resources dynamically when resource is needed -- wait if granting request would lead to deadlock Request can be granted if some sequential ordering of threads is deadlock free

Banker s Algorithm Grant request iff result is a safe state Sum of maximum resource needs of current threads can be greater than the total resources Provided there is some way for all the threads to finish without getting into deadlock Example: proceed iff total available resources - # allocated >= max remaining that might be needed by this thread in order to finish Guarantees this thread can finish

Detect and Repair Algorithm Scan wait for graph Detect cycles Fix cycles Proceed without the resource Requires robust exception handling code Roll back and retry Transaction: all operations are provisional until have all required resources to complete operation

Detecting Deadlock

Non-Blocking Synchronization Goal: data structures that can be read/modified without acquiring a lock No lock contention! No deadlock! General method using compareandswap Create copy of data structure Modify copy Swap in new version iff no one else has Restart if pointer has changed

Lock-Free Bounded Buffer tryget() { do { copy = ConsistentCopy(p); if (copy->front == copy->tail) return NULL; else { item = copy->buf[copy->front % MAX]; copy->front++; } while (compareandswap(&p, p, copy)); return item; }