Implementing Locks. Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau)

Similar documents
CS 537 Lecture 11 Locks

Concurrency: Mutual Exclusion (Locks)

Concurrency: Locks. Announcements

Concurrency: Signaling and Condition Variables

Locks. Dongkun Shin, SKKU

[537] Locks and Condition Variables. Tyler Harter

[537] Locks. Tyler Harter

28. Locks. Operating System: Three Easy Pieces

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Steve Gribble. Synchronization. Threads cooperate in multithreaded programs

Synchronization I. Jo, Heeseung

Lecture 9: Multiprocessor OSs & Synchronization. CSC 469H1F Fall 2006 Angela Demke Brown

CS 537: Introduction to Operating Systems Fall 2015: Midterm Exam #2 SOLUTIONS

Operating System Labs. Yuanbin Wu

CS 537: Introduction to Operating Systems (Summer 2017) University of Wisconsin-Madison Department of Computer Sciences.

10/17/ Gribble, Lazowska, Levy, Zahorjan 2. 10/17/ Gribble, Lazowska, Levy, Zahorjan 4

Synchronization I. To do

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Hank Levy 412 Sieg Hall

Condition Variables & Semaphores

Synchronization I. Today. Next time. Race condition, critical regions and locks. Condition variables, semaphores and Monitors

Synchronization I. Today. Next time. Monitors. ! Race condition, critical regions and locks. ! Condition variables, semaphores and

Locks and semaphores. Johan Montelius KTH

recap, what s the problem Locks and semaphores Total Store Order Peterson s algorithm Johan Montelius 0 0 a = 1 b = 1 read b read a

Operating Systems. Synchronization

Synchronization I. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Synchronization 1. Synchronization

Chapter 6: Process Synchronization

Last Class: CPU Scheduling! Adjusting Priorities in MLFQ!

CMSC421: Principles of Operating Systems

Locks and semaphores. Johan Montelius KTH

Lecture. DM510 - Operating Systems, Weekly Notes, Week 11/12, 2018

Operating Systems. Lecture 4 - Concurrency and Synchronization. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

Multiprocessors and Locking

Implementing Mutual Exclusion. Sarah Diesburg Operating Systems CS 3430

Recap: Thread. What is it? What does it need (thread private)? What for? How to implement? Independent flow of control. Stack

Concurrency. On multiprocessors, several threads can execute simultaneously, one on each processor.

Synchronization for Concurrent Tasks

Today: Synchronization. Recap: Synchronization

Concurrency. On multiprocessors, several threads can execute simultaneously, one on each processor.

Mutual Exclusion and Synchronization

10/17/2011. Cooperating Processes. Synchronization 1. Example: Producer Consumer (3) Example

Synchronization 1. Synchronization

Process & Thread Management II. Queues. Sleep() and Sleep Queues CIS 657

Process & Thread Management II CIS 657

Review: Easy Piece 1

Threads. Concurrency. What it is. Lecture Notes Week 2. Figure 1: Multi-Threading. Figure 2: Multi-Threading

CS5460: Operating Systems

Scalable Locking. Adam Belay

CS370 Operating Systems

Preemptive Scheduling and Mutual Exclusion with Hardware Support

Chapter 6: Synchronization. Operating System Concepts 8 th Edition,

Synchronization Principles

MULTITHREADING AND SYNCHRONIZATION. CS124 Operating Systems Fall , Lecture 10

As an example, assume our critical section looks like this, the canonical update of a shared variable:

Chapter 6: Process Synchronization

Lecture 5: Synchronization w/locks

Synchronization. Disclaimer: some slides are adopted from the book authors slides with permission 1

Process Synchronization Mechanisms

As an example, assume our critical section looks like this, the canonical update of a shared variable:

Synchronization. CSE 2431: Introduction to Operating Systems Reading: Chapter 5, [OSC] (except Section 5.10)

Chapter 6: Synchronization. Chapter 6: Synchronization. 6.1 Background. Part Three - Process Coordination. Consumer. Producer. 6.

Achieving Synchronization or How to Build a Semaphore

Chapter 6: Process [& Thread] Synchronization. CSCI [4 6] 730 Operating Systems. Why does cooperation require synchronization?

Concurrent programming: From theory to practice. Concurrent Algorithms 2015 Vasileios Trigonakis

CHAPTER 6: PROCESS SYNCHRONIZATION

Synchronization Spinlocks - Semaphores

Agenda. Lecture. Next discussion papers. Bottom-up motivation Shared memory primitives Shared memory synchronization Barriers and locks

Memory Consistency Models

Models of concurrency & synchronization algorithms

Synchronization. CS61, Lecture 18. Prof. Stephen Chong November 3, 2011

CS 31: Introduction to Computer Systems : Threads & Synchronization April 16-18, 2019

Synchronization I q Race condition, critical regions q Locks and concurrent data structures q Next time: Condition variables, semaphores and monitors

Midterm Exam Amy Murphy 19 March 2003

Chapter 6: Process Synchronization

I/O Devices. Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau)

CS370 Operating Systems

l12 handout.txt l12 handout.txt Printed by Michael Walfish Feb 24, 11 12:57 Page 1/5 Feb 24, 11 12:57 Page 2/5

Multiprocessor Systems. Chapter 8, 8.1

Spinlocks. Spinlocks. Message Systems, Inc. April 8, 2011

CS533 Concepts of Operating Systems. Jonathan Walpole

Systèmes d Exploitation Avancés

2 Threads vs. Processes

COMP 3430 Robert Guderian

CS 111. Operating Systems Peter Reiher

Readers-Writers Problem. Implementing shared locks. shared locks (continued) Review: Test-and-set spinlock. Review: Test-and-set on alpha

Other consistency models

Multiprocessor System. Multiprocessor Systems. Bus Based UMA. Types of Multiprocessors (MPs) Cache Consistency. Bus Based UMA. Chapter 8, 8.

Chapter 5: Process Synchronization. Operating System Concepts 9 th Edition

! Why is synchronization needed? ! Synchronization Language/Definitions: ! How are locks implemented? Maria Hybinette, UGA

Multiprocessor Systems. COMP s1

Lesson 6: Process Synchronization

Programming Languages

Last class: Today: CPU Scheduling. Start synchronization

Dept. of CSE, York Univ. 1

Lecture #7: Implementing Mutual Exclusion

CS420: Operating Systems. Process Synchronization

Chapter 5: Process Synchronization. Operating System Concepts 9 th Edition

1. Motivation (Race Condition)

Interprocess Communication By: Kaushik Vaghani

Lecture #7: Shared objects and locks

Process Synchronization

Transcription:

Implementing Locks Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau)

Lock Implementation Goals We evaluate lock implementations along following lines Correctness Mutual exclusion: only one thread in critical section at a time Progress (deadlock-free): if several simultaneous requests, must allow one to proceed Bounded wait (starvation-free): must eventually allow each waiting thread to enter Fairness: each thread waits for same amount of time Also, threads acquire locks in the same order as requested Performance: CPU time is used efficiently

Building Locks Locks are variables in shared memory Two main operations: acquire() and release() Also called lock() and unlock() To check if locked, read variable and check value To acquire, write locked value to variable Should only do this if already unlocked If already locked, keep reading value until unlock observed To release, write unlocked value to variable

First Implementation Attempt Using normal load/store instructions Boolean lock = false; // shared variable Void acquire(boolean *lock) { while (*lock) /* wait */ ; *lock = true; Final check of while condition & write to lock should happen atomically Void release(boolean *lock) { *lock = false; This does not work. Why? Checking and writing of the lock value in acquire() need to happen atomically.

Solution: Use Atomic RMW Instructions Atomic Instructions guarantee atomicity Perform Read, Modify, and Write atomically (RMW) Many flavors in the real world Test and Set Fetch and Add Compare and Swap (CAS) Load Linked / Store Conditional

Example: Test-and-Set Semantic: // return what was pointed to by addr // at the same time, store newval into addr atomically int TAS(int *addr, int newval) { int old = *addr; *addr = newval; return old; Implementation in x86: int TAS(volatile int *addr, int newval) { int result = newval; asm volatile("lock; xchg %0, %1" : "+m" (*addr), "=r" (result) : "1" (newval) : "cc"); return result;

Lock Implementation with TAS typedef struct lock_t { int flag; lock_t; void init(lock_t *lock) { lock->flag =??; void acquire(lock_t *lock) { while (????) ; // spin-wait (do nothing) void release(lock_t *lock) { lock->flag =??;

Lock Implementation with TAS typedef struct lock_t { int flag; lock_t; void init(lock_t *lock) { lock->flag = 0; void acquire(lock_t *lock) { while (TAS(&lock->flag, 1) == 1) ; // spin-wait (do nothing) void release(lock_t *lock) { lock->flag = 0;

Evaluating Our Spinlock Lock implementation goals 1) Mutual exclusion: only one thread in critical section at a time 2) Progress (deadlock-free): if several simultaneous requests, must allow one to proceed 3) Bounded wait: must eventually allow each waiting thread to enter 4) Fairness: threads acquire lock in the order of requesting 5) Performance: CPU time is used efficiently Which ones are NOT satisfied by our lock impl? 3, 4, 5

Our Spinlock is Unfair lock unlock lock unlock lock unlock lock unlock lock spin spin spin spin A B A B A B A B 0 20 40 60 80 100 120 140 160 Scheduler is independent of locks/unlocks

Fairness and Bounded Wait Use Ticket Locks Idea: reserve each thread s turn to use a lock. Each thread spins until their turn. Use new atomic primitive: fetch-and-add Acquire: Grab ticket using fetch-and-add Semantics: int FAA(int *ptr) { int old = *ptr; *ptr = old + 1; return old; Implementation: // Let s use GCC s built-in // atomic functions this time around sync_fetch_and_add(ptr, 1) Spin while not thread s ticket!= turn Release: Advance to next turn

Ticket Lock Example Initially, turn = ticket = 0 A lock(): B lock(): C lock(): A unlock(): A lock(): B unlock(): C unlock(): A unlock(): C lock(): gets ticket 0, spins until turn == 0 A runs gets ticket 1, spins until turn == 1 gets ticket 2, spins until turn == 2 turn++ (turn = 1) B runs gets ticket 3, spins until turn == 3 turn++ (turn = 2) C runs turn++ (turn = 3) A runs turn++ (turn = 4) gets ticket 4 C runs

Ticket Lock Implementation typedef struct { int ticket; int turn; lock_t; void lock_init(lock_t *lock) { lock->ticket = 0; lock->turn = 0; void acquire(lock_t *lock) { int myturn = FAA(&lock->ticket); while (lock->turn!= myturn); // spin void release(lock_t *lock) { lock->turn += 1;

Busy-Waiting (Spinning) Performance Good when many CPUs locks held a short time advantage: avoid context switch Awful when one CPU locks held a long time disadvantage: spinning is wasteful

CPU Scheduler Is Ignorant of busy-waiting locks lock unlock lock spin spin spin spin spin A B C D A B C D 0 20 40 60 80 100 120 140 160 CPU scheduler may run B instead of A even though B is waiting for A

Ticket Lock with yield() typedef struct { int ticket; int turn; lock_t; void acquire(lock_t *lock) { int myturn = FAA(&lock->ticket); while (lock->turn!= myturn) yield(); void release(lock_t *lock) { lock->turn += 1;

Yielding instead of Spinning lock unlock lock spin spin spin spin spin no yield: A B C D A B C D 0 20 40 60 80 100 120 140 160 lock unlock lock yield: A A B 0 20 40 60 80 100 120 140 160

Evaluating Ticket Lock Lock implementation goals 1) Mutual exclusion: only one thread in critical section at a time 2) Progress (deadlock-free): if several simultaneous requests, must allow one to proceed 3) Bounded wait: must eventually allow each waiting thread to enter 4) Fairness: threads acquire lock in the order of requesting 5) Performance: CPU time is used efficiently Which ones are NOT satisfied by our lock impl? 5 (even with yielding, too much overhead)

Spinning Performance Wasted time Without yield: O(threads time_slice) With yield: O(threads context_switch_time) So even with yield, spinning is slow with high thread contention Next improvement: instead of spinning, block and put thread on a wait queue

Blocking Locks acquire() removes waiting threads from run queue using special system call Let s call it park() removes current thread from run queue release() returns waiting threads to run queue using special system call Let s call it unpark(tid) returns thread tid to run queue Scheduler runs any thread that is ready No time wasted on waiting threads when lock is not available Good separation of concerns Keep waiting threads on a wait queue instead of scheduler s run queue Note: park() and unpark() are made-up syscalls inspired by Solaris lwp_park() and lwp_unpark() system calls

Building a Blocking Lock typedef struct { int lock; int guard; queue_t q; lock_t; 1) What is guard for? 2) Why okay to spin on guard? 3) In release(), why not set lock=false when unparking? 4) Is the code correct? Hint: there is a race condition void acquire(lock_t *l) { while (TAS(&l->guard, 1) == 1); if (l->lock) { queue_add(l->q, gettid()); l->guard = 0; park(); // blocked else { l->lock = 1; l->guard = 0; void release(lock_t *l) { while (TAS(&l->guard, 1) == 1); if (queue_empty(l->q)) l->lock=false; else unpark(queue_remove(l->q)); l->guard = false;

Race Condition Thread 1 in acquire() Thread 2 in release() if (l->lock) { queue_add(l->q, gettid()); l->guard = 0; while (TAS(&l->guard, 1) == 1); if (queue_empty(l->q)) l->lock=false; else unpark(queue_remove(l->q)); park(); Problem: guard not held when calling park() Thread 2 can call unpark() before Thread 1 calls park()

Solving Race Problem: Final Correct Lock typedef struct { int lock; int guard; queue_t q; lock_t; setpark() informs the OS of my plan to park() myself If there is an unpark() between my setpark() and park(), park() will return immediately (no blocking) void acquire(lock_t *l) { while (TAS(&l->guard, 1) == 1); if (l->lock) { queue_add(l->q, gettid()); setpark(); l->guard = 0; park(); // blocked else { l->lock = 1; l->guard = 0; void release(lock_t *l) { while (TAS(&l->guard, 1) == 1); if (queue_empty(l->q)) l->lock=false; else unpark(queue_remove(l->q)); l->guard = false;

Different OS, Different Support park, unpark, and setpark inspired by Solaris Other OSes provide different mechanisms to support blocking synchronization E.g., Linux has a mechanism called futex With two basic operations: wait and wakeup It keeps the queue in kernel It renders guard and setpark unnecessary Read more about futex in OSTEP (brief) and in an optional reading (detailed)

Spinning vs. Blocking Each approach is better under different circumstances Uniprocessor Waiting process is scheduled Process holding lock can t be Therefore, waiting process should always relinquish processor Associate queue of waiters with each lock (as in previous implementation) Multiprocessor Waiting process is scheduled Process holding lock might be Spin or block depends on how long before lock is released Lock is going to be released quickly Spin-wait Lock released slowly Block

Two-Phase Locking A hybrid approach that combines best of spinning and blocking Phase 1: spin for a short time, hoping the lock becomes available soon Phase 2: if lock not released after a short while, then block Question: how long to spin for? There s a nice theory (next slide) which is in practice hard to implement, so just spin for a few iterations

Two-Phase Locking Spin Time Say cost of context switch is C cycles and lock will become available after T cycles Algorithm: spin for C cycles before blocking We can show this is a 2-approximation of the optimal solution Two cases: T < C: optimal would spin for T (cost = T), so do we (cost = T) T C: optimal would immediately block (cost = C), we spin for C and then block (cost = C + C = 2C) So, our cost is at most twice that of optimal algorithm Problems to implement this theory? 1) Difficult to know C (it is non-deterministic) 2) Needs a low-overhead high-resolution timing mechanism to know when C cycles have passed