Concurrency: Locks. Announcements

Similar documents
CS 537 Lecture 11 Locks

Concurrency: Mutual Exclusion (Locks)

[537] Locks. Tyler Harter

Implementing Locks. Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau)

Review: Easy Piece 1

Synchronization I q Race condition, critical regions q Locks and concurrent data structures q Next time: Condition variables, semaphores and monitors

Locks. Dongkun Shin, SKKU

Concurrency: Signaling and Condition Variables

Synchronization I. To do

Synchronization I. Today. Next time. Monitors. ! Race condition, critical regions and locks. ! Condition variables, semaphores and

Synchronization I. Today. Next time. Race condition, critical regions and locks. Condition variables, semaphores and Monitors

[537] Locks and Condition Variables. Tyler Harter

CS 537: Introduction to Operating Systems Fall 2015: Midterm Exam #2 SOLUTIONS

CS5460: Operating Systems

Synchronization I. Jo, Heeseung

Synchronization. CS61, Lecture 18. Prof. Stephen Chong November 3, 2011

CS 537: Introduction to Operating Systems (Summer 2017) University of Wisconsin-Madison Department of Computer Sciences.

Lecture 5: Synchronization w/locks

10/17/ Gribble, Lazowska, Levy, Zahorjan 2. 10/17/ Gribble, Lazowska, Levy, Zahorjan 4

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Steve Gribble. Synchronization. Threads cooperate in multithreaded programs

Operating Systems. Synchronization

! Why is synchronization needed? ! Synchronization Language/Definitions: ! How are locks implemented? Maria Hybinette, UGA

CS 31: Introduction to Computer Systems : Threads & Synchronization April 16-18, 2019

Operating Systems. Lecture 4 - Concurrency and Synchronization. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

Chapter 6: Process [& Thread] Synchronization. CSCI [4 6] 730 Operating Systems. Why does cooperation require synchronization?

10/17/2011. Cooperating Processes. Synchronization 1. Example: Producer Consumer (3) Example

CS5460/6460: Operating Systems. Lecture 11: Locking. Anton Burtsev February, 2014

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Hank Levy 412 Sieg Hall

Chapter 6: Process Synchronization

Synchronization Spinlocks - Semaphores

CS 450 Exam 2 Mon. 11/7/2016

Synchronization. Disclaimer: some slides are adopted from the book authors slides with permission 1

Synchronization. Before We Begin. Synchronization. Credit/Debit Problem: Race Condition. CSE 120: Principles of Operating Systems.

CS 153 Design of Operating Systems Winter 2016

Operating Systems CMPSCI 377 Spring Mark Corner University of Massachusetts Amherst

Process/Thread Synchronization

Synchronization. CSCI 3753 Operating Systems Spring 2005 Prof. Rick Han

CS3733: Operating Systems

Threads and Synchronization. Kevin Webb Swarthmore College February 15, 2018

Synchronization I. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Announcements. Class feedback for mid-course evaluations Receive about survey to fill out until this Friday

Process Synchronization (Part I)

CSCI [4 6] 730 Operating Systems. Example Execution. Process [& Thread] Synchronization. Why does cooperation require synchronization?

Recap: Thread. What is it? What does it need (thread private)? What for? How to implement? Independent flow of control. Stack

Today: Synchronization. Recap: Synchronization

Synchronization Principles

Announcements. Cond Var: Review

Systèmes d Exploitation Avancés

Mutual Exclusion and Synchronization

Last Class: CPU Scheduling! Adjusting Priorities in MLFQ!

MULTITHREADING AND SYNCHRONIZATION. CS124 Operating Systems Fall , Lecture 10

Operating Systems. Sina Meraji U of T

Synchronization 1. Synchronization

Lecture #7: Implementing Mutual Exclusion

Lecture 9: Multiprocessor OSs & Synchronization. CSC 469H1F Fall 2006 Angela Demke Brown

Chapter 6: Process Synchronization

CS370 Operating Systems

CSE 153 Design of Operating Systems

Process/Thread Synchronization

Synchronization for Concurrent Tasks

CS 450 Exam 2 Mon. 4/11/2016

Multiprocessors and Locking

Dealing with Issues for Interprocess Communication

28. Locks. Operating System: Three Easy Pieces

CS370 Operating Systems

Chapter 6: Synchronization. Chapter 6: Synchronization. 6.1 Background. Part Three - Process Coordination. Consumer. Producer. 6.

Midterm Exam Amy Murphy 19 March 2003

Chapter 6: Synchronization. Operating System Concepts 8 th Edition,

Introduction to Parallel Programming Part 4 Confronting Race Conditions

Chapter 5: Process Synchronization. Operating System Concepts 9 th Edition

Introduction to OS Synchronization MOS 2.3

Administrivia. Review: Thread package API. Program B. Program A. Program C. Correct answers. Please ask questions on Google group

Lecture. DM510 - Operating Systems, Weekly Notes, Week 11/12, 2018

Programming Languages

Implementing Mutual Exclusion. Sarah Diesburg Operating Systems CS 3430

CS420: Operating Systems. Process Synchronization

Midterm on next week Tuesday May 4. CS 361 Concurrent programming Drexel University Fall 2004 Lecture 9

CS3210: Multiprocessors and Locking

CSE 153 Design of Operating Systems Fall 2018

Lesson 6: Process Synchronization

CS-537: Midterm Exam (Spring 2001)

CPSC/ECE 3220 Fall 2017 Exam Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts.

Operating Systems. Designed and Presented by Dr. Ayman Elshenawy Elsefy

Interprocess Communication By: Kaushik Vaghani

Threads. Concurrency. What it is. Lecture Notes Week 2. Figure 1: Multi-Threading. Figure 2: Multi-Threading

Advance Operating Systems (CS202) Locks Discussion

Synchronization 1. Synchronization

Dept. of CSE, York Univ. 1

Multiprocessor System. Multiprocessor Systems. Bus Based UMA. Types of Multiprocessors (MPs) Cache Consistency. Bus Based UMA. Chapter 8, 8.

CMSC421: Principles of Operating Systems

Last Class: Synchronization

Chapter 5: Process Synchronization. Operating System Concepts Essentials 2 nd Edition

Process Synchronization

Multiprocessor Systems. COMP s1

Thread. Disclaimer: some slides are adopted from the book authors slides with permission 1

Chapter 6 Process Synchronization

Concurrency. On multiprocessors, several threads can execute simultaneously, one on each processor.

Lecture #7: Shared objects and locks

Synchronization. Before We Begin. Synchronization. Example of a Race Condition. CSE 120: Principles of Operating Systems. Lecture 4.

Review: Thread package API

Transcription:

CS 537 Introduction to Operating Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department Concurrency: Locks Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau Questions answered in this lecture: Review threads and mutual exclusion for critical sections How can locks be used to protect shared data structures such as linked lists? Can locks be implemented by disabling interrupts? Can locks be implemented with loads and stores? Can locks be implemented with atomic hardware instructions? Are spinlocks a good idea? Announcements P2: Due this Friday à Extension to Sunday evening Test scripts and handin directories available Purpose of graph is to demonstrate scheduler is working correctly 1 st Exam: Congratulations for completing! Grades posted to Learn@UW : Average around 80% 90% and up: A 85-90: AB 80-85: B 70-80: BC 60-70: C Below 60: D Return individual sheets in discussion section Exam with answers will be posted to course web page soon Read as we go along! Chapter 28 1

CPU 1 CPU 2 running running thread 1 thread 2 PTBR PTBR IP SP IP SP RAM PageDir A PageDir B Virt Mem (PageDir B) CODE HEAP Review: Which registers store the same/different values across threads? CPU 1 CPU 2 running running thread 1 thread 2 PTBR PTBR IP SP IP SP RAM PageDir A PageDir B Virt Mem (PageDir B) CODE HEAP STACK 1 STACK 2 All general purpose registers are virtualized à each thread given impression of own copy 2

Review: What is needed for CORRECTNESS? Balance = balance + 1; Instructions accessing shared memory must execute as uninterruptable group Need group of assembly instructions to be atomic mov 0x123, %eax add %0x1, %eax mov %eax, 0x123 critical section More general: Need mutual exclusion for critical sections if process A is in critical section C, process B can t (okay if other processes do unrelated work) Other Examples Consider multi-threaded applications that do more than increment shared balance Multi-threaded application with shared linked-list All concurrent: Thread A inserting element a Thread B inserting element b Thread C looking up element c 3

Shared Linked List Void List_Insert(list_t *L, int key) { node_t *new = malloc(sizeof(node_t)); assert(new); new->key = key; new->next = L->head; L->head = new; int List_Lookup(list_t *L, int key) { node_t *tmp = L->head; while (tmp) { if (tmp->key == key) return 1; tmp = tmp->next; return 0; typedef struct node_t { int key; struct node_t *next; node_t; Typedef struct list_t { node_t *head; list_t; Void List_Init(list_t *L) { L->head = NULL; What can go wrong? Find schedule that leads to problem? Linked-List Race Thread 1 Thread 2 new->key = key new->next = L->head new->key = key new->next = L->head L->head = new L->head = new Both entries point to old head Only one entry (which one?) can be the new head. 4

Resulting Linked List head T1 s node old head n3 n4 T2 s node [orphan node] Locking Linked Lists Void List_Insert(list_t *L, int key) { node_t *new = malloc(sizeof(node_t)); assert(new); new->key = key; new->next = L->head; L->head = new; int List_Lookup(list_t *L, int key) { node_t *tmp = L->head; while (tmp) { if (tmp->key == key) return 1; tmp = tmp->next; return 0; typedef struct node_t { int key; struct node_t *next; node_t; Typedef struct list_t { node_t *head; list_t; Void List_Init(list_t *L) { L->head = NULL; How to add locks? 5

Locking Linked Lists typedef struct node_t { int key; struct node_t *next; node_t; typedef struct node_t { int key; struct node_t *next; node_t; Typedef struct list_t { node_t *head; list_t; Void List_Init(list_t *L) { L->head = NULL; How to add locks? pthread_mutex_t lock; Typedef struct list_t { node_t *head; pthread_mutex_t lock; list_t; Void List_Init(list_t *L) { L->head = NULL; pthread_mutex_init(&l->lock, NULL); One lock per list Fine if add to OTHER lists concurrently Locking Linked Lists : Approach #1 Void List_Insert(list_t *L, Pthread_mutex_lock(&L->lock); Consider everything critical section Can critical section be smaller? Pthread_mutex_unlock(&L->lock); Pthread_mutex_lock(&L->lock); Pthread_mutex_unlock(&L->lock); int key) { node_t *new = malloc(sizeof(node_t)); assert(new); new->key = key; new->next = L->head; L->head = new; int List_Lookup(list_t *L, int key) { node_t *tmp = L->head; while (tmp) { if (tmp->key == key) return 1; tmp = tmp->next; return 0; 6

Locking Linked Lists : Approach #2 Void List_Insert(list_t *L, int key) { node_t *new = Critical section small as possible Pthread_mutex_lock(&L->lock); Pthread_mutex_unlock(&L->lock); Pthread_mutex_lock(&L->lock); Pthread_mutex_unlock(&L->lock); malloc(sizeof(node_t)); assert(new); new->key = key; new->next = L->head; L->head = new; int List_Lookup(list_t *L, int key) { node_t *tmp = L->head; while (tmp) { if (tmp->key == key) return 1; tmp = tmp->next; return 0; Locking Linked Lists : Approach #3 Void List_Insert(list_t *L, int key) { node_t *new = What about Lookup()? Pthread_mutex_lock(&L->lock); Pthread_mutex_unlock(&L->lock); Pthread_mutex_lock(&L->lock); If no List_Delete(), locks not needed Pthread_mutex_unlock(&L->lock); malloc(sizeof(node_t)); assert(new); new->key = key; new->next = L->head; L->head = new; int List_Lookup(list_t *L, int key) { node_t *tmp = L->head; while (tmp) { return 0; if (tmp->key == key) return 1; tmp = tmp->next; 7

Implementing Synchronization Build higher-level synchronization primitives in OS Operations that ensure correct ordering of instructions across threads Motivation: Build them once and get them right Monitors Locks Semaphores Condition Variables Loads Stores Test&Set Disable Interrupts Lock Implementation Goals Correctness Mutual exclusion Only one thread in critical section at a time Progress (deadlock-free) If several simultaneous requests, must allow one to proceed Bounded waiting (starvation-free) Must eventually allow each waiting thread to eventually enter Fairness Each thread waits in some defined order Performance CPU is not used unnecessarily (e.g., spinning) Fast to acquire lock if no contention with other threads 8

Implementing Synchronization To implement, need atomic operations Atomic operation: No other instructions can be interleaved Examples of atomic operations Code between interrupts on uniprocessors Disable timer interrupts, don t do any I/O Loads and stores of words Load r1, B Store r1, A Special hw instructions Test&Set Compare&Swap Implementing Locks: W/ Interrupts Turn off interrupts for critical sections Prevent dispatcher from running another thread Code between interrupts executes atomically Void acquire(lockt *l) { disableinterrupts(); Void release(lockt *l) { enableinterrupts(); Disadvantages?? Only works on uniprocessors Process can keep control of CPU for arbitrary length Cannot perform other necessary work 9

Implementing Synchronization To implement, need atomic operations Atomic operation: No other instructions can be interleaved Examples of atomic operations Code between interrupts on uniprocessors Disable timer interrupts, don t do any I/O Loads and stores of words Load r1, B Store r1, A Special hw instructions Test&Set Compare&Swap Implementing LOCKS: w/ Load+Store Code uses a single shared lock variable Boolean lock = false; // shared variable void acquire(boolean *lock) { while (*lock) /* wait */ ; *lock = true; void release(boolean *lock) { *lock = false; Why doesn t this work? Example schedule that fails with 2 threads? 10

Race Condition with LOAD and STORE *lock == 0 initially Thread 1 Thread 2 while(*lock == 1) ; while(*lock == 1) ; *lock = 1 *lock = 1 Both threads grab lock! Problem: Testing lock and setting lock are not atomic Demo Main-thread-3.c Critical section not protected with faulty lock implementation 11

Peterson s Algorithm Assume only two threads (tid = 0, 1) and use just loads and stores int turn = 0; // shared across threads PER LOCK Boolean lock[2] = {false, false; // shared PER LOCK Void acquire() { lock[tid] = true; turn = 1-tid; Example of spin-lock while (lock[1-tid] && turn == 1-tid) /* wait */ ; Void release() { lock[tid] = false; Different Cases: All work Only thread 0 wants lock initially Lock[0] = true; turn = 1; while (lock[1] && turn ==1) ; In critical section Lock[1] = true; turn = 0; lock[0] = false; while (lock[0] && turn == 0) while (lock[0] && turn == 0) ; 12

Different Cases: All work Thread 0 and thread 1 both try to acquire lock at same time Lock[0] = true; turn = 1; Lock[1] = true; turn = 0; while (lock[1] && turn ==1) ; Finish critical section lock[0] = false; while (lock[0] && turn == 0) while (lock[0] && turn == 0) ; Different Cases: All Work Thread 0 and thread 1 both want lock Lock[0] = true; turn = 1; Lock[1] = true; turn = 0; while (lock[1] && turn ==1) while (lock[0] && turn == 0) ; 13

Different Cases: All Work Thread 0 and thread 1 both want lock; Lock[0] = true; turn = 1; Lock[1] = true; while (lock[1] && turn ==1) turn = 0; while (lock[0] && turn == 0) while (lock[1] && turn ==1) ; Peterson s Algorithm: Intuition Mutual exclusion: Enter critical section if and only if Other thread does not want to enter OR Other thread wants to enter, but your turn (only 1 turn) Progress: Both threads cannot wait forever at while() loop Completes if other process does not want to enter Other process (matching turn) will eventually finish Bounded waiting (not shown in examples) Each process waits at most one critical section (because turn given to other) Problem: doesn t work on modern hardware (doesn t provide sequential consistency due to caching) 14

Implementing Synchronization To implement, need atomic operations Atomic operation: No other instructions can be interleaved Examples of atomic operations Code between interrupts on uniprocessors Disable timer interrupts, don t do any I/O Loads and stores of words Load r1, B Store r1, A Special hw instructions Test&Set Compare&Swap xchg: atomic exchange, or test-and-set // xchg(int *addr, int newval) // ATOMICALLY return what was pointed to by addr // AT THE SAME TIME, store newval into addr int xchg(int *addr, int newval) { int old = *addr; *addr = newval; return old; Need hardware support static inline uint xchg(volatile unsigned int *addr, unsigned int newval) { uint result; asm volatile("lock; xchgl %0, %1" : "+m" (*addr), "=a" (result) : "1" (newval) : "cc"); return result; 15

LOCK Implementation with XCHG typedef struct lock_t { int flag; lock_t; void init(lock_t *lock) { lock->flag =??; void acquire(lock_t *lock) {????; int xchg(int *addr, int newval) // spin-wait (do nothing) void release(lock_t *lock) { lock->flag =??; XCHG Implementation typedef struct lock_t { int flag; lock_t; void init(lock_t *lock) { lock->flag = 0; void acquire(lock_t *lock) { while(xchg(&lock->flag, 1) == 1) ; // spin-wait (do nothing) void release(lock_t *lock) { lock->flag = 0; Example of spin-lock 16

DEMO: XCHG Critical section protected with our lock implementation!! Main-thread-5.c Break 17

Other Atomic HW Instructions int CompareAndSwap(int *addr, int expected, int new) { int actual = *addr; if (actual == expected) *addr = new; return actual; Example of spin-lock void acquire(lock_t *lock) { while(compareandswap(&lock->flag,?,?) ==?) ; // spin-wait (do nothing) Other Atomic HW Instructions int CompareAndSwap(int *ptr, int expected, int new) { int actual = *addr; if (actual == expected) *addr = new; return actual; void acquire(lock_t *lock) { while(compareandswap(&lock->flag, 0, 1) == 1) ; // spin-wait (do nothing) 18

Lock Implementation Goals Correctness Mutual exclusion Only one thread in critical section at a time Progress (deadlock-free) If several simultaneous requests, must allow one to proceed Bounded (starvation-free) Must eventually allow each waiting thread to enter eventually Fairness Each thread waits in some determined ordered Performance CPU is not used unnecessarily Basic Spinlocks are Unfair unlock lock unlock lock unlock lock unlock lock lock spin spin spin spin A B A B A B A B 0 20 40 60 80 100 120 140 160 Scheduler is independent of locks/unlocks 19

Fairness: Ticket Locks Idea: reserve each thread s turn to use a lock. Each thread spins until their turn. Use new atomic primitive, fetch-and-add: int FetchAndAdd(int *ptr) { int old = *ptr; *ptr = old + 1; return old; Acquire: Grab ticket; Wait while not thread s ticket!= turn Release: Advance to next turn Ticket Lock ExampLE A lock(): B lock(): C lock(): A unlock(): B runs A lock(): B unlock(): C runs C unlock(): A runs A unlock(): C lock(): Ticket 0 1 2 3 4 5 6 7 Turn 20

Ticket Lock ExampLE A lock(): gets ticket 0, spins until turn = 0 àruns B lock(): gets ticket 1, spins until turn=1 C lock(): gets ticket 2, spins until turn=2 A unlock(): turn++ (turn = 1) B runs A lock(): gets ticket 3, spins until turn=3 B unlock(): turn++ (turn = 2) C runs C unlock(): turn++ (turn = 3) A runs A unlock(): turn++ (turn = 4) C lock(): gets ticket 4, runs 0 1 2 3 4 5 6 7 Ticket Lock Implementation typedef struct lock_t { int ticket; int turn; void lock_init(lock_t *lock) { lock->ticket = 0; lock->turn = 0; void acquire(lock_t *lock) { int myturn = FAA(&lock->ticket); while (lock->turn!= myturn); // spin void release (lock_t *lock) { FAA(&lock->turn); 21

Ticket Lock typedef struct lock_t { int ticket; int turn; void acquire(lock_t *lock) { int myturn = FAA(&lock->ticket); while(lock->turn!= myturn) yield(); // spin void lock_init(lock_t *lock) { lock->ticket = 0; lock->turn = 0; void release (lock_t *lock) { lock->turn++; FAA() used in textbook à conservative Try this modification in Homework simulations Spinlock Performance Fast when - many CPUs - locks held a short time - advantage: avoid context switch Slow when - one CPU - locks held a long time - disadvantage: spinning is wasteful 22

CPU Scheduler is Ignorant lock unlock lock spin spin spin spin spin A B C D A B C D 0 20 40 60 80 100 120 140 160 CPU scheduler may run B instead of A even though B is waiting for A Ticket Lock with Yield() typedef struct lock_t { void acquire(lock_t *lock) { int ticket; int myturn = FAA(&lock->ticket); int turn; while(lock->turn!= myturn) yield(); void lock_init(lock_t *lock) { lock->ticket = 0; lock->turn = 0; void release (lock_t *lock) { FAA(&lock->turn); Remember: yield() voluntarily relinquishes CPU for remainder of timeslice, but process remains READY 23

Yield Instead of Spin lock unlock lock spin spin spin spin spin no yield: A B C D A B C D 0 20 40 60 80 100 120 140 160 lock unlock lock yield: A A B 0 20 40 60 80 100 120 140 160 Spinlock Performance Waste Without yield: O(threads * time_slice) With yield: O(threads * context_switch) So even with yield, spinning is slow with high thread contention Next improvement: Block and put thread on waiting queue instead of spinning 24