CS 31: Introduction to Computer Systems : Threads & Synchronization April 16-18, 2019

Similar documents
CS 31: Intro to Systems Threading & Parallel Applications. Kevin Webb Swarthmore College November 27, 2018

Why do we care about parallel?

Synchronization. Before We Begin. Synchronization. Credit/Debit Problem: Race Condition. CSE 120: Principles of Operating Systems.

Synchronization. Before We Begin. Synchronization. Example of a Race Condition. CSE 120: Principles of Operating Systems. Lecture 4.

CSE 120: Principles of Operating Systems. Lecture 4. Synchronization. October 7, Prof. Joe Pasquale

Threads and Synchronization. Kevin Webb Swarthmore College February 15, 2018

CS 153 Design of Operating Systems Winter 2016

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Steve Gribble. Synchronization. Threads cooperate in multithreaded programs

Synchronization I. Jo, Heeseung

Dealing with Issues for Interprocess Communication

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Hank Levy 412 Sieg Hall

CSE 153 Design of Operating Systems

Lecture 5: Synchronization w/locks

Recap: Thread. What is it? What does it need (thread private)? What for? How to implement? Independent flow of control. Stack

CSE 153 Design of Operating Systems Fall 2018

Operating Systems. Lecture 4 - Concurrency and Synchronization. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

Operating Systems. Synchronization

Synchronization. Disclaimer: some slides are adopted from the book authors slides with permission 1

Synchronization I. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Synchronization. CS61, Lecture 18. Prof. Stephen Chong November 3, 2011

Concurrency Control. Synchronization. Brief Preview of Scheduling. Motivating Example. Motivating Example (Cont d) Interleaved Schedules

Operating Systems. Sina Meraji U of T

10/17/ Gribble, Lazowska, Levy, Zahorjan 2. 10/17/ Gribble, Lazowska, Levy, Zahorjan 4

EECE.4810/EECE.5730: Operating Systems Spring 2017 Homework 2 Solution

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

Chapter 6: Process [& Thread] Synchronization. CSCI [4 6] 730 Operating Systems. Why does cooperation require synchronization?

Chapter 6: Process Synchronization

Interprocess Communication By: Kaushik Vaghani

Advance Operating Systems (CS202) Locks Discussion

! Why is synchronization needed? ! Synchronization Language/Definitions: ! How are locks implemented? Maria Hybinette, UGA

Chapter 6: Synchronization. Chapter 6: Synchronization. 6.1 Background. Part Three - Process Coordination. Consumer. Producer. 6.

Thread. Disclaimer: some slides are adopted from the book authors slides with permission 1

Operating Systems. Designed and Presented by Dr. Ayman Elshenawy Elsefy

Synchronization for Concurrent Tasks

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

MULTITHREADING AND SYNCHRONIZATION. CS124 Operating Systems Fall , Lecture 10

Lecture. DM510 - Operating Systems, Weekly Notes, Week 11/12, 2018

Chapter 6: Process Synchronization

CSE332: Data Abstractions Lecture 22: Shared-Memory Concurrency and Mutual Exclusion. Tyler Robison Summer 2010

SYNCHRONIZATION M O D E R N O P E R A T I N G S Y S T E M S R E A D 2. 3 E X C E P T A N D S P R I N G 2018

What is the Race Condition? And what is its solution? What is a critical section? And what is the critical section problem?

Multiprocessors and Locking

IT 540 Operating Systems ECE519 Advanced Operating Systems

CPSC/ECE 3220 Fall 2017 Exam Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts.

Programmazione di sistemi multicore

Interprocess Communication and Synchronization

CS533 Concepts of Operating Systems. Jonathan Walpole

Synchronization Spinlocks - Semaphores

THREADS: (abstract CPUs)

CS-537: Midterm Exam (Spring 2001)

Review: Easy Piece 1

CSCI [4 6] 730 Operating Systems. Example Execution. Process [& Thread] Synchronization. Why does cooperation require synchronization?

CIS Operating Systems Synchronization based on Busy Waiting. Professor Qiang Zeng Spring 2018

ECE 574 Cluster Computing Lecture 8

PROCESS SYNCHRONIZATION

Process Management And Synchronization

Chapter 5: Process Synchronization. Operating System Concepts 9 th Edition

Chapter 5: Process Synchronization. Operating System Concepts Essentials 2 nd Edition

Part II Process Management Chapter 6: Process Synchronization

Introduction to OS Synchronization MOS 2.3

Chapter 6: Process Synchronization. Module 6: Process Synchronization

Concurrency: Locks. Announcements

Operating Systems. Operating Systems Sina Meraji U of T

UNIX Input/Output Buffering

Systèmes d Exploitation Avancés

CSE 332: Data Structures & Parallelism Lecture 17: Shared-Memory Concurrency & Mutual Exclusion. Ruth Anderson Winter 2019

Synchronising Threads

CS420: Operating Systems. Process Synchronization

Synchronization Principles

Module 6: Process Synchronization. Operating System Concepts with Java 8 th Edition

Locks. Dongkun Shin, SKKU

A Sophomoric Introduction to Shared-Memory Parallelism and Concurrency Lecture 4 Shared-Memory Concurrency & Mutual Exclusion

Synchronization. CS 475, Spring 2018 Concurrent & Distributed Systems

CHAPTER 6: PROCESS SYNCHRONIZATION

Chapter 6: Synchronization. Operating System Concepts 8 th Edition,

Parallel Programming Languages COMP360

Programming in Parallel COMP755

Concurrency, Mutual Exclusion and Synchronization C H A P T E R 5

CSE 374 Programming Concepts & Tools

Synchronization. Dr. Yingwu Zhu

Chapter 5: Process Synchronization. Operating System Concepts 9 th Edition

Lesson 6: Process Synchronization

[537] Locks. Tyler Harter

Synchronization. CSCI 3753 Operating Systems Spring 2005 Prof. Rick Han

CSE Traditional Operating Systems deal with typical system software designed to be:

CS370 Operating Systems

CS4411 Intro. to Operating Systems Exam 1 Fall points 9 pages

Midterm Exam Amy Murphy 19 March 2003

Synchronization COMPSCI 386

The concept of concurrency is fundamental to all these areas.

CS510 Advanced Topics in Concurrency. Jonathan Walpole

Concurrency: Mutual Exclusion (Locks)

CS 537 Lecture 11 Locks

CSCI 447 Operating Systems Filip Jagodzinski

Background. The Critical-Section Problem Synchronisation Hardware Inefficient Spinning Semaphores Semaphore Examples Scheduling.

Memory Management. Kevin Webb Swarthmore College February 27, 2018

Semaphores. Otto J. Anshus University of {Tromsø, Oslo}

Learning Outcomes. Concurrency and Synchronisation. Textbook. Concurrency Example. Inter- Thread and Process Communication. Sections & 2.

CS 5523 Operating Systems: Midterm II - reivew Instructor: Dr. Tongping Liu Department Computer Science The University of Texas at San Antonio

Midterm Exam Amy Murphy 6 March 2002

Transcription:

CS 31: Introduction to Computer Systems 22-23: Threads & Synchronization April 16-18, 2019

Making Programs Run Faster We all like how fast computers are In the old days (1980 s - 2005): Algorithm too slow? Wait for HW to catch up. Modern CPUs exploit parallelism for speed: Executes multiple instructions at once Reorders instructions on the fly

Processor Design Trends Transistors (*10^3) Clock Speed (MHZ) Power (W) ILP (IPC) Instruction Level Parallelism From Herb Sutter, Dr. Dobbs Journal

The Multi-Core Era Today, can t make a single core go much faster. Limits on clock speed, heat, energy consumption Use extra transistors to put multiple CPU cores on the chip. Exciting: CPU capable of doing a lot more! Problem: up to the programmer to take advantage of multiple cores Humans bad at thinking in parallel

Parallel Abstraction To speed up a job, must divide it across multiple cores. A process contains both execution information and memory/resources. What if we want to separate the execution information to give us parallelism in our programs?

Which components of a process might we replicate to take advantage of multiple CPU cores? A. The entire address space (memory) B. Parts of the address space (memory) C. OS resources (open files, etc.) D. Execution state (PC, registers, etc.) E. More than one of these (which?)

Which components of a process might we replicate to take advantage of multiple CPU cores? A. The entire address space (memory not duplicated) B. Parts of the address space (memory - stack) C. OS resources (open files, etc not duplicated.) D. Execution state (PC, registers, etc.) E. More than one of these (which?)

Threads Modern OSes separate the concepts of processes and threads. The process defines the address space and general process attributes (e.g., open files) The thread defines a sequential execution stream within a process (PC, SP, registers) A thread is bound to a single process Processes, however, can have multiple threads Each process has at least one thread (e.g. main)

Threads Process 1 Thread 1 PC1 OS This is the picture we ve been using all along: SP1 Text Data A process with a single thread, which has execution state (registers) and a stack. Heap Stack 1

Thread 1 PC1 Threads Process 1 OS Text We can add a thread to the process. New threads share all memory (VAS) with other threads. New thread gets private registers, local stack. SP1 PC2 Data Heap Thread 2 SP2 Stack 2 Stack 1

Threads A third thread added. Thread 1 PC1 Process 1 OS Text Note: they re all executing the same program (shared instructions in text), though they may be at different points in the code. SP1 PC2 Data Heap Thread 2 PC3 Stack 3 SP2 Thread 3 Stack 2 SP3 Stack 1

Why Use Threads? Separating threads and processes makes it easier to support parallel applications: Creating multiple paths of execution does not require creating new processes (less state to store, initialize Light Weight Process ) Low-overhead sharing between threads in same process (threads share page tables, access same memory) Concurrency (multithreading) can be very useful

Concurrency? Several computations or threads of control are executing simultaneously, and potentially interacting with each other. We can multitask! Why does that help? Taking advantage of multiple CPUs / cores Overlapping I/O with computation Improving program structure

Recall: Processes Process 1 Process 2 Process n Text Text Text Data Data Data Stack Stack Stack System Calls fork read write Kernel System Management Context Switching Scheduling

Scheduling Threads We have basically two options 1. Kernel explicitly selects among threads in a process 2. Hide threads from the kernel, and have a user-level scheduler inside each multi-threaded process Why do we care? Think about the overhead of switching between threads Who decides which thread in a process should go first? What about blocking system calls?

User-Level Threads Process 1 Process 2 Process n Text Thread C/S + Sched Data Text Thread C/S + Sched Data Text Thread C/S + Sched Data Library divides stack region Stack Stack Stack System Calls fork read write Kernel Process Context Switching Process Scheduling System Management Threads are invisible to the kernel

Kernel-Level Threads Process 1 Process 2 Process n Text Data Stack 1 Stack 2 Stack 3 System Calls fork read write Text Data Stack 1 Stack 2 Kernel Thread Context Switching Text Data Stack 1 Thread + Process Scheduling System Management Kernel Context switching over threads Each process has explicitly mapped regions for stacks

If you call thread_create() on a modern OS (Linux/Mac/Windows), which type of thread would you expect to receive? (Why? Which would you pick?) A. Kernel threads B. User threads C. Some other sort of threads

Kernel vs. User Threads Kernel-level threads Integrated with OS (informed scheduling) Slower to create, manipulate, synchronize Requires getting the OS involved, which means changing context (relatively expensive) User-level threads Faster to create, manipulate, synchronize Not integrated with OS (uninformed scheduling) If one thread makes a syscall, all of them get blocked because the OS doesn t distinguish.

Threads & Sharing Code (text) shared by all threads in process Global variables and static objects are shared Stored in the static data segment, accessible by any thread Dynamic objects and other heap objects are shared Allocated from heap with malloc/free or new/delete Local variables should not be shared Refer to data on the stack Each thread has its own stack Never pass/share/store a pointer to a local variable on another thread s stack!!

Threads & Sharing Local variables should not be shared Refer to data on the stack Each thread has its own stack Never pass/share/store a pointer to a local variable on another thread s stack Function B returns Shared Heap int *x; Thread 2 can dereference x to access Z. function B function A Thread 1 s stack Z function D function C Thread 2 s stack

Threads & Sharing Local variables should not be shared Refer to data on the stack Each thread has its own stack Never pass/share/store a pointer to a local variable on another thread s stack function B function A Thread 1 s stack Shared Heap int *x; Z Shared data on heap! Thread 2 can dereference x to access Z. function D function C Thread 2 s stack

Thread-level Parallelism Speed up application by assigning portions to CPUs/cores that process in parallel Requires: partitioning responsibilities (e.g., parallel algorithm) managing their interaction Example: game of life (next lab) One core: Three cores:

If one CPU core can run a program at a rate of X, how quickly will the program run on two cores? Why? A. Slower than one core (<X) B. The same speed (X) C. Faster than one core, but not double (X-2X) D. Twice as fast (2X) E. More than twice as fast(>2x)

If one CPU core can run a program at a rate of X, how quickly will the program run on two cores? Why? A. Slower than one core (<X) B. The same speed (X) C. Faster than one core, but not double (X-2X) D. Twice as fast (2X) E. More than twice as fast(>2x)

Parallel Speedup Performance benefit of parallel threads depends on many factors: algorithm divisibility communication overhead memory hierarchy and locality implementation quality For most programs, more threads means more communication, diminishing returns.

Summary Physical limits to how much faster we can make a single core run. Use transistors to provide more cores. Parallelize applications to take advantage. OS abstraction: thread Shares most of the address space with other threads in same process Gets private execution context (registers) + stack Coordinating threads is challenging!

Recap To speed up a job, must divide it across multiple cores. Thread: abstraction for execution within process. Threads share process memory. Threads may need to communicate to achieve goal Thread communication: To solve task (e.g., neighbor GOL cells) To prevent bad interactions (synchronization)

Synchronization Synchronize: to (arrange events to) happen at same time (or ensure that they don t) Thread synchronization When one thread has to wait for another Events in threads that occur at the same time Uses of synchronization Prevent race conditions Wait for resources to become available

Synchronization Example One core: Four cores: Coordination required: Threads in different regions must work together to compute new value for boundary cells. Threads might not run at the same speed (depends on the OS scheduler). Can t let one region get too far ahead.

Thread Ordering (Why threads require care. Humans aren t good at reasoning about this.) As a programmer you have no idea when threads will run. The OS schedules them, and the schedule will vary across runs. It might decide to context switch from one thread to another at any time. Your code must be prepared for this! Ask yourself: Would something bad happen if we context switched here?

Example: The Credit/Debit Problem Say you have $1000 in your bank account You deposit $100 You also withdraw $100 How much should be in your account? What if your deposit and withdrawal occur at the same time, at different ATMs?

Credit/Debit Problem: Race Condition Thread T 0 Credit (int a) { int b; b = ReadBalance (); b = b + a; WriteBalance (b); Thread T 1 Debit (int a) { int b; b = ReadBalance (); b = b - a; WriteBalance (b); PrintReceipt (b); PrintReceipt (b);

Credit/Debit Problem: Race Condition Say T 0 runs first Read $1000 into b Thread T 0 Credit (int a) { int b; b = ReadBalance (); b = b + a; WriteBalance (b); Thread T 1 Debit (int a) { int b; b = ReadBalance (); b = b - a; WriteBalance (b); PrintReceipt (b); PrintReceipt (b);

Credit/Debit Problem: Race Condition Say T 0 runs first Read $1000 into b Thread T 0 Credit (int a) { int b; b = ReadBalance (); b = b + a; WriteBalance (b); Switch to T 1 Read $1000 into b Debit by $100 Write $900 Thread T 1 Debit (int a) { int b; b = ReadBalance (); b = b - a; WriteBalance (b); PrintReceipt (b); PrintReceipt (b);

Credit/Debit Problem: Race Condition Say T 0 runs first Read $1000 into b Thread T 0 Credit (int a) { int b; b = ReadBalance (); b = b + a; WriteBalance (b); Switch to T 1 Read $1000 into b Debit by $100 Write $900 Thread T 1 Debit (int a) { int b; b = ReadBalance (); b = b - a; WriteBalance (b); PrintReceipt (b); PrintReceipt (b); Switch back to T 0 Read $1000 into b Credit $100 Write $1100 Bank gave you $100! What went wrong?

Critical Section Thread T 0 Credit (int a) { int b; b = ReadBalance (); b = b + a; WriteBalance (b); Badness if context switch here! Thread T 1 Debit (int a) { int b; b = ReadBalance (); b = b - a; WriteBalance (b); PrintReceipt (b); PrintReceipt (b); Bank gave you $100! What went wrong?

To Avoid Race Conditions Thread 0 ------------ ------------ - Critical - - Section - ------------ ------------ Thread 1 ------------ ------------ - Critical - - Section - ------------ ------------ 1. Identify critical sections 2. Use synchronization to enforce mutual exclusion Only one thread active in a critical section

What Are Critical Sections? Sections of code executed by multiple threads Access shared variables, often making local copy Places where order of execution or thread interleaving will affect the outcome Must run atomically with respect to each other Atomicity: runs as an entire unit or not at all. Cannot be divided into smaller parts.

Which code region is a critical section? Thread A main () { int a,b; Thread B main () { int a,b; A B a = getshared(); b = 10; a = a + b; saveshared(a); C D E a = getshared(); b = 20; a = a - b; saveshared(a); a += 1 a += 1 return a; shared memory return a; s = 40;

Which code region is a critical section? Thread A main () { int a,b; a = getshared(); b = 10; a = a + b; saveshared(a); a += 1 Thread B main () { int a,b; a = getshared(); b = 20; a = a - b; saveshared(a); a += 1 return a; shared memory return a; s = 40;

Which values might the shared s variable hold after both threads finish? Thread A main () { int a,b; a = getshared(); b = 10; a = a + b; saveshared(a); Thread B main () { int a,b; a = getshared(); b = 20; a = a - b; saveshared(a); return a; shared memory return a; A. 30 B. 20 or 30 C. 20, 30, or 50 D. Another set of values s = 40;

If A runs first Thread A main () { int a,b; a = getshared(); b = 10; a = a + b; saveshared(a); Thread B main () { int a,b; a = getshared(); b = 20; a = a - b; saveshared(a); return a; shared memory return a; s = 50;

B runs after A Completes Thread A main () { int a,b; a = getshared(); b = 10; a = a + b; saveshared(a); Thread B main () { int a,b; a = getshared(); b = 20; a = a - b; saveshared(a); return a; shared memory return a; s = 30;

What about interleaving? Thread A main () { int a,b; a = getshared(); b = 10; a = a + b; saveshared(a); Thread B main () { int a,b; a = getshared(); b = 20; a = a - b; saveshared(a); return a; shared memory return a; s = 40;

Is there a race condition? Suppose count is a global variable, multiple threads increment it: count++; A. Yes, there s a race condition (count++ is a critical section). B. No, there s no race condition (count++ is not a critical section). C. Cannot be determined How about if compiler implements it as: movl (%edx), %eax // read count value addl $1, %eax // modify value movl %eax, (%edx) // write count How about if compiler implements it as: incl (%edx) // increment value

Four Rules for Mutual Exclusion 1. No two threads can be inside their critical sections at the same time. 2. No thread outside its critical section may prevent others from entering their critical sections. 3. No thread should have to wait forever to enter its critical section. (Starvation) 4. No assumptions can be made about speeds or number of CPU s.

How to Achieve Mutual Exclusion? < entry code > < critical section > < exit code > < entry code > < critical section > < exit code > Surround critical section with entry/exit code Entry code should act as a gate If another thread is in critical section, block Otherwise, allow thread to proceed Exit code should release other entry gates

Possible Solution: Spin Lock? shared int lock = OPEN; T 0 while (lock == CLOSED); lock = CLOSED; < critical section > lock = OPEN; T 1 while (lock == CLOSED); lock = CLOSED; < critical section > lock = OPEN; Lock indicates whether any thread is in critical section. Note: While loop has no body. Keeps checking the condition as quickly as possible until it becomes false. (It spins )

Possible Solution: Spin Lock? shared int lock = OPEN; T 0 while (lock == CLOSED); lock = CLOSED; < critical section > lock = OPEN; T 1 while (lock == CLOSED); lock = CLOSED; < critical section > lock = OPEN; Lock indicates whether any thread is in critical section. Is there a problem here? A: Yes, this is broken. B: No, this ought to work.

Possible Solution: Spin Lock? shared int lock = OPEN; T 0 while (lock == CLOSED); lock = CLOSED; < critical section > lock = OPEN; T 1 while (lock == CLOSED); lock = CLOSED; < critical section > lock = OPEN; What if a context switch occurs at this point?

Possible Solution: Take Turns? shared int turn = T 0 ; T 0 while (turn!= T 0 ); < critical section > turn = T 1 ; T 1 while (turn!= T 1 ); < critical section > turn = T 0 ; Alternate which thread can enter critical section Is there a problem? A: Yes, this is broken. B: No, this ought to work.

Possible Solution: Take Turns? shared int turn = T 0 ; T 0 while (turn!= T 0 ); < critical section > turn = T 1 ; T 1 while (turn!= T 1 ); < critical section > turn = T 0 ; Rule #2: No thread outside its critical section may prevent others from entering their critical sections.

Possible Solution: State Intention? shared boolean flag[2] = {FALSE, FALSE; T 0 flag[t 0 ] = TRUE; while (flag[t 1 ]); < critical section > flag[t 0 ] = FALSE; T 1 flag[t 1 ] = TRUE; while (flag[t 0 ]); < critical section > flag[t 1 ] = FALSE; Each thread states it wants to enter critical section Is there a problem? A: Yes, this is broken. B: No, this ought to work.

Possible Solution: State Intention? shared boolean flag[2] = {FALSE, FALSE; T 0 flag[t 0 ] = TRUE; while (flag[t 1 ]); < critical section > flag[t 0 ] = FALSE; T 1 flag[t 1 ] = TRUE; while (flag[t 0 ]); < critical section > flag[t 1 ] = FALSE; What if threads context switch between these two lines? Rule #3: No thread should have to wait forever to enter its critical section.

Peterson s Solution shared int turn; shared boolean flag[2] = {FALSE, FALSE; T 0 flag[t 0 ] = TRUE; turn = T 1 ; while (flag[t 1 ] && turn==t 1 ); < critical section > flag[t 0 ] = FALSE; T 1 flag[t 1 ] = TRUE; turn = T 0 ; while (flag[t 0 ] && turn==t 0 ); < critical section > flag[t 1 ] = FALSE; If there is competition, take turns; otherwise, enter Is there a problem? A: Yes, this is broken. B: No, this ought to work.

Spinlocks are Wasteful If a thread is spinning on a lock, it s using the CPU without making progress. Single-core system, prevents lock holder from executing. Multi-core system, waste core time when something else could be running. Ideal: thread can t enter critical section? Schedule something else. Consider it blocked.

Atomicity How do we get away from having to know about all other interested threads? The implementation of acquiring/releasing critical section must be atomic. An atomic operation is one which executes as though it could not be interrupted Code that executes all or nothing How do we make them atomic? Atomic instructions (e.g., test-and-set, compare-and-swap) Allows us to build semaphore OS abstraction

Semaphores Semaphore: OS synchronization variable Has integer value List of waiting threads Works like a gate If sem > 0, gate is open Value equals number of threads that can enter Else, gate is closed Possibly with waiting threads sem = 3 sem = 2 sem = 1 critical section sem = 0

Semaphores Associated with each semaphore is a queue of waiting threads When wait() is called by a thread: If semaphore is open, thread continues If semaphore is closed, thread blocks on queue Then signal() opens the semaphore: If a thread is waiting on the queue, the thread is unblocked If no threads are waiting on the queue, the signal is remembered for the next thread

Semaphore Operations sem s = n; // declare and initialize wait (sem s) // Executes atomically decrement s; if s < 0, block thread (and associate with s); signal (sem s) // Executes atomically increment s; if blocked threads, unblock (any) one of them;

Semaphore Operations sem s = n; // declare and initialize wait (sem s) // Executes atomically decrement s; if s < 0, block thread (and associate with s); signal (sem s) // Executes atomically increment s; if blocked threads, unblock (any) one of them; Based on what you know about semaphores, should a process be able to check beforehand whether wait(s) will cause it to block? A. Yes, it should be able to check. B. No, it should not be able to check.

Semaphore Operations sem s = n; // declare and initialize wait (sem s) decrement s; if s < 0, block thread (and associate with s); signal (sem s) increment s; if blocked threads, unblock (any) one of them; No other operations allowed In particular, semaphore s value can t be tested! No thread can tell the value of s

Mutual Exclusion with Semaphores sem mutex = 1; T 0 wait (mutex); < critical section > signal (mutex); T 1 wait (mutex); < critical section > signal (mutex); Use a mutex semaphore initialized to 1 Only one thread can enter critical section Simple, works for any number of threads Is there any busy-waiting?

Locking Abstraction One way to implement critical sections is to lock the door on the way in, and unlock it again on the way out Typically exports nicer interface for semaphores in user space A lock is an object in memory providing two operations acquire()/lock(): before entering the critical section release()/unlock(): after leaving a critical section Threads pair calls to acquire() and release() Between acquire()/release(), the thread holds the lock acquire() does not return until any previous holder releases What can happen if the calls are not paired?

Using Locks Thread A main () { int a,b; Thread B main () { int a,b; a = getshared(); b = 10; a = a + b; saveshared(a); a = getshared(); b = 20; a = a - b; saveshared(a); return a; shared memory return a; s = 40;

Using Locks Thread A main () { int a,b; Thread B main () { int a,b; acquire(l); a = getshared(); b = 10; a = a + b; saveshared(a); release(l); acquire(l); a = getshared(); b = 20; a = a - b; saveshared(a); release(l); return a; shared memory return a; s = 40; Lock l; Held by: Nobody

Using Locks Thread A main () { int a,b; Thread B main () { int a,b; acquire(l); a = getshared(); b = 10; a = a + b; saveshared(a); release(l); acquire(l); a = getshared(); b = 20; a = a - b; saveshared(a); release(l); return a; shared memory return a; s = 40; Lock l; Held by: Thread A

Using Locks Thread A main () { int a,b; Thread B main () { int a,b; acquire(l); a = getshared(); b = 10; a = a + b; saveshared(a); release(l); acquire(l); a = getshared(); b = 20; a = a - b; saveshared(a); release(l); return a; shared memory return a; s = 40; Lock l; Held by: Thread A

Thread A main () { int a,b; Using Locks Thread B main () { int a,b; Lock already owned. Must Wait! acquire(l); a = getshared(); b = 10; a = a + b; saveshared(a); release(l); acquire(l); a = getshared(); b = 20; a = a - b; saveshared(a); release(l); return a; shared memory return a; s = 40; Lock l; Held by: Thread A

Using Locks Thread A main () { int a,b; Thread B main () { int a,b; acquire(l); a = getshared(); b = 10; a = a + b; saveshared(a); release(l); acquire(l); a = getshared(); b = 20; a = a - b; saveshared(a); release(l); return a; shared memory return a; s = 50; Lock l; Held by: Nobody

Using Locks Thread A main () { int a,b; Thread B main () { int a,b; acquire(l); a = getshared(); b = 10; a = a + b; saveshared(a); release(l); acquire(l); a = getshared(); b = 20; a = a - b; saveshared(a); release(l); return a; shared memory return a; s = 30; Lock l; Held by: Thread B

Using Locks Thread A main () { int a,b; Thread B main () { int a,b; acquire(l); a = getshared(); b = 10; a = a + b; saveshared(a); release(l); acquire(l); a = getshared(); b = 20; a = a - b; saveshared(a); release(l); return a; shared memory return a; s = 30; Lock l; Held by: Nobody No matter how we order threads or when we context switch, result will always be 30, like we expected (and probably wanted).

Summary We have no idea when OS will schedule or context switch our threads. Code must be prepared, tough to reason about. Threads often must synchronize To safely communicate / transfer data, without races Synchronization primitives help programmers Kernel-level semaphores: limit # of threads that can do something, provides atomicity User-level locks: built upon semaphore, provides mutual exclusion (usually part of thread library)

Recap To speed up a job, must divide it across multiple cores. Thread: abstraction for execution within process. Threads share process memory. Threads may need to communicate to achieve goal Thread communication: To solve task (e.g., neighbor GOL cells) To prevent bad interactions (synchronization)

Synchronization Synchronize: to (arrange events to) happen at same time (or ensure that they don t) Thread synchronization When one thread has to wait for another Events in threads that occur at the same time Uses of synchronization Prevent race conditions Wait for resources to become available

Synchronization Example One core: Four cores: Coordination required: Threads in different regions must work together to compute new value for boundary cells. Threads might not run at the same speed (depends on the OS scheduler). Can t let one region get too far ahead.

Thread Ordering (Why threads require care. Humans aren t good at reasoning about this.) As a programmer you have no idea when threads will run. The OS schedules them, and the schedule will vary across runs. It might decide to context switch from one thread to another at any time. Your code must be prepared for this! Ask yourself: Would something bad happen if we context switched here?

Example: The Credit/Debit Problem Say you have $1000 in your bank account You deposit $100 You also withdraw $100 How much should be in your account? What if your deposit and withdrawal occur at the same time, at different ATMs?

Credit/Debit Problem: Race Condition Thread T 0 Credit (int a) { int b; b = ReadBalance (); b = b + a; WriteBalance (b); Thread T 1 Debit (int a) { int b; b = ReadBalance (); b = b - a; WriteBalance (b); PrintReceipt (b); PrintReceipt (b);

Credit/Debit Problem: Race Condition Say T 0 runs first Read $1000 into b Thread T 0 Credit (int a) { int b; b = ReadBalance (); b = b + a; WriteBalance (b); Thread T 1 Debit (int a) { int b; b = ReadBalance (); b = b - a; WriteBalance (b); PrintReceipt (b); PrintReceipt (b);

Credit/Debit Problem: Race Condition Say T 0 runs first Read $1000 into b Thread T 0 Credit (int a) { int b; b = ReadBalance (); b = b + a; WriteBalance (b); Switch to T 1 Read $1000 into b Debit by $100 Write $900 Thread T 1 Debit (int a) { int b; b = ReadBalance (); b = b - a; WriteBalance (b); PrintReceipt (b); PrintReceipt (b);

Credit/Debit Problem: Race Condition Say T 0 runs first Read $1000 into b Thread T 0 Credit (int a) { int b; b = ReadBalance (); b = b + a; WriteBalance (b); Switch to T 1 Read $1000 into b Debit by $100 Write $900 Thread T 1 Debit (int a) { int b; b = ReadBalance (); b = b - a; WriteBalance (b); PrintReceipt (b); PrintReceipt (b); Switch back to T 0 Read $1000 into b Credit $100 Write $1100 Bank gave you $100! What went wrong?

Critical Section Thread T 0 Credit (int a) { int b; b = ReadBalance (); b = b + a; WriteBalance (b); Badness if context switch here! Thread T 1 Debit (int a) { int b; b = ReadBalance (); b = b - a; WriteBalance (b); PrintReceipt (b); PrintReceipt (b); Bank gave you $100! What went wrong?

To Avoid Race Conditions Thread 0 ------------ ------------ - Critical - - Section - ------------ ------------ Thread 1 ------------ ------------ - Critical - - Section - ------------ ------------ 1. Identify critical sections 2. Use synchronization to enforce mutual exclusion Only one thread active in a critical section

What Are Critical Sections? Sections of code executed by multiple threads Access shared variables, often making local copy Places where order of execution or thread interleaving will affect the outcome Must run atomically with respect to each other Atomicity: runs as an entire unit or not at all. Cannot be divided into smaller parts.

Which code region is a critical section? Thread A main () { int a,b; Thread B main () { int a,b; A B a = getshared(); b = 10; a = a + b; saveshared(a); C D E a = getshared(); b = 20; a = a - b; saveshared(a); a += 1 a += 1 return a; shared memory return a; s = 40;

Which code region is a critical section? Thread A main () { int a,b; a = getshared(); b = 10; a = a + b; saveshared(a); a += 1 Thread B main () { int a,b; a = getshared(); b = 20; a = a - b; saveshared(a); a += 1 return a; shared memory return a; s = 40;

Which values might the shared s variable hold after both threads finish? Thread A main () { int a,b; a = getshared(); b = 10; a = a + b; saveshared(a); Thread B main () { int a,b; a = getshared(); b = 20; a = a - b; saveshared(a); return a; shared memory return a; A. 30 B. 20 or 30 C. 20, 30, or 50 D. Another set of values s = 40;

If A runs first Thread A main () { int a,b; a = getshared(); b = 10; a = a + b; saveshared(a); Thread B main () { int a,b; a = getshared(); b = 20; a = a - b; saveshared(a); return a; shared memory return a; s = 50;

B runs after A Completes Thread A main () { int a,b; a = getshared(); b = 10; a = a + b; saveshared(a); Thread B main () { int a,b; a = getshared(); b = 20; a = a - b; saveshared(a); return a; shared memory return a; s = 30;

What about interleaving? Thread A main () { int a,b; a = getshared(); b = 10; a = a + b; saveshared(a); Thread B main () { int a,b; a = getshared(); b = 20; a = a - b; saveshared(a); return a; shared memory return a; s = 40;

Is there a race condition? Suppose count is a global variable, multiple threads increment it: count++; A. Yes, there s a race condition (count++ is a critical section). B. No, there s no race condition (count++ is not a critical section). C. Cannot be determined How about if compiler implements it as: movl (%edx), %eax // read count value addl $1, %eax // modify value movl %eax, (%edx) // write count How about if compiler implements it as: incl (%edx) // increment value

Four Rules for Mutual Exclusion 1. No two threads can be inside their critical sections at the same time. 2. No thread outside its critical section may prevent others from entering their critical sections. 3. No thread should have to wait forever to enter its critical section. (Starvation) 4. No assumptions can be made about speeds or number of CPU s.

How to Achieve Mutual Exclusion? < entry code > < critical section > < exit code > < entry code > < critical section > < exit code > Surround critical section with entry/exit code Entry code should act as a gate If another thread is in critical section, block Otherwise, allow thread to proceed Exit code should release other entry gates

Possible Solution: Spin Lock? shared int lock = OPEN; T 0 while (lock == CLOSED); lock = CLOSED; < critical section > lock = OPEN; T 1 while (lock == CLOSED); lock = CLOSED; < critical section > lock = OPEN; Lock indicates whether any thread is in critical section. Note: While loop has no body. Keeps checking the condition as quickly as possible until it becomes false. (It spins )

Possible Solution: Spin Lock? shared int lock = OPEN; T 0 while (lock == CLOSED); lock = CLOSED; < critical section > lock = OPEN; T 1 while (lock == CLOSED); lock = CLOSED; < critical section > lock = OPEN; Lock indicates whether any thread is in critical section. Is there a problem here? A: Yes, this is broken. B: No, this ought to work.

Possible Solution: Spin Lock? shared int lock = OPEN; T 0 while (lock == CLOSED); lock = CLOSED; < critical section > lock = OPEN; T 1 while (lock == CLOSED); lock = CLOSED; < critical section > lock = OPEN; What if a context switch occurs at this point?

Possible Solution: Take Turns? shared int turn = T 0 ; T 0 while (turn!= T 0 ); < critical section > turn = T 1 ; T 1 while (turn!= T 1 ); < critical section > turn = T 0 ; Alternate which thread can enter critical section Is there a problem? A: Yes, this is broken. B: No, this ought to work.

Possible Solution: Take Turns? shared int turn = T 0 ; T 0 while (turn!= T 0 ); < critical section > turn = T 1 ; T 1 while (turn!= T 1 ); < critical section > turn = T 0 ; Rule #2: No thread outside its critical section may prevent others from entering their critical sections.

Possible Solution: State Intention? shared boolean flag[2] = {FALSE, FALSE; T 0 flag[t 0 ] = TRUE; while (flag[t 1 ]); < critical section > flag[t 0 ] = FALSE; T 1 flag[t 1 ] = TRUE; while (flag[t 0 ]); < critical section > flag[t 1 ] = FALSE; Each thread states it wants to enter critical section Is there a problem? A: Yes, this is broken. B: No, this ought to work.

Possible Solution: State Intention? shared boolean flag[2] = {FALSE, FALSE; T 0 flag[t 0 ] = TRUE; while (flag[t 1 ]); < critical section > flag[t 0 ] = FALSE; T 1 flag[t 1 ] = TRUE; while (flag[t 0 ]); < critical section > flag[t 1 ] = FALSE; What if threads context switch between these two lines? Rule #3: No thread should have to wait forever to enter its critical section.

Peterson s Solution shared int turn; shared boolean flag[2] = {FALSE, FALSE; T 0 flag[t 0 ] = TRUE; turn = T 1 ; while (flag[t 1 ] && turn==t 1 ); < critical section > flag[t 0 ] = FALSE; T 1 flag[t 1 ] = TRUE; turn = T 0 ; while (flag[t 0 ] && turn==t 0 ); < critical section > flag[t 1 ] = FALSE; If there is competition, take turns; otherwise, enter Is there a problem? A: Yes, this is broken. B: No, this ought to work.

Spinlocks are Wasteful If a thread is spinning on a lock, it s using the CPU without making progress. Single-core system, prevents lock holder from executing. Multi-core system, waste core time when something else could be running. Ideal: thread can t enter critical section? Schedule something else. Consider it blocked.

Atomicity How do we get away from having to know about all other interested threads? The implementation of acquiring/releasing critical section must be atomic. An atomic operation is one which executes as though it could not be interrupted Code that executes all or nothing How do we make them atomic? Atomic instructions (e.g., test-and-set, compare-and-swap) Allows us to build semaphore OS abstraction

Semaphores Semaphore: OS synchronization variable Has integer value List of waiting threads Works like a gate If sem > 0, gate is open Value equals number of threads that can enter Else, gate is closed Possibly with waiting threads sem = 3 sem = 2 sem = 1 critical section sem = 0

Semaphores Associated with each semaphore is a queue of waiting threads When wait() is called by a thread: If semaphore is open, thread continues If semaphore is closed, thread blocks on queue Then signal() opens the semaphore: If a thread is waiting on the queue, the thread is unblocked If no threads are waiting on the queue, the signal is remembered for the next thread

Semaphore Operations sem s = n; // declare and initialize wait (sem s) // Executes atomically decrement s; if s < 0, block thread (and associate with s); signal (sem s) // Executes atomically increment s; if blocked threads, unblock (any) one of them;

Semaphore Operations sem s = n; // declare and initialize wait (sem s) // Executes atomically decrement s; if s < 0, block thread (and associate with s); signal (sem s) // Executes atomically increment s; if blocked threads, unblock (any) one of them; Based on what you know about semaphores, should a process be able to check beforehand whether wait(s) will cause it to block? A. Yes, it should be able to check. B. No, it should not be able to check.

Semaphore Operations sem s = n; // declare and initialize wait (sem s) decrement s; if s < 0, block thread (and associate with s); signal (sem s) increment s; if blocked threads, unblock (any) one of them; No other operations allowed In particular, semaphore s value can t be tested! No thread can tell the value of s

Mutual Exclusion with Semaphores sem mutex = 1; T 0 wait (mutex); < critical section > signal (mutex); T 1 wait (mutex); < critical section > signal (mutex); Use a mutex semaphore initialized to 1 Only one thread can enter critical section Simple, works for any number of threads Is there any busy-waiting?

Locking Abstraction One way to implement critical sections is to lock the door on the way in, and unlock it again on the way out Typically exports nicer interface for semaphores in user space A lock is an object in memory providing two operations acquire()/lock(): before entering the critical section release()/unlock(): after leaving a critical section Threads pair calls to acquire() and release() Between acquire()/release(), the thread holds the lock acquire() does not return until any previous holder releases What can happen if the calls are not paired?

Using Locks Thread A main () { int a,b; Thread B main () { int a,b; a = getshared(); b = 10; a = a + b; saveshared(a); a = getshared(); b = 20; a = a - b; saveshared(a); return a; shared memory return a; s = 40;

Using Locks Thread A main () { int a,b; Thread B main () { int a,b; acquire(l); a = getshared(); b = 10; a = a + b; saveshared(a); release(l); acquire(l); a = getshared(); b = 20; a = a - b; saveshared(a); release(l); return a; shared memory return a; s = 40; Lock l; Held by: Nobody

Using Locks Thread A main () { int a,b; Thread B main () { int a,b; acquire(l); a = getshared(); b = 10; a = a + b; saveshared(a); release(l); acquire(l); a = getshared(); b = 20; a = a - b; saveshared(a); release(l); return a; shared memory return a; s = 40; Lock l; Held by: Thread A

Using Locks Thread A main () { int a,b; Thread B main () { int a,b; acquire(l); a = getshared(); b = 10; a = a + b; saveshared(a); release(l); acquire(l); a = getshared(); b = 20; a = a - b; saveshared(a); release(l); return a; shared memory return a; s = 40; Lock l; Held by: Thread A

Thread A main () { int a,b; Using Locks Thread B main () { int a,b; Lock already owned. Must Wait! acquire(l); a = getshared(); b = 10; a = a + b; saveshared(a); release(l); acquire(l); a = getshared(); b = 20; a = a - b; saveshared(a); release(l); return a; shared memory return a; s = 40; Lock l; Held by: Thread A

Using Locks Thread A main () { int a,b; Thread B main () { int a,b; acquire(l); a = getshared(); b = 10; a = a + b; saveshared(a); release(l); acquire(l); a = getshared(); b = 20; a = a - b; saveshared(a); release(l); return a; shared memory return a; s = 50; Lock l; Held by: Nobody

Using Locks Thread A main () { int a,b; Thread B main () { int a,b; acquire(l); a = getshared(); b = 10; a = a + b; saveshared(a); release(l); acquire(l); a = getshared(); b = 20; a = a - b; saveshared(a); release(l); return a; shared memory return a; s = 30; Lock l; Held by: Thread B

Using Locks Thread A main () { int a,b; Thread B main () { int a,b; acquire(l); a = getshared(); b = 10; a = a + b; saveshared(a); release(l); acquire(l); a = getshared(); b = 20; a = a - b; saveshared(a); release(l); return a; shared memory return a; s = 30; Lock l; Held by: Nobody No matter how we order threads or when we context switch, result will always be 30, like we expected (and probably wanted).

Summary We have no idea when OS will schedule or context switch our threads. Code must be prepared, tough to reason about. Threads often must synchronize To safely communicate / transfer data, without races Synchronization primitives help programmers Kernel-level semaphores: limit # of threads that can do something, provides atomicity User-level locks: built upon semaphore, provides mutual exclusion (usually part of thread library)