Chap. 6 Part 1. CIS*3090 Fall Fall 2016 CIS*3090 Parallel Programming 1

Similar documents
Systèmes d Exploitation Avancés

CS 31: Intro to Systems Misc. Threading. Kevin Webb Swarthmore College December 6, 2018

Outline. CS4254 Computer Network Architecture and Programming. Introduction 2/4. Introduction 1/4. Dr. Ayman A. Abdel-Hamid.

High Performance Computing Lecture 21. Matthew Jacob Indian Institute of Science

POSIX Threads. Paolo Burgio

Condition Variables. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Review: Processes. A process is an instance of a running program. POSIX Thread APIs:

More Types of Synchronization 11/29/16

Data Races and Deadlocks! (or The Dangers of Threading) CS449 Fall 2017

Chap. 6 Part 3. CIS*3090 Fall Fall 2016 CIS*3090 Parallel Programming 1

Operating Systems. Thread Synchronization Primitives. Thomas Ropars.

Semaphore. Originally called P() and V() wait (S) { while S <= 0 ; // no-op S--; } signal (S) { S++; }

CSE 333 SECTION 9. Threads

real time operating systems course

Condition Variables & Semaphores

Motivation and definitions Processes Threads Synchronization constructs Speedup issues

[537] Locks and Condition Variables. Tyler Harter

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 8: Semaphores, Monitors, & Condition Variables

Deadlock and Monitors. CS439: Principles of Computer Systems September 24, 2018

CS533 Concepts of Operating Systems. Jonathan Walpole

TCSS 422: OPERATING SYSTEMS

CS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017

CS 537 Lecture 8 Monitors. Thread Join with Semaphores. Dining Philosophers. Parent thread. Child thread. Michael Swift

CSE 153 Design of Operating Systems

EEE3052: Introduction to Operating Systems. Fall Project #3

Condition Variables CS 241. Prof. Brighten Godfrey. March 16, University of Illinois

CSci 4061 Introduction to Operating Systems. (Threads-POSIX)

CSci 4061 Introduction to Operating Systems. Synchronization Basics: Locks

Lecture 4. Threads vs. Processes. fork() Threads. Pthreads. Threads in C. Thread Programming January 21, 2005

Operating systems and concurrency (B08)

Administrivia. Review: Thread package API. Program B. Program A. Program C. Correct answers. Please ask questions on Google group

Thread. Disclaimer: some slides are adopted from the book authors slides with permission 1

Lecture 8: September 30

Locks. Dongkun Shin, SKKU

Concurrent Programming

Synchroniza+on II COMS W4118

CS 470 Spring Mike Lam, Professor. Semaphores and Conditions

Synchronising Threads

Concurrent Server Design Multiple- vs. Single-Thread

ANSI/IEEE POSIX Standard Thread management

Multithreading Programming II

CS 333 Introduction to Operating Systems Class 4 Concurrent Programming and Synchronization Primitives

CPS 310 second midterm exam, 11/14/2014

PThreads in a Nutshell

Review: Thread package API

Review: Thread package API

Real Time Operating Systems and Middleware

Concurrency: Deadlock and Starvation

W4118 Operating Systems. Instructor: Junfeng Yang

EECS 482 Introduction to Operating Systems

Lecture 6 (cont.): Semaphores and Monitors

PROCESS SYNCHRONIZATION

Concurrency: Signaling and Condition Variables

Resource management. Real-Time Systems. Resource management. Resource management

Warm-up question (CS 261 review) What is the primary difference between processes and threads from a developer s perspective?

Operating Systems CMPSCI 377 Spring Mark Corner University of Massachusetts Amherst

Multithreaded Programming

CMSC421: Principles of Operating Systems

Operating Systems (2INC0) 2017/18

Chapter 6: Process Synchronization

CS 2112 Lecture 20 Synchronization 5 April 2012 Lecturer: Andrew Myers

CS 241 Honors Concurrent Data Structures

Real-Time and Concurrent Programming Lecture 4 (F4): Monitors: synchronized, wait and notify

This is a talk given by me to the Northwest C++ Users Group on May 19, 2010.

The Dining Philosophers Problem CMSC 330: Organization of Programming Languages

Dealing with Issues for Interprocess Communication

CS 318 Principles of Operating Systems

Project 3-2. Mutex & CV

POSIX Threads. HUJI Spring 2011

CMSC 330: Organization of Programming Languages. The Dining Philosophers Problem

Synchronization and Semaphores. Copyright : University of Illinois CS 241 Staff 1

Warm-up question (CS 261 review) What is the primary difference between processes and threads from a developer s perspective?

CMSC 330: Organization of Programming Languages. Threads Classic Concurrency Problems

Concurrency, Thread. Dongkun Shin, SKKU

Pthreads. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

CS 261 Fall Mike Lam, Professor. Threads

COSC 6374 Parallel Computation. Shared memory programming with POSIX Threads. Edgar Gabriel. Fall References

Overview. CMSC 330: Organization of Programming Languages. Concurrency. Multiprocessors. Processes vs. Threads. Computation Abstractions

Semaphores. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Lecture 7: CVs & Scheduling

ECE 598 Advanced Operating Systems Lecture 23

Operating Systems ECE344

CSE 451: Operating Systems Spring Module 10 Semaphores, Condition Variables, and Monitors

Threaded Programming. Lecture 9: Alternatives to OpenMP

C09: Process Synchronization

Lecture 5 Threads and Pthreads II

Parallel Programming with Threads

Thread Synchronization: Foundations. Properties. Safety properties. Edsger s perspective. Nothing bad happens

Last class: Today: Thread Background. Thread Systems

Real-Time Systems. Lecture #4. Professor Jan Jonsson. Department of Computer Science and Engineering Chalmers University of Technology

Deadlock and Monitors. CS439: Principles of Computer Systems February 7, 2018

Opera&ng Systems ECE344

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

CSE 374 Programming Concepts & Tools

CSCI4430 Data Communication and Computer Networks. Pthread Programming. ZHANG, Mi Jan. 26, 2017

CMSC 330: Organization of Programming Languages

Last Class: CPU Scheduling! Adjusting Priorities in MLFQ!

Student: Yu Cheng (Jade) ICS 412 Homework #3 October 07, Explain why a spin-lock is not a good idea on a single processor, single machine.

Concurrency. Chapter 5

Transcription:

Chap. 6 Part 1 CIS*3090 Fall 2016 Fall 2016 CIS*3090 Parallel Programming 1

Chap 6: specific programming techniques Languages and libraries Authors blur the distinction Languages: access parallel programming features explicitly or implicitly (under hood) by simply coding in the language Java, Java threads, Go, Scala, Julia Vs. C/C++ no HLL support for threads (till C++11) CUDA, OpenCL for GPU co-processor Most languages are inherently serial Parallel prog. hasn t been important before recently! Introduces many complications Fall 2016 CIS*3090 Parallel Programming 2

Libraries Making inherently-serial languages support parallel programming via calls to a library API pthreads.h, good example for C/C++ pilot.h, ditto for C & Fortran Other Pilot ports: C++, Python mpi.h, basis of Pilot Lots of parallel languages, libraries exist Few have caught on Fall 2016 CIS*3090 Parallel Programming 3

Starting with pthreads Rationale Already exposed in OS course Thread definition similar to Pilot s process definition, via work function All communication via shared memory Nothing stopping you using msg. passing in shared mem as sound IPC technique! QNX (bought by RIM) has send/receive/reply Fall 2016 CIS*3090 Parallel Programming 4

Compare/contrast pthreads API with Pilot s pthread_create Like PI_CreateProcess + PI_StartAll Thread is candidate for execution immediately! pthread_t is handle similar to PI_PROCESS* Like Pilot, one thread function can serve for multiple threads Distinguish instances via void* arg Like Pilot s index & void* args 1st call to pthread_create converts main() from process into a thread itself (PI_MAIN) Fall 2016 CIS*3090 Parallel Programming 5

Bound vs. unbound threads May need to set thread attributes Bound each thread gets own core (provided #threads #cores) This is Pthreads system contention scope: every thread is an equal contender for CPU Unbound = process contention scope Process s threads treated as a group (less CPU) Default may be OS-specific: 1 core:1 thread; 1:N; N:M Can also specify scheduling policy like FIFO or RR, and set thread priority Fall 2016 CIS*3090 Parallel Programming 6

pthread_join ~ PI_StopMain Wait for thread exit and reap its status Done by master, or any thread with handle on joinee thread Status is specified as a void* Can pass a value cast to (void*) If really passing pointer, make sure doesn t go out of scope when thread exits! Static storage address will still be valid Pointer to stack variable dumb! PI_StopMain does barrier with all processes Fall 2016 CIS*3090 Parallel Programming 7

Detached threads Detached thread attribute opposite of joinable Left to finish independently Can t return a status Also, pthread_detach() changes joinable thread to detached Fall 2016 CIS*3090 Parallel Programming 8

How main thread ends Can return or call exit() Terminates process and any remaining running threads (including detached) Can call pthread_exit() Leaves other threads running Running unjoined and detached threads keep whole process alive Likely not what you wanted Fall 2016 CIS*3090 Parallel Programming 9

How spawned threads end Normal way: work function returns return(status) or call pthread_exit() Waits to join if joinable, or dies if detached Can also be cancelled by other thread Tricky, could leave mess (e.g., locked mutexes) Complex, could be inside a system call (e.g., I/O), which may/not be cancellation point Possible to define cleanup function to be called upon cancellation Best to stay away from this! Fall 2016 CIS*3090 Parallel Programming 10

Inter-thread synchronization Mutex: lock, unlock, trylock Should initialize to unlocked via PTHREAD_MUTEX_INITIALIZER No fairness for multiple waiters Not necessarily FIFO queue (organize it yourself) (Counting) Semaphore: Need additional #include <semaphore.h> init(value), wait, post (aka signal), getvalue, trywait Fall 2016 CIS*3090 Parallel Programming 11

Condition variables Solves problem of holding onto mutex while waiting for (logical) condition to occur I.e., waiting inside a critical section Associated with a mutex wait, timedwait, signal, broadcast (=signal all waiters) Fall 2016 CIS*3090 Parallel Programming 12

Classic producer/consumer How can producer wait for room to open up in full buffer without releasing the buffer s mutex lock? Prevents consumer from removing an item! One solution is for producer to give up the lock and check back after awhile But we don t want it busy-waiting, nor to keep waking up on a timer uselessly when the condition hasn t changed Fall 2016 CIS*3090 Parallel Programming 13

The magic of cond_wait Waiters for condition to change call cond_wait covertly gives up associated mutex before blocking When returns, mutex already reacquired! Important: mutex lock is associated with some shared data structure (e.g., buffer) Whoever is accessing data structure needs to use SAME mutex Can be multiple condition variables associated with same mutex Fall 2016 CIS*3090 Parallel Programming 14

Condition variable: Basic discipline Waiter first acquires associated mutex finds logical condition false, so calls cond_wait() to wait for condition to become true blocks Any party that wants to signal that condition is now (potentially) fulfilled calls cond_signal() wakes up waiters Signaler only needs to acquire mutex if messing with associated data structure Fall 2016 CIS*3090 Parallel Programming 15

Main opportunity for failure Returning from cond_wait does NOT necessarily mean that the condition is true despite having been signaled!!! Library is allowed to wake up multiple waiters from one cond_signal (sorry to say) All have to contend for reacquiring mutex Only one at a time will succeed, and return from its cond_wait call Another one will not return till earlier one releases the mutex, by which time condition may have changed Fall 2016 CIS*3090 Parallel Programming 16

What waiter must do So, as a cond_wait caller, upon waking: You DO KNOW that you have exclusive use of the associated data structure But you CAN T ASSUME that the cond_signaled condition is (still) true Ergo, MUST recheck condition Another woken waiter may have changed condition (e.g, re-emptied or re-filled the buffer) Those who assume wakeup from cond_wait means condition good to go have buggy code! Fall 2016 CIS*3090 Parallel Programming 17

Unsafe condition variable use Fig 6.4: circular buffer, put/get indexes Shared buffer protected by mutex lock C.v. nonempty for producer to signal Inserting item and signaling non-empty condition must be within same critical section! If not, signal could be missed between waiter finding buffer is empty and waiting for it to fill Consider both 1) modifying the buffer and 2) signaling the change as part of the same locked transaction Fall 2016 CIS*3090 Parallel Programming 18

Figure 6.4 Example of why a signaling thread needs to be protected by a mutex. pthread_mutex_lock(&lock); & 1. Consumer (right column) locks mutex 2. Consumer checks if buffer empty (put==get), but before it can call cond_wait 3. Producer (left column) inserts item and calls cond_signal 4. The signal is lost, because no one is waiting yet! 5. Now when consumer calls cond_wait, it will not wake up Solution is for producer to lock mutex around (before/after) insert and cond_signal. This will force the insert to run after cond_wait releases the mutex (arrow), then the signal will wakeup the waiting consumer. Copyright 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 6-19

Fig 6.5 needs repairs Meant to show that 3 critical sections (C/S) pertaining to same buffer should use same mutex, then every possible execution sequence safe Even if multiple consumers, this works Test of buffer-empty is in while loop Re-executes test anytime cond_wait returns So test and removal occur w/in same C/S Fall 2016 CIS*3090 Parallel Programming 20

lock(&mutex) /* insert item */ pthread_cond_signal(&nonempty) unlock(&mutex) lock(&mutex) while(put==get) pthread_cond_wait(&nonempty,&mutex) /* remove item */ unlock(&mutex) Signaling thread CASE 1: A B C lock(&mutex) /* insert item */ pthread_cond_signal(&nonempty) unlock(&mutex) Fig 6.3 s insert( ) Producer s critical section A Fig 6.3 s remove( ) Consumer s critical section B (waiting for new items gives up mutex) Consumer s critical section C Waiting thread Producer followed by consumer no one waiting for signal, but doesn t matter since consumer will find buffer non-empty lock(&mutex) while(put==get) /* false has data */ pthread_cond_wait(&nonempty,&mutex) /* remove item */ unlock(&mutex) Fall 2016 CIS*3090 Parallel Programming 21

CASE 2: B A C lock(&mutex) /* insert item */ pthread_cond_signal(&nonempty) unlock(&mutex) CASE 3: B C A lock(&mutex) /* insert item */ pthread_cond_signal(&nonempty) unlock(&mutex) Consumer finds empty buffer lock(&mutex) while(put==get) /* true no data */ pthread_cond_wait(&nonempty,&mutex) consumer is waiting and gets wakeup /* remove item */ unlock(&mutex) Consumer followed by producer lock(&mutex) while(put==get) /* false has data */ pthread_cond_wait(&nonempty,&mutex) /* remove item */ unlock(&mutex) no one waiting for signal, but doesn t matter since consumer will (later) find buffer non-empty Fall 2016 CIS*3090 Parallel Programming 22

Multiple Condition Vars. (p159) See any problem with this? EatJuicyFruit() { pthread_mutex_lock(&lock); while( apples==0 oranges == 0 ) { pthread_cond_wait( &more_apples, &lock ); } pthread_cond_wait( &more_oranges, &lock ); } /* CRITICAL SECTION: eat both an apple and an orange */ pthread_mutex_unlock(&lock); Fall 2016 CIS*3090 Parallel Programming 23

Solution Involves proper use of cond_wait Shows how tricky pthreads code is to write correctly! Pilot much less opportunity for deadlocks by comparison Also: message-passing handles both communication and synchronization! Pthreads API only does inter-thread sync (you re using global variables for comm.) Fall 2016 CIS*3090 Parallel Programming 24

Thread-specific data (TSD) (lots of typos) Variable that is global in scope (to all functions) but having different values for each thread Identified by key (key_create func) Each TS variable needs its own key Use setspecific and getspecific with key:value pair Benefit: lower-level funcs can access these values without passing them down as args Fall 2016 CIS*3090 Parallel Programming 25

Drawbacks to TSD Not terribly efficient since accessed via function call Don t place in inner loops Can set up per-key destructor function Useful for OOP (C++) When thread exits, automatically called C++11 has TSD Fall 2016 CIS*3090 Parallel Programming 26

Safety issues Deadlocks (familiar from CIS*3110) Lock hierarchies: When thread needs to acquire more than one mutex at a time Make rule that they be acquired in consistent order (e.g. alphabetical by variable name) Prevents circular wait Unfortunately no easy way to enforce! Fall 2016 CIS*3090 Parallel Programming 27

Monitors Very OO encapsulates shared data with methods that manipulate it Methods take care of acquiring needed mutexes, deal with cond. vars. Prevents programmer logic errors leading to deadlock or violation of C/S by hiding mutex/cv s Not provided directly in pthreads.h Build yourself using cond. vars. (pattern in book) Fall 2016 CIS*3090 Parallel Programming 28

Next time Look at Successive Over-relaxation case study (p174-187) to prepare Fall 2016 CIS*3090 Parallel Programming 29