Chap. 6 Part 1 CIS*3090 Fall 2016 Fall 2016 CIS*3090 Parallel Programming 1
Chap 6: specific programming techniques Languages and libraries Authors blur the distinction Languages: access parallel programming features explicitly or implicitly (under hood) by simply coding in the language Java, Java threads, Go, Scala, Julia Vs. C/C++ no HLL support for threads (till C++11) CUDA, OpenCL for GPU co-processor Most languages are inherently serial Parallel prog. hasn t been important before recently! Introduces many complications Fall 2016 CIS*3090 Parallel Programming 2
Libraries Making inherently-serial languages support parallel programming via calls to a library API pthreads.h, good example for C/C++ pilot.h, ditto for C & Fortran Other Pilot ports: C++, Python mpi.h, basis of Pilot Lots of parallel languages, libraries exist Few have caught on Fall 2016 CIS*3090 Parallel Programming 3
Starting with pthreads Rationale Already exposed in OS course Thread definition similar to Pilot s process definition, via work function All communication via shared memory Nothing stopping you using msg. passing in shared mem as sound IPC technique! QNX (bought by RIM) has send/receive/reply Fall 2016 CIS*3090 Parallel Programming 4
Compare/contrast pthreads API with Pilot s pthread_create Like PI_CreateProcess + PI_StartAll Thread is candidate for execution immediately! pthread_t is handle similar to PI_PROCESS* Like Pilot, one thread function can serve for multiple threads Distinguish instances via void* arg Like Pilot s index & void* args 1st call to pthread_create converts main() from process into a thread itself (PI_MAIN) Fall 2016 CIS*3090 Parallel Programming 5
Bound vs. unbound threads May need to set thread attributes Bound each thread gets own core (provided #threads #cores) This is Pthreads system contention scope: every thread is an equal contender for CPU Unbound = process contention scope Process s threads treated as a group (less CPU) Default may be OS-specific: 1 core:1 thread; 1:N; N:M Can also specify scheduling policy like FIFO or RR, and set thread priority Fall 2016 CIS*3090 Parallel Programming 6
pthread_join ~ PI_StopMain Wait for thread exit and reap its status Done by master, or any thread with handle on joinee thread Status is specified as a void* Can pass a value cast to (void*) If really passing pointer, make sure doesn t go out of scope when thread exits! Static storage address will still be valid Pointer to stack variable dumb! PI_StopMain does barrier with all processes Fall 2016 CIS*3090 Parallel Programming 7
Detached threads Detached thread attribute opposite of joinable Left to finish independently Can t return a status Also, pthread_detach() changes joinable thread to detached Fall 2016 CIS*3090 Parallel Programming 8
How main thread ends Can return or call exit() Terminates process and any remaining running threads (including detached) Can call pthread_exit() Leaves other threads running Running unjoined and detached threads keep whole process alive Likely not what you wanted Fall 2016 CIS*3090 Parallel Programming 9
How spawned threads end Normal way: work function returns return(status) or call pthread_exit() Waits to join if joinable, or dies if detached Can also be cancelled by other thread Tricky, could leave mess (e.g., locked mutexes) Complex, could be inside a system call (e.g., I/O), which may/not be cancellation point Possible to define cleanup function to be called upon cancellation Best to stay away from this! Fall 2016 CIS*3090 Parallel Programming 10
Inter-thread synchronization Mutex: lock, unlock, trylock Should initialize to unlocked via PTHREAD_MUTEX_INITIALIZER No fairness for multiple waiters Not necessarily FIFO queue (organize it yourself) (Counting) Semaphore: Need additional #include <semaphore.h> init(value), wait, post (aka signal), getvalue, trywait Fall 2016 CIS*3090 Parallel Programming 11
Condition variables Solves problem of holding onto mutex while waiting for (logical) condition to occur I.e., waiting inside a critical section Associated with a mutex wait, timedwait, signal, broadcast (=signal all waiters) Fall 2016 CIS*3090 Parallel Programming 12
Classic producer/consumer How can producer wait for room to open up in full buffer without releasing the buffer s mutex lock? Prevents consumer from removing an item! One solution is for producer to give up the lock and check back after awhile But we don t want it busy-waiting, nor to keep waking up on a timer uselessly when the condition hasn t changed Fall 2016 CIS*3090 Parallel Programming 13
The magic of cond_wait Waiters for condition to change call cond_wait covertly gives up associated mutex before blocking When returns, mutex already reacquired! Important: mutex lock is associated with some shared data structure (e.g., buffer) Whoever is accessing data structure needs to use SAME mutex Can be multiple condition variables associated with same mutex Fall 2016 CIS*3090 Parallel Programming 14
Condition variable: Basic discipline Waiter first acquires associated mutex finds logical condition false, so calls cond_wait() to wait for condition to become true blocks Any party that wants to signal that condition is now (potentially) fulfilled calls cond_signal() wakes up waiters Signaler only needs to acquire mutex if messing with associated data structure Fall 2016 CIS*3090 Parallel Programming 15
Main opportunity for failure Returning from cond_wait does NOT necessarily mean that the condition is true despite having been signaled!!! Library is allowed to wake up multiple waiters from one cond_signal (sorry to say) All have to contend for reacquiring mutex Only one at a time will succeed, and return from its cond_wait call Another one will not return till earlier one releases the mutex, by which time condition may have changed Fall 2016 CIS*3090 Parallel Programming 16
What waiter must do So, as a cond_wait caller, upon waking: You DO KNOW that you have exclusive use of the associated data structure But you CAN T ASSUME that the cond_signaled condition is (still) true Ergo, MUST recheck condition Another woken waiter may have changed condition (e.g, re-emptied or re-filled the buffer) Those who assume wakeup from cond_wait means condition good to go have buggy code! Fall 2016 CIS*3090 Parallel Programming 17
Unsafe condition variable use Fig 6.4: circular buffer, put/get indexes Shared buffer protected by mutex lock C.v. nonempty for producer to signal Inserting item and signaling non-empty condition must be within same critical section! If not, signal could be missed between waiter finding buffer is empty and waiting for it to fill Consider both 1) modifying the buffer and 2) signaling the change as part of the same locked transaction Fall 2016 CIS*3090 Parallel Programming 18
Figure 6.4 Example of why a signaling thread needs to be protected by a mutex. pthread_mutex_lock(&lock); & 1. Consumer (right column) locks mutex 2. Consumer checks if buffer empty (put==get), but before it can call cond_wait 3. Producer (left column) inserts item and calls cond_signal 4. The signal is lost, because no one is waiting yet! 5. Now when consumer calls cond_wait, it will not wake up Solution is for producer to lock mutex around (before/after) insert and cond_signal. This will force the insert to run after cond_wait releases the mutex (arrow), then the signal will wakeup the waiting consumer. Copyright 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 6-19
Fig 6.5 needs repairs Meant to show that 3 critical sections (C/S) pertaining to same buffer should use same mutex, then every possible execution sequence safe Even if multiple consumers, this works Test of buffer-empty is in while loop Re-executes test anytime cond_wait returns So test and removal occur w/in same C/S Fall 2016 CIS*3090 Parallel Programming 20
lock(&mutex) /* insert item */ pthread_cond_signal(&nonempty) unlock(&mutex) lock(&mutex) while(put==get) pthread_cond_wait(&nonempty,&mutex) /* remove item */ unlock(&mutex) Signaling thread CASE 1: A B C lock(&mutex) /* insert item */ pthread_cond_signal(&nonempty) unlock(&mutex) Fig 6.3 s insert( ) Producer s critical section A Fig 6.3 s remove( ) Consumer s critical section B (waiting for new items gives up mutex) Consumer s critical section C Waiting thread Producer followed by consumer no one waiting for signal, but doesn t matter since consumer will find buffer non-empty lock(&mutex) while(put==get) /* false has data */ pthread_cond_wait(&nonempty,&mutex) /* remove item */ unlock(&mutex) Fall 2016 CIS*3090 Parallel Programming 21
CASE 2: B A C lock(&mutex) /* insert item */ pthread_cond_signal(&nonempty) unlock(&mutex) CASE 3: B C A lock(&mutex) /* insert item */ pthread_cond_signal(&nonempty) unlock(&mutex) Consumer finds empty buffer lock(&mutex) while(put==get) /* true no data */ pthread_cond_wait(&nonempty,&mutex) consumer is waiting and gets wakeup /* remove item */ unlock(&mutex) Consumer followed by producer lock(&mutex) while(put==get) /* false has data */ pthread_cond_wait(&nonempty,&mutex) /* remove item */ unlock(&mutex) no one waiting for signal, but doesn t matter since consumer will (later) find buffer non-empty Fall 2016 CIS*3090 Parallel Programming 22
Multiple Condition Vars. (p159) See any problem with this? EatJuicyFruit() { pthread_mutex_lock(&lock); while( apples==0 oranges == 0 ) { pthread_cond_wait( &more_apples, &lock ); } pthread_cond_wait( &more_oranges, &lock ); } /* CRITICAL SECTION: eat both an apple and an orange */ pthread_mutex_unlock(&lock); Fall 2016 CIS*3090 Parallel Programming 23
Solution Involves proper use of cond_wait Shows how tricky pthreads code is to write correctly! Pilot much less opportunity for deadlocks by comparison Also: message-passing handles both communication and synchronization! Pthreads API only does inter-thread sync (you re using global variables for comm.) Fall 2016 CIS*3090 Parallel Programming 24
Thread-specific data (TSD) (lots of typos) Variable that is global in scope (to all functions) but having different values for each thread Identified by key (key_create func) Each TS variable needs its own key Use setspecific and getspecific with key:value pair Benefit: lower-level funcs can access these values without passing them down as args Fall 2016 CIS*3090 Parallel Programming 25
Drawbacks to TSD Not terribly efficient since accessed via function call Don t place in inner loops Can set up per-key destructor function Useful for OOP (C++) When thread exits, automatically called C++11 has TSD Fall 2016 CIS*3090 Parallel Programming 26
Safety issues Deadlocks (familiar from CIS*3110) Lock hierarchies: When thread needs to acquire more than one mutex at a time Make rule that they be acquired in consistent order (e.g. alphabetical by variable name) Prevents circular wait Unfortunately no easy way to enforce! Fall 2016 CIS*3090 Parallel Programming 27
Monitors Very OO encapsulates shared data with methods that manipulate it Methods take care of acquiring needed mutexes, deal with cond. vars. Prevents programmer logic errors leading to deadlock or violation of C/S by hiding mutex/cv s Not provided directly in pthreads.h Build yourself using cond. vars. (pattern in book) Fall 2016 CIS*3090 Parallel Programming 28
Next time Look at Successive Over-relaxation case study (p174-187) to prepare Fall 2016 CIS*3090 Parallel Programming 29