Parallel Computing: Programming Shared Address Space Platforms (Pthread) Jin, Hai

Parallel Computing: Programming Shared Address Space Platforms (Pthread) Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology

Thread Basics! All memory in the logical machine model of a thread is globally accessible to every thread.! The stack corresponding to the function call is generally treated as being local to the thread for liveness reasons. 2

Thread Basics 3

Thread Basics The logical machine model of a thread-based parallel programming paradigm. 4

Thread Basics! Cannot assume any order of execution.! Any such order must be explicitly established using synchronization mechanisms.! Issues on deadlock, starvation and performance. 5

Pthreads API! The Pthreads API is defined in the ANSI/ IEEE POSIX 1003.1-1995 standard.! The subroutines which comprise the Pthreads API can be informally grouped into three major classes:! Thread management! Mutexes! Condition variables 6

Thread Management! The first class of functions work directly on threads - creating, detaching, joining, etc.! They include functions to set/query thread attributes (joinable, scheduling etc.) 7

Creating Threads! Initially, only a single, default thread. All other threads must be explicitly created by the programmer.! routine: pthread_create(thread,attr,start_routine,arg)! thread: unique identifier for the new thread (pthread_t)! attr: attribute object used to set thread attributes. (pthread_attr) You can specify a thread attributes object, or NULL for the default values.! start_routine: C routine that the thread will execute.! arg: A single argument that may be passed to start_routine. It must be passed by reference. NULL may be used if no argument is to be passed.! If successful, the pthread_create() function shall return zero; otherwise, an error number shall be returned to indicate the error. 8

Thread Attributes! By default, a thread is created with certain attributes. Some of these attributes can be changed by the programmer via the thread attribute object.! pthread_attr_init(attr) and pthread_attr_destroy(attr) are used to initialize/destroy the thread attribute object.! Other routines are then used to query/set specific attributes in the thread attribute object. 9

Terminating Thread! There are several ways in which a Pthread may be terminated:! The thread makes a call to the pthread_exit() subroutine.! The thread is cancelled by another thread via the pthread_cancel() routine.! The entire process is terminated due to a call to the exit subroutine. 10

Terminating Thread! routine: pthread_exit(status)! used to explicitly exit a thread.! The programmer may optionally specify a termination status, which is stored as a void pointer for any thread that may join the calling thread.! Cleanup: the pthread_exit() routine does not close files; any files opened inside the thread will remain open after the thread is terminated. 11

Example #include <pthread.h> #include <stdio.h> #include <stdlib.h> #define NUM_THREADS 5 void *PrintHello(void *threadid) { int tid; tid = (int) threadid; printf("hello World! It's me, thread #%d!\n", tid); pthread_exit(null); 12

Example int main(int argc, char *argv[]) { pthread_t threads[num_threads]; int rc, t; for(t=0;t<num_threads;t++){ printf("in main: creating thread %d\n", t); rc = pthread_create(&threads[t], NULL, PrintHello, (void *)t); if (rc){ printf("error; return code from pthread_create() is %d\n", rc); exit(-1); pthread_exit(null); 13

Passing Arguments to Threads! The pthread_create() routine permits the programmer to pass one argument to the thread start routine.! For cases where multiple arguments must be passed, this limitation is easily overcome by creating a structure which contains all of the arguments, and then passing a pointer to that structure in the pthread_create () routine.! All arguments must be passed by reference and cast to (void *). 14

Passing Arguments to Threads struct two_args { int arg1; int arg2; ; void *needs_2_args(void *ap) { struct two_args *argp; int a1, a2; argp = (struct two_args *) ap; a1 = argp->arg1; a2 = argp->arg2; free (argp); pthread_exit(null); 15

Passing Arguments to Threads int main(int argc, char *argv[]) { pthread_t t; struct two_args *ap; int rc; ap = (struct two_args *) malloc (sizeof (struct two_args)); ap->arg1 = 1; ap->arg2 = 2; rc = pthread_create(&t, NULL, needs_2_args, (void *) ap); pthread_exit(null); 16

Joining & Detaching Threads! Routines: pthread_join (threadid,status) pthread_detach (threadid,status) pthread_attr_setdatachstate (attr,detachstate) pthread_attr_getdetachstate (attr,datachstate)! "Joining" is one way to accomplish synchronization between threads.! The pthread_join() subroutine blocks the calling thread until the specified threadid thread terminates.! The programmer is able to obtain the target thread's termination return status if it was specified in the target thread's call to pthread_exit(). 17

Joining & Detaching Threads! When a thread is created, one of its attributes defines whether it is joinable or detached.! Only threads that are created as joinable can be joined. If a thread is created as detached, it can never be joined.! To explicitly create a thread as joinable or detached, the attr argument in the pthread_create() routine is used:! Declare a pthread attribute variable of the pthread_attr_t data type! Initialize the attribute variable with pthread_attr_init()! Set the attribute detached status with pthread_attr_setdetachstate()! When done, free library resources used by the attribute with pthread_attr_destroy() 18

Example void *BusyWork(void *null) { pthread_exit((void *) 0); 19

Example int main (int argc, char *argv[]) { pthread_attr_t attr; int rc, t; void *status; /* Initialize and set thread detached attribute */ pthread_attr_init(&attr); pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE); /* Free attribute and wait for the other threads */ pthread_attr_destroy(&attr); for(t=0; t<num_threads; t++) { rc = pthread_join(thread[t], &status); printf("completed join with thread %d status= %ld\n",t, (long)status); pthread_exit(null); 20

Synchronization Issues! When multiple threads attempt to manipulate the same data item, the results can often be incoherent if proper care is not taken to synchronize them.! Consider: /* each thread tries to update variable best_cost as follows */ if (my_cost < best_cost) best_cost = my_cost;! Assume that there are two threads, the initial value of best_cost is 100, and the values of my_cost are 50 and 75 at threads t1 and t2.! Depending on the schedule of the threads, the value of best_cost could be 50 or 75! 21

Mutex! The second class of functions deal with synchronization, called a "mutex", which is an abbreviation for "mutual exclusion".! Mutex functions provide for creating, destroying, locking and unlocking mutexes.! They are also supplemented by mutex attribute functions that set or modify attributes associated with mutexes. 22

Creating & Destroying Mutex! Routines: pthread_mutex_init (mutex,attr) pthread_mutex_destroy (mutex) pthread_mutexattr_init (attr) pthread_mutexattr_destroy (attr)! Mutex must be declared with type pthread_mutex_t, and must be initialized before they can be used.! There are two ways to initialize a mutex variable:! Statically, when it is declared. For example: pthread_mutex_t mymutex = PTHREAD_MUTEX_INITIALIZER;! Dynamically, with the pthread_mutex_init() routine. This method permits setting mutex object attributes, attr (which may be specified as NULL to accept defaults).! The mutex is initially unlocked. 23

Locking & Unlocking Mutex! Routines: pthread_mutex_lock (mutex) pthread_mutex_trylock (mutex) pthread_mutex_unlock (mutex)! pthread_mutex_lock() is used by a thread to acquire a lock on the specified mutex variable.! pthread_mutex_unlock() will unlock a mutex if called by the owning thread. An error will be returned if:! the mutex was already unlocked! the mutex is owned by another thread! pthread_mutex_trylock() will attempt to lock a mutex. However, if the mutex is already locked, the routine will return immediately with an EBUSY" error code. This routine may be useful in preventing deadlock conditions. 24

Example 1! We can now write our previously incorrect code segment as: pthread_mutex_t minimum_value_lock;... main() {... pthread_mutex_init(&minimum_value_lock, NULL);... void *find_min(void *list_ptr) {... pthread_mutex_lock(&minimum_value_lock); if (my_cost < best_cost) best_cost = my_cost; pthread_mutex_unlock(&minimum_value_lock); 25

Example 2! The producer-consumer scenario imposes the following constraints:! The producer thread must not overwrite the shared buffer when the previous task has not been picked up by a consumer thread.! The consumer threads must not pick up tasks until there is something present in the shared data structure.! Individual consumer threads should pick up tasks one at a time. 26

Example 2 pthread_mutex_t task_queue_lock; int task_available;... main() {... task_available = 0; pthread_mutex_init(&task_queue_lock, NULL);... 27

Example 2 void *producer(void *producer_thread_data) {... while (!done()) { inserted = 0; create_task(&my_task); while (inserted == 0) { pthread_mutex_lock(&task_queue_lock); if (task_available == 0) { insert_into_queue(my_task); task_available = 1; inserted = 1; pthread_mutex_unlock(&task_queue_lock); 28

Example 2 void *consumer(void *consumer_thread_data) {... while (!done()) { extracted = 0; while (extracted == 0) { pthread_mutex_lock(&task_queue_lock); if (task_available == 1) { extract_from_queue(&my_task); task_available = 0; extracted = 1; pthread_mutex_unlock(&task_queue_lock); process_task(my_task); 29

Overheads of Locking! Locks represent serialization points since critical sections must be executed by threads one after another.! Encapsulating large segments of the program within locks can lead to significant performance degradation.! It is often possible to reduce the idling overhead associated with locks using pthread_mutex_trylock. 30

Alleviating Locking Overhead /* using pthread_mutex_trylock routine */ Pthread_mutax_t trylock_lock = PTHREAD_MUTEX_INITIALIZER; lock_status=pthread_mutex_trylock(&trylock_lock); if (lock_status == EBUSY) { /* do something else */ else { /* do one thing */ pthread_mutex_unlock(&trylock_lock); 31

Monitor! Mutexs provide powerful synchronization tools. However,! Lock() and unlock() are scattered among several threads. Therefore, it is difficult to understand their effects! Usage must be correct in all the threads.! One bad thread (or one programming error) can kill the whole system.! Monitor is a high-level abstraction that may provide a convenient and effective mechanism for thread synchronization 32

Monitor! Local data variables are accessible only by the monitor! thread enters monitor by invoking one of its procedures! Only one thread may be executing in the monitor at a time 33

Monitor with Condition Variables! Monitor does not need to code certain synchronization constraints explicitly.! However, it is not sufficiently powerful for modeling some other synchronization schemes.! An additional synchronization mechanism, i.e., condition variable, is required. 34

Condition Variables! The third class of functions address communications between threads that share a mutex.! A condition variable allows a thread to block itself until specified data reaches a predefined state.! A condition variable indicates an event and has no value.! One cannot store a value into nor retrieve a value from a condition variable.! If a thread must wait for an event to occur, that thread waits on the corresponding condition variable.! A condition variable has a queue for those threads that are waiting the corresponding event to occur to wait on.! If another thread causes the event to occur, that thread simply signals the corresponding condition variable. 35

Condition Variables! This class includes functions to create, destroy, wait and signal based upon specified variable values.! Functions to set/query condition variable attributes are also included.! A condition variable is always used in conjunction with a mutex lock. 36

Creating & Destroying Condition Variables! Routines: pthread_cond_init (condition, attr) pthread_cond_destroy (condition) pthread_condattr_init (attr) pthread_condattr_destroy (attr)! Condition variables must be declared with type pthread_cond_t, and must be initialized before they can be used.! There are two ways to initialize a condition variable:! Statically, when it is declared. For example: pthread_cond_t myconvar = PTHREAD_COND_INITIALIZER;! Dynamically, with the pthread_cond_init() routine. This method permits setting condition variable object attributes, attr (which may be specified as NULL to accept defaults). 37

Waiting and Signaling on Condition Variables! Routines: pthread_cond_wait (condition, mutex) pthread_cond_signal (condition) pthread_cond_broadcast (condition)! pthread_cond_wait() blocks the calling thread until the specified condition is signalled.! This routine should be called while mutex is locked, and it will automatically release the mutex while it waits.! After signal is received and thread is awakened, mutex will be automatically locked for use by the thread.! The programmer is then responsible for unlocking mutex when the thread is finished with it. 38

Waiting and Signaling on Condition Variables! pthread_cond_signal() is used to signal (or wake up) another thread which is waiting on the condition variable.! It should be called after mutex is locked, and! must unlock mutex in order for pthread_cond_wait() routine to complete.! The pthread_cond_broadcast() routine unlocks all of the threads blocked on the condition variable. 39

Waiting and Signaling on Condition Variables! Proper locking and unlocking of the associated mutex variable is essential when using these routines. For example:! Failing to lock the mutex before calling pthread_cond_wait() may cause it NOT to block.! Failing to unlock the mutex after calling pthread_cond_signal() may not allow a matching pthread_cond_wait() routine to complete (it will remain blocked). 40

Producer-Consumer Using Condition Variables pthread_cond_t cond_queue_empty, cond_queue_full; pthread_mutex_t task_queue_cond_lock; int task_available; /* other data structures here */ main() { /* declarations and initializations */ task_available = 0; pthread_cond_init(&cond_queue_empty, NULL); pthread_cond_init(&cond_queue_full, NULL); pthread_mutex_init(&task_queue_cond_lock, NULL); /* create and join producer and consumer threads */ 41

Producer-Consumer Using Condition Variables void *producer(void *producer_thread_data) { while (!done()) { create_task(); pthread_mutex_lock(&task_queue_cond_lock); while (task_available == 1) pthread_cond_wait(&cond_queue_empty, &task_queue_cond_lock); insert_into_queue(); task_available = 1; pthread_cond_signal(&cond_queue_full); pthread_mutex_unlock(&task_queue_cond_lock); 42

Producer-Consumer Using Condition Variables void *consumer(void *consumer_thread_data) { while (!done()) { pthread_mutex_lock(&task_queue_cond_lock); while (task_available == 0) pthread_cond_wait(&cond_queue_full, &task_queue_cond_lock); my_task = extract_from_queue(); task_available = 0; pthread_cond_signal(&cond_queue_empty); pthread_mutex_unlock(&task_queue_cond_lock); process_task(my_task); 43

Composite Synchronization Constructs! By design, Pthreads provide support for a basic set of operations.! Higher level constructs can be built using basic synchronization constructs. 44

! To find the solution for a twodimensional Laplace equation simply: 1. Initialise T ij to some initial guess. 2. Apply the boundary conditions. 3. For each internal mesh point set " T ij *!=!(T (i+1)j!+! T (i-1)j!+! T i(j+1)!+! T i(j-1) )/4. 4. Replace old solution T with new estimate T*. 5. If solution does not satisfy tolerance, repeat from step 2. 45

! We need a new synchronization structure called barrier to synchronize the computation between consecutive iteration steps no thread can come across the barrier till all the threads arrive.! For barrier we need! Total number of threads considered! Number of threads have arrived! A mutex and a condition variable 46

typedef struct { pthread_mutex_t count_lock; pthread_cond_t ok_to_proceed; int count; mylib_barrier_t; void mylib_init_barrier(mylib_barrier_t *b) { b -> count = 0; pthread_mutex_init(&(b -> count_lock), NULL); pthread_cond_init(&(b -> ok_to_proceed), NULL); 47

void mylib_barrier (mylib_barrier_t *b, int num_threads) { pthread_mutex_lock(&(b -> count_lock)); b -> count ++; if (b -> count == num_threads) { b -> count = 0; pthread_cond_broadcast(&(b -> ok_to_proceed)); else while (pthread_cond_wait(&(b -> ok_to_proceed), &(b -> count_lock))!= 0); pthread_mutex_unlock(&(b -> count_lock)); 48

Read-Write Locks! In many applications, a data structure is read frequently but written infrequently. For such applications, we should use read-write locks.! A read lock is granted when there are other threads that may already have read locks.! If there is a write lock on the data (or if there are queued write locks), the thread performs a condition wait.! If there are multiple threads requesting a write lock, they must perform a condition wait.! With this description, we can design functions for! read locks mylib_rwlock_rlock,! write locks mylib_rwlock_wlock, and! unlocking mylib_rwlock_unlock. 49

Read-Write Locks! The lock data type mylib_rwlock_t holds the following:! a count of the number of readers,! the writer (a 0/1 integer specifying whether a writer is present),! a count pending_writers of pending writers,! a condition variable readers_proceed that is signaled when readers can proceed,! a condition variable writer_proceed that is signaled when one of the writers can proceed, and! a mutex read_write_lock associated with the shared data structure 50

Read-Write Locks typedef struct { int readers; int writer; int pending_writers; pthread_cond_t readers_proceed; pthread_cond_t writer_proceed; pthread_mutex_t read_write_lock; mylib_rwlock_t; void mylib_rwlock_init (mylib_rwlock_t *l) { l->readers = l->writer = l->pending_writers = 0; pthread_mutex_init(&(l->read_write_lock), NULL); pthread_cond_init(&(l->readers_proceed), NULL); pthread_cond_init(&(l->writer_proceed), NULL); 51

Read-Write Locks void mylib_rwlock_rlock(mylib_rwlock_t *l) { /* if there is a write lock, perform condition wait. else increment count of readers and grant read lock */ pthread_mutex_lock(&(l->read_write_lock)); while (l->writer > 0) pthread_cond_wait(&(l->readers_proceed), &(l->read_write_lock)); l->readers ++; pthread_mutex_unlock(&(l->read_write_lock)); 52

Read-Write Locks void mylib_rwlock_wlock(mylib_rwlock_t *l) { /* if there are readers or writers, increment pending writers count and wait. On being awoken, decrement pending writers count and increment writer count */ pthread_mutex_lock(&(l->read_write_lock)); l->pending_writers ++; while ((l->writer > 0) (l->readers > 0)) { pthread_cond_wait(&(l->writer_proceed), &(l->read_write_lock)); l->pending_writers --; l->writer ++; pthread_mutex_unlock(&(l->read_write_lock)); 53

Read-Write Locks void mylib_rwlock_unlock(mylib_rwlock_t *l) { /* if there is a write lock then unlock, else if there are read locks, decrement count of read locks. If the count is 0 and there is a pending writer, let it through, else if there are pending readers, let them all go through */ pthread_mutex_lock(&(l->read_write_lock)); if (l->writer > 0) l->writer = 0; else if (l->readers > 0) l->readers --; if ((l->readers == 0) && (l->pending_writers > 0)) pthread_cond_signal(&(l->writer_proceed)); else if (l->readers > 0) pthread_cond_broadcast(&(l->readers_proceed)); pthread_mutex_unlock(&(l->read_write_lock)); 54