Parallel Computing: Programming Shared Address Space Platforms (Pthread) Jin, Hai

Similar documents
Programming Shared Address Space Platforms (cont.) Alexandre David B2-206

Programming Shared Address Space Platforms (Chapter 7)

Shared Address Space Programming with Pthreads. CS 5334/4390 Spring 2014 Shirley Moore, Instructor February 18, 2014

Lecture 08: Programming with PThreads: PThreads basics, Mutual Exclusion and Locks, and Examples

Multithread Programming. Alexandre David

Agenda. Process vs Thread. ! POSIX Threads Programming. Picture source:

Xu Liu Derived from John Mellor-Crummey s COMP422 at Rice University

ANSI/IEEE POSIX Standard Thread management

POSIX Threads. HUJI Spring 2011

POSIX threads CS 241. February 17, Copyright University of Illinois CS 241 Staff

CS240: Programming in C. Lecture 18: PThreads

Pthreads: POSIX Threads

Pthreads. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Unix. Threads & Concurrency. Erick Fredj Computer Science The Jerusalem College of Technology

CS510 Operating System Foundations. Jonathan Walpole

Threads need to synchronize their activities to effectively interact. This includes:

pthreads Announcement Reminder: SMP1 due today Reminder: Please keep up with the reading assignments (see class webpage)

COSC 6374 Parallel Computation. Shared memory programming with POSIX Threads. Edgar Gabriel. Fall References

CS333 Intro to Operating Systems. Jonathan Walpole

CSE 333 SECTION 9. Threads

CSCI4430 Data Communication and Computer Networks. Pthread Programming. ZHANG, Mi Jan. 26, 2017

CS-345 Operating Systems. Tutorial 2: Grocer-Client Threads, Shared Memory, Synchronization

Posix Threads (Pthreads)

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

POSIX Threads. Paolo Burgio

Multi-threaded Programming

CS510 Operating System Foundations. Jonathan Walpole

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

Preview. What are Pthreads? The Thread ID. The Thread ID. The thread Creation. The thread Creation 10/25/2017

Thread and Synchronization

Programming with Shared Memory

real time operating systems course

Shared-Memory Paradigm Multithreading

Shared-Memory Paradigm Multithreading

Shared-Memory Paradigm Multithreading

POSIX Threads Programming

Thread Posix: Condition Variables Algoritmi, Strutture Dati e Calcolo Parallelo. Daniele Loiacono

CS533 Concepts of Operating Systems. Jonathan Walpole

Chapter 4 Threads. Images from Silberschatz 03/12/18. CS460 Pacific University 1

Lecture 4. Threads vs. Processes. fork() Threads. Pthreads. Threads in C. Thread Programming January 21, 2005

Synchronization Primitives

Modern Computer Architecture. Lecture 10 Multi-Core: pthreads, OpenMP Segmented Scan

POSIX PTHREADS PROGRAMMING

Programming Shared Address Space Platforms

Threads. Jo, Heeseung

Ricardo Rocha. Department of Computer Science Faculty of Sciences University of Porto

Oct 2 and 4, 2006 Lecture 8: Threads, Contd

Chapter 4 Concurrent Programming

CS Operating Systems: Threads (SGG 4)

CS 5523 Operating Systems: Thread and Implementation

Introduc)on to pthreads. Shared memory Parallel Programming

POSIX Threads Programming

Warm-up question (CS 261 review) What is the primary difference between processes and threads from a developer s perspective?

Threads. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Synchronization and Semaphores. Copyright : University of Illinois CS 241 Staff 1

HPCSE - I. «Introduction to multithreading» Panos Hadjidoukas

Pre-lab #2 tutorial. ECE 254 Operating Systems and Systems Programming. May 24, 2012

Process Synchronization

Lecture 19: Shared Memory & Synchronization

More Shared Memory Programming

Threads. Jo, Heeseung

Multithreading Programming II

Introduction to PThreads and Basic Synchronization

Pthreads (2) Dong-kun Shin Embedded Software Laboratory Sungkyunkwan University Embedded Software Lab.

Shared Memory Parallel Programming

CS 6400 Lecture 11 Name:

COMP 322: Fundamentals of Parallel Programming. Lecture 38: Comparison of Programming Models

Threads (SGG 4) Outline. Traditional Process: Single Activity. Example: A Text Editor with Multi-Activity. Instructor: Dr.

TCSS 422: OPERATING SYSTEMS

CS 333 Introduction to Operating Systems Class 4 Concurrent Programming and Synchronization Primitives

Data Races and Deadlocks! (or The Dangers of Threading) CS449 Fall 2017

CSE 333 Section 9 - pthreads

Chapter 5: Achieving Good Performance

Synchronization and Semaphores. Copyright : University of Illinois CS 241 Staff 1

Paralleland Distributed Programming. Concurrency

CPSC 341 OS & Networks. Threads. Dr. Yingwu Zhu

THREADS. Jo, Heeseung

CSci 4061 Introduction to Operating Systems. Synchronization Basics: Locks

LSN 13 Linux Concurrency Mechanisms

pthreads CS449 Fall 2017

CS 153 Lab4 and 5. Kishore Kumar Pusukuri. Kishore Kumar Pusukuri CS 153 Lab4 and 5

Threads. lightweight processes

Threads. Threads (continued)

Processes, Threads, SMP, and Microkernels

Programming with Shared Memory. Nguyễn Quang Hùng

Shared Memory Programming Models III

CS 3305 Intro to Threads. Lecture 6

About me Now that you know the pthread API

Programming Languages

C Grundlagen - Threads

A Programmer s View of Shared and Distributed Memory Architectures

für Mathematik in den Naturwissenschaften Leipzig

Running example. Threads. Overview. Programming Shared-memory Machines 3/21/2018

High Performance Computing Course Notes Shared Memory Parallel Programming

Computer Systems Laboratory Sungkyunkwan University

Introduction to parallel computing

CS345 Opera,ng Systems. Φροντιστήριο Άσκησης 2

Outline. CS4254 Computer Network Architecture and Programming. Introduction 2/4. Introduction 1/4. Dr. Ayman A. Abdel-Hamid.

CS 470 Spring Mike Lam, Professor. Semaphores and Conditions

POSIX Threads: a first step toward parallel programming. George Bosilca

Transcription:

Parallel Computing: Programming Shared Address Space Platforms (Pthread) Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology

Thread Basics! All memory in the logical machine model of a thread is globally accessible to every thread.! The stack corresponding to the function call is generally treated as being local to the thread for liveness reasons. 2

Thread Basics 3

Thread Basics The logical machine model of a thread-based parallel programming paradigm. 4

Thread Basics! Cannot assume any order of execution.! Any such order must be explicitly established using synchronization mechanisms.! Issues on deadlock, starvation and performance. 5

Pthreads API! The Pthreads API is defined in the ANSI/ IEEE POSIX 1003.1-1995 standard.! The subroutines which comprise the Pthreads API can be informally grouped into three major classes:! Thread management! Mutexes! Condition variables 6

Thread Management! The first class of functions work directly on threads - creating, detaching, joining, etc.! They include functions to set/query thread attributes (joinable, scheduling etc.) 7

Creating Threads! Initially, only a single, default thread. All other threads must be explicitly created by the programmer.! routine: pthread_create(thread,attr,start_routine,arg)! thread: unique identifier for the new thread (pthread_t)! attr: attribute object used to set thread attributes. (pthread_attr) You can specify a thread attributes object, or NULL for the default values.! start_routine: C routine that the thread will execute.! arg: A single argument that may be passed to start_routine. It must be passed by reference. NULL may be used if no argument is to be passed.! If successful, the pthread_create() function shall return zero; otherwise, an error number shall be returned to indicate the error. 8

Thread Attributes! By default, a thread is created with certain attributes. Some of these attributes can be changed by the programmer via the thread attribute object.! pthread_attr_init(attr) and pthread_attr_destroy(attr) are used to initialize/destroy the thread attribute object.! Other routines are then used to query/set specific attributes in the thread attribute object. 9

Terminating Thread! There are several ways in which a Pthread may be terminated:! The thread makes a call to the pthread_exit() subroutine.! The thread is cancelled by another thread via the pthread_cancel() routine.! The entire process is terminated due to a call to the exit subroutine. 10

Terminating Thread! routine: pthread_exit(status)! used to explicitly exit a thread.! The programmer may optionally specify a termination status, which is stored as a void pointer for any thread that may join the calling thread.! Cleanup: the pthread_exit() routine does not close files; any files opened inside the thread will remain open after the thread is terminated. 11

Example #include <pthread.h> #include <stdio.h> #include <stdlib.h> #define NUM_THREADS 5 void *PrintHello(void *threadid) { int tid; tid = (int) threadid; printf("hello World! It's me, thread #%d!\n", tid); pthread_exit(null); 12

Example int main(int argc, char *argv[]) { pthread_t threads[num_threads]; int rc, t; for(t=0;t<num_threads;t++){ printf("in main: creating thread %d\n", t); rc = pthread_create(&threads[t], NULL, PrintHello, (void *)t); if (rc){ printf("error; return code from pthread_create() is %d\n", rc); exit(-1); pthread_exit(null); 13

Passing Arguments to Threads! The pthread_create() routine permits the programmer to pass one argument to the thread start routine.! For cases where multiple arguments must be passed, this limitation is easily overcome by creating a structure which contains all of the arguments, and then passing a pointer to that structure in the pthread_create () routine.! All arguments must be passed by reference and cast to (void *). 14

Passing Arguments to Threads struct two_args { int arg1; int arg2; ; void *needs_2_args(void *ap) { struct two_args *argp; int a1, a2; argp = (struct two_args *) ap; a1 = argp->arg1; a2 = argp->arg2; free (argp); pthread_exit(null); 15

Passing Arguments to Threads int main(int argc, char *argv[]) { pthread_t t; struct two_args *ap; int rc; ap = (struct two_args *) malloc (sizeof (struct two_args)); ap->arg1 = 1; ap->arg2 = 2; rc = pthread_create(&t, NULL, needs_2_args, (void *) ap); pthread_exit(null); 16

Joining & Detaching Threads! Routines: pthread_join (threadid,status) pthread_detach (threadid,status) pthread_attr_setdatachstate (attr,detachstate) pthread_attr_getdetachstate (attr,datachstate)! "Joining" is one way to accomplish synchronization between threads.! The pthread_join() subroutine blocks the calling thread until the specified threadid thread terminates.! The programmer is able to obtain the target thread's termination return status if it was specified in the target thread's call to pthread_exit(). 17

Joining & Detaching Threads! When a thread is created, one of its attributes defines whether it is joinable or detached.! Only threads that are created as joinable can be joined. If a thread is created as detached, it can never be joined.! To explicitly create a thread as joinable or detached, the attr argument in the pthread_create() routine is used:! Declare a pthread attribute variable of the pthread_attr_t data type! Initialize the attribute variable with pthread_attr_init()! Set the attribute detached status with pthread_attr_setdetachstate()! When done, free library resources used by the attribute with pthread_attr_destroy() 18

Example void *BusyWork(void *null) { pthread_exit((void *) 0); 19

Example int main (int argc, char *argv[]) { pthread_attr_t attr; int rc, t; void *status; /* Initialize and set thread detached attribute */ pthread_attr_init(&attr); pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE); /* Free attribute and wait for the other threads */ pthread_attr_destroy(&attr); for(t=0; t<num_threads; t++) { rc = pthread_join(thread[t], &status); printf("completed join with thread %d status= %ld\n",t, (long)status); pthread_exit(null); 20

Synchronization Issues! When multiple threads attempt to manipulate the same data item, the results can often be incoherent if proper care is not taken to synchronize them.! Consider: /* each thread tries to update variable best_cost as follows */ if (my_cost < best_cost) best_cost = my_cost;! Assume that there are two threads, the initial value of best_cost is 100, and the values of my_cost are 50 and 75 at threads t1 and t2.! Depending on the schedule of the threads, the value of best_cost could be 50 or 75! 21

Mutex! The second class of functions deal with synchronization, called a "mutex", which is an abbreviation for "mutual exclusion".! Mutex functions provide for creating, destroying, locking and unlocking mutexes.! They are also supplemented by mutex attribute functions that set or modify attributes associated with mutexes. 22

Creating & Destroying Mutex! Routines: pthread_mutex_init (mutex,attr) pthread_mutex_destroy (mutex) pthread_mutexattr_init (attr) pthread_mutexattr_destroy (attr)! Mutex must be declared with type pthread_mutex_t, and must be initialized before they can be used.! There are two ways to initialize a mutex variable:! Statically, when it is declared. For example: pthread_mutex_t mymutex = PTHREAD_MUTEX_INITIALIZER;! Dynamically, with the pthread_mutex_init() routine. This method permits setting mutex object attributes, attr (which may be specified as NULL to accept defaults).! The mutex is initially unlocked. 23

Locking & Unlocking Mutex! Routines: pthread_mutex_lock (mutex) pthread_mutex_trylock (mutex) pthread_mutex_unlock (mutex)! pthread_mutex_lock() is used by a thread to acquire a lock on the specified mutex variable.! pthread_mutex_unlock() will unlock a mutex if called by the owning thread. An error will be returned if:! the mutex was already unlocked! the mutex is owned by another thread! pthread_mutex_trylock() will attempt to lock a mutex. However, if the mutex is already locked, the routine will return immediately with an EBUSY" error code. This routine may be useful in preventing deadlock conditions. 24

Example 1! We can now write our previously incorrect code segment as: pthread_mutex_t minimum_value_lock;... main() {... pthread_mutex_init(&minimum_value_lock, NULL);... void *find_min(void *list_ptr) {... pthread_mutex_lock(&minimum_value_lock); if (my_cost < best_cost) best_cost = my_cost; pthread_mutex_unlock(&minimum_value_lock); 25

Example 2! The producer-consumer scenario imposes the following constraints:! The producer thread must not overwrite the shared buffer when the previous task has not been picked up by a consumer thread.! The consumer threads must not pick up tasks until there is something present in the shared data structure.! Individual consumer threads should pick up tasks one at a time. 26

Example 2 pthread_mutex_t task_queue_lock; int task_available;... main() {... task_available = 0; pthread_mutex_init(&task_queue_lock, NULL);... 27

Example 2 void *producer(void *producer_thread_data) {... while (!done()) { inserted = 0; create_task(&my_task); while (inserted == 0) { pthread_mutex_lock(&task_queue_lock); if (task_available == 0) { insert_into_queue(my_task); task_available = 1; inserted = 1; pthread_mutex_unlock(&task_queue_lock); 28

Example 2 void *consumer(void *consumer_thread_data) {... while (!done()) { extracted = 0; while (extracted == 0) { pthread_mutex_lock(&task_queue_lock); if (task_available == 1) { extract_from_queue(&my_task); task_available = 0; extracted = 1; pthread_mutex_unlock(&task_queue_lock); process_task(my_task); 29

Overheads of Locking! Locks represent serialization points since critical sections must be executed by threads one after another.! Encapsulating large segments of the program within locks can lead to significant performance degradation.! It is often possible to reduce the idling overhead associated with locks using pthread_mutex_trylock. 30

Alleviating Locking Overhead /* using pthread_mutex_trylock routine */ Pthread_mutax_t trylock_lock = PTHREAD_MUTEX_INITIALIZER; lock_status=pthread_mutex_trylock(&trylock_lock); if (lock_status == EBUSY) { /* do something else */ else { /* do one thing */ pthread_mutex_unlock(&trylock_lock); 31

Monitor! Mutexs provide powerful synchronization tools. However,! Lock() and unlock() are scattered among several threads. Therefore, it is difficult to understand their effects! Usage must be correct in all the threads.! One bad thread (or one programming error) can kill the whole system.! Monitor is a high-level abstraction that may provide a convenient and effective mechanism for thread synchronization 32

Monitor! Local data variables are accessible only by the monitor! thread enters monitor by invoking one of its procedures! Only one thread may be executing in the monitor at a time 33

Monitor with Condition Variables! Monitor does not need to code certain synchronization constraints explicitly.! However, it is not sufficiently powerful for modeling some other synchronization schemes.! An additional synchronization mechanism, i.e., condition variable, is required. 34

Condition Variables! The third class of functions address communications between threads that share a mutex.! A condition variable allows a thread to block itself until specified data reaches a predefined state.! A condition variable indicates an event and has no value.! One cannot store a value into nor retrieve a value from a condition variable.! If a thread must wait for an event to occur, that thread waits on the corresponding condition variable.! A condition variable has a queue for those threads that are waiting the corresponding event to occur to wait on.! If another thread causes the event to occur, that thread simply signals the corresponding condition variable. 35

Condition Variables! This class includes functions to create, destroy, wait and signal based upon specified variable values.! Functions to set/query condition variable attributes are also included.! A condition variable is always used in conjunction with a mutex lock. 36

Creating & Destroying Condition Variables! Routines: pthread_cond_init (condition, attr) pthread_cond_destroy (condition) pthread_condattr_init (attr) pthread_condattr_destroy (attr)! Condition variables must be declared with type pthread_cond_t, and must be initialized before they can be used.! There are two ways to initialize a condition variable:! Statically, when it is declared. For example: pthread_cond_t myconvar = PTHREAD_COND_INITIALIZER;! Dynamically, with the pthread_cond_init() routine. This method permits setting condition variable object attributes, attr (which may be specified as NULL to accept defaults). 37

Waiting and Signaling on Condition Variables! Routines: pthread_cond_wait (condition, mutex) pthread_cond_signal (condition) pthread_cond_broadcast (condition)! pthread_cond_wait() blocks the calling thread until the specified condition is signalled.! This routine should be called while mutex is locked, and it will automatically release the mutex while it waits.! After signal is received and thread is awakened, mutex will be automatically locked for use by the thread.! The programmer is then responsible for unlocking mutex when the thread is finished with it. 38

Waiting and Signaling on Condition Variables! pthread_cond_signal() is used to signal (or wake up) another thread which is waiting on the condition variable.! It should be called after mutex is locked, and! must unlock mutex in order for pthread_cond_wait() routine to complete.! The pthread_cond_broadcast() routine unlocks all of the threads blocked on the condition variable. 39

Waiting and Signaling on Condition Variables! Proper locking and unlocking of the associated mutex variable is essential when using these routines. For example:! Failing to lock the mutex before calling pthread_cond_wait() may cause it NOT to block.! Failing to unlock the mutex after calling pthread_cond_signal() may not allow a matching pthread_cond_wait() routine to complete (it will remain blocked). 40

Producer-Consumer Using Condition Variables pthread_cond_t cond_queue_empty, cond_queue_full; pthread_mutex_t task_queue_cond_lock; int task_available; /* other data structures here */ main() { /* declarations and initializations */ task_available = 0; pthread_cond_init(&cond_queue_empty, NULL); pthread_cond_init(&cond_queue_full, NULL); pthread_mutex_init(&task_queue_cond_lock, NULL); /* create and join producer and consumer threads */ 41

Producer-Consumer Using Condition Variables void *producer(void *producer_thread_data) { while (!done()) { create_task(); pthread_mutex_lock(&task_queue_cond_lock); while (task_available == 1) pthread_cond_wait(&cond_queue_empty, &task_queue_cond_lock); insert_into_queue(); task_available = 1; pthread_cond_signal(&cond_queue_full); pthread_mutex_unlock(&task_queue_cond_lock); 42

Producer-Consumer Using Condition Variables void *consumer(void *consumer_thread_data) { while (!done()) { pthread_mutex_lock(&task_queue_cond_lock); while (task_available == 0) pthread_cond_wait(&cond_queue_full, &task_queue_cond_lock); my_task = extract_from_queue(); task_available = 0; pthread_cond_signal(&cond_queue_empty); pthread_mutex_unlock(&task_queue_cond_lock); process_task(my_task); 43

Composite Synchronization Constructs! By design, Pthreads provide support for a basic set of operations.! Higher level constructs can be built using basic synchronization constructs. 44

! To find the solution for a twodimensional Laplace equation simply: 1. Initialise T ij to some initial guess. 2. Apply the boundary conditions. 3. For each internal mesh point set " T ij *!=!(T (i+1)j!+! T (i-1)j!+! T i(j+1)!+! T i(j-1) )/4. 4. Replace old solution T with new estimate T*. 5. If solution does not satisfy tolerance, repeat from step 2. 45

! We need a new synchronization structure called barrier to synchronize the computation between consecutive iteration steps no thread can come across the barrier till all the threads arrive.! For barrier we need! Total number of threads considered! Number of threads have arrived! A mutex and a condition variable 46

typedef struct { pthread_mutex_t count_lock; pthread_cond_t ok_to_proceed; int count; mylib_barrier_t; void mylib_init_barrier(mylib_barrier_t *b) { b -> count = 0; pthread_mutex_init(&(b -> count_lock), NULL); pthread_cond_init(&(b -> ok_to_proceed), NULL); 47

void mylib_barrier (mylib_barrier_t *b, int num_threads) { pthread_mutex_lock(&(b -> count_lock)); b -> count ++; if (b -> count == num_threads) { b -> count = 0; pthread_cond_broadcast(&(b -> ok_to_proceed)); else while (pthread_cond_wait(&(b -> ok_to_proceed), &(b -> count_lock))!= 0); pthread_mutex_unlock(&(b -> count_lock)); 48

Read-Write Locks! In many applications, a data structure is read frequently but written infrequently. For such applications, we should use read-write locks.! A read lock is granted when there are other threads that may already have read locks.! If there is a write lock on the data (or if there are queued write locks), the thread performs a condition wait.! If there are multiple threads requesting a write lock, they must perform a condition wait.! With this description, we can design functions for! read locks mylib_rwlock_rlock,! write locks mylib_rwlock_wlock, and! unlocking mylib_rwlock_unlock. 49

Read-Write Locks! The lock data type mylib_rwlock_t holds the following:! a count of the number of readers,! the writer (a 0/1 integer specifying whether a writer is present),! a count pending_writers of pending writers,! a condition variable readers_proceed that is signaled when readers can proceed,! a condition variable writer_proceed that is signaled when one of the writers can proceed, and! a mutex read_write_lock associated with the shared data structure 50

Read-Write Locks typedef struct { int readers; int writer; int pending_writers; pthread_cond_t readers_proceed; pthread_cond_t writer_proceed; pthread_mutex_t read_write_lock; mylib_rwlock_t; void mylib_rwlock_init (mylib_rwlock_t *l) { l->readers = l->writer = l->pending_writers = 0; pthread_mutex_init(&(l->read_write_lock), NULL); pthread_cond_init(&(l->readers_proceed), NULL); pthread_cond_init(&(l->writer_proceed), NULL); 51

Read-Write Locks void mylib_rwlock_rlock(mylib_rwlock_t *l) { /* if there is a write lock, perform condition wait. else increment count of readers and grant read lock */ pthread_mutex_lock(&(l->read_write_lock)); while (l->writer > 0) pthread_cond_wait(&(l->readers_proceed), &(l->read_write_lock)); l->readers ++; pthread_mutex_unlock(&(l->read_write_lock)); 52

Read-Write Locks void mylib_rwlock_wlock(mylib_rwlock_t *l) { /* if there are readers or writers, increment pending writers count and wait. On being awoken, decrement pending writers count and increment writer count */ pthread_mutex_lock(&(l->read_write_lock)); l->pending_writers ++; while ((l->writer > 0) (l->readers > 0)) { pthread_cond_wait(&(l->writer_proceed), &(l->read_write_lock)); l->pending_writers --; l->writer ++; pthread_mutex_unlock(&(l->read_write_lock)); 53

Read-Write Locks void mylib_rwlock_unlock(mylib_rwlock_t *l) { /* if there is a write lock then unlock, else if there are read locks, decrement count of read locks. If the count is 0 and there is a pending writer, let it through, else if there are pending readers, let them all go through */ pthread_mutex_lock(&(l->read_write_lock)); if (l->writer > 0) l->writer = 0; else if (l->readers > 0) l->readers --; if ((l->readers == 0) && (l->pending_writers > 0)) pthread_cond_signal(&(l->writer_proceed)); else if (l->readers > 0) pthread_cond_broadcast(&(l->readers_proceed)); pthread_mutex_unlock(&(l->read_write_lock)); 54