Thread Disclaimer: some slides are adopted from the book authors slides with permission 1
IPC Shared memory Recap share a memory region between processes read or write to the shared memory region fast communication synchronization is very difficult Message passing exchange messages (send and receive) typically involves data copies (to/from buffer) synchronization is easier slower communication 2
Process Address space Recap The process s view of memory Includes program code, global variables, dynamic memory, stack Processor state Program counter (PC), stack pointer, and other CPU registers OS resources Various OS resources that the process uses E.g.) open files, sockets, accounting information 3
main() { char *s, buf[1024]; int fds[2]; s = Hello World\n"; Recap: Pipes /* create a pipe */ pipe(fds); (*) Img. source: http://beej.us/guide/bgipc/output/html/multipage/pipes.html /* create a new process using fork */ if (fork() == 0) { } /* child process. All file descriptors, including pipe are inherited, and copied.*/ write(fds[1], s, strlen(s)); exit(0); } /* parent process */ read(fds[0], buf, strlen(s)); write(1, buf, strlen(s)); 4
Concurrent Programs Objects (tanks, planes, ) are moving simultaneously Now, imagine you implement each object as a process. Any problems? 5
Why Processes Are Not Always Ideal? Not memory efficient Own address space (page tables) OS resources: open files, sockets, pipes, Sharing data between processes is not easy No direct access to others address space Need to use IPC mechanisms 6
Better Solutions? We want to run things concurrently i.e., multiple independent flows of control We want to share memory easily Protection is not really big concern Share code, data, files, sockets, We want do these things efficiently Don t want to waste memory Performance is very important 7
Thread 8
Thread in OS Lightweight process Process Address space CPU context: PC, registers, stack, OS resources Thread Address space CPU context: PC, registers, stack, OS resources Thread Process Thread 9
Thread in Architecture Logical processor http://www.pcstats.com/articleview.cfm?articleid=1302 10
Thread Lightweight process Own independent flown of control (execution) Stack, thread specific data (tid, ) Everything else (address space, open files, ) is shared Shared Private - Program code - (Most) data - Open files, sockets, pipes - Environment (e.g., HOME) - Registers - Stack - Thread specific data - Return value 11
Process vs. Thread Figure source: https://computing.llnl.gov/tutorials/pthreads/ 12
Process vs. Thread Figure source: https://computing.llnl.gov/tutorials/pthreads/ 13
Thread Benefits Responsiveness Simple model for concurrent activities. No need to block on I/O Resource Sharing Easier and faster memory sharing (but be aware of synchronization issues) Economy Reduces context-switching and space overhead better performance Scalability Exploit multicore CPU 14
Thread Programming in UNIX Pthread IEEE POSIX standard threading API Pthread API Thread management create, destroy, detach, join, set/query thread attributes Synchronization Mutexes lock, unlock Condition variables signal/wait 15
Pthread API pthread_attr_init initialize the thread attributes object int pthread_attr_init(pthread_attr_t *attr); defines the attributes of the thread created pthread_create create a new thread int pthread_create(pthread_t *restrict thread, const pthread_attr_t *restrict attr, void *(*start_routine)(void*), void *restrict arg); upon success, a new thread id is returned in thread pthread_join wait for thread to exit int pthread_join(pthread_t thread, void **value_ptr); calling process blocks until thread exits pthread_exit terminate the calling thread void pthread_exit(void *value_ptr); make return value available to the joining thread 16
#include <pthread.h> #include <stdio.h> Pthread Example 1 int sum; /* data shared by all threads */ void *runner (void *param) { int i, upper = atoi(param); sum = 0; for(i=1 ; i<=upper ; i++) sum += i; pthread_exit(0); } Quiz: Final ouput? $./a.out 10 sum = 55 int main (int argc, char *argv[]) { pthread_t tid; /* thread identifier */ pthread_attr_t attr; pthread_attr_init(&attr); } /* create the thread */ pthread_create(&tid, &attr, runner, argv[1]); /* wait for the thread to exit */ pthread_join(tid, NULL); fprintf(stdout, sum = %d\n, sum); 17
#include <pthread.h> #include <stdio.h> int arraya[10], arrayb[10]; void *routine1(void *param) { int var1, var2 } void *routine2(void *param) { int var1, var2, var3 } Pthread Example 2 int main (int argc, char *argv[]) { /* create the thread */ pthread_create(&tid[0], &attr, routine1, NULL); pthread_create(&tid[1], &attr, routine2, NULL); pthread_join(tid[0]); pthread_join(tid[1]); } 18
User-level Threads Kernel is unaware of threads Early UNIX and Linux did not support threads Threading runtime Handle context switching Setjmp/longjmp, Advantage No kernel support Fast (no kernel crossing) Disadvantage Blocking system call. What happens? 19
Kernel-level Threads Native kernel support for threads Most modern OS (Linux, Windows NT) Advantage No threading runtime Native system call handing Disadvantage Overhead 20
Hybrid Threads Many kernel threads to many user threads Best of both worlds? 21
Threads: Advanced Topics Signal handling Thread pool Multicore 22
What is Singal? Signal Handling $ man 7 signal OS to process notification hey, wake-up, you ve got a packet on your socket, hey, wake-up, your timer is just expired. Which thread to deliver a signal? Any thread e.g., kill(pid) Specific thread E.g., pthread_kill(tid) 23
Thread Pool Managing threads yourself can be cumbersome and costly Repeat: create/destroy threads as needed. Let s create a set of threads ahead of time, and just ask them to execute my functions #of thread ~ #of cores No need to create/destroy many times Many high-level parallel libraries use this. e.g., Intel TBB (threading building block), 24
Single Core Vs. Multicore Execution Single core execution Multiple core execution 25
Synchronization Disclaimer: some slides are adopted from the book authors slides with permission 26
Mutual exclusion Agenda Peterson s algorithm (Software) Synchronization instructions (Hardware) Spinlock High-level synchronization mechanisms Mutex Semaphore Monitor 27
Producer/Consumer Producer Thread Buffer[10] Consumer Thread 28
Producer/Consumer Producer while (true){ /* wait if buffer full */ while (counter == 10); /* produce data */ buffer[in] = sdata; in = (in + 1) % 10; Consumer while (true){ /* wait if buffer empty */ while (counter == 0); /* consume data */ sdata = buffer[out]; out = (out + 1) % 10; } /* update number of items in buffer */ counter++; } /* update number of items in buffer */ counter--; 29
Producer/Consumer Producer while (true){ /* wait if buffer full */ while (counter == 10); /* produce data */ buffer[in] = sdata; in = (in + 1) % 10; Consumer while (true){ /* wait if buffer empty */ while (counter == 0); /* consume data */ sdata = buffer[out]; out = (out + 1) % 10; } /* update number of items in buffer */ R1 = load (counter); R1 = R1 + 1; counter = store (R1); } /* update number of items in buffer */ R2 = load (counter); R2 = R2 1; counter = store (R2); 30
Check Yourself int count = 0; int main() { count = count + 1; return count; } $ gcc O2 S sync.c movl addl movl count(%rip), %eax $1, %eax %eax, count(%rip)
Race Condition Initial condition: counter = 5 Thread 1 Thread 2 R1 = load (counter); R1 = R1 + 1; counter = store (R1); R2 = load (counter); R2 = R2 1; counter = store (R2); What are the possible outcome? 32
Race Condition Initial condition: counter = 5 R1 = load (counter); R1 = R1 + 1; counter = store (R1); R2 = load (counter); R2 = R2 1; counter = store (R2); R1 = load (counter); R1 = R1 + 1; R2 = load (counter); R2 = R2 1; counter = store (R1); counter = store (R2); R1 = load (counter); R1 = R1 + 1; R2 = load (counter); R2 = R2 1; counter = store (R2); counter = store (R1); counter = 5 counter = 4 counter = 6 Why this happens? 33
Race Condition A situation when two or more threads read and write shared data at the same time Correctness depends on the execution order Thread 1 Thread 2 R1 = load (counter); R1 = R1 + 1; counter = store (R1); read write R2 = load (counter); R2 = R2 1; counter = store (R2); How to prevent race conditions? 34
Critical Section Code sections of potential race conditions Thread 1 Thread 2 Do something.. R1 = load (counter); R1 = R1 + 1; counter = store (R1);... Do something Do something.. R2 = load (counter); R2 = R2 1; counter = store (R2);.. Do something Critical sections 35
Solution Requirements Mutual Exclusion If a thread executes its critical section, no other threads can enter their critical sections Progress If no one executes a critical section, someone can enter its critical section Bounded waiting Waiting (time/number) must be bounded 36
Simple Solution (?): Use a Flag // wait while (in_cs) ; // enter critical section in_cs = true; Do something // exit critical section in_cs = false; T1 while(in_cs){}; in_cs = true; //enter T2 while(in_cs){}; in_cs = true; //enter Mutual exclusion is not guaranteed 37
Peterson s Solution Software solution (no h/w support) Two process solution Multi-process extension exists The two processes share two variables: int turn; The variable turn indicates whose turn it is to enter the critical section Boolean flag[2] The flag array is used to indicate if a process is ready to enter the critical section. 38
Peterson s Solution Thread 1 Thread 2 do { flag[0] = TRUE; turn = 1; while (flag[1] && turn==1) {}; // critical section flag[0] = FALSE; // remainder section } while (TRUE) do { flag[1] = TRUE; turn = 0; while (flag[0] && turn==0) {}; // critical section flag[1] = FALSE; // remainder section } while (TRUE) Solution meets all three requirements Mutual exclusion: P0 and P1 cannot be in the critical section at the same time Progress: if P0 does not want to enter critical region, P1 does no waiting Bounded waiting: process waits for at most one turn 39
Peterson s Solution Limitations Only supports two processes generalizing for more than two processes has been achieved, but not very efficient Assumes LOAD and STORE instructions are atomic In reality, no guarantees Assumes memory accesses are not reordered compiler re-orders instructions (gcc O2, -O3, ) Out-of-order processor re-orders instructions 40
Reordering by the CPU Initially X = Y = 0 Thread 0 Thread 1 Thread 0 Thread 1 Possible values of R1 and R2? 0,1 1,0 1,1 X = 1 R1 = Y Y = 1 R2 = X 0,0 possible on PC R1 = Y X = 1 R2 = X Y = 1 41
Summary Peterson s algorithm Software-only solution Pros Turn based No hardware support Satisfy all requirements Cons Mutual exclusion, progress, bounded waiting Complicated Assume program order May not work on out-of-order processors 42
Race condition Recap A situation when two or more threads read and write shared data at the same time Correctness depends on the execution order Critical section Code sections of potential race conditions Mutual exclusion If a thread executes its critical section, no other threads can enter their critical sections 43
Recap Peterson s algorithm Software-only solution Pros Turn based No hardware support Satisfy all requirements Cons Mutual exclusion, progress, bounded waiting Complicated Assume program order May not work on out-of-order processors 44
Today Hardware support Synchronization instructions Lock Spinlock Mutex 45
General solution Lock Protect critical section via a lock Acquire on enter, release on exit do { acquire lock; critical section release lock; remainder section } while(true); 46
How to Implement a Lock? Unicore processor No true concurrency one thread at a time Threads are interrupted by the OS scheduling events: timer interrupt, device interrupts Disabling interrupt Threads can t be interrupted do { disable interrupts; critical section enable interrupts; remainder section } while(true); 47
How to Implement a Lock? Multicore processor True concurrency More than one active threads sharing memory Disabling interrupts don t solve the problem More than one threads are executing at a time Hardware support Synchronization instructions Atomic test&set instruction Atomic compare&swap instruction What do we mean by atomic? All or nothing 48
Pseudo code TestAndSet Instruction boolean TestAndSet (boolean *target) { boolean rv = *target; *target = TRUE; return rv: } 49
Spinlock using TestAndSet int mutex; init_lock (&mutex); do { lock (&mutex); critical section unlock (&mutex); remainder section } while(true); void init_lock (int *mutex) { *mutex = 0; } void lock (int *mutex) { while(testandset(mutex)) ; } void unlock (int *mutex) { *mutex = 0; } 50
CAS (Compare & Swap) Instruction Pseudo code int CAS(int *value, int oldval, int newval) { int temp = *value; if (*value == oldval) *value = newval; return temp; } 51
Spinlock using CAS int mutex; init_lock (&mutex); do { lock (&mutex); critical section unlock (&mutex); remainder section } while(true); void init_lock (int *mutex) { *mutex = 0; } void lock (int *mutex) { while(cas(&mutex, 0, 1)!= 0); } void unlock (int *mutex) { *mutex = 0; } 52
What s Wrong With Spinlocks? Very wasteful Waiting thread continues to use CPU cycles While doing absolutely nothing but wait 100% CPU utilization, but no useful work done Power consumption, fan noise, Useful when You hold the lock only briefly Otherwise A better solution is needed 53