Concurrent Programming I. Anna Lina Ruscelli Scuola Superiore Sant Anna

Concurrent Programming I Anna Lina Ruscelli Scuola Superiore Sant Anna

Contact info Email a.ruscelli@sssup.it RTOS course web page http://retis.sssup.it/~marko/rtos.html RTOS course mailing list rtos-arezzo@retis.sssup.it Many course slides and other material are courtesy of Prof. Giuseppe Lipari Ing. Paolo Gai 2

Reference material PDF notes and slides (available on the web page) P.Ancilotti e M.Boari, Principi e tecniche di programmazione concorrente, Utet libreria, 1987 (In biblioteca, SOLO pagine 67-107). Paolo Ancilotti, Maurelio Boari, Anna Ciampolini e Giuseppe Lipari, Sistemi Operativi, Mc-Graw Hill, June 2004 Linux man pages (as a reference for POSIX programming). Notes on concurrent programming in UNIX systems. Other reference books are available on the web page 3

Outline Introduction to concurrency Model of concurrency: shared memory Critical Sections Synchronization Semaphores 4

The need of Concurrency There are many reasons for concurrency functional performance expressive power Functional many users may be connected to the same system at the same time each user can have its own processes that execute concurrently with the processes of the other users perform many operations concurrently for example, listen to the music, write with a word processor, burn a CD, etc... they are all different and independent activities they can be done at the same time 5

The need of Concurrency (2) Performance take advantage of blocking time while some thread waits for a blocking condition, another thread performs another operation parallelism in multi-processor machines if we have a multi-processor machine, independent activities can be carried out on different processors at the same time Expressive power many control applications are inherently concurrent concurrency support helps in expressing concurrency, making application development simpler 6

Concurrency model (theoretical) A system is a set of concurrent activities they can be processes or threads They interact in two ways they access the hardware resources (processor, disk, memory, etc.) they exchange data These activities compete for the resources and/or cooperate for some common objective 7

Resource A resource can be a HW resource like a I/O device a SW resource, i.e. a data structure in both cases, access to a resource must be regulated to avoid interference example 1 if two processes want to print on the same printer, their access must be sequentialized, otherwise the two printing could be intermingled! example 2 if two threads access the same data structure, the operation on the data must be sequentialised otherwise the data could be inconsistent! 8

Interaction model Activities can interact according to two fundamental models shared memory All activities access the same memory space message passing All activities communicate each other by sending messages through OS primitives we will analyze both models in the following slides 9

Cooperative vs. Competitive The interaction between concurrent activities (threads or processes) can be classified into: competitive concurrency different activities compete for the resources one activity does not know anything about the other ones the OS must manage the resources so to avoid conflicts be fair cooperative concurrency many activities cooperate to perform an operation every activity knows about the others they must synchronize on particular events interference 10

Competition Cooperative and competitive activities need different models of execution and synchronization competing activities need to be protected from each other separate memory spaces the allocation of the resource and the synchronization must be centralized competitive activities request for services to a central manager (the OS or some dedicated process) which allocates the resources in a fair way Client/Server model communication is usually done through messages the process model of execution is the best one 11

Competition (2) In a client/server system a server manages the resource exclusively for example, the printer if a process needs to access the resource, it sends a request to the server for example, printing a file, or asking for the status the server can send back the responses the server can also be on a remote system two basic primitives send and receive Server Client 1 Client 2 12

Cooperation Cooperative activities know about each other they do not need memory protection not using memory protection, we have less overhead they need to access the same data structures allocation of the resource is de-centralized shared memory is the best model the thread model of execution is the best one 13

Cooperation and Competition Competition is best resolved by using the message passing model however it can be implemented using a shared memory paradigm too Cooperation is best implemented by using the shared memory paradigm however, it can be realized by using pure message passing mechanisms shared memory or message passing? in the past, there were OSs that supported only shared memory or only message passing 14

Cooperation and Competition (2) A general purpose OS needs to support both models we need at least protection for competing activities we need to support client/server models. So we need messag passing primitives we need to support shared memory for reducing the overhead some special OS supports only one of the two for example, many RTOS support only shared memory 15

Interference There is a third kind of interaction, the interference It is due to two kinds of programming errors: interactions between processes that are not required by the semantic of the problem erroneous solution to the problems of interaction interference problems are usually time-dependent problems 16

Model of Concurrency Shared memory Critical section Synchronization 17

Shared memory Shared memory communication it was the first one to be supported in old OSs it is the simplest one and the closest to the machine all threads can access the same memory locations 18

Hardware analogy An abstract model that presents a good analogy is the following many HW CPU, each one running one activity (thread) one shared memory 19

Resource allocation Allocation of resource can be static: once the resource is granted, it is never revoked dynamic: resource can be granted and revoked dynamically manager Access to a resource can be dedicated: one activity at time only is granted access to the resource shared: many activities can access the resource at the same time mutual exclusion 20

Mutual exclusion problem We do not know in advance the relative speed of the processes hence, we do not know the order of execution of the hardware instructions recall the example of incrementing variable x incrementing x is not an atomic operation atomic behavior can be obtained using interrupt disabling or special atomic instructions 21

Example 1 /* Shared memory */ int x; void *threada(void *) {...; x = x + 1;...; void *threadb(void *) {...; x = x + 1;...;... LD R0, x (TA) x = 0, R0=0 LD R0, x (TB) x = 0, R0=0 INC R0 (TB) x = 0, R0=1 ST x, R0 (TB) x = 1, R0=0 INC R0 (TA) x = 1, R0=1 ST x, R0 (TA) x = 1, R0=0... Bad interleaving! 22

Example 2 // Shared object (sw resource) struct A_t { int a; int b; A; void A_init(A_t *x) { x->a=1; x->b=1; void A_inc(A_t *x) { x->a++; x->b++; void A_mul(A_t *x) { x->b*=2; x->a *=2 Consistency: After each operation, a == b a = a + 1; TA a = 2 b = b * 2; TB b = 2 b = b + 1; TA b = 3 a = a * 2; TB a = 4 void *threada(void *) {... A_inc(&A);... void * threadb(void *) {... A_mul(&B);... Resource in a non-consistent state!! 23

Consistency For any resource, we can state a set of consistency properties a consistency property Ci is a boolean expression on the values of the internal variables a consistency property must hold before and after each operation it does not need to hold during an operation if the operations are properly sequentialized, the consistency properties will always hold Formal verification let R be a resource and let C(R) be a set of consistency properties on the resource R C(R) = {Ci Definition: A concurrent program is correct if, for every possible interleaving of the operations on the resource, Ci C(R), Ci holds. 24

Example 3: Circular arrayimplementation of a FIFO queue tail head head: index of the first free element in the queue here will be inserted the next element tail: index of the first occupied element in the queue will be the one that will be extracted next time 25

Circular array: implementation of a FIFO queue 26 struct CA { int array[10]; int head, tail, num; void init(struct CA *ca) { ca->head=0; ca->tail=0; ca->num=0; boolean insert(struct CA *ca, int elem) { if (ca->num == 10) return false; else { ca->array[ca->head] = elem; ca->head = (ca->head + 1) % 10; ca->num ++; return true; boolean extract(struct CA *ca, int *elem) { if (ca->num == 0) return false; else { *elem = ca->array[ca->tail]; ca->tail = (ca->tail + 1) % 10; ca->num--; return true; consistency properties (suppose num++ and num-- atomic) C1: if (num == 0 num == 10) head == tail; C2: if (0 < num < 10) num == (head - tail) %10 C3: num == NI NE C4: (insert x) pre: if (num < 10) post: num == num + 1 && array[(head-1)%10] = x; C5: (extract &x) pre: if (num > 0) post: num == num -1 && x = array[(tail-1)%10];

Consistency properties consistency properties C1: if (num == 0 num == 10) head == tail; C2: if (0 < num < 10) num == (head - tail) %10 C3: num == NI NE C4: (insert x) pre: if (num < 10) post: num == num + 1 && array[(head-1)%10] = x; C5: (extract &x) pre: if (num > 0) post: num == num -1 && x = array[(tail-1)%10]; C1: when the queue is empty, or when the queue is full, head == tail C3: num is equal to the number of times that insert() has been called minus the number of times that extract() has been called C4: if element x has been inserted, eventually it must be extracted with an appropriate number of extracts C5: Every element that is extracted, has been inserted sometimes in the past. Last two can also be expressed as: Let (x1, x2,..., xk ) be the sequence of inserted elements, and let (y1, y2,..., yk ) be the sequence of extracted elements; then i = 1,..., k yi = xi 27

Consistency properties for struct CA 1. When the queue is empty, or when the queue is full, head == tail 2. num is equal to the number of times that insert() has been called minus the number of times that extract() has been called 3. If element x has been inserted, eventually it must be extracted with an appropriate number of extracts 4. Every element that is extracted, has been inserted sometimes in the past. Last two can also be expressed as: Let (x1, x2,..., xk ) be the sequence of inserted elements, and let (y1, y2,..., yk ) be the sequence of extracted elements; then i = 1,..., k yi = xi 28

Example 3: empty queue tail head head: index of the first free element in the queue here will be inserted the next element tail: index of the first occupied element in the queue will be the one that will be extracted next time the queue is empty, hence head == tail 29

Example 3: circular array - insert lavagna 30

Example 3: insert tail queue 5 3 8 5 2 5 4 5 9 num = (head - tail) % 8 num = 4; boolean insert(struct CA *ca, int elem) { if (ca->num == 10) return false; ca->array[ca->head] = elem; ca->head = (ca->head+1)%10; ca->num++; return true; insert(ca, 9); head and num have been increased 31

Example 3: circular array - extract lavagna 32

Example 3: concurrent insert (interference) If the insert() operation is performed by two processes, some consistency property may be violated! int insert_ca(struct CircularArray_t *a, int elem) void *threada(void *) {... insert_ca( &queue, 5);... void *threadb(void *) {... insert_ca( &queue, 2);... 33

Example 4: concurrent insert (interference) if (a->num == 10) return 0; else { a->array[a->head] = 5; a->head = (a->head + 1) % 10; (**) a->num ++; return 1; 34 TA if (a->num == 10) return 0; (TA) else { a->array[a->head] = 5; (TA) if (a->num == 10) return 0; (TB) else { a->array[a->head] = 2; (TB) a->head = (a->head + 1) % 10; (TB) (*) a->num ++; (TB) return 1; (TB) a->head = (a->head + 1) % 10; (TA) (**) a->num ++; (TA) return 1; (TA) if (a->num == 10) return 0; else { a->array[a->head] = 2; a->head = (a->head + 1) % 10; (*) a->num ++; return 1; TB

Example 4: concurrent insert tail 3 8 2 head Two threads, they both call insert(9). thread 1 calls insert(ca, 9); preemption by second thread second thread completes 9 there is a hole! At some point, the extract will read a 9 and a random value, instead of two 9s. boolean insert(struct CA *ca, int elem) { if (ca->num == 10) return false; ca->array[ca->head] = elem;... boolean insert(struct CA *ca, int elem) { if (ca->num == 10) return false; ca->array[ca->head] = elem; ca->head = (ca->head+1)%10; ca->num++; return true;... ca->head = (ca->head+1)%10; ca->num++; return true; 35

Example 3: Correctness of Circular Array implementation The previous program is not correct, as the last property is not verified the sequence of extracted elements does not correspond to the sequence of inserted elements The problem is that the first thread was preempted while updating the data structure in a critical point. we must prevent thread 2 from accessing the data structure while another thread is completing an operation on it 36

Example 3: Correctness Proving the non-correctness is easy, in the sense that we must find a counterexample Proving the correctness is a very complex task! it is necessary to prove the correctness for every possible interleaving of every operation, for every possible input data and for every possible internal state 37

Insert and Extract What happens if an insert() and an extract() are interleaved? Let s assume that increments and decrements are atomic operations Producer: thread that inserts elements Consumer: thread that extracts elements It can be proved that interleaving exactly one producer and one consumer does not bring any problem Proof: if 0 < num < 10, insert() and extract() are independent if num==0 if extract() begins before insert, it immediately returns false, if insert() begins before, extract will still return false, so it cannot interfere with insert same thing when num==10 Correctness is guaranteed for one consumer and one producer. 38

Insert and Extract II What happens if we exchange the sequence of instructions in insert()? boolean insert(struct CA *ca, int elem) { if (ca->num == 10) return false; else { ca->num++; ca->array[ca->head] = elem; ca->head = (ca->head+1)%10; return true; boolean extract(struct CA *ca, int *elem) { if (ca->num == 0) return false; else { *elem = ca->array[ca->tail]; ca->tail = (ca->tail + 1) % 10; ca->num--; return true; It is easy to prove that in this case insert() cannot be interleaved with extract() 39

Circular array properties a) If more than one thread executes insert_ca() inconsistency!! b) If we have only two threads one threads calls insert_ca() and the other thread calls extract_ca() no inconsistency! The order of the operations is important! a wrong order can make the object inconsistent even under the assumption b) the case is when num is incremented but the data has not yet been inserted in any case, the final result depends on the timings of the different requests (e.g, an insertion with the buffer full) 40

Exercise: non-atomic increment Problem: in the previous examples, we supposed that num++ and num-- are atomic operations what happens if they are not atomic? question: assuming that operation -- and ++ are not atomic, and assuming that we have only one producer and one consumer, can we make the Circular Array safe? hint: try to substitute variable num with two boolean variables, bool empty and bool full; 41

Outline Model of concurrency: shared memory Critical Sections 42

Critical section: definitions The shared object where the conflict may happen is a resource The parts of the code where the problem may happen are called critical sections A critical section is a sequence of operations that cannot be interleaved with other operations on the same resource Two critical sections on the same resource must be properly sequentialized We say that two critical sections on the same resource must execute in MUTUAL EXCLUSION 43

Mutual Exclusion There are three ways to obtain mutual exclusion 1. implementing the critical section as an atomic operation 2. disabling the preemption (system-wide) 3. selectively disabling the preemption (using semaphores and mutual exclusion) 44

1- Implementig atomic operations In single processor systems disable interrupts during a critical section non-voluntary context switch is disabled! Limitations: if the critical section is long, no interrupt can arrive during the critical section CLI; <critical section> STI; consider a timer interrupt that arrives every 1 msec. if a critical section lasts for more than 1 msec, a timer interrupt could be lost It must be done only for very short critical sections; Non voluntary context switch is disabled during the critical section Disabling interrupts is a very low level solution: it is not possible in user space. Concurrency is disabled during the critical section! we must avoid conflicts on the resource, not disabling interrupts! 45

Atomic operations on multiprocessors Disabling interrupts is not sufficient disabling interrupts on one processor lets a thread on another processor free to access the resource Solution: use lock() and unlock() operations define a flag s for each resource, and then surround a critical section with lock(s) and unlock(s); Problems: busy waiting: if the critical section is long, we waste a lot of time cannot be used in single processors! int s;... lock(s); <critical section> unlock(s);... 46

Low level synchronisation in SMP The atomicity problem cannot be solved by disabling the interrupts! If we disable the interrupts, we protect the code from interrupts. It is not easy to protect from other processors CPU 0 CPU 1... LD R0, x INC R0 ST x, RO...... LD R0, x INC R0 ST x, RO...... LD R0, x (CPU 0) LD R0, x (CPU 1) INC R0 (CPU 0) INC R0 (CPU 1) ST x, R0 (CPU 0) ST x, R0 (CPU 1)... 47

Low level synchronisation in SMP Most processors support some special instruction XCH Exchange register with memory location TST If memory location = 0, set location to 1 and return true (1), else return false (0) void xch(register R, memory x) { int tmp; tmp = R; R = x; x=tmp; XCH and TST are atomic! int tst(int x) { if (x == 1) return 0; else { x=1; return 1; 48

Locking in multi-processors We define one variable s If s == 0, then we can perform the critical operation If s == 1, then must wait before performing the critical operation Using XCH or TST we can implement two functions: lock() unlock() void lock(int s) { int a = 1; while (a==1) XCH (s,a); void unlock(int s) { s = 0; void lock(int x) { while (TST (s) == 0); 49

2 - Disabling preemption On single processor systems in some scheduler, it is possible to disable preemption for a limited interval of time problems: if a high priority critical thread needs to execute, it cannot make preemption and it is delayed even if the high priority task does not access the resource! disable_preemption(); <critical section> enable_preemption(); no context switch may happen during the critical section, but interrupts are enabled 50

3 - Selectively disabling preemption Some general mechanisms exist to implement mutual exclusion only between the processes that use a resource: Semaphores Mutex 51

Critical section: a general approach General techniques exist to protect critical sections Semaphores Mutex Properties: Interrupts always enabled Preemption always enabled Basic idea: if a thread is inside a critical section on a given resource all other threads are blocked upon entrance on the critical section on the same resource -> selectivity We will study such techniques in the following 52

Outline Model of concurrency: shared memory Synchronization 53

Synchronization: Producer/Consumer model Mutual exclusion is not the only problem we need a way of synchronize two or more threads example: producer/consumer suppose we have two threads, one produces some integers and sends them to another thread (PRODUCER) another one takes the integer and elaborates it (CONSUMER) Producer Consumer 54

Producer/Consumer: implementation with circular array Suppose that the two threads have different speeds for example, the producer is much faster than the consumer we need to store the temporary results of the producer in some memory buffer, so that no data are lost for our example, we will use the circular array structure, the CircularArray_t structure 55

Producer/Consumer II struct CircularArray_t queue; void *producer(void *) { bool res; int data; while(1) { <obtain data> while (!insert_ca(&queue, data)); void *consumer(void *) { bool res; int data; while(1) { while (!extract_ca(&queue, &data)); <use data> problems with this approach: if the queue is full, the producer actively waits if the queue is empty, the consumer actively waits 56

A more general approach We need to provide a general mechanism for synchronization and mutual exclusion requirements provide mutual exclusion between critical sections avoid two interleaved insert() operations (semaphores, mutexes) synchronize two threads on one condition for example, block the producer when the queue is full (semaphores, condition variables) 57

Outline Semaphores 58

A general mechanism for blocking tasks: semaphores The semaphore mechanism was first proposed by Dijkstra A semaphore is an abstract data type that consists of a counter a blocking queue operation wait operation signal The operations on a semaphore must be atomic the OS makes them atomic by appropriate low-level mechanisms 59

Semaphores definitions Semaphores are a basic mechanisms for providing synchronization it has been shown that every kind of synchronization and mutual exclusion can be implemented by using semaphores we will analyze possible implementation of the semaphore mechanism later typedef struct { <blocked queue> blocked; int counter; sem_t; void sem_init (sem_t &s, int n); void sem_wait (sem_t &s); void sem_post (sem_t &s); Note: the real prototype of sem_init is slightly different! 60

Wait and signal A wait operation has the following behavior if counter == 0, the requiring thread is blocked it is removed from the ready queue it is inserted in the blocked queue if counter > 0, then counter--; a post operation has the following behavior if counter == 0 and there is some blocked thread, unblock it the thread is removed from the blocked queue it is inserted in the ready queue otherwise, increment counter 61

Semaphores 62 void sem_init (sem_t *s, int n) { s->count=n;... void sem_wait(sem_t *s) { if (counter == 0) <block the thread> else counter--; void sem_post(sem_t *s) { if (<there are blocked threads>) <unblock a thread> else counter++;

Signal semantic What happens when a thread blocks on a semaphore? in general, it is inserted in a BLOCKED queue extraction from the blocking queue can follow different semantics: strong semaphore the threads are removed in well-specified order for example, the FIFO order is the fairest policy, or priority based ordering,... signal and suspend after the new thread has been unblocked, a thread switch happens signal and continue after the new thread has been unblocked, the thread that executed the signal continues to execute concurrent programs should not rely too much on the semaphore semantic 63

Mutual exclusion with semaphores: Mutex How to use a semaphore for critical sections define a semaphore initialized to 1 before entering the critical section, perform a wait after leaving the critical section, perform a post void *threada(void *arg) {... sem_wait(&s); <critical section> sem_post(&s);... sem_t s;... sem_init(&s, 1); void *threadb(void *arg) {... sem_wait(&s); <critical section> sem_post(&s);... 64

Mutual exclusion: example Semaphore counter 1 Blocked queue Ready queue TB TA 65

Mutual exclusion: example Semaphore counter 0 Blocked queue s.wait(); (TA) Ready queue TB TA 66

Mutual exclusion: example Semaphore counter 0 Blocked queue s.wait(); <critical section (1)> (TA) (TA) Ready queue TB TA 67

Mutual exclusion: example Semaphore counter 0 Blocked queue s.wait(); <critical section (1)> s.wait(); (TA) (TA) (TB) Ready queue TA TB 68

Mutual exclusion: example Semaphore counter 0 Blocked queue TB s.wait(); <critical section (1)> s.wait(); <critical section (2)> (TA) (TA) (TB) (TA) Ready queue TA 69

Mutual exclusion: example Semaphore counter 0 Blocked queue TB s.wait(); <critical section (1)> s.wait(); <critical section (2)> s.signal(); (TA) (TA) (TB) (TA) (TA) Ready queue TA 70

Mutual exclusion: example Semaphore counter 0 Blocked queue s.wait(); <critical section (1)> s.wait(); <critical section (2)> s.signal(); <critical section> (TA) (TA) (TB) (TA) (TA) (TB) Ready queue TA TB 71

Mutual exclusion: example Semaphore counter 1 Blocked queue Ready queue s.wait(); <critical section (1)> s.wait(); <critical section (2)> s.signal(); <critical section> s.signal(); (TA) (TA) (TB) (TA) (TA) (TB) (TB) TA TB 72

Synchronization with semaphores How to use a semaphore for synchronization define a semaphore initialized to 0 at the synchronization point, perform a wait when the synchronization point is reached, perform a post in the example, threada blocks until threadb wakes it up sem_t s;... void *threada(void *) {... sem_wait(&s);... sem_init(&s, 0); void *threadb(void *) {... sem_post(&s);... 73

Problem 1 How to make each thread waits for the other one? The first one that arrives at the synchronization point waits for the other one. Solution: use two semaphores! Semaphore sa(0), sb(0); void *threada(void *) {... sa.signal(); sb.wait();... void *threadb(void *) {... sb.signal(); sa.wait();... 74

Semaphores in POSIX sem_t sema; int sem_init(sem_t *s, int flag, int count); int sem_wait(sem_t *s); int sem_trywait(sem_t *s); int sem_post(sem_t *s); sem_t is the semaphore type; it is an opaque C structure sem_post is the normal signal operation. sem_init initializes the semaphore; if flag = 0, the semaphore is local to the process; if flag = 1, the semaphore is shared with other processes; count is the initial value of the counter sem_wait is the normal wait operation; sem_trywait does not block the task, but returns with error (< 0) if the semaphore counter is 0. 75

Producer/consumer Consider a producer/consumer system: now we want to implement a mailbox with a circular array avoiding busy wait one producer executes insert_ca() the producer must be blocked when the mailbox is full the producer will be unblocked when there is some space again one consumer executes extract_ca() the consumer must be blocked when the mailbox is empty the consumer will be unblocked when there is one new element and the queue is not empty We use appropriate semaphores to block these threads Initially we consider only one producer and one consumer 76

Producer/Consumer implementation struct CircularArray_t { int array[10]; int head, tail; sem_t empty, full; void init_ca(struct CircularArray_t *c) { c->head=0; c->tail=0; sem_init(&c->empty, 0); sem_init(&c->full, 10); void insert_ca(struct CircularArray_t *c, int elem){ sem_wait(&c->full); c->array[c->head] = elem; c->head = (c->head + 1) % 10; sem_post(&c->empty); 78 void extract_ca(struct CircularArray_t *c, int &elem) { sem_wait(&c->empty); elem = c->array[c->tail]; c->tail = (c->tail + 1) % 10; sem_post(c->full);

Producer/consumer properties Notice that the value of the counter of empty is the number of elements in the queue it is the number of times we can call extract without blocking the value of the counter of full is the complement of the elements in the queue it is the number of times we can call insert without blocking exercise prove that the implementation is correct insert_ca() never overwrites elements extract_ca() always gets an element of the queue 79

Proof of correctness When the number of elements in the queue is between 1 and 9, there is no problem; insert and extract work on different variables (head and tail respectively) and different elements of the array; The value of full and empty is always greater than 0, so neither the producer nor the consumer can block; When there is no element in the queue, head = tail, counter of empty = 0, counter of full = N; If extract begins before the end of insert, it will be blocked After an insert, there is an element in the queue, so we are in the previous case For symmetry, the same holds in the case of N elements in the queue. Again, head = tail, counter of empty = N, counter of full = 0; If insert begins before the end of an extract, it will be blocked After an extract, we fall back in the previous case 80

Multiple producers/consumers Suppose now there are many producers and many consumers; All producers will act on the same variable head, and all consumers on the same variable tail; If one producer preempts another producer, an inconsistency can arise Exercise: prove the above sentence Therefore, we need to combine synchronization and mutual exclusion we want to implement synchronization we want to protect the data structure 81

First solution struct CircularArray_t { int array[10]; int head, tail; sem_t full, empty; sem_t mutex; void init_ca(struct CircularArray_t*c) { c->head=0; c->tail=0; sem_init(&c->empty, 0); sem_init(&c->full, 10); sem_init(&c->mutex, 1); void insert_ca(struct CircularArray_t *c, int elem){ sem_wait(&c->mutex); sem_wait(&c->full); c->array[c->head]=elem; c->head = (c->head+1)%10; sem_post(&c->empty); sem_post(&c->mutex); void extract_ca(struct CircularArray_t *c, int *elem){ sem_wait(&c->mutex); sem_wait(&c->empty); elem = c->array[c->tail]; c->tail = (c->tail+1)%10; sem_post(&c->full); sem_post(&c->mutex); 82

Wrong solution The previous solution is wrong! Counter example: A consumer thread executes first, locks the mutex and blocks on the empty semaphore All other threads (producers or consumers) will block on the mutex Lesson learned: never block inside a mutex! 83

Correct solution struct CircularArray_t { int array[10]; int head, tail; Semaphore full, empty; Semaphore mutex; void init_ca(struct CircularArray_t*c) { c->head=0; c->tail=0; sem_init(&c->empty, 0); sem_init(&c->full, 10); sem_init(&c->mutex, 1); void insert_ca(struct CircularArray_t *c, int elem){ sem_wait(&c->full); sem_wait(&c->mutex); c->array[c->head]=elem; c->head = (c->head+1)%10; sem_post(&c->mutex); sem_post(&c->empty); void extract_ca(struct CircularArray_t *c, int *elem){ sem_wait(&c->empty); sem_wait(&c->mutex); elem = c->array[c->tail]; c->tail = (c->tail+1)%10; sem_post(&c->mutex); sem_post(&c->full); 84

Producers/Consumers: deadlock situation Deadlock situation a thread executes sem_wait(&c->mutex) and then blocks on a synchronisation semaphore to be unblocked another thread must enter a critical section guarded by the same mutex semaphore! so, the first thread cannot be unblocked and free the mutex! the situation cannot be solved, and the two threads will never proceed as a rule, never insert a blocking synchronization inside a critical section!!! 85

Internal implementation of semaphores wait()and signal()involve a possible threadswitch therefore they must be implemented as system calls! one blocked thread must be removed from state RUNNING and be moved in the semaphore blocking queue a semaphore is itself a shared resource wait()and signal()are critical sections! they must run with interrupt disabled and by using lock() and unlock() primitives 86

Readers/Writers One shared buffer Readers they read the content of the buffer many readers can read at the same time Writers they write in the buffer while one writer is writing no other readers or writers can access the buffer use semaphores to implement the resource 88

Readers/Writers: simple implementation struct Buffer_t { sem_t synch; sem_t s_r; int nr; ; void read_b(struct Buffer_t *b) { sem_wait(&b->s_r); b->nr++; if (b->nr==1) sem_wait(&b->synch); sem_post(&b->s_r); <read the buffer> sem_wait(&b->s_r); b->nr--; if (b->nr==0) sem_post(&b->synch); sem_post(&b->s_r); void init_b(struct Buffer_t *b) { sem_init(&b->synch, 1); sem_init(&b->s_r, 1); b->nr=0; void write_b(struct Buffer_t *b) { sem_wait(&b->synch); <write the buffer> sem_post(&b->synch); 89 Real-Time Operating Systems A.A. 2009-2010

Readers/Writers: more than one pending writer struct Buffer_t { sem_t synch, mutex; sem_t s_r, s_w; int nr, nw; ; void read_b(struct Buffer_t *b) { sem_wait(&b->s_r); b->nr++; if (b->nr==1) sem_wait(&b->synch); sem_post(&b->s_r); <read the buffer> sem_wait(&b->s_r); b->nr--; if (b->nr==0) sem_post(&b->synch); sem_post(&b->s_r); void init_b(struct Buffer_t *b) { sem_init(&b->synch, 1); sem_init(&b->mutex, 1); sem_init(&b->s_r, 1); sem_init(&b->s_w, 1); b->nr=0; b->nw=0; void write_b(struct Buffer_t *b) { sem_wait(&b->s_w); nw++; if (nw==1) sem_wait(&b->synch); sem_post(&b->s_w); sem_wait(&b->mutex); <write the buffer> sem_post(&b->mutex); sem_wait(&b->s_w); nw--; if (nw==0) sem_post(&b->synch); sem_post(&b->s_w); 90

Readers/Writers: starvation A reader will be blocked for a finite time The writer suffers starvation Suppose we have 2 readers (R1 and R2) and 1 writer W1 and suppose that R1 starts to read while R1 is reading, W1 blocks because it wants to write now R2 starts to read now R1 finishes, but, since R2 is reading, W1 cannot be unblocked before R2 finishes to read, R1 starts to read again when R2 finishes, W1 cannot be unblocked because R1 is reading a solution readers should not be counted whenever there is a writer waiting for them 91 A.L. Ruscelli

Readers/Writers: priority to writers! struct Buffer_t { sem_t synch, synch1; sem_t s_r, s_w; int nr, nw; ; void init_b(struct Buffer_t *b) { sem_init(&b->synch, 1); sem_init(&b->synch1, 1); sem_init(&b->s_r, 1); sem_init(&b->s_w, 1); b->nr=0; b->nw=0; void read_b(struct Buffer_t *b) { sem_wait(&b->synch1); sem_wait(&b->s_r); nr++; if (nr==1) sem_wait(&b->synch); sem_post(&b->s_r); sem_post(&b->synch1); <read the buffer> sem_wait(&b->s_r); nr--; if (nr==0) sem_post(&b->synch); sem_post(&b->s_r); void write_b(struct Buffer_t *b) { sem_wait(&b->s_w); nw++; if (nw==1) sem_wait(&b->synch1); sem_post(&b->s_w); sem_wait(&b->synch); <write the buffer> sem_post(&b->synch); sem_wait(&b->s_w); nw--; if (nw == 0) sem_post(&b->synch1); sem_post(&b->s_w); 92

Readers/Writers: problem Now, there is starvation for readers the readers/writers problem can be solved in general? no starvation for readers no starvation for writers solution maintain a FIFO ordering with requests if at least one writer is blocked, every next reader blocks if at least one reader is blocked, every next writer blocks we can do that using the private semaphores technique 93

Problem2: Synchronization of N threads Generalize the threads synchronization problem to N threads The first N-1 threads must block waiting for the last one First solution (more elegant) Second solution (more practical) 94

First solution to problem 2 #include <pthread.h> #include <semaphore.h> #define N 8 sem_t s[n][n]; void init() { int i, j; for (i=0; i<n; i++) for(j=0; j<n; j++) sem_init(&s[i][j], 0, 0); void *thread(void *arg) { int k = (int) arg; int j; printf("th%d: before synch\n", k); for (j=0; j<n; j++) if (j!=k) sem_post(&s[k][j]); for (j=0; j<n; j++) if (j!=k) sem_wait(&s[j][k]); printf("th%d: after synch\n", k); 95 int main() { pthread_t tid[n]; int i; init(); for (i=0; i<n; i++) pthread_create(&tid[i], 0, thread, (void *)i); for (i=0; i<n; i++) pthread_join(tid[i], 0); printf("main: exiting\n"); Elegant solution but it uses many semaphores!

Second solution to problem 2 Practical solution. We need a mutex semaphore, a counter and a semaphore to block threads. struct synch { int count; sem_t m; // mutex sem_t b; // blocked int N; // number of threads ; void initsynch(struct synch *s, int n) { int i; s->count = 0; sem_init(&s->m, 0, 1); sem_init(&s->b, 0, 0); s->n = n; 96 void my_synch(struct synch *s){ int i; sem_wait(&s->m); if (++s->count < s->n) { sem_post(&s->m); sem_wait(&s->b); else { for (i=0; i < s->n - 1; i++) sem_post(&s->b); sem_post(&s->m); struct synch sp; void *thread(void *arg){... my_synch(&sp);...