Monitors; Software Transactional Memory Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico March 17, 2016 CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 1 / 39
Outline Monitors vs Semaphores Software Transactional Memory CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 2 / 39
SMP Programming Errors Shared Memory parallel programming: saves the programmer from having to map data onto multiple processors. opens up a range of new errors coming from unanticipated shared resource conflicts Race Conditions The output of a program depends on the timing of the threads in the team. Deadlock Threads lock up on a locked resource that will never become free. Livelock Threads working on individual tasks which the ensemble cannot finish. CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 3 / 39
Critical Regions Critical Region Sections of the code that access a shared resource which must not be accessed concurrently by another thread. non-critical entry region critical-region leave region non-critical Unfortunate notation: the critical region is really in data but the guards are in code CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 4 / 39
Monitors Semaphores are low-level synchronization primitives, inherently unstructured usage of the semaphore must be correct for all regions of the program CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 5 / 39
Monitors Semaphores are low-level synchronization primitives, inherently unstructured usage of the semaphore must be correct for all regions of the program Monitors provide a structured concurrent programming primitive that concentrates the responsibility of correctness to a few modules. Monitors encapsulate: items of data procedures that operate on this set of data In object-oriented programming, monitors synchronize calls to the methods of a class. CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 5 / 39
Monitor Concept Already in the 1970s concurrent programmers were faced with the following problems: Concurrent programs were unreliable and hard to write Semaphores were already understood but semaphores were often used incorrectly and a compiler could provide no help on using them Brinch-Hansen and Hoare proposed monitors, which: provide the facility for synchronizing concurrent programs that would be easy to use and could also be checked by a compiler when threads (process) have to access the same variables, they must be granted mutually exclusive access one thread must be able to wait for a condition that another thread can cause without creating a deadlock situation CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 6 / 39
Monitors Monitor A monitor is essentially a shared class with explicit queues. A monitor region is code that needs to be executed as one indivisible operation with respect to a particular monitor. A monitor enforces this one-thread-at-a-time execution of its monitor regions. The only way a thread can enter a monitor is by arriving at the beginning of one of the monitor regions associated with that monitor. The only way a thread can move forward and execute the monitor region is by acquiring the monitor. CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 7 / 39
Monitors When a thread arrives at the beginning of a monitor region, it is placed into an entry set for the associated monitor. CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 8 / 39
Monitors When a thread arrives at the beginning of a monitor region, it is placed into an entry set for the associated monitor. When the thread finishes executing the monitor region, it exits and releases the monitor. CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 8 / 39
Monitors When a thread arrives at the beginning of a monitor region, it is placed into an entry set for the associated monitor. When the thread finishes executing the monitor region, it exits and releases the monitor. A thread can suspend itself inside the monitor by executing a wait command. When a thread executes a wait, it releases the monitor and enters a wait set. CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 8 / 39
Monitors When a thread arrives at the beginning of a monitor region, it is placed into an entry set for the associated monitor. When the thread finishes executing the monitor region, it exits and releases the monitor. A thread can suspend itself inside the monitor by executing a wait command. When a thread executes a wait, it releases the monitor and enters a wait set. The thread will stay suspended in the wait set until some time after another thread executes a notify command inside the monitor. CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 8 / 39
Monitors When a thread arrives at the beginning of a monitor region, it is placed into an entry set for the associated monitor. When the thread finishes executing the monitor region, it exits and releases the monitor. A thread can suspend itself inside the monitor by executing a wait command. When a thread executes a wait, it releases the monitor and enters a wait set. The thread will stay suspended in the wait set until some time after another thread executes a notify command inside the monitor. When a thread executes a notify, it continues to own the monitor until it releases the monitor of its own accord, either by executing a wait or by completing the monitor region. CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 8 / 39
Simple Monitor Operation Entry Set The Owner Wait Set enter acquire release acquire release and exit CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 9 / 39
Simple Monitor Operation Entry Set The Owner Wait Set enter acquire release acquire release and exit CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 9 / 39
Simple Monitor Operation Entry Set The Owner Wait Set enter acquire release acquire release and exit CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 9 / 39
Simple Monitor Operation Entry Set The Owner Wait Set enter acquire release acquire release and exit CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 9 / 39
Simple Monitor Operation Entry Set The Owner Wait Set enter acquire release acquire release and exit CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 9 / 39
Simple Monitor Operation Entry Set The Owner Wait Set enter acquire release acquire release and exit CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 9 / 39
Simple Monitor Operation Entry Set The Owner Wait Set enter acquire release acquire release and exit CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 9 / 39
Three Types of Monitors Brinch-Hansen and Hoare Signal and continue Java monitor graphics by Theodore Norvell CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 10 / 39
Threads in Java Threads in Java are integrated into the language. class dummythread extends Thread { int id; public dummythread(int id) {this.id = id;} public void run(){ System.out.println("Hello World from thread "+id); } } dummythread dt = new dummythread(42); dt.start(); dt.join(); CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 11 / 39
Threads in Java Local variables Every thread has its own set of local variables stored in a stack frame, to which no other thread has access. CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 12 / 39
Threads in Java Local variables Every thread has its own set of local variables stored in a stack frame, to which no other thread has access. wait, notify and sleep used for inter-thread communication: Thread.wait, goes to sleep until some other thread wakes it up. Thread.notify, wakes up some other thread. Thread.sleep, goes to sleep for some specified number of milliseconds, unless some other thread wakes it up first. CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 12 / 39
Threads in Java Local variables Every thread has its own set of local variables stored in a stack frame, to which no other thread has access. wait, notify and sleep used for inter-thread communication: Thread.wait, goes to sleep until some other thread wakes it up. Thread.notify, wakes up some other thread. Thread.sleep, goes to sleep for some specified number of milliseconds, unless some other thread wakes it up first. interrupt One thread can interrupt another with Thread.interrupt. The most common use is to wake up a sleeping thread prematurely, or to abort a long I/O. CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 12 / 39
Mutex in Java Mutual exclusion is straightforward in Java: attribute synchronized. synchronized doit() { /* critical section */ } The synchronized attribute ensures that a single thread will be executing method doit() at any time. CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 13 / 39
Example: Producer-Consumer Problem Construct a FIFO from a finite length buffer: only two operations: append and remove append and remove exclude each other a producer should suspend on a full buffer a consumer should suspend on an empty buffer CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 14 / 39
Producer-Consumer with Semaphores void producer() { int item; while (TRUE) { item = produce_item(); /* claim empty spot */ sem_wait(¬_full); sem_wait(&mutex); /* insert, protected by mutex */ insert_item(item); sem_post(&mutex); /* signal filled-in spot */ sem_post(¬_empty); } } void consumer() { int item; while (TRUE) { /* claim spot in buffer */ sem_wait(¬_empty); sem_wait(&mutex); /* remove, protected by mutex */ item = remove_item(); sem_post(&mutex); /* signal empty spot */ sem_post(¬_full); consume_item(item); } } CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 15 / 39
Producer-Consumer with Java Monitors class Buffer { private Object[] buf; private int in = 0; //index for put private int out = 0; //index for get private int count = 0; //no of items private int size; } Buffer(int size) { this.size = size; buf = new Object[size]; } synchronized public void put(object o) {...} synchronized public Object get() {...} CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 16 / 39
Producer-Consumer with Java Monitors synchronized public void put(object o) { while (count >= size) { try { wait(); } catch(interruptedexception e){} } buf[in] = o; ++count; in = (in + 1) % size; notifyall(); } synchronized public Object get() { while (count == 0) { try { wait(); } catch (InterruptedException e){} } Object o = buf[out]; buf[out] = null; // display purposes --count; out = (out + 1) % size; notifyall(); // [count < size] return (o); } CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 17 / 39
Producer-Consumer with Java Monitors Solution is more structured than semaphores: data and procedures are encapsulated in a single module mutual exclusion is provided automatically by the implementation producer and consumer processes see only abstract put and get CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 18 / 39
Limitations of Previous Approach only one queue for threads waiting at an object must always wake up all of the waiting threads CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 19 / 39
Limitations of Previous Approach only one queue for threads waiting at an object must always wake up all of the waiting threads the thread awakened by notify() or notifyall() does not get immediate control of the lock. By the time the thread runs, the condition may no longer be true, so it must check the condition again. CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 19 / 39
Transactions Transaction Operations in a transaction either all occur or none occur. Atomic operation: Commit: takes effect Abort: effects rolled back Usually retried Linearizable Appear to happen in one-at-a-time order CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 20 / 39
Transactions Transaction Operations in a transaction either all occur or none occur. Atomic operation: Commit: takes effect Abort: effects rolled back Usually retried Linearizable Appear to happen in one-at-a-time order Transactional Memory A section of code with reads and writes to shared memory which logically occur at a single instant in time. CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 20 / 39
Software Transactional Memory Software Transactional Memory (STM) has been proposed as an alternative to Lock-based Synchronization. Concurrency Unlocked no thread control when entering critical regions if there are no memory access conflicts during thread execution, operations executed by thread are accepted in case of conflict, program state is rolled-back to the state it was before the thread entered the critical region CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 21 / 39
Software Transactional Memory Benefits of STM Optimistic: increased concurrency Composable: define atomic set of operations Conditional Critical Regions CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 22 / 39
Optimistic Increased concurrency: threads are not blocked. Conflicts only arise when more than one thread makes an access to the same memory position. Conflicts are rare small number of roll-backs. CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 23 / 39
Composable Atomic Operations Keyword atomic allows the definition of the set of operations that make up the transaction. atomic { delete(t1, item); add(t2, item); } atomic { newnode->prev = node; newnode->next = node->next; node->next->prev = newnode; node->next = newnode; } either all happen or none at all remaining threads never see intermediate values CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 24 / 39
Conditional Critical Regions The default action when a transaction fails is to retry. What if the successful completion of a transaction is dependent on some variable? Example: consumer accessing an empty queue. CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 25 / 39
Conditional Critical Regions The default action when a transaction fails is to retry. What if the successful completion of a transaction is dependent on some variable? Example: consumer accessing an empty queue. thread enters a cycle of failures (i.e., active wait!) CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 25 / 39
Conditional Critical Regions If the successful completion of a transaction is dependent on some variable use a guard condition. atomic (queuesize > 0) { remove item from queue } If condition is not satisfied, thread will be blocked until a commit has been made that affects the condition. CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 26 / 39
Conditional Critical Regions Keyword retry allows the thread to abort and block at any point in the transaction: atomic { if(queuesize > 0) remove item from queue else retry; } CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 27 / 39
Conditional Critical Regions STM includes the possibility of alternative course of actions when a transaction fails: orelse keyword atomic { delete(t1, item) orelse delete(t2, item); add(t3, item); } if delete in T1 fails, a delete in T2 is attempted. if delete in T2 fails, the whole transaction retries. add is only performed after a successful delete of either T1 or T2. CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 28 / 39
Software Transactional Memory Benefits of STM Optimistic: increased concurrency Composable: define atomic set of operations Conditional Critical Regions CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 29 / 39
Software Transactional Memory Benefits of STM Optimistic: increased concurrency Composable: define atomic set of operations Conditional Critical Regions Problems with STM CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 29 / 39
Software Transactional Memory Benefits of STM Optimistic: increased concurrency Composable: define atomic set of operations Conditional Critical Regions Problems with STM overhead for conflict detection, both computational and memory CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 29 / 39
Software Transactional Memory Benefits of STM Optimistic: increased concurrency Composable: define atomic set of operations Conditional Critical Regions Problems with STM overhead for conflict detection, both computational and memory overhead from commit CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 29 / 39
Software Transactional Memory Benefits of STM Optimistic: increased concurrency Composable: define atomic set of operations Conditional Critical Regions Problems with STM overhead for conflict detection, both computational and memory overhead from commit cannot be used when operations cannot be undone (i.e., I/O) CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 29 / 39
STM Overhead Advantage of STM: much fewer conflicts. most transactions will commit. However, every commit has a potentially large overhead... Note that if there are no conflicts, the only overhead of the mutex approach is in locking and unlocking. CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 30 / 39
Implementation Issues Transaction Log each read and write in transaction is logged to a thread-local transaction log CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 31 / 39
Implementation Issues Transaction Log each read and write in transaction is logged to a thread-local transaction log writes go to the log only, not to memory CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 31 / 39
Implementation Issues Transaction Log each read and write in transaction is logged to a thread-local transaction log writes go to the log only, not to memory at the end, the transaction tries to commit to memory CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 31 / 39
Implementation Issues Transaction Log each read and write in transaction is logged to a thread-local transaction log writes go to the log only, not to memory at the end, the transaction tries to commit to memory in case commit fails, discard log and retry transaction CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 31 / 39
Implementation Issues Commit-time Locking uses a global clock CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 32 / 39
Implementation Issues Commit-time Locking uses a global clock each memory location maintains an access time, T a CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 32 / 39
Implementation Issues Commit-time Locking uses a global clock each memory location maintains an access time, T a marks time at beginning of transaction, T t CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 32 / 39
Implementation Issues Commit-time Locking uses a global clock each memory location maintains an access time, T a marks time at beginning of transaction, T t for every read/write, if T a > T t, abort transaction CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 32 / 39
Implementation Issues Commit-time Locking uses a global clock each memory location maintains an access time, T a marks time at beginning of transaction, T t for every read/write, if T a > T t, abort transaction during commit, all write locations are locked and access times updated CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 32 / 39
Implementation Issues Commit-time locking: roll-back simple; commit expensive... CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 33 / 39
Implementation Issues Commit-time locking: roll-back simple; commit expensive... Encounter-time Locking memory positions inside a transaction are locked CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 33 / 39
Implementation Issues Commit-time locking: roll-back simple; commit expensive... Encounter-time Locking memory positions inside a transaction are locked thread has exclusive access to them during execution of transaction CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 33 / 39
Implementation Issues Commit-time locking: roll-back simple; commit expensive... Encounter-time Locking memory positions inside a transaction are locked thread has exclusive access to them during execution of transaction remaining threads abort immediately when accessing one of these positions CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 33 / 39
Hardware Support Hardware support for transactional memories has been proposed long ago. Performance of STM would improve considerably! Sun s Rock Processor First multicore (16 cores) designed for hardware transaction memory. Special Assembly instructions: chkpt <fail pc>: begin a transaction commit: commit the transaction CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 34 / 39
Distributed STM The concept of Software Transactional Memory can be extended to distributed systems. IST s Fenix is based on a distributed STM concept. CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 35 / 39
STM in Fenix Fenix is a large web application, with a rich domain model Before STM (2005), Fenix had major problems: frequent bugs poor performance Root of the problems: Locks used for concurrency control Idea: Wrap each HTTP request with a transaction. CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 36 / 39
STM in Fenix The JVSTM went into production in September 2005. Major benefits: The data-corruption errors disappeared they were caused mostly by misplaced locks There was a perceived increase in the performance after an initial warm-up New functionalities are developed significantly faster it requires less coding and less debugging CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 37 / 39
Review Monitors vs Semaphores Software Transactional Memory CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 38 / 39
Next Class Foster s design methodology partitioning communication agglomeration mapping Application Examples Boundary value problem Finding the maximum n-body problem CPD (DEI / IST) Parallel and Distributed Computing 10 2016-03-17 39 / 39