The University of Texas at Arlington

Similar documents
The University of Texas at Arlington

Lecture 8: September 30

What is the Race Condition? And what is its solution? What is a critical section? And what is the critical section problem?

Concurrency: a crash course

Synchronization for Concurrent Tasks

Multi-core Architecture and Programming

SYNCHRONIZATION M O D E R N O P E R A T I N G S Y S T E M S R E A D 2. 3 E X C E P T A N D S P R I N G 2018

Concurrency. Chapter 5

PROCESS SYNCHRONIZATION

MULTITHREADING AND SYNCHRONIZATION. CS124 Operating Systems Fall , Lecture 10

Dealing with Issues for Interprocess Communication

Chapter 6 Concurrency: Deadlock and Starvation

Introduction to OS Synchronization MOS 2.3

Programming Languages

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Steve Gribble. Synchronization. Threads cooperate in multithreaded programs

Threading and Synchronization. Fahd Albinali

Synchronization I. Jo, Heeseung

CMSC421: Principles of Operating Systems

CS533 Concepts of Operating Systems. Jonathan Walpole

Week 3. Locks & Semaphores

IV. Process Synchronisation

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Hank Levy 412 Sieg Hall

Chapter 6: Synchronization. Operating System Concepts 8 th Edition,

Overview. CMSC 330: Organization of Programming Languages. Concurrency. Multiprocessors. Processes vs. Threads. Computation Abstractions

Deadlock and Monitors. CS439: Principles of Computer Systems February 7, 2018

Operating Systems. Lecture 4 - Concurrency and Synchronization. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

G52CON: Concepts of Concurrency

CS420: Operating Systems. Process Synchronization

Systèmes d Exploitation Avancés

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 8: Semaphores, Monitors, & Condition Variables

Deadlock and Monitors. CS439: Principles of Computer Systems September 24, 2018

Interprocess Communication By: Kaushik Vaghani

Programming Languages

Programming in Parallel COMP755

Process/Thread Synchronization

C09: Process Synchronization

Synchronization 1. Synchronization

Concurrency: Deadlock and Starvation

CS 333 Introduction to Operating Systems. Class 4 Concurrent Programming and Synchronization Primitives

Last Class: Monitors. Real-world Examples

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

Models of concurrency & synchronization algorithms

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

Recap: Thread. What is it? What does it need (thread private)? What for? How to implement? Independent flow of control. Stack

Background. The Critical-Section Problem Synchronisation Hardware Inefficient Spinning Semaphores Semaphore Examples Scheduling.

CMSC 330: Organization of Programming Languages

Problem Set 2. CS347: Operating Systems

Deadlock. Only one process can use the resource at a time but once it s done it can give it back for use by another process.

Synchronization. CS 475, Spring 2018 Concurrent & Distributed Systems

CS 153 Design of Operating Systems Winter 2016

Process/Thread Synchronization

UNIT:2. Process Management

Synchronization COMPSCI 386

Concurrency and Synchronisation

CSE 153 Design of Operating Systems

Lecture 5: Synchronization w/locks

Last Class: Deadlocks. Today

Chapter 6: Process Synchronization. Operating System Concepts 8 th Edition,

Module 1. Introduction:

Chapter 5: Process Synchronization. Operating System Concepts 9 th Edition

Operating Systems. Designed and Presented by Dr. Ayman Elshenawy Elsefy

What's wrong with Semaphores?

Remaining Contemplation Questions

Threads Tuesday, September 28, :37 AM

UNIX Input/Output Buffering

Page 1. Goals for Today" Atomic Read-Modify-Write instructions" Examples of Read-Modify-Write "

Concurrency: Deadlock and Starvation. Chapter 6

Concurrency, Mutual Exclusion and Synchronization C H A P T E R 5

Multiprocessor System. Multiprocessor Systems. Bus Based UMA. Types of Multiprocessors (MPs) Cache Consistency. Bus Based UMA. Chapter 8, 8.

Synchronization. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.

Chapter 6: Process Synchronization

CS 153 Design of Operating Systems Winter 2016

CS4961 Parallel Programming. Lecture 12: Advanced Synchronization (Pthreads) 10/4/11. Administrative. Mary Hall October 4, 2011

Chapter 5 Concurrency: Mutual Exclusion. and. Synchronization. Operating Systems: Internals. and. Design Principles

Operating Systems. Thread Synchronization Primitives. Thomas Ropars.

CS 571 Operating Systems. Midterm Review. Angelos Stavrou, George Mason University

Multiprocessor Systems. COMP s1

Chapter 6: Process Synchronization

EECS 482 Introduction to Operating Systems

Concurrent Processes Rab Nawaz Jadoon

SWEN-220 Mathematical Models of Software. Process Synchronization Critical Section & Semaphores

Lecture 5 Threads and Pthreads II

More on Synchronization and Deadlock

Learning Outcomes. Concurrency and Synchronisation. Textbook. Concurrency Example. Inter- Thread and Process Communication. Sections & 2.

Operating Systems ECE344

Operating Systems. Operating Systems Summer 2017 Sina Meraji U of T

10/17/ Gribble, Lazowska, Levy, Zahorjan 2. 10/17/ Gribble, Lazowska, Levy, Zahorjan 4

Real Time Operating Systems and Middleware

Monitors; Software Transactional Memory

IT 540 Operating Systems ECE519 Advanced Operating Systems

Synchronization 1. Synchronization

CS 2112 Lecture 20 Synchronization 5 April 2012 Lecturer: Andrew Myers

Reminder from last time

Chapter 6: Process Synchronization. Operating System Concepts 9 th Edit9on

CSCI 447 Operating Systems Filip Jagodzinski

Synchronization I. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Semaphore. Originally called P() and V() wait (S) { while S <= 0 ; // no-op S--; } signal (S) { S++; }

Background. Old Producer Process Code. Improving the Bounded Buffer. Old Consumer Process Code

Resource management. Real-Time Systems. Resource management. Resource management

PESIT Bangalore South Campus

Transcription:

The University of Texas at Arlington Lecture 10: Threading and Parallel Programming Constraints CSE 5343/4342 Embedded d Systems II

Objectives: Lab 3: Windows Threads (win32 threading API) Convert serial applications to a threaded version. Lab Assignment Use Windows threads to thread the serial code to compute PI using 8 threads.

4.0 Numerical Integration Example 1 0 4.0 (1+x 2 ) dx = static long num_steps=100000; double step, pi; 2.0 void main() { int i; double x, sum = 0.0; 0; 0.0 X 1.0 } step = 1.0/(double) num_steps; for (i=0; i< num_steps; i++){ x = (i+0.5)*step; sum = sum + 4.0/(1.0 + x*x); } pi = step * sum; printf( Pi = %f\n,pi); 3

More Task Decomposition: Dependence Graph Graph = {vertices, (directed) edges} Vertix (node) for each: Variable assignment (except index variables) Constant Operator or function call Directed edges (arrows) indicate use of variables and constants for: Data flow Control flow

Dependence Graph Example #1 for (i = 0; i < 3; i++) a[i] = b[i] / 2.0; b[0] 2 b[1] 2 b[2] 2 / / / a[0] a[1] a[2] 5

Dependence Graph Example #1 for (i = 0; i < 3; i++) a[i] = b[i] / 2.0; Domain decomposition possible b[0] 2 b[1] 2 b[2] 2 / / / a[0] a[1] a[2] 6

Dependence Graph Example #2 for (i = 1; i < 4; i++) a[i] = a[i-1] * b[i]; a[0] b[1] b[2] b[3] * * * a[1] a[2] a[3] 7

Dependence Graph Example #2 for (i = 1; i < 4; i++) a[i] = a[i-1] * b[i]; No domain decomposition b[3] a[0] b[1] b[2] * * * a[1] a[2] a[3] 8

Dependence Graph Example #3 a = f(x, y, z); b = g(w, x); t = a + b; c = h(z); s = t / c; w x y z g f h b a c t / s 9

Dependence Graph Example #3 a = f(x, y, z); b = g(w, x); t = a + b; c = h(z); s = t / c; w x y z g f h Task decomposition with 3 CPUs. b t a / c s 10

Multi-thread thread Concepts Multi-Threading concepts are needed in order to obtain maximum performance from the multi-core microprocessors. These concepts include : Creating, Terminating, Suspending, and Resuming Threads Thread Synchronization i Methods: Semaphores, Mutexes, Locks and Critical Sections.

Using Threads Benefits of using threads include: Increased performance Better resource utilization Efficient data sharing However there are risks of using threads: Data race conditions Deadlocks Code complexity Portability issues Testing and debugging difficulty

Waiting for Threads Blocking versus non-blocking Looping on a condition is expensive Thread scheduled even when no work CPU time stolen from threads performing work Hard to find the right balance Locking probably too much or not enough Thread.Sleep inflexible Better option: Just wait for it!

Synchronization Synchronization controls the relative order of thread execution and resolves conflicts among threads. Threads sometime need to wait for other threads to be in known state before continuing In shared-memory systems constraints have to be imposed for proper order of execution or to avoid corrupted/locked data. Two basic types of synchronization: 1. mutual exclusion 2. condition synchronization

Thread Synchronization Two or more threads cooperating One thread waits for another to be in known state before continuing Lack of synchronization leads to data corruption/lockups Using methods/constructs to enforce required behavior 15

Mutual Exclusion Program logic used to ensure single-thread access to a critical region. One thread blocks a critical section of code that contains shared data that one or more threads wait for access. Other threads are blocked from entering critical section (until the first thread is done) The use of proper synchronization techniques insures that only one thread is allowed access to a critical section at any one instance. The major challenge of threaded d programming is to implement critical sections in such a way that multiple threads perform mutually exclusive operations for critical sections and do not use critical sections simultaneously.

Mutual Exclusion done by a Critical Section

Condition Synchronization Condition synchronization allows a thread to wait until a specific condition is reached (e.g., Semaphores)

Using the Mutex The most common method of making sure that two threads take turns before accessing a given object is to use a shared lock. Since only one thread at a time can have the lock, other threads wait their turn. Similar to a lock is the Mutex object. Only one thread can lock the Mutex at a time, and that same thread must then release it. The key difference between a Mutex and a standard d lock is that it works across processes for more advanced a scenarios. 19

Deadlocks Thread waits for a resource that will never become available Self-deadlock (recursive deadlock): A thread wants to acquire a resource that is already belonging to it Lock-ordering ordering deadlock (more common): Example: thread A locks resource R1 then tries to lock resource R2; meanwhile thread B locks R2 and tries to lock R1; in some scenario thread A could have acquired R1 and is waiting for R2 while B has acquired R2 and is waiting for R1. As implied by the name, deadlocks are not good, they need to be avoided at all costs.

Deadlocks (cont d) Deadlocks Occur when a thread waits for a condition that never occurs. Are commonly results from the competition between threads for system resources held by other threads. The four necessary conditions for a deadlock are: Mutual exclusion condition Hold and wait condition No preemption condition Circular wait condition

Deadlock.cpp This program illustrates the potential for deadlock in a badlocking hierarchy. It is possible for one thread to lock both critical sections and avoid deadlock. However, concurrent programs that t rely on a particular order-of-execution without enforcing that t order will eventually fail.

Race Conditions Race conditions: Are the most common errors in concurrent programs. Occur because the programmer assumes a particular order of execution but does not guarantee that order through synchronization. A Data Race: Refers to a storage conflict situation. Occurs when two or more threads simultaneously access the same memory location while at least one thread is updating that location. Result in two possible conflicts: Read/Write conflicts Write/Write conflicts Race conditions are usually not obvious Errors most likely only occur unexpectedly and unpredictably Locks are the key to avoidance

Using Synchronization Synchronization is about making sure that threads take turns when they need to, typically to access some shared object. Depending on your specific application needs, you will find that t different options make more sense than others. Operating systems have to provide some support for atomic operations. Windows simplifies this process since it has built-in support for suspending a thread at the scheduler level when necessary. In this manner, one thread can be put to sleep until a certain condition occurs in another thread. By letting one thread sleep instead of just repeatedly checking to see if another thread is done, performance is dramatically improved.

Synchronization Primitives Synchronization typically performed by three types of primitives: Semaphores Locks, and Condition variables Primitives are implemented by atomic operations by use of a memory fence or barrier processor dependent d operation that t insures threads see other threads memory operations by maintaining reasonable order

Semaphores Introduced by Edsger Dijkstra (1968) A Semaphore is a form of a counter that allows multiple threads access to a resource by incrementing or decrementing the semaphore. Typical use is protecting a shared resource of which at most n instances are allowed to exist simultaneously. Use P to acquire a resource and V to release. Concept of capacity, thus can be represented by an integer. Semaphores are created with a specified capacity, and once that number of threads have locked (P-proberen) it, subsequent access is blocked until a slot opens up (V- verhogen).

Semaphores (cont d) P and V need to be atomic to protect the semaphore variable. The P operation busy-waits (or maybe sleeps) until a resource is available, whereupon it immediately claims one. A Semaphore with a capacity of one is a binary semaphore (it is also essentially a Mutex, with the exception that any thread can release it, not just a thread that has acquired it.). Semaphores can be used across processes as well. Semaphores are not as frequently used anymore.

Semaphores (example) Producer/Consumer threads: Producer: void producer() { while(1) { } produce_data(); p_sem->release(); // V operation } Consumer: void consumer() { while(1) { } } p_sem->wait(); // P operation consume_data();

Locks Insure that a only a single thread can have access to a resource The coarse granular locks have higher lock contention than finer granular ones. Locks could be realized by binary semaphores (and an initialization entity) Acquire(): waits for the lock sate to be unlocked and then sets the state to lock Release(): Changes the lock state from locked to unlocked

Critical Section Implementation To avoid deadlocks, locks should be mostly used inside critical sections where there is but a single entry and single exit point. <critical section start> <acquire lock A> (operate on shared memory protected t by lock) <release lock A> <critical section end>

Locks Locking restricts access to an object to one thread Minimize locking/synchronization whenever possible Make objects thread-safe when appropriate Acquire locks late, release early Shorter duration, the better Lock only when necessary

Locking Example private object padlock = new object(); public void CoordinateWork() { (new Thread(PerformWork)).Start(); (new Thread(PerformWork)).Start(); } private void PerformWork() { } while(true) { lock(padlock) { /* GET NEXT ITEM */ } /* DO WORK HERE */ } unlock(padlock)

Lock Types Mutex: simplest lock; can include a timer attribute for release or an try-finally exception to release Recursive: can be repeatedly acquired by the owning thread (used in recursive functions). This can thus avoid recursive deadlocks. dl Read-Write Locks: allow simultaneous read access for multiple threads but limit the write access to only one thread. Use when multiple threads need to read shared data but do not need to perform a write operation on the data. Granularity (how much is locked) matters. Spin Locks: Waiting threads must spin or poll the states of a lock rather than getting blocked. Used mostly on multiprocessor systems as the one processor is essentially blocked spinning. Use when hold time of locks are short (i.e., less than a blocking or waking up of a thread).

Condition Variables Usually, condition variables are user-mode objects that cannot be shared across processes. In general condition variables are a method to implement a message regarding a specific condition that a thread is waiting on and the thread has a lock on specific resource. To prevent occurrences of deadlocks, dl the following atomic operations on a condition variable can be used. Wait(L), Signal(L), and Broadcast(L) Condition variables enable threads to atomically release a lock and enter the sleeping state. They can be used with critical sections or slim reader/writer (SRW) locks. Condition variables support operations that t "wake one" or "wake all" waiting threads. After a thread is woken, it re-acquires the lock it released when the thread entered the sleeping state.

Condition Variables Suppose a thread has a lock on specific resource, but cannot proceed until a particular condition occurs. For this case, the thread can release the lock but will need it returned when the condition occurs. The wait() is a method of releasing the lock and letting the next thread waiting on this resource to now use the resource. The condition the original thread was waiting on is passed via the condition variable to the new thread with the lock. When the new thread is finished with the resource, it checks the condition variable and returns the resource to the original i holder by use of the signal() or broadcast commands. The broadcasts enables all waiting threads for that resource to run.

Example: Condition Variable Condition C; Lock L; Bool LC = false; void producer() { while (1) { L ->acquire(); // start critical section while(lc == true) { C -> wait(l); } // produce the next data LC = true; C ->signal(l); // end critical section L ->release(); } } void consumer() { while (1) { L ->acquire(); // start critical section while (LC == false) { C ->wait(l); } // consume the next data LC = false; } //end critical section L ->release(); }

Message Passing Message is a special method of communication to transfer information or a signal from one domain to another. For multi-threading environments the domain is referred to as the boundary of a thread. Message passing, or MPI, (used in distributed computing, parallel l processing, etc.) A method to communicate between threads, or processes.

Messages Threads communication within a process is known as intra-process communication. Messages that reside in different processes use inter-process communication. To synchronize operation of threads, semaphores, locks, and condition variables are used. Synchronization primitives convey status and access information. To communicate data thread messaging is done.

Summary For synchronization, an understanding of atomic operations will help avoid deadlock and eliminate race conditions. Use a proper synchronization construct-based framework for threaded applications. Use higher-level synchronization constructs over primitive types (more OS support) An application cannot contain any possibility for a deadlock scenario. Threads can perform message passing using different approaches: intra-process, inter-process Important to understand the way threading features of third-party libraries are implemented. Different implementations may cause applications to fail in unexpected ways.