Introduction to Parallel Programming Part 4 Confronting Race Conditions

Similar documents
CS510 Advanced Topics in Concurrency. Jonathan Walpole

CS 470 Spring Mike Lam, Professor. Advanced OpenMP

CS 470 Spring Mike Lam, Professor. Advanced OpenMP

Synchronising Threads

Concurrency, Thread. Dongkun Shin, SKKU

Parallel Programming using OpenMP

Parallel Programming using OpenMP

OpenMP and more Deadlock 2/16/18

Review: Easy Piece 1

Shared Memory Parallel Programming with Pthreads An overview

Shared Memory: Virtual Shared Memory, Threads & OpenMP

OpenMP. A parallel language standard that support both data and functional Parallelism on a shared memory system

Concurrency: Locks. Announcements

Programming in Parallel COMP755

Threads and Parallelism in Java

CS 537 Lecture 11 Locks

DPHPC: Introduction to OpenMP Recitation session

Locks. Dongkun Shin, SKKU

10/17/ Gribble, Lazowska, Levy, Zahorjan 2. 10/17/ Gribble, Lazowska, Levy, Zahorjan 4

! What is a deadlock? ! What causes a deadlock? ! How do you deal with (potential) deadlocks? Maria Hybinette, UGA

What is the Race Condition? And what is its solution? What is a critical section? And what is the critical section problem?

Lecture 8: September 30

Multiple Inheritance. Computer object can be viewed as

Data Races and Deadlocks! (or The Dangers of Threading) CS449 Fall 2017

Ricardo Rocha. Department of Computer Science Faculty of Sciences University of Porto

!! What is a deadlock? !! What causes a deadlock? !! How do you deal with (potential) deadlocks? Maria Hybinette, UGA

Operating Systems. Synchronization

Parallel Programming in C with MPI and OpenMP

Overview. CMSC 330: Organization of Programming Languages. Concurrency. Multiprocessors. Processes vs. Threads. Computation Abstractions

CS 31: Introduction to Computer Systems : Threads & Synchronization April 16-18, 2019

CS420: Operating Systems. Process Synchronization

Parallel Programming in C with MPI and OpenMP

Process Synchronization and Communication

Concurrent Server Design Multiple- vs. Single-Thread

CS 450 Exam 2 Mon. 4/11/2016

Concurrency: Mutual Exclusion (Locks)

Synchronization. CS61, Lecture 18. Prof. Stephen Chong November 3, 2011

Parallel Programming in C with MPI and OpenMP

ENCM 501 Winter 2019 Assignment 9

Deadlock CS 241. March 19, University of Illinois

CS4961 Parallel Programming. Lecture 12: Advanced Synchronization (Pthreads) 10/4/11. Administrative. Mary Hall October 4, 2011

Synchronization Spinlocks - Semaphores

SSC - Concurrency and Multi-threading Java multithreading programming - Synchronisation (II)

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Steve Gribble. Synchronization. Threads cooperate in multithreaded programs

CSE 451 Midterm 1. Name:

CMSC 132: Object-Oriented Programming II

Shared Memory Programming: Threads and OpenMP. Lecture 6

CPSC 261 Midterm 2 Thursday March 17 th, 2016

The University of Texas at Arlington

Parallel Programming. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Martin Kruliš, v

Parallel Programming with OpenMP. CS240A, T. Yang

EECE.4810/EECE.5730: Operating Systems Spring 2017 Homework 2 Solution

Parallel Programming. OpenMP Parallel programming for multiprocessors for loops

CS 318 Principles of Operating Systems

POSIX / System Programming

Parallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops

Microkernel/OS and Real-Time Scheduling

Shared Memory Programming Model

Midterm Exam Amy Murphy 19 March 2003

Announcements. Cond Var: Review

Introduction to Embedded Systems

OPENMP TIPS, TRICKS AND GOTCHAS

CS 261 Fall Mike Lam, Professor. Threads

Operating Systems. Designed and Presented by Dr. Ayman Elshenawy Elsefy

Operating Systems ECE344. Ding Yuan

COSC 462. Parallel Algorithms. The Design Basics. Piotr Luszczek

Shared-Memory Programming

CMSC Computer Architecture Lecture 12: Multi-Core. Prof. Yanjing Li University of Chicago

HPCSE - I. «Introduction to multithreading» Panos Hadjidoukas

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Hank Levy 412 Sieg Hall

Concurrent Programming Lecture 10

Overview: The OpenMP Programming Model

CS 345 Operating Systems. Tutorial 2: Treasure Room Simulation Threads, Shared Memory, Synchronization

Agenda. Highlight issues with multi threaded programming Introduce thread synchronization primitives Introduce thread safe collections

Synchronization I. Jo, Heeseung

Java Threads and intrinsic locks

Threads and Synchronization. Kevin Webb Swarthmore College February 15, 2018

Parallel Numerical Algorithms

Multithreading in C with OpenMP

Motivation and definitions Processes Threads Synchronization constructs Speedup issues

Chapter 6: Process Synchronization. Operating System Concepts 9 th Edit9on

POSIX PTHREADS PROGRAMMING

CS 571 Operating Systems. Midterm Review. Angelos Stavrou, George Mason University

Shared Memory Programming with OpenMP. Lecture 8: Memory model, flush and atomics

Parallel Programming Languages COMP360

Programming Languages

Threads Cannot Be Implemented As a Library

Module 5: Concurrent and Parallel Programming

Condition Variables & Semaphores

HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP)

CS-345 Operating Systems. Tutorial 2: Grocer-Client Threads, Shared Memory, Synchronization

Synchronization 1. Synchronization

Operating Systems. Thread Synchronization Primitives. Thomas Ropars.

OpenMP Introduction. CS 590: High Performance Computing. OpenMP. A standard for shared-memory parallel programming. MP = multiprocessing

So far, we know: Wednesday, October 4, Thread_Programming Page 1

Quiz Answers. CS 537 Lecture 9 Deadlock. What can go wrong? Readers and Writers Monitor Example

Chip Multiprocessors COMP Lecture 9 - OpenMP & MPI

Dealing with Issues for Interprocess Communication

College of Computer & Information Science Spring 2010 Northeastern University 26 January 2010

Transcription:

Introduction to Parallel Programming Part 4 Confronting Race Conditions Intel Software College

Objectives At the end of this module you should be able to: Give practical examples of ways that threads may contend for shared resources Write an OpenMP program that contains a reduction Describe what race conditions are and explain how to eliminate them Define deadlock and explain ways to prevent it 2

Motivating Example double area, pi, x; int i, n;... area = 0.0; for (i = 0; i < n; i++) { x = (i + 0.5)/n; area += 4.0/(1.0 + x*x); } pi = area / n; What happens when we make the for loop parallel? 3

Race Condition A race condition is nondeterministic behavior caused by the times at which two or more threads access a shared variable For example, suppose both Thread A and Thread B are executing the statement area += 4.0 / (1.0 + x*x); 4

One Timing Correct Sum Value of area Thread A Thread B 11.667 +3.765 15.432 15.432 + 3.563 18.995 5

Another Timing Incorrect Sum Value of area Thread A Thread B 11.667 11.667 15.432 +3.765 + 3.563 15.230 6

Another Race Condition Example struct Node { struct Node *next; int data; }; struct List { struct Node *head; } void AddHead (struct List *list, struct Node *node) { node->next = list->head; list->head = node; } 7

Original Singly-Linked List list head node_a data next 8

Thread 1 after Stmt. 1 of AddHead list head node_b data next node_a data next 9

Thread 2 Executes AddHead list head node_c data next node_b data next node_a data next 10

Thread 1 After Stmt. 2 of AddHead node_c data list head next node_b data next node_a data next 11

Why Race Conditions Are Nasty Programs with race conditions exhibit nondeterministic behavior Sometimes give correct result Sometimes give erroneous result Programs often work correctly on trivial data sets and small number of threads Errors more likely to occur when number of threads and/or execution time increases Hence debugging race conditions can be difficult 12

Mutual Exclusion We can prevent the race conditions described earlier by ensuring that only one thread at a time references and updates shared variable or data structure Mutual exclusion refers to a kind of synchronization that allows only a single thread or process at a time to have access to a shared resource Mutual exclusion is implemented using some form of locking 13

Flags Don t Guarantee Mutual Exclusion int flag = 0; void AddHead (struct List *list, struct Node *node) { while (flag!= 0) /* wait */ ; flag = 1; node->next = list->head; list->head = node; flag = 0; } 14

Flags Don t Guarantee Mutual Exclusion int flag = 0; flag void AddHead (struct List *list, struct Node *node) { while (flag!= 0) /* wait */ ; flag = 1; node->next = list->head; list->head = node; flag = 0; } 0 Thread 1 15

Flags Don t Guarantee Mutual Exclusion int flag = 0; void AddHead (struct List *list, struct Node *node) { while (flag!= 0) /* wait */ ; flag = 1; node->next = list->head; list->head = node; flag = 0; } flag Thread 1 Thread 2 0 16

Flags Don t Guarantee Mutual Exclusion int flag = 0; void AddHead (struct List *list, struct Node *node) { while (flag!= 0) /* wait */ ; flag = 1; node->next = list->head; list->head = node; flag = 0; } flag Thread 1 Thread 2 1 17

Flags Don t Guarantee Mutual Exclusion int flag = 0; void AddHead (struct List *list, struct Node *node) { while (flag!= 0) /* wait */ ; flag = 1; node->next = list->head; list->head = node; flag = 0; } flag Thread 1 Thread 2 1 18

Flags Don t Guarantee Mutual Exclusion int flag = 0; void AddHead (struct List *list, struct Node *node) { while (flag!= 0) /* wait */ ; flag = 1; node->next = list->head; list->head = node; flag = 0; } flag Thread 1 Thread 2 0 19

Flags Don t Guarantee Mutual Exclusion int flag = 0; void AddHead (struct List *list, struct Node *node) { while (flag!= 0) /* wait */ ; flag = 1; node->next = list->head; list->head = node; flag = 0; } flag Thread 1 Thread 2 0 20

Locking Mechanism The previous method failed because checking the value of flag and setting its value were two distinct operations We need some sort of atomic test-and-set Operating system provides functions to do this The generic term lock refers to a synchronization mechanism used to control access to shared resources 21

Critical Sections A critical section is a portion of code that threads execute in a mutually exclusive fashion The critical pragma in OpenMP immediately precedes a statement or block representing a critical section Good news: critical sections eliminate race conditions Bad news: critical sections are executed sequentially More bad news: you have to identify critical sections yourself 22

Reminder: Motivating Example double area, pi, x; int i, n;... area = 0.0; for (i = 0; i < n; i++) { x = (i + 0.5)/n; area += 4.0/(1.0 + x*x); } pi = area / n; Where is the critical section? 23

Solution #1 double area, pi, x; int i, n;... area = 0.0; #pragma omp parallel for private(x) for (i = 0; i < n; i++) { x = (i + 0.5)/n; #pragma omp critical area += 4.0 / (1.0 + x*x); } pi = area / n; This ensures area will end up with the correct value. How can we do better? 24

Solution #2 double area, pi, tmp, x; int i, n;... area = 0.0; #pragma omp parallel for private(x,tmp) for (i = 0; i < n; i++) { x = (i + 0.5)/n; tmp = 4.0/(1.0 + x*x); #pragma omp critical area += tmp; } pi = area / n; This reduces amount of time spent in critical section. How can we do better? 25

Solution #3 double area, pi, tmp, x; int i, n;... area = 0.0; #pragma omp parallel private(tmp) { tmp = 0.0; #pragma omp for private (x) for (i = 0; i < n; i++) { x = (i + 0.5)/n; tmp += 4.0/(1.0 + x*x); } #pragma omp critical area += tmp; } pi = area / n; Why is this better? 26

Reductions Given associative binary operator the expression a 1 a 2 a 3... a n is called a reduction The -finding program performs a sum-reduction 27

OpenMP reduction Clause Reductions are so common that OpenMP provides a reduction clause for the parallel for pragma Eliminates need for Creating private variable Dividing computation into accumulation of local answers that contribute to global result 28

Solution #4 double area, pi, x; int i, n;... area = 0.0; #pragma omp parallel for private(x) \ reduction(+:area) for (i = 0; i < n; i++) { x = (i + 0.5)/n; area += 4.0/(1.0 + x*x); } pi = area / n; 29

Important: Lock Data, Not Code Locks should be associated with data objects Different data objects should have different locks Suppose lock associated with critical section of code instead of data object Mutual exclusion can be lost if same object manipulated by two different functions Performance can be lost if two threads manipulating different objects attempt to execute same function 30

Example: Hash Table Creation NULL NULL NULL NULL... NULL 31

Locking Code: Inefficient #pragma omp parallel for private (index) for (i = 0; i < elements; i++) { index = hash(element[i]); #pragma omp critical insert_element (element[i], index); } 32

Locking Data: Efficient /* Static variable */ Declaration omp_lock_t hash_lock[hash_table_size]; /* Inside function main */ Initialization for (i = 0; i < HASH_TABLE_SIZE; i++) omp_init_lock(&hash_lock[i]); void insert_element (ELEMENT e, int i) { Use omp_set_lock (&hash_lock[i]); /* Code to insert element e */ omp_unset_lock (&hash_lock[i]); } 33

Locks in General Threads Model If access to a shared resource is controlled by a synchronization mechanism, we say the resource is protected by a lock Examples of locks are semaphores, signals, mutexes, and monitors We ll use generic terms lock and unlock Different thread models have different functions/methods 34

Win32 Synchronization Primitives Semaphore Most general and slowest Can synchronize two or more threads from multiple processes Mutex Binary semaphore Faster than more general semaphore Critical section Synchronizes threads within same process Much faster than semaphore and mutex 35

Critical Sections with Pthreads /* Create mutex, a static variable */ pthread_mutex_t mutex_name = PTHREAD_MUTEX_INITIALIZER; /* Use mutex in program */ pthread_mutex_lock (&mutex_name); /* Crtical section operations */ pthread_mutex_unlock (&mutex_name); 36

Locks Are Dangerous Suppose a lock is used to guarantee mutually exclusive access to a shared variable Imagine two threads, each with its own critical section Thread A Thread B a += 5; b += 5; b += 7; a += 7; a += b; a += b; a += 11; b += 11; 37

Faulty Implementation Thread A lock (lock_a); What happens if threads are at this point at the same time? Thread B a += 5; b += 5; lock (lock_b); lock (lock_b); lock (lock_a); b += 7; a += 7; a += b; a += b; unlock (lock_b); unlock (lock_a); a += 11; b += 11; unlock (lock_a); unlock (lock_b); 38

Deadlock A situation involving two or more threads (processes) in which no thread may proceed because each is waiting for a resource held by another Can be represented by a resource allocation graph wants sem_b held by Thread A Thread A held by sem_a wants A graph of deadlock contains a cycle 39

More on Deadlocks A program exhibits a global deadlock if every thread is blocked A program exhibits local deadlock if only some of the threads in the program are blocked A deadlock is another example of a nondeterministic behavior exhibited by a parallel program Adding debugging output to detect source of deadlock can change timing and reduce chance of deadlock occurring 40

Four Conditions for Deadlock Mutually exclusive access to a resource Threads hold onto resources they have while they wait for additional resources Resources cannot be taken away from threads Cycle in resource allocation graph 41

Preventing Deadlock Eliminate one of four necessary conditions Don t allow mutually exclusive access to resource Don t allow threads to wait while holding resources Allow resources to be taken away from threads Make sure request allocation graph cannot have a cycle 42

Deadlock Prevention Strategies Don t allow mutually exclusive access to resource Make resource shareable Don t allow threads to wait while holding resources Allow resources to be taken away from threads. Ensure no cycle in request allocation graph. Only request resources when have none. That means only hold one resource at a time or request all resources at once. Allow preemption. Works for CPU and memory. Doesn t work for locks. Rank resources. Threads must acquire resources in order. 43

Correct Implementation Thread A lock (lock_a); a += 5; Threads must lock lock_a before lock_b Thread B lock (lock_a); lock (lock_b); lock (lock_b); b += 5; b += 7; a += 7; a += b; a += b; unlock (lock_b); unlock (lock_a); a += 11; b += 11; unlock (lock_a); unlock (lock_b); 44

Another Problem with Locks Every call to function lock should be matched with a call to unlock, representing the start and the end of the critical section A program may be syntactically correct (i.e., may compile) without having matching calls A programmer may forget the unlock call or may pass the wrong argument to unlock A thread that never releases a shared resource creates a deadlock 45

Deadlock Detection By keeping track of blocked threads, a run-time system can often detect global and local deadlocks Intel Thread Checker does lock-cycle analysis to detect the potential for deadlock Windows Vista has a wait chain traversal mechanism that can help debuggers diagnose deadlocks Java HotSpot VM has a deadlock detection utility 46

References Andrea C. Arpaci-Dusseau and Remzi H. Arpaci-Dusseau, Deadlock, CS 537, Introduction to Operating Systems, Computer Sciences Department, University of Wisconsin- Madison. Jim Beveridge and Robert Wiener, Multithreading Applications in Win32, Addison-Wesley (1997). Richard H. Carver and Kuo-Chung Tai, Modern Multithreading: Implementing, Testing, and Debugging Java and C++/Pthreads/ Win32 Programs, Wiley-Interscience (2006). Michael J. Quinn, Parallel Programming in C with MPI and OpenMP, McGraw-Hill (2004). Brent E. Rector and Joseph M. Newcomer, Win32 Programming, Addison-Wesley (1997). N. Wirth, Programming in Modula-2, Springer (1985). 47

48