Concurrent Programming with OpenMP

Similar documents
Parallel and Distributed Computing

Parallel Programming in C with MPI and OpenMP

Parallel Programming. OpenMP Parallel programming for multiprocessors for loops

Programming Shared Memory Systems with OpenMP Part I. Book

Parallel Programming in C with MPI and OpenMP

Shared Memory Parallel Programming

Parallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops

Parallel Programming in C with MPI and OpenMP

OpenMP programming. Thomas Hauser Director Research Computing Research CU-Boulder

Parallel Programming with OpenMP. CS240A, T. Yang

15-418, Spring 2008 OpenMP: A Short Introduction

ECE 574 Cluster Computing Lecture 10

OpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013

EPL372 Lab Exercise 5: Introduction to OpenMP

OpenMP I. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen

Parallel Numerical Algorithms

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

OpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

Concurrent Programming with OpenMP

DPHPC: Introduction to OpenMP Recitation session

Lecture 4: OpenMP Open Multi-Processing

ITCS 4/5145 Parallel Computing Test 1 5:00 pm - 6:15 pm, Wednesday February 17, 2016 Solutions Name:...

Distributed Systems + Middleware Concurrent Programming with OpenMP

OpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

Introduction to. Slides prepared by : Farzana Rahman 1

UvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono

1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008

POSIX threads CS 241. February 17, Copyright University of Illinois CS 241 Staff

OpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.

Mango DSP Top manufacturer of multiprocessing video & imaging solutions.

Allows program to be incrementally parallelized

An Introduction to OpenMP

OpenMP. A parallel language standard that support both data and functional Parallelism on a shared memory system

Multithreading in C with OpenMP

OpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.

Parallel Processing Top manufacturer of multiprocessing video & imaging solutions.

Introduction to OpenMP

Multi-core Architecture and Programming

/Users/engelen/Sites/HPC folder/hpc/openmpexamples.c

Parallel Programming: OpenMP

Chapter 4 Concurrent Programming

Shared Memory Parallelism - OpenMP

Overview: The OpenMP Programming Model

CS4961 Parallel Programming. Lecture 9: Task Parallelism in OpenMP 9/22/09. Administrative. Mary Hall September 22, 2009.

OpenMP Tutorial. Seung-Jai Min. School of Electrical and Computer Engineering Purdue University, West Lafayette, IN

High Performance Computing: Tools and Applications

Shared Memory: Virtual Shared Memory, Threads & OpenMP

HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP)

CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)

by system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by using function call

Assignment 1 OpenMP Tutorial Assignment

Topics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP

OpenMP! baseado em material cedido por Cristiana Amza!

OpenMP threading: parallel regions. Paolo Burgio

CS510 Operating System Foundations. Jonathan Walpole

Programming with Shared Memory. Nguyễn Quang Hùng

GCC Developers Summit Ottawa, Canada, June 2006

Computer Architecture

A brief introduction to OpenMP

Operating systems fundamentals - B06

Parallel and Distributed Programming. OpenMP

Introduction to OpenMP

Programming Shared Address Space Platforms using OpenMP

COMP4300/8300: The OpenMP Programming Model. Alistair Rendell. Specifications maintained by OpenMP Architecture Review Board (ARB)

COMP4300/8300: The OpenMP Programming Model. Alistair Rendell

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP

OpenMPand the PGAS Model. CMSC714 Sept 15, 2015 Guest Lecturer: Ray Chen

Lecture 4. Threads vs. Processes. fork() Threads. Pthreads. Threads in C. Thread Programming January 21, 2005

OpenMP. OpenMP. Portable programming of shared memory systems. It is a quasi-standard. OpenMP-Forum API for Fortran and C/C++

Advanced C Programming Winter Term 2008/09. Guest Lecture by Markus Thiele

Unix. Threads & Concurrency. Erick Fredj Computer Science The Jerusalem College of Technology

CS333 Intro to Operating Systems. Jonathan Walpole

Ricardo Rocha. Department of Computer Science Faculty of Sciences University of Porto

Lecture 19: Shared Memory & Synchronization

Shared Memory Parallelism using OpenMP

Scientific Computing

Parallel Programming

CSE 333 SECTION 9. Threads

CS510 Operating System Foundations. Jonathan Walpole

Shared Memory Programming Model

Parallel Programming using OpenMP

Parallel Programming using OpenMP

DPHPC: Introduction to OpenMP Recitation session

LSN 13 Linux Concurrency Mechanisms

HPC Workshop University of Kentucky May 9, 2007 May 10, 2007

High Performance Computing: Tools and Applications

OpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2.

Concurrency and Synchronization. ECE 650 Systems Programming & Engineering Duke University, Spring 2018

JANUARY 2004 LINUX MAGAZINE Linux in Europe User Mode Linux PHP 5 Reflection Volume 6 / Issue 1 OPEN SOURCE. OPEN STANDARDS.

Threads. Jo, Heeseung

COSC 6374 Parallel Computation. Introduction to OpenMP(I) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)

Introduction to OpenMP.

OpenMP examples. Sergeev Efim. Singularis Lab, Ltd. Senior software engineer

Parallel Computing Parallel Programming Languages Hwansoo Han

CSEN 602-Operating Systems, Spring 2018 Practice Assignment 2 Solutions Discussion:

Parallel Computing. Lecture 13: OpenMP - I

CS 470 Spring Mike Lam, Professor. OpenMP

PCS - Part Two: Multiprocessor Architectures

Transcription:

Concurrent Programming with OpenMP Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico March 7, 2016 CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 1 / 42

Outline Shared Memory Concurrent Programming Review of Operating Systems: PThreads OpenMP Parallel Clauses Private / Shared Variables CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 2 / 42

Shared-Memory Systems Uniform Memory Access (UMA) architecture also known as Symmetric Shared-Memory Multiprocessors (SMP) P P P P Cache Cache Cache Cache Main Memory I / O CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 3 / 42

Fork/Join Parallelism Cheap creation/termination of tasks invites for Incremental Parallelization: process of converting a sequential program to a parallel program a little bit at a time. initially only master thread is active master thread executes sequential code Fork: master thread creates or awakens additional threads to execute parallel code Join: at end of parallel code created threads die or are suspended Master Thread Other Threads Fork Join Time Fork Join CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 4 / 42

Fork/Join Parallelism read(a, B); x = initx(a, B); y = inity(a, B); z = initz(a, B); for(i = 0; i < N_ENTRIES; i++) x[i] = compx(y[i], z[i]); for(i = 1; i < N_ENTRIES; i++){ x[i] = solvex(x[i-1]); z[i] = x[i] + y[i];... finalize1(&x, &y, &z); finalize2(&x, &y, &z); finalize3(&x, &y, &z); CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 5 / 42

Processes and Threads Process A Process B Environment Global Data, Shared Code System Resources Interprocess Communication Thread 1 Private Data Stack Thread 2 Private Data Stack Thread 3 Private Data Stack... Thread n Private Data Stack CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 6 / 42

POSIX Threads (PThreads): Creation int pthread_create(pthread_t *thread, pthread_attr_t *attr, void *(*start_routine)(void*), void *arg) Example: pthread_t pt_worker; void *thread_function(void *args) { /* thread code */ pthread_create(&pt_worker, NULL, thread_function, (void *) thread_args); CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 7 / 42

PThreads: Termination and Synchronization int pthread_exit(void *value_ptr) int pthread_join(pthread_t thread, void **value_ptr) CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 8 / 42

PThread Example: Summing the Values in Matrix Rows #include <stdlib.h> #include <stdio.h> #include <unistd.h> #include <pthread.h> int buffer[n][size]; void *sum_row (void *ptr){ int index = 0, sum = 0; int *b = (int *) ptr; while (index < SIZE - 1) sum += b[index++]; /* sum row*/ b[index]=sum; /* store sum in last col. */ pthread_exit(null); int main(void){ int i,j; pthread_t tid[n]; for(i = 0; i < N; i++) if(pthread_create(&tid[i], NULL, sum_row, (void *) &(buffer[i]))!= 0){ printf("error creating thread, id=%d\n", i); exit(-1); else printf ("Created thread w/ id %d\n", tid[i]); for(i = 0; i < N; i++) pthread_join(tid[i], NULL); printf("all threads have concluded\n"); for(i = 0; i < N; i++){ for(j = 0; j < SIZE; j++) printf(" %d ", buffer[i][j]); printf ("Row %d \n", i); exit(0); for(i = 0; i < N; i++) for(j = 0; j < SIZE-1; j++) buffer[i][j] = rand()%10; CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 9 / 42

PThreads: Synchronization int count; void *sum_row(void *ptr){ int index = 0, sum = 0; int *b = (int *) ptr; while(index < SIZE - 1) sum += b[index++]; /* sum row */ b[index] = sum; /* store sum in last col. */ count++; pthread_exit(null); Problem? CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 10 / 42

PThreads: Synchronization int pthread_mutex_init(pthread_mutex_t *mutex, pthread_mutexattr_t *attr); int pthread_mutex_lock(pthread_mutex_t *mutex); int pthread_mutex_unlock(pthread_mutex_t *mutex); Example: pthread_mutex_t count_lock; pthread_mutex_init(&count_lock, NULL); pthread_mutex_lock(&count_lock); atomic_function(); pthread_mutex_unlock(&count_lock); CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 11 / 42

PThreads: Synchronization Example int count; pthread_mutex_t count_lock; void *sum_row(void *ptr){ int index = 0, sum = 0; int *b = (int *) ptr; main() { /*...*/ pthread_mutex_init(&count_lock, NULL); while(index < SIZE - 1) sum += b[index++]; /* sum row */ b[index]=sum; /* store sum in last col. */ pthread_mutex_lock(&count_lock); count++; pthread_mutex_unlock(&count_lock); pthread_exit(null); CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 12 / 42

OpenMP What is OpenMP? Open specification for Multi-Threaded, Shared Memory Parallelism Standard Application Programming Interface (API): Preprocessor (compiler) directives Library Calls Environment Variables More info at www.openmp.org CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 13 / 42

OpenMP vs Threads (Supposedly) Better than threads: Simpler programming model Separate a program into serial and parallel regions, rather than T concurrently-executing threads Similar to threads: Programmer must detect dependencies Programmer must prevent data races CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 14 / 42

Parallel Programming Recipes Threads: 1 Start with a parallel algorithm 2 Implement, keeping in mind: Data races Synchronization Threading syntax 3 Test & Debug 4 Goto step 2 CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 15 / 42

Parallel Programming Recipes Threads: 1 Start with a parallel algorithm 2 Implement, keeping in mind: Data races Synchronization Threading syntax 3 Test & Debug 4 Goto step 2 OpenMP: 1 Start with some algorithm 2 Implement serially, ignoring: Data races Synchronization Threading syntax 3 Test & Debug 4 Automagically parallelize with relatively few annotations that specify parallelism and synchronization CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 15 / 42

OpenMP Directives Parallelization directives: parallel region parallel for parallel sections task Data environment directives: shared, private, threadprivate, reduction, etc. Synchronization directives: barrier, critical CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 16 / 42

C / C++ Directives Format #pragma omp directive-name [clause,...] Case sensitive Long directive lines may be continued on succeeding lines by escaping the newline character with a \ at the end of the directive line Always apply to the next statement, which must be a structured block. Examples: #pragma omp... statement #pragma omp... { statement1; statement2; statement3; CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 17 / 42

Parallel Region #pragma omp parallel [clauses] Creates N parallel threads All execute subsequent block All wait for each other at the end of executing the block Barrier synchronization CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 18 / 42

How Many Threads? The number of threads created is determined by, in order of precedence: Use of omp set num threads() library function Setting of the OMP NUM THREADS environment variable Implementation default - usually the number of CPUs Possible to query number of CPUs: int omp_get_num_procs (void) CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 19 / 42

Parallel Region Example main() { printf("serial Region 1\n"); omp_set_num_threads(4); #pragma omp parallel { printf("parallel Region\n"); printf("serial Region 2\n"); Output? CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 20 / 42

Thread Count and Id API #include <omp.h> int omp_get_thread_num() int omp_get_num_threads() void omp_set_num_threads(int num) Example: #pragma omp parallel { if(!omp_get_thread_num() ) master(); else slave(); CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 21 / 42

Work Sharing Directives Always occur within a parallel region Divide the execution of the enclosed code region among the members of the team Do not create new threads Two main directives are parallel for parallel section CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 22 / 42

Parallel For #pragma omp parallel #pragma omp for [clauses] for( ; ; ) {... Each thread executes a subset of the iterations All threads synchronize at the end of parallel for Restrictions No data dependencies between iterations Program correctness must not depend upon which thread executes a particular iteration Paradigm of Data Parallelism. CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 23 / 42

Handy Shortcut #pragma omp parallel #pragma omp for for ( ; ; ) {... is equivalent to #pragma omp parallel for for ( ; ; ) {... CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 24 / 42

PThread Example Revisited #include <stdlib.h> #include <stdio.h> #include <unistd.h> #include <pthread.h> int buffer[n][size]; void *sum_row (void *ptr){ int index = 0, sum = 0; int *b = (int *) ptr; while (index < SIZE - 1) sum += b[index++]; /* sum row*/ b[index]=sum; /* store sum in last col. */ pthread_exit(null); int main(void){ int i,j; pthread_t tid[n]; for(i = 0; i < N; i++) if(pthread_create(&tid[i], 0, sum_row, (void *) &(buffer[i]))!= 0){ printf("error creating thread, id=%d\n", i); exit(-1); else printf ("Created thread w/ id %d\n", i); for(i = 0; i < N; i++) pthread_join(tid[i], NULL); printf("all threads have concluded\n"); for(i = 0; i < N; i++){ for(j = 0; j < SIZE; j++) printf(" %d ", buffer[i][j]); printf ("Row %d \n", i); exit(0); for(i = 0; i < N; i++) for(j = 0; j < SIZE-1; j++) buffer[i][j] = rand()%10; CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 25 / 42

PThread Example Revisited #include <stdlib.h> #include <stdio.h> #include <unistd.h> #include <omp.h> #pragma omp parallel for for(i = 0; i < N; i++) sum_row(buffer[i]); int buffer[n][size]; void *sum_row (void *ptr){ int index = 0, sum = 0; int *b = (int *) ptr; while (index < SIZE - 1) sum += b[index++]; /* sum row*/ b[index]=sum; /* store sum in last col. */ int main(void){ int i,j; printf("all threads have concluded\n"); for(i = 0; i < N; i++){ for(j = 0; j < SIZE; j++) printf(" %d ", buffer[i][j]); printf ("Row %d \n", i); exit(0); for(i = 0; i < N; i++) for(j = 0; j < SIZE-1; j++) buffer[i][j] = rand()%10; CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 26 / 42

Multiple Work Sharing Directives May occur within the same parallel region: #pragma omp parallel { #pragma omp for for( ; ; ) {... #pragma omp for for( ; ; ) {... Implicit barrier at the end of each for. CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 27 / 42

Parallel Sections Functional Parallelism: several blocks are executed in parallel #pragma omp parallel { #pragma omp sections { #pragma omp section { a=...; b=...; #pragma omp section /* <- delimiter! */ { c=...; d=...; #pragma omp section { e=...; f=...; #pragma omp section { g=...; h=...; /*omp end sections*/ /*omp end parallel*/ CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 28 / 42

Handy Shortcut #pragma omp parallel #pragma omp sections {... is equivalent to #pragma omp parallel sections {... CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 29 / 42

OpenMP Memory Model Concurrent programs access two types of data Shared data, visible to all threads Private data, visible to a single thread (often stack-allocated) Threads: Global variables are shared Local variables are private CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 30 / 42

OpenMP Memory Model Concurrent programs access two types of data Threads: Shared data, visible to all threads Private data, visible to a single thread (often stack-allocated) Global variables are shared Local variables are private OpenMP: All variables are by default shared. Some exceptions: the loop variable of a parallel for is private stack (local) variables in called subroutines are private By using data directives, some variables can be made private or given other special characteristics. CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 30 / 42

Private Variables #pragma omp parallel for private( list ) Makes a private copy for each thread for each variable in the list. No storage association with original object All references are to the local object Values are undefined on entry and exit Also applies to other region and work-sharing directives. CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 31 / 42

Shared Variables #pragma omp parallel for shared ( list ) Similarly, there is a shared data directive. Shared variables exist in a single location and all threads can read and write it It is the programmer s responsibility to ensure that all multiple threads properly access shared variables (will discuss synchronization next) CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 32 / 42

Example PThread vs OpenMP PThreads OpenMP // shared, globals int n, *x, *y; void loop() { int i; // private, stack for(i = 0; i < n; i++) x[i] += y[i]; #pragma omp parallel \ shared(n,x,y) private(i) { #pragma omp for for(i = 0; i < n; i++) x[i] += y[i]; CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 33 / 42

Example PThread vs OpenMP PThreads OpenMP // shared, globals int n, *x, *y; void loop() { int i; // private, stack #pragma omp parallel for { for(i = 0; i < n; i++) x[i] += y[i]; for(i = 0; i < n; i++) x[i] += y[i]; CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 34 / 42

Example of private Clause for(i = 0; i < n; i++) for(j = 0; j < n; j++) a[i][j] = b[i][j] + c[i][j]; Make outer loop parallel, to reduce number of forks/joins. Give each thread its own private copy of variable j. CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 35 / 42

Example of private Clause for(i = 0; i < n; i++) for(j = 0; j < n; j++) a[i][j] = b[i][j] + c[i][j]; Make outer loop parallel, to reduce number of forks/joins. Give each thread its own private copy of variable j. #pragma omp parallel for private(j) for(i = 0; i < n; i++) for(j = 0; j < n; j++) a[i][j] = b[i][j] + c[i][j]; CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 36 / 42

firstprivate / lastprivate Clauses As mentioned, values of private variables are undefined on entry and exit. A private variable within a region has no storage association with the same variable outside of the region firstprivate (list) Variables in list are initialized with the value the original variable had before entering the parallel construct lastprivate (list) The thread that executes the sequentially last iteration or section updates the value of the variables in list CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 37 / 42

Example of firstprivate / lastprivate Clauses main() { a = 1; #pragma omp parallel for private(i), firstprivate(a), lastprivate(b) for (i = 0; i < n; i++) {... b = a + i; /* a undefined, unless declared firstprivate */... a = b; /* b undefined, unless declared lastprivate */ CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 38 / 42

threadprivate Variables Private variables are private on a parallel region basis. threadprivate variables are global variables that are private throughout the execution of the program. #pragma omp threadprivate(x) Initial data is undefined, unless copyin is used copyin (list) data of the master thread is copied to the threadprivate copies CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 39 / 42

Example of threadprivate Clause #include <omp.h> int a, b, i, tid; float x; #pragma omp threadprivate(a, x) main () { printf("1st Parallel Region:\n"); #pragma omp parallel private(b,tid) { tid = omp_get_thread_num(); a = tid; b = tid; x = 1.1 * tid +1.0; printf("thread %d: a,b,x= %d %d %f\n",tid,a,b,x); /* end of parallel section */ printf("2nd Parallel Region:\n"); #pragma omp parallel private(tid) { tid = omp_get_thread_num(); printf("thread %d: a,b,x = %d %d %f\n",tid,a,b,x); /* end of parallel section */ CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 40 / 42

Review Shared Memory Concurrent Programming Review of Operating Systems: PThreads OpenMP Parallel Clauses Private / Shared Variables CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 41 / 42

Next Class More on OpenMP: Synchronism Conditional Parallelism Reduction Clause Scheduling Options CPD (DEI / IST) Parallel and Distributed Computing 7 2016-03-07 42 / 42