OpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

Similar documents
OpenMP I. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen

OpenMP. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen

OpenMP - III. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

OpenMP. Dr. William McDoniel and Prof. Paolo Bientinesi WS17/18. HPAC, RWTH Aachen

Data Environment: Default storage attributes

Topics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP

DPHPC: Introduction to OpenMP Recitation session

Lecture 4: OpenMP Open Multi-Processing

ECE 574 Cluster Computing Lecture 10

CS 470 Spring Mike Lam, Professor. OpenMP

EPL372 Lab Exercise 5: Introduction to OpenMP

CS 470 Spring Mike Lam, Professor. OpenMP

Introduction to OpenMP

COSC 6374 Parallel Computation. Introduction to OpenMP. Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)

OpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018

OpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2.

OpenMP Fundamentals Fork-join model and data environment

OpenMP examples. Sergeev Efim. Singularis Lab, Ltd. Senior software engineer

OpenMP loops. Paolo Burgio.

Parallel Programming

OpenMP C and C++ Application Program Interface Version 1.0 October Document Number

High Performance Computing: Tools and Applications

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

OpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013

Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG

Introduction to OpenMP

Open Multi-Processing: Basic Course

Distributed Systems + Middleware Concurrent Programming with OpenMP

Introduction to OpenMP

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono

Parallel Programming in C with MPI and OpenMP

DPHPC: Introduction to OpenMP Recitation session

OpenMP threading: parallel regions. Paolo Burgio

Parallel Programming with OpenMP. CS240A, T. Yang

Session 4: Parallel Programming with OpenMP

Programming Shared Memory Systems with OpenMP Part I. Book

OpenMPand the PGAS Model. CMSC714 Sept 15, 2015 Guest Lecturer: Ray Chen

A Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh

Barbara Chapman, Gabriele Jost, Ruud van der Pas

Jukka Julku Multicore programming: Low-level libraries. Outline. Processes and threads TBB MPI UPC. Examples

CS691/SC791: Parallel & Distributed Computing

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

Parallel Programming in C with MPI and OpenMP

Parallel Processing Top manufacturer of multiprocessing video & imaging solutions.

Shared Memory Programming with OpenMP

Introduction to OpenMP.

Overview: The OpenMP Programming Model

High Performance Computing: Tools and Applications

Computer Architecture

OpenMP - Introduction

OpenMP programming. Thomas Hauser Director Research Computing Research CU-Boulder

Shared Memory Programming Model

15-418, Spring 2008 OpenMP: A Short Introduction

Raspberry Pi Basics. CSInParallel Project

Parallel programming using OpenMP

Introduction to Standard OpenMP 3.1

Shared Memory Parallelism using OpenMP

Introduction to. Slides prepared by : Farzana Rahman 1

Shared Memory Programming with OpenMP

Introduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah

Parallel Numerical Algorithms

Advanced C Programming Winter Term 2008/09. Guest Lecture by Markus Thiele

Introduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah

Shared Memory Parallelism - OpenMP

Parallel Programming in C with MPI and OpenMP

OpenMP 4.0/4.5: New Features and Protocols. Jemmy Hu

OpenMP Tutorial. Dirk Schmidl. IT Center, RWTH Aachen University. Member of the HPC Group Christian Terboven

OpenMP, Part 2. EAS 520 High Performance Scientific Computing. University of Massachusetts Dartmouth. Spring 2015

A brief introduction to OpenMP

OpenMP. A parallel language standard that support both data and functional Parallelism on a shared memory system

CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)

Speeding Up Reactive Transport Code Using OpenMP. OpenMP

Scientific Computing

Programming with Shared Memory PART II. HPC Fall 2012 Prof. Robert van Engelen

HPC Workshop University of Kentucky May 9, 2007 May 10, 2007

1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008

Introduction to OpenMP

Module 11: The lastprivate Clause Lecture 21: Clause and Routines. The Lecture Contains: The lastprivate Clause. Data Scope Attribute Clauses

Parallel Programming

OPENMP OPEN MULTI-PROCESSING

COSC 6374 Parallel Computation. Introduction to OpenMP(I) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)

OpenMP. Application Program Interface. CINECA, 14 May 2012 OpenMP Marco Comparato

Parallelising Scientific Codes Using OpenMP. Wadud Miah Research Computing Group

EE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California

Chip Multiprocessors COMP Lecture 9 - OpenMP & MPI

Introduc)on to OpenMP

Programming with Shared Memory PART II. HPC Fall 2007 Prof. Robert van Engelen

JANUARY 2004 LINUX MAGAZINE Linux in Europe User Mode Linux PHP 5 Reflection Volume 6 / Issue 1 OPEN SOURCE. OPEN STANDARDS.

Multithreading in C with OpenMP

Concurrent Programming with OpenMP

OpenMP 4. CSCI 4850/5850 High-Performance Computing Spring 2018

Parallel Programming using OpenMP

Parallel Programming using OpenMP

OpenMP. OpenMP. Portable programming of shared memory systems. It is a quasi-standard. OpenMP-Forum API for Fortran and C/C++

Mango DSP Top manufacturer of multiprocessing video & imaging solutions.

[Potentially] Your first parallel application

OpenMP 4.5: Threading, vectorization & offloading

Lecture 2 A Hand-on Introduction to OpenMP

ITCS 4/5145 Parallel Computing Test 1 5:00 pm - 6:15 pm, Wednesday February 17, 2016 Solutions Name:...

Alfio Lazzaro: Introduction to OpenMP

Transcription:

OpenMP - II Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16

OpenMP References Using OpenMP: Portable Shared Memory Parallel Programming. The MIT Press, 2007. B. Chapman, G. Jost, R. van der Pas. Introduction to OpenMP by Tim Mattson. http://openmp.org/mp-documents/intro_to_openmp_mattson.pdf Diego Fabregat OpenMP - II 2 / 30

Previously on... Parallel Programming OpenMP: Fork-join programming model Master thread Parallel Regions Team of threads Initial thread Workers Sequential Regions Time Diego Fabregat OpenMP - II 3 / 30

Previously on... Parallel Programming OpenMP API consists of: Compiler directives Library routines Environment variables Header: #include <omp.h> Compilation: [gcc icc] -fopenmp <source.c> -o <executable.x> Diego Fabregat OpenMP - II 4 / 30

Previously on... Parallel Programming The parallel construct and the SPMD approach The most important construct: #pragma omp parallel [clause [, clause]...] Creates a team of threads to execute the parallel region (fork) Has an implicit barrier at the end of the region (join) It does not distribute the work So far: we distributed the work based on thread id and number of threads (SPMD or à la MPI) Diego Fabregat OpenMP - II 5 / 30

Previously on... Parallel Programming Shared vs private variables Shared: single instance that every thread can read/write Private: each thread has its own copy and others cannot read/write them (unless a pointer to them is given) So far: shared or private depending on where they were declared See, for instance, 02b.axpy-omp.c Diego Fabregat OpenMP - II 6 / 30

Exercise 3 (pi.c) Mathematically, we know that: π = 1 0 4 1 + x 2 dx Numerically, we can approximate the integral as the sum of rectangles: π N i=0 4 1 + x 2 i x where each rectangle has width x and height F(x i ) at the middle of the interval i. Source: Timothy Mattson, Intel. Diego Fabregat OpenMP - II 7 / 30

Exercise 3 (pi.c) Use the #pragma omp parallel construct to parallelize the code below so that 4 threads collaborate in the computation of π. Pay attention to shared vs private variables! #include <stdio.h> #include <stdlib.h> #define NUM_STEPS 10000 int main( void ) { int i; double sum = 0.0, pi, x_i; double step = 1.0/NUM_STEPS; for ( i = 0; i < NUM_STEPS; i++ ) { x_i = (i + 0.5) * step; sum = sum + 4.0 / (1.0 + x_i * x_i); pi = sum * step; printf("pi: %.15e\n", pi); return 0; Hints: #pragma omp parallel num_threads(...) omp_set_num_threads(...) omp_get_num_threads(...) omp_get_thread_num(...) Challenges: split iterations of the loop among threads create an accumulator for each thread to hold partial sums, which can later be combined to generate the global sum Diego Fabregat OpenMP - II 8 / 30

Parallel construct #pragma omp parallel [clause [, clause]...] structured-block The following clauses apply: if num_threads shared, private, firstprivate, default reduction copyin proc_bind Diego Fabregat OpenMP - II 9 / 30

Parallel construct if clause Conditional parallel execution Avoid parallelization overhead if little work to be parallelized Syntax: if (scalar-logical-expression) If the logical expression evaluates to true: execute in parallel Example: int main(... ) { [...] #pragma omp parallel if (n > 1000) { [...] [...] Diego Fabregat OpenMP - II 10 / 30

Parallel construct num_threads clause Specifies how many threads should execute the region The runtime may decide to use less threads than specified (never more) Syntax: num_threads (scalar-logical-expression) Example: int main(... ) { [...] #pragma omp parallel num_threads (nths) { [...] [...] Diego Fabregat OpenMP - II 11 / 30

Parallel construct Data-sharing attributes Shared-memory programming model Variables are shared by default Shared: All variables visible upon entry of the construct Static variables Private: Variables declared within a parallel region (Stack) variables in functions called from within a parallel region Diego Fabregat OpenMP - II 12 / 30

Parallel construct int N = 10; int main( void ) { double array[n]; #pragma omp parallel { int i, myid; double thread_array[n]; [...] for ( i = 0; i < N; i++ ) thread_array[i] = myid * array[i]; function( thread_array ); double function( double arg ) { static int cnt; double local_array[n]; [...] Within parallel region: Note: Shared: array, N, cnt Private: i, myid, thread_array, local_array Lexical extent vs dynamic/runtime extent Diego Fabregat OpenMP - II 13 / 30

Parallel construct General rules for data-sharing clauses Clauses default, private, shared, firstprivate allow changing the default behavior The clauses consist of the keyword and a comma-separated list of variables in parenthesis. For instance: private(a,b) Variables must be visible in the lexical extent of the directive A variable can only appear in one clause Exception: a variable can appear in both firstprivate and lastprivate (coming later) Diego Fabregat OpenMP - II 14 / 30

Parallel construct shared clause Syntax: shared (item-list) Specifies that variables in the comma-separated list are shared among threads One single instance, each thread can freely read and modify its value When the parallel region finishes, the final values reside in the shared space where the master thread will be able to access it CAREFUL: Every thread can access it, race conditions may occur. Synchronize/order the access when needed (e.g., critical construct) Diego Fabregat OpenMP - II 15 / 30

Parallel construct private clause Syntax: private (item-list) Specifies data that will be replicated so that each thread has a local copy Changes made to this data by one thread are not visible to the other threads Values are undefined upon entry to and exit from the construct The storage lasts until the block in which it is created exists Diego Fabregat OpenMP - II 16 / 30

Parallel construct firstprivate clause Syntax: firstprivate (item-list) Variables in the list are private Variables are also initialized with the value the corresponding original variable had when the construct was encountered Diego Fabregat OpenMP - II 17 / 30

Parallel construct default clause Syntax: default ( shared none ) default(shared) causes all variables to be shared by default default(none) requires that each variable must have its data-sharing attribute explicitly listed in a data-sharing clause Only one single default clause may be specified It is considered a good programming practice to always use default(none) to enforce the explicit listing of data-sharing attributes Diego Fabregat OpenMP - II 18 / 30

Exercise on data sharing: Think about it! Given the following sample code int A=1, B=1, C=1; #pragma omp parallel private(b) firstprivate(c) { [...] Are A, B, and C shared or private to each thread inside the parallel region? What are the initial values inside the region? What are the values after the parallel region? Diego Fabregat OpenMP - II 19 / 30

Parallel construct reduction clause Specifies a reduction operation Syntax: reduction (operator:list) Predefined operators: +, *, -, &,, ˆ, &&,, max, min Each reduction operator has an initializer and a combiner. For instance: +. Initializer: 0; combiner: accum += var *. Initializer: 1; combiner: accum *= var Arrays are not allowed in a reduction clause! Diego Fabregat OpenMP - II 20 / 30

Parallel construct reduction clause For each list item, each thread gets a private copy The private copy is initialized with the initializer value At the end of the region, the original item is updated with the values of the private copies using the combiner See 04.parallel-reduction.c Diego Fabregat OpenMP - II 21 / 30

Exercise 3 (pi.c) - reduction clause This time use the parallel construct in combination with the reduction clause to compute π. #include <stdio.h> Hints: #include <stdlib.h> #pragma omp parallel #define NUM_STEPS 10000 reduction(...) int main( void ) { int i; double sum = 0.0, pi, x_i; double step = 1.0/NUM_STEPS; for ( i = 0; i < NUM_STEPS; i++ ) { x_i = (i + 0.5) * step; sum = sum + 4.0 / (1.0 + x_i * x_i); pi = sum * step; printf("pi: %.15e\n", pi); return 0; omp_get_num_threads(...) omp_get_thread_num(...) Challenges: split iterations of the loop among threads create an accumulator for each thread to hold partial sums, which can later be combined to generate the global sum. Let reduction take care of this! Diego Fabregat OpenMP - II 22 / 30

Worksharing constructs Distribute the execution of code among the team members Loop construct Sections construct Single construct Diego Fabregat OpenMP - II 23 / 30

Loop construct Syntax: Example: #pragma omp for [clause [, clause]...] for-loop #pragma omp parallel { #pragma omp for for (i = 0; i < n; i++) [...] The iterations of the associated loop are executed in parallel by the team of threads that encounter it Diego Fabregat OpenMP - II 24 / 30

Loop construct - Canonical form Restrictions in the form of the loop to simplify the compiler optimizations Only for loops Number of iterations can be counted: integer counter which is incremented (decremented) until some specified upper (lower) bound is reached Remember: one entry point, one exit point No break statement continue and exit allowed. Diego Fabregat OpenMP - II 25 / 30

Loop construct - Canonical form Loops must have the canonical form for (init-expr ; var relop b ; incr-expr) where: init-expr: initializes the loop counter var via an integer expression relop is one of: <, <=, >, >= b is also an integer expression incr-expr: increments or decrements var by an integer amount: ++var, var++, --var, var-- var += incr, var -= incr var = var + incr, var = var - incr Example: for (i = 0; i < n; i += 4) Diego Fabregat OpenMP - II 26 / 30

Exercise 2 (axpy.c) - revisited Use the parallel and for constructs to parallelize the code below so that 4 threads collaborate in the computation of z. #include <stdio.h> #include <stdlib.h> int main(int argc, char *argv[]) { int i, N = 10; double x[n], y[n], z[n], alpha = 5.0; for (i = 0; i < N; i++) { x[i] = i; y[i] = 2.0*i; Hints: #pragma omp parallel num_threads(...) #pragma omp for for (i = 0; i < N; i++) z[i] = alpha * x[i] + y[i]; // Print results. Should output [0, 7, 14, 21,...] return 0; Diego Fabregat OpenMP - II 27 / 30

Exercise 3 (pi.c) - reduction + for Use the parallel and for constructs to parallelize the code below. Use also the reduction clause. #include <stdio.h> #include <stdlib.h> #define NUM_STEPS 10000 int main( void ) { int i; double sum = 0.0, pi, x_i; double step = 1.0/NUM_STEPS; Hints: #pragma omp parallel reduction(...) #pragma omp for for ( i = 0; i < NUM_STEPS; i++ ) { x_i = (i + 0.5) * step; sum = sum + 4.0 / (1.0 + x_i * x_i); pi = sum * step; printf("pi: %.15e\n", pi); return 0; Diego Fabregat OpenMP - II 28 / 30

Loop construct - Clauses #pragma omp for [clause [, clause]...] The following clauses apply: private, firstprivate, lastprivate reduction schedule collapse ordered nowait Diego Fabregat OpenMP - II 29 / 30

Loop construct - Clauses Data-sharing attributes private and firstprivate as in the parallel construct Important: the iterator variable is made private by default. That is, in for (i = 0; i < n; i += 4) i is made private automatically lastprivate (list): the last value of a private variable listed in this clause is available after the construct completes By last iteration we mean the value from the iteration that would come last in a sequential execution See 05.loop-lastprivate.c Diego Fabregat OpenMP - II 30 / 30