OpenMP I. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen

Similar documents
OpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

OpenMP - III. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

Lecture 4: OpenMP Open Multi-Processing

OpenMP. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen

Data Environment: Default storage attributes

OpenMP Fundamentals Fork-join model and data environment

EPL372 Lab Exercise 5: Introduction to OpenMP

OpenMP. Dr. William McDoniel and Prof. Paolo Bientinesi WS17/18. HPAC, RWTH Aachen

Introduction to OpenMP

DPHPC: Introduction to OpenMP Recitation session

Topics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP

OpenMP examples. Sergeev Efim. Singularis Lab, Ltd. Senior software engineer

COSC 6374 Parallel Computation. Introduction to OpenMP. Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)

Overview: The OpenMP Programming Model

CS 470 Spring Mike Lam, Professor. OpenMP

Parallel Programming with OpenMP. CS240A, T. Yang

OpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2.

Introduction to OpenMP

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono

ECE 574 Cluster Computing Lecture 10

Distributed Systems + Middleware Concurrent Programming with OpenMP

High Performance Computing: Tools and Applications

Introduction to OpenMP

CS 470 Spring Mike Lam, Professor. OpenMP

Programming Shared Memory Systems with OpenMP Part I. Book

OpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013

OpenMP programming. Thomas Hauser Director Research Computing Research CU-Boulder

OpenMP. A parallel language standard that support both data and functional Parallelism on a shared memory system

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

A brief introduction to OpenMP

Shared Memory Programming Model

OpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Shared Memory Parallelism - OpenMP

CS691/SC791: Parallel & Distributed Computing

Parallel programming using OpenMP

Shared Memory Parallelism using OpenMP

Parallel Programming in C with MPI and OpenMP

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

Parallel Programming in C with MPI and OpenMP

Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG

OpenMPand the PGAS Model. CMSC714 Sept 15, 2015 Guest Lecturer: Ray Chen

Introduction to OpenMP

DPHPC: Introduction to OpenMP Recitation session

OpenMP threading: parallel regions. Paolo Burgio

Raspberry Pi Basics. CSInParallel Project

Session 4: Parallel Programming with OpenMP

Alfio Lazzaro: Introduction to OpenMP

OpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means

Shared Memory Programming with OpenMP

Open Multi-Processing: Basic Course

CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)

Computer Architecture

Parallel Programming

Parallel Processing Top manufacturer of multiprocessing video & imaging solutions.

Scientific Computing

Introduction to. Slides prepared by : Farzana Rahman 1

Multithreading in C with OpenMP

Introduction to OpenMP

Shared Memory Programming with OpenMP

Introduc)on to OpenMP

CS 5220: Shared memory programming. David Bindel

OpenMP 4.0/4.5: New Features and Protocols. Jemmy Hu

Introduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah

Parallel Programming in C with MPI and OpenMP

Concurrent Programming with OpenMP

Parallel Programming

OpenMP, Part 2. EAS 520 High Performance Scientific Computing. University of Massachusetts Dartmouth. Spring 2015

OpenMP C and C++ Application Program Interface Version 1.0 October Document Number

HPC Workshop University of Kentucky May 9, 2007 May 10, 2007

OpenMP - Introduction

1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008

15-418, Spring 2008 OpenMP: A Short Introduction

Barbara Chapman, Gabriele Jost, Ruud van der Pas

Introduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah

Parallel Programming. OpenMP Parallel programming for multiprocessors for loops

Advanced C Programming Winter Term 2008/09. Guest Lecture by Markus Thiele

COMP4300/8300: The OpenMP Programming Model. Alistair Rendell. Specifications maintained by OpenMP Architecture Review Board (ARB)

COMP4300/8300: The OpenMP Programming Model. Alistair Rendell

An Introduction to OpenMP

EE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California

Programming with Shared Memory PART II. HPC Fall 2012 Prof. Robert van Engelen

Jukka Julku Multicore programming: Low-level libraries. Outline. Processes and threads TBB MPI UPC. Examples

Module 11: The lastprivate Clause Lecture 21: Clause and Routines. The Lecture Contains: The lastprivate Clause. Data Scope Attribute Clauses

Mango DSP Top manufacturer of multiprocessing video & imaging solutions.

Introduction to OpenMP

Department of Informatics V. HPC-Lab. Session 2: OpenMP M. Bader, A. Breuer. Alex Breuer

A Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh

Shared Memory programming paradigm: openmp

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

Introduction to OpenMP.

Parallel and Distributed Computing

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP

Parallel Programming using OpenMP

Parallel Programming using OpenMP

COSC 6374 Parallel Computation. Introduction to OpenMP(I) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)

Introduction to Standard OpenMP 3.1

OpenMP. OpenMP. Portable programming of shared memory systems. It is a quasi-standard. OpenMP-Forum API for Fortran and C/C++

UvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP

Programming with Shared Memory PART II. HPC Fall 2007 Prof. Robert van Engelen

Parallel Numerical Algorithms

Transcription:

OpenMP I Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS16/17

OpenMP References Using OpenMP: Portable Shared Memory Parallel Programming. The MIT Press, 2007. B. Chapman, G. Jost, R. van der Pas. Introduction to OpenMP by Tim Mattson. http://openmp.org/mp-documents/intro_to_openmp_mattson.pdf Diego Fabregat OpenMP I 2 / 33

OpenMP API for shared-memory parallelism Steered by the OpenMP ARB (industry, research) Supported by compilers on most platforms Not a programming language. Mainly annotations to the (sequential) code. OpenMP API consists of: Compiler directives Library routines Environment variables Simple to use, high-level, incremental parallelism Performance oriented Data (and task) parallelism Diego Fabregat OpenMP I 3 / 33

(Very brief) History of OpenMP SC97: Group HPC experts (industry, research) presented OpenMP, to propose a unified model to program shared-memory systems. A company was set up to own and maintain the new standard: The openmp architecture review board (openmparb) People efforts on: extending the standard, developing implementations, teaching and spreading the word, compunity for the interaction between vendors, researchers and users. Originally primarily designed to exploit concurrency in structured loop nests. Diego Fabregat OpenMP I 4 / 33

Main ideas User gives a high-level specification of the portions of code to be executed in parallel int main(... ) {... #pragma omp parallel { <region executed by multiple threads>... pragma (pragmatic): tell the compiler to use some compiler-dependent feature/extension. Diego Fabregat OpenMP I 5 / 33

Main ideas (II) User may provide additional information on how to parallelize #pragma omp parallel num_threads(4) omp_set_schedule( static dynamic... ); OpenMP takes care of the low level details of creating threads, execution, assigning work,... Provides relatively easy variable scoping, synchronization and primitives to avoid data races. Usage: #include "omp.h" [gcc icc] -fopenmp <source.c> -o <executable.x> Diego Fabregat OpenMP I 6 / 33

Hello world! Exercise 1: Warming up Write an OpenMP multi-threaded program where each thread prints "Hello world!". #include <stdio.h> #include <stdlib.h> Hint: #pragma omp parallel int main( void ) { printf("hello world!\n"); return 0; Diego Fabregat OpenMP I 7 / 33

Main ideas (III) Fork-join paradigm Master thread Parallel Regions Team of threads Initial thread Workers Sequential Regions Diego Fabregat OpenMP I 8 / 33

Incremental parallelism A common approach to writing OpenMP programs: Identify paralellism in your sequential code Incremental parallelism: introduce directives in one portion of the code, leave the rest untoched When tested, move on to next region to be parallelized until target speedup is achieved Diego Fabregat OpenMP I 9 / 33

First steps Directives: Syntax: #pragma omp <construct> [<clause> [<clause>]] Most constructs apply to structured blocks One entry point, one exit point Routines (some examples): omp_set_num_threads( int nthreads ); int id = omp_get_num_threads(); int id = omp_get_thread_num(); Environment variables (an example): export OMP_NUM_THREADS=4;./program.x Diego Fabregat OpenMP I 10 / 33

Hello world! I m thread X! Exercise 1b Extend exercise 1 (below) so that 4 threads execute the parallel region and each of them prints also its thread id. #include <stdio.h> #include <stdlib.h> #include "omp.h" int main( void ) { #pragma omp parallel printf("hello world!\n"); return 0; Hints: #pragma omp parallel num_threads(...) omp_get_num_threads() omp_set_num_threads(...) omp_get_thread_num(...) Diego Fabregat OpenMP I 11 / 33

Exercise 2 (axpy.c) Use the #pragma omp parallel construct to parallelize the code below so that 4 threads collaborate in the computation of z. #include <stdio.h> #include <stdlib.h> int main(int argc, char *argv[]) { int i, N = 10; double x[n], y[n], z[n], alpha = 5.0; Hints: #pragma omp parallel num_threads(...) omp_set_num_threads(...) for( i = 0; i < N; i++ ) { omp_get_num_threads(...) x[i] = i; y[i] = 2.0*i; omp_get_thread_num(...) Challenge: split iterations of the for(i = 0; i < N; i++) loop among threads z[i] = alpha * x[i] + y[i]; // Print results. Should output [0, 7, 14, 21,...] return 0; Diego Fabregat OpenMP I 12 / 33

The parallel construct and the SPMD approach The most important construct: #pragma omp parallel [clause [, clause]...] Creates a team of threads to execute the parallel region (fork) Has an implicit barrier at the end of the region (join) It does not distribute the work So far we distributed the work based on thread id and number of threads (SPMD) Diego Fabregat OpenMP I 13 / 33

Exercise 3 (pi.c) Mathematically, we know that: π = 1 0 4 1 + x 2 dx Numerically, we can approximate the integral as the sum of rectangles: π N i=0 4 1 + x 2 i x where each rectangle has width x and height F(x i ) at the middle of the interval i. Source: Timothy Mattson, Intel. Diego Fabregat OpenMP I 14 / 33

Exercise 3 (pi.c) Use the #pragma omp parallel construct to parallelize the code below so that 4 threads collaborate in the computation of π. Pay attention to shared vs private variables! #include <stdio.h> #include <stdlib.h> #define NUM_STEPS 10000 int main( void ) { int i; double sum = 0.0, pi, x_i; double step = 1.0/NUM_STEPS; for ( i = 0; i < NUM_STEPS; i++ ) { x_i = (i + 0.5) * step; sum = sum + 4.0 / (1.0 + x_i * x_i); pi = sum * step; printf("pi: %.15e\n", pi); return 0; Hints: #pragma omp parallel num_threads(...) omp_set_num_threads(...) omp_get_num_threads(...) omp_get_thread_num(...) Challenges: split iterations of the loop among threads create an accumulator for each thread to hold partial sums, which can later be combined to generate the global sum Diego Fabregat OpenMP I 15 / 33

Shared vs private variables Shared: single instance that every thread can read/write Private: each thread has its own copy and others cannot read/write them (unless a pointer to them is given) So far: shared or private depending on where they were declared See, for instance, 02b.axpy-omp.c Diego Fabregat OpenMP I 16 / 33

Variable Scope Process or Program Code Thread 0 Private data Shared data Thread 1 Private data Diego Fabregat OpenMP I 17 / 33

Parallel construct: Syntax in detail #pragma omp parallel [clause [, clause]...] structured-block The following clauses apply: if num_threads shared, private, firstprivate, default reduction copyin proc_bind Diego Fabregat OpenMP I 18 / 33

Parallel construct if clause Conditional parallel execution Avoid parallelization overhead if little work to be parallelized Syntax: if (scalar-logical-expression) If the logical expression evaluates to true: execute in parallel Example: int main(... ) { [...] #pragma omp parallel if (n > 1000) { [...] [...] Diego Fabregat OpenMP I 19 / 33

Parallel construct num_threads clause Specifies how many threads should execute the region The runtime may decide to use less threads than specified (never more) Syntax: num_threads (scalar-logical-expression) Example: int main(... ) { [...] #pragma omp parallel num_threads (nths) { [...] [...] Diego Fabregat OpenMP I 20 / 33

Parallel construct Data-sharing attributes Shared-memory programming model Variables are shared by default Shared: All variables visible upon entry of the construct Static variables Private: Variables declared within a parallel region (Stack) variables in functions called from within a parallel region Diego Fabregat OpenMP I 21 / 33

Parallel construct int N = 10; int main( void ) { double array[n]; #pragma omp parallel { int i, myid; double thread_array[n]; [...] for ( i = 0; i < N; i++ ) thread_array[i] = myid * array[i]; function( thread_array ); double function( double arg ) { static int cnt; double local_array[n]; [...] Within parallel region: Note: Shared: array, N, cnt Private: i, myid, thread_array, local_array Lexical extent vs dynamic/runtime extent Diego Fabregat OpenMP I 22 / 33

Parallel construct General rules for data-sharing clauses Clauses default, private, shared, firstprivate allow changing the default behavior The clauses consist of the keyword and a comma-separated list of variables in parenthesis. For instance: private(a,b) Variables must be visible in the lexical extent of the directive A variable can only appear in one clause Exception: a variable can appear in both firstprivate and lastprivate (coming later) Diego Fabregat OpenMP I 23 / 33

Parallel construct shared clause Syntax: shared (item-list) Specifies that variables in the comma-separated list are shared among threads One single instance, each thread can freely read and modify its value When the parallel region finishes, the final values reside in the shared space where the master thread will be able to access it CAREFUL: Every thread can access it, race conditions may occur. Synchronize/order the access when needed (e.g., critical construct) Diego Fabregat OpenMP I 24 / 33

Parallel construct private clause Syntax: private (item-list) Specifies data that will be replicated so that each thread has a local copy Changes made to this data by one thread are not visible to the other threads Values are undefined upon entry to and exit from the construct The storage lasts until the block in which it is created exists Diego Fabregat OpenMP I 25 / 33

Parallel construct firstprivate clause Syntax: firstprivate (item-list) Variables in the list are private Variables are also initialized with the value the corresponding original variable had when the construct was encountered Diego Fabregat OpenMP I 26 / 33

Parallel construct default clause Syntax: default ( shared none ) default(shared) causes all variables to be shared by default default(none) requires that each variable must have its data-sharing attribute explicitly listed in a data-sharing clause Only one single default clause may be specified It is considered a good programming practice to always use default(none) to enforce the explicit listing of data-sharing attributes Diego Fabregat OpenMP I 27 / 33

Exercise on data sharing: Think about it! Given the following sample code int A=1, B=1, C=1; #pragma omp parallel private(b) firstprivate(c) { [...] Are A, B, and C shared or private to each thread inside the parallel region? What are the initial values inside the region? What are the values after the parallel region? Diego Fabregat OpenMP I 28 / 33

Parallel construct reduction clause Specifies a reduction operation Syntax: reduction (operator:list) Predefined operators: +, *, -, &,, ˆ, &&,, max, min list is a list of one or more variables Example: #pragma omp parallel reduction(+:mysum) Diego Fabregat OpenMP I 29 / 33

Parallel construct reduction clause Each reduction operator has an initializer and a combiner. For instance: +. Initializer: 0; combiner: accum += var *. Initializer: 1; combiner: accum *= var For each list item, each thread gets a private copy The private copy is initialized with the initializer value At the end of the region, the original item is updated with the values of the private copies using the combiner Compare 03c.pi-omp-manual-red.c vs 03d.pi-omp-red.c Diego Fabregat OpenMP I 30 / 33

Parallel construct User-defined reductions We can also define our own reduction operators The syntax is: #pragma omp declare reduction (reduction-identifier : typename : combiner ) [initializer-clause] reduction-identifier is the identifier we want to give to our reduction. E.g., mymax typename is the datatype to which the reduction applies. E.g., int (continues in next slide...) Diego Fabregat OpenMP I 31 / 33

Parallel construct User-defined reductions #pragma omp declare reduction (reduction-identifier : typename : combiner ) [initializer-clause] combiner is an expression or function that specifies how to combine the partial result of each thread with the global result. It must be expressed in terms of the variables omp_in and omp_out. E.g., omp_out = my_max_function(omp_out, omp_in), where int my_max_function( int omp_out, int omp_in ) { if ( omp_out > omp_in ) return omp_out; else return omp_in; is defined somewhere in our code. Diego Fabregat OpenMP I 32 / 33

Parallel construct User-defined reductions #pragma omp declare reduction (reduction-identifier : typename : combiner ) [initializer-clause] initializer-clause is an expression or function that specifies how to initialize the private copies of each thread. It must be expressed in terms of the variable omp_priv. E.g., omp_priv = INT_MIN See 05b.reduction-max-userdefined.c Diego Fabregat OpenMP I 33 / 33