Introduction to OpenMP

Similar documents
Introduction to OpenMP

Introduction to OpenMP

Introduction to OpenMP

Introduction to OpenMP

Introduction to OpenMP

Introduction to OpenMP. Tasks. N.M. Maclaren September 2017

Introduction to OpenMP

Programming with MPI

Introduction to OpenMP

Introduction to OpenMP

Message-Passing and MPI Programming

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

Shared Memory Programming with OpenMP

A Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh

Introduction to OpenMP

Introduction to OpenMP

Parallelising Scientific Codes Using OpenMP. Wadud Miah Research Computing Group

Programming with MPI

Introduction to Modern Fortran

1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008

Introduction to OpenMP. Lecture 4: Work sharing directives

CS691/SC791: Parallel & Distributed Computing

Introductory OpenMP June 2008

OpenMP Shared Memory Programming

OpenMP: Open Multiprocessing

Parallel processing with OpenMP. #pragma omp

EPL372 Lab Exercise 5: Introduction to OpenMP

OPENMP TIPS, TRICKS AND GOTCHAS

Programming with MPI

OPENMP TIPS, TRICKS AND GOTCHAS

Multithreading in C with OpenMP

Introduction to OpenMP

Chap. 6 Part 3. CIS*3090 Fall Fall 2016 CIS*3090 Parallel Programming 1

Chapter 24a More Numerics and Parallelism

Why C? Because we can t in good conscience espouse Fortran.

Shared Memory Programming Model

!OMP #pragma opm _OPENMP

QUIZ. What is wrong with this code that uses default arguments?

ECE 574 Cluster Computing Lecture 10

7.1 Some Sordid Details

COMP Parallel Computing. SMM (2) OpenMP Programming Model

Introduction to Modern Fortran

OpenMP Fundamentals Fork-join model and data environment

Programming with MPI

Module 11: The lastprivate Clause Lecture 21: Clause and Routines. The Lecture Contains: The lastprivate Clause. Data Scope Attribute Clauses

UvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

Parallel Programming with OpenMP. CS240A, T. Yang

[Potentially] Your first parallel application

15-418, Spring 2008 OpenMP: A Short Introduction

CS 470 Spring Mike Lam, Professor. OpenMP

Introduction to Modern Fortran

INTRODUCTION TO OPENMP

OpenMP. Application Program Interface. CINECA, 14 May 2012 OpenMP Marco Comparato

CS 470 Spring Mike Lam, Professor. OpenMP

Parallel Programming

Introduction to Standard OpenMP 3.1

Advanced OpenMP. OpenMP Basics

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing

Lecture 2: Introduction to OpenMP with application to a simple PDE solver

OpenMP: Open Multiprocessing

Jukka Julku Multicore programming: Low-level libraries. Outline. Processes and threads TBB MPI UPC. Examples

Overview: The OpenMP Programming Model

OpenMP - Introduction

G52CPP C++ Programming Lecture 7. Dr Jason Atkin

PIC 10A Objects/Classes

OpenACC Course. Office Hour #2 Q&A

OpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Advanced C Programming Winter Term 2008/09. Guest Lecture by Markus Thiele

Implementation of Parallelization

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono

Barbara Chapman, Gabriele Jost, Ruud van der Pas

printf( Please enter another number: ); scanf( %d, &num2);

CSE 374 Programming Concepts & Tools. Hal Perkins Fall 2015 Lecture 19 Introduction to C++

Shared Memory programming paradigm: openmp

CSE 303: Concepts and Tools for Software Development

Comparing OpenACC 2.5 and OpenMP 4.1 James C Beyer PhD, Sept 29 th 2015

OpenMP Tutorial. Dirk Schmidl. IT Center, RWTH Aachen University. Member of the HPC Group Christian Terboven

Practical in Numerical Astronomy, SS 2012 LECTURE 12

EE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California

CME 213 S PRING Eric Darve

OpenMP. OpenMP. Portable programming of shared memory systems. It is a quasi-standard. OpenMP-Forum API for Fortran and C/C++

Parallel Programming: OpenMP

Make the Most of OpenMP Tasking. Sergi Mateo Bellido Compiler engineer

OpenMP Introduction. CS 590: High Performance Computing. OpenMP. A standard for shared-memory parallel programming. MP = multiprocessing

Ch. 12: Operator Overloading

OpenMP C and C++ Application Program Interface Version 1.0 October Document Number

Introduction to Modern Fortran

Programming Shared Memory Systems with OpenMP Part I. Book

Scientific Programming in C XIV. Parallel programming

cp r /global/scratch/workshop/openmp-wg-oct2017 cd openmp-wg-oct2017 && ls Current directory

Parallel Numerical Algorithms

Parallel Programming using OpenMP

(Refer Slide Time 01:41 min)

Parallel Programming using OpenMP

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Thread-Level Parallelism (TLP) and OpenMP

And Even More and More C++ Fundamentals of Computer Science

Distributed Systems + Middleware Concurrent Programming with OpenMP

Synchronisation in Java - Java Monitor

Topics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP

Transcription:

Introduction to OpenMP p. 1/?? Introduction to OpenMP Basics and Simple SIMD Nick Maclaren nmm1@cam.ac.uk September 2017

Introduction to OpenMP p. 2/?? Terminology I am abusing the term SIMD tough Strictly, it refers to a type of parallel hardware In this course, I mean a type of program design

Introduction to OpenMP p. 3/?? KISS That stands for Keep It Simple and Stupid Kelly Johnson, lead engineer at The Skunkworks It should be written above every programmer s desk As I tell myself every time I shoot myself in the foot! It s rule number one for OpenMP use Actually, it s one key to all parallel programming Problems increase exponentially with complexity That sounds ridiculous, but it s true

Introduction to OpenMP p. 4/?? Fortran For SIMD (1) Fortran 90 rules, OK? No competition in either coding or optimisation Followed by Fortran 77, then C++ and lastly C Start with clean, clear whole array expressions Then call BLAS or expand calls manually Unfortunately, that relies on an excellent compiler And, to a first approximation, none of them are

Introduction to OpenMP p. 5/?? Fortran For SIMD (2) Compilers may not parallelise array expressions They are better at parallelising DO loops Also why is MATMUL sometimes very slow? The compiler could call D/ZGEMM itself There are sometimes options to do so E.g. gfortran external blas But expanding such code is easy and reliable So using it makes debugging a lot easier

Introduction to OpenMP p. 6/?? Fortran Syntax (1) This course covers modern free format only As always in Fortran, case is ignored entirely Leading spaces are also ignored in free format Fortran directives take the form of comments Lines starting!$omp and a space Most of them have start and end forms!$omp PARALLEL DO [ clauses ] < structured block >!$OMP END PARALLEL DO

Introduction to OpenMP p. 7/?? Fortran Syntax (2) Directives may be continued, like Fortran statements!$omp PARALLEL DO &!$OMP [ clauses ]! Note the!$omp < structured block >!$OMP END PARALLEL DO!$OMP PARALLEL DO &!$OMP & [ clauses ]! Note the second & < structured block >!$OMP END PARALLEL DO

Introduction to OpenMP p. 8/?? Fortran Syntax (3) You can also write PARALLELDO etc. You can omit END PARALLEL DO and END DO But not any other END directive I don t recommend doing either it s poor style You may see them in other people s code

Introduction to OpenMP p. 9/?? Fortran Structured Block Sequence of whole Fortran statements or constructs Always entered at the top and left at the bottom No branching in or out, however it is done Including RETURN, GOTO, CYCLE, EXIT, END=, EOR= and ERR= Allowed to branch around within structured block You can execute STOP within one, too

Introduction to OpenMP p. 10/?? C++ vs C This course covers only the C subset of C++ But it will describe C++ where that is relevant There are two main reasons for this: Version 3.0 needed for serious C++ support A lot of extra complications and gotchas Compilers are likely to be incompatible or buggy Use either C or C++ for doing the practicals More detailed guidelines are given in a moment

Introduction to OpenMP p. 11/?? C++ Assumptions This course assumes the following restrictions: In the serial code, not affected by OpenMP You can use all of C++, including the STL In any parallel region or OpenMP construct Don t use any constructors or destructors Don t update containers or iterators Parallelise only C--style arrays and loops All specimen answers are in C only

Introduction to OpenMP p. 12/?? C++ STL That is the Standard Template Library Very serial compared with Fortran But it does allow some scope for parallelisation Not supported by OpenMP until version 3.0 Most unclear how much of that will work see later The University of Geneva has an OpenMP version See http:/ /tech.unige.ch/omptl/ Feedback appreciated if you experiment with it

Introduction to OpenMP p. 13/?? Safe C++ Use You can use vector, deque and array (only) Based on built--in integer, floating--point or complex Can use them as normal outside parallel regions Use front( ) to get a C pointer to first element Using the container directly is covered later Inside parallel regions, use that pointer only Access elements using only operators * and [ ] Do not update the container in any way And, if in any doubt, don t use the container at all

Introduction to OpenMP p. 14/?? C/C++ Syntax (1) C/C++ directives take the form of pragmas They are all lines starting #pragma omp As always in C/C++, case is significant Leading spaces are also usually ignored #pragma omp parallel for [ clauses ] < structured block > Note that there is no end form, unlike Fortran Critical that the block really is a block

Introduction to OpenMP p. 15/?? C/C++ Syntax (2) Warning: watch out for macro expansions Occasionally can generate a statement sequence Can continue lines as normal in C/C++ End the line with a backslash (\) #pragma parallel for \ [ clauses ]

Introduction to OpenMP p. 16/?? C/C++ Structured Block (1) I recommend using one of three forms: A for statement A compound statement: {...;...} A simple expression statement Includes a void function call and assignment Several more allowed, but those are the sanest Always entered at the top and left at the bottom No branching in or out, however it is done Including return, goto, catch/throw, setjmp/longjmp, raise/abort/signal etc.

Introduction to OpenMP p. 17/?? C/C++ Structured Block (2) Note that this applies to all functions called, too Including ones called implicitly by C++ Called functions must return to the structured block Allowed to branch around within structured block E.g. catch/throw is OK, if entirely inside the block You can call exit() within one, too Clean programs should have little trouble Chaos awaits if you break these rules

Introduction to OpenMP p. 18/?? Course Conventions (1) We will often use C/C++ case conventions As well as spaces between all keywords This works for Fortran as well as C and C++!$OMP and #pragma omp are called sentinels Clauses are separated by commas, in any order Will describe their syntax as we use them Syntax is almost entirely language--independent Names and expressions match the language

Introduction to OpenMP p. 19/?? Course Conventions (2) Examples of using OpenMP are in both Fortran and C C and C++ are almost identical in their OpenMP use Some are given in only one of Fortran and C They are all intended to illustrate a single point Ignore their detailed syntax, as it s not important Please interrupt if you can t follow them

Introduction to OpenMP p. 20/?? Library There is a runtime library of auxiliary routines C/C++: #include <omp.h> Fortran: USE OMP --- LIB or: INCLUDE omp --- lib.h The compiler chooses which, not the user The most useful routines are covered as we use them We shall mention the three most important here

Introduction to OpenMP p. 21/?? Omp --- get --- wtime (1) This returns the elapsed (wall--clock) time in seconds As a double precision floating--point result The time is since some arbitrary time in the past It is guaranteed to be fixed for one execution It is guaranteed to be the same for all threads

Introduction to OpenMP p. 22/?? Omp --- get --- wtime (2) Fortran, C++ and C examples: WRITE ( *, " ( Time taken, F0.3, seconds ) " ) omp --- get --- wtime ( ) & std::cout << " Time taken " << omp --- get --- << " seconds " << std::endl ; printf ( " Time taken %.3f seconds\n ", omp --- get --- wtime ( ) ) ; wtime ( ) <<

Introduction to OpenMP p. 23/?? Omp --- get --- thread --- num This returns the thread number being executed It is a default integer from 0 upwards Fortran, C++ and C examples: WRITE ( *, " ( Current thread, I0 ) " ) omp --- get --- thread --- num ( ) & std::cout << " Current thread " << omp --- get --- thread --- num ( ) << std::endl ; printf ( " Current thread %d\n ", omp --- get --- thread --- num ( ) ) ;

Introduction to OpenMP p. 24/?? Omp --- get --- num --- threads This returns the number of threads being used It is a default integer from 1 upwards Fortran, C++ and C examples: WRITE ( *, " ( Number of threads, I0 ) " ) omp --- get --- num --- threads ( ) & std::cout << " Number of threads " << omp --- get --- num --- threads ( ) << std::endl ; printf ( " Number of threads %d\n ", omp --- get --- num --- threads ( )

Introduction to OpenMP p. 25/?? Warning: Oversimplication We are going to skip a lot of essential details Just going to show the basics of OpenMP usage For now, don t worry if there are loose ends We return and cover these topics in more detail Three essentials to OpenMP parallelism: Establishing a team of threads Saying if data is shared or private Sharing out the work between threads

Introduction to OpenMP p. 26/?? The Parallel Directive This introduces a parallel region Syntax: sentinel parallel [clauses] Fortran: C/C++:!$OMP PARALLEL < code of structured block >!$OMP END PARALLEL #pragma omp parallel { < code of structured block > }

Introduction to OpenMP p. 27/?? How It Works When thread A encounters a parallel directive It logically creates a team of sub--threads It then becomes thread 0 in the new team That thread is also called the master thread The team then executes the structured block When the structured block has completed execution The sub--threads collapse back to thread A

Introduction to OpenMP p. 28/?? A Useful Trick Difficulty working out what OpenMP is doing? Try printing out omp --- get --- thread --- num() You can then see which thread is being executed Plus any other useful data at the same time E.g. a tag indicating which print statement In DO/for loops, the index as well Yes, I did that, to check I had understood But you shouldn t really do I/O in parallel

Introduction to OpenMP p. 29/?? Data Environment The data environment is also critical But the basic principles are very simple Variables are either shared or private Warning: you need to keep them separate Outside all parallel regions, very little difference Standard language rules for serial execution apply Everything is executing in master thread 0 Most differences apply only within parallel regions

Introduction to OpenMP p. 30/?? Shared Data Shared means global across the program Same name means same location in all threads Can pass pointers from one thread to another Don t update in one and access in another Without appropriate synchronisation between actions I.e. can use in parallel if read--only in all threads Such race conditions main cause of obscure bugs Not just OpenMP, but any shared memory language

Introduction to OpenMP p. 31/?? Arrays and Structures Those rules apply to base elements, not whole arrays Can update different elements of a shared array But watch out for Fortran s aliasing rules Unclear if applies to members of structures Very complicated and language dependent And C standard is hopelessly inconsistent here So KISS Keep It Simple and Stupid Clean, simple code will have no problems

Introduction to OpenMP p. 32/?? Private Data Private means each thread has a separate copy Same name means different location in each thread Don t pass pointers to data to other threads Don t set shared pointers to private data This applies even to global master thread 0 The above is not like POSIX s rules OpenMP is more restrictive for better optimisation

Introduction to OpenMP p. 33/?? Default Rules Start by assuming that everything is shared The following variables are private: Indices in OpenMP--parallelised DO/for--loops C automatic vars declared inside parallel regions Fortran DO--loop, implied--do and a bit more That s often enough on its own for the simplest cases You can override the defaults, when you need to By adding clauses to the parallel directive

Introduction to OpenMP p. 34/?? Specifying Usage Specify shared by shared(<names>) But private(<names>) is far more often needed!$omp PARALLEL shared ( array ), private ( x, y, z, i, j, k ) < code of structured block >!$OMP END PARALLEL Clauses are identical for Fortran and C/C++ It s good practice to specify everything explicitly But it makes no difference to the code generated

Introduction to OpenMP p. 35/?? Default(none) You can set the default to effectively unset Should give a diagnostic if you forget to declare Very like the Fortran statement IMPLICIT NONE!$OMP PARALLEL default ( none ), shared ( array ), & private ( i, j, k ) < code of structured block >!$OMP END PARALLEL Only allowed in combination with parallel directives Some variables have defaults even if it is used Usually omitted, because of space on slides

Introduction to OpenMP p. 36/?? Parallelising Loops (1) This is the main SIMD work--sharing directive Obviously, each iteration must be independent I.e. the order must not matter in any way Similar rules to Fortran DO CONCURRENT Next slide describes what that means It s also relevant to C/C++ people Same rules apply, unlike for ordinary for loops

Introduction to OpenMP p. 37/?? Parallelising Loops (2) What does Fortran DO CONCURRENT require? Mainly that no iteration may affect any other And it is important to stress in in any way It s not just setting a variable used in a later one Includes calling any procedures with state Including all I/O, random numbers, and so on Not hard, but be very cautious as you start coding

Introduction to OpenMP p. 38/?? Fortran Form Syntax: sentinel DO [clauses] Loop variable best declared as private!$omp DO PRIVATE(<var>) DO <var> = <loop control> < structured block > END DO!$OMP END DO

Introduction to OpenMP p. 39/?? C/C++ Form Syntax: sentinel for [clauses] Loop variable best declared as private #pragma omp for private(<var>) for ( <loop control> ) < structured block > <loop control> must be very Fortran--like Most people do that anyway, so little problem Will describe rules later, for C/C++--only people

Introduction to OpenMP p. 40/?? Combining Them (1) Also combined parallel and work--sharing You can use any clauses that are valid on either Fortran: C/C++:!$OMP PARALLEL DO [clauses] < code of structured block >!$OMP END PARALLEL DO #pragma omp parallel for [clauses] < code of structured block >

Introduction to OpenMP p. 41/?? Combining Them (2) For now, we shall use the combined forms We shall come back to the split forms later That s mainly for convenience in the slides Having two directives instead of one is messy Also, the split forms are very deceptive There are a lot of subtle gotchas to avoid There is more you can do with the split forms But, in the simple cases, both are equivalent

Introduction to OpenMP p. 42/?? Be Warned! All threads execute all of a parallel block Unless controlled by other directives see later In particular, apparently serial code is Don t update any shared variables in such code Reading them and calling procedures are fine It s easier to use the combined forms safely Always start by using them, where that is feasible

Introduction to OpenMP p. 43/?? Split Directives The following code is badly broken:!$omp PARALLEL!$OMP DO PRIVATE(i), REDUCTION(+:av) DO i = 1,size av = av+values(i) END DO!$OMP END DO av = av/size! Executed once on each thread!$omp DO PRIVATE(i) DO i = 1,size values(i) = values(i)/av END DO!$OMP END DO!$OMP END PARALLEL

Introduction to OpenMP p. 44/?? Semi-Realistic Example Let s use matrix addition as an example This is just showing how OpenMP is used You can do it in one line in Fortran 90 We need to compile using commands like: ifort/icc --O3 --ip --openmp... gfortran/gcc --O3 --fopenmp...

Introduction to OpenMP p. 45/?? Fortran Example SUBROUTINE add (left, right, out) REAL ( KIND=dp ), INTENT ( IN ) :: left ( :, : ), right ( :, : ) REAL ( KIND=dp ), INTENT ( OUT ) :: out ( :, : ) INTEGER :: i, j!$omp PARALLEL DO DO j = 1, UBOUND ( out, 2 ) DO i = 1, UBOUND ( out, 1 ) out ( i, j ) = left ( i, j ) + right ( i, j ) END DO END DO!$OMP END PARALLEL DO END SUBROUTINE add

Introduction to OpenMP p. 46/?? C/C++ Example void add ( const double * left, const double * right, double * int i, j ; out, int size ) { #pragma omp parallel for private ( j ) for ( i = 0 ; i < size ; ++ i ) for ( j = 0 ; j < size ; ++ j ) { out [ j + i * size ] = left [ j + i * size ] + right [ j + i * size ] ; } }

Introduction to OpenMP p. 47/?? So Far, So Good Unfortunately, adding the directives is the easy bit You now have enough information to start coding There are a couple of very simple practicals Then a few more details of using OpenMP Then some more SIMD practicals

Introduction to OpenMP p. 48/?? Actual Examples All details of what to do are on the handouts provided Do check the results and look at the times Errors often show as wrong results or poor times Do exactly what the instructions tell you to Intended to show specific problems and solutions You can t match the tuned libraries They use blocked algorithms better for caching