Parallel Programming with OpenMP

Similar documents
Data Environment: Default storage attributes

OpenMP. OpenMP. Portable programming of shared memory systems. It is a quasi-standard. OpenMP-Forum API for Fortran and C/C++

Programming with Shared Memory PART II. HPC Fall 2007 Prof. Robert van Engelen

Introduction to OpenMP

Programming with Shared Memory PART II. HPC Fall 2012 Prof. Robert van Engelen

CSL 860: Modern Parallel

ECE 574 Cluster Computing Lecture 10

EPL372 Lab Exercise 5: Introduction to OpenMP

Parallel Numerical Algorithms

CS 5220: Shared memory programming. David Bindel

Session 4: Parallel Programming with OpenMP

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

A brief introduction to OpenMP

Leveraging OpenMP Infrastructure for Language Level Parallelism Darryl Gove. 15 April

CS691/SC791: Parallel & Distributed Computing

Parallel Computing. Prof. Marco Bertini

CSL 730: Parallel Programming. OpenMP

Masterpraktikum - High Performance Computing

Lecture 4: OpenMP Open Multi-Processing

CS 470 Spring Mike Lam, Professor. OpenMP

An Introduction to OpenMP

Shared Memory Programming with OpenMP. Lecture 8: Memory model, flush and atomics

Parallel Computing Using OpenMP/MPI. Presented by - Jyotsna 29/01/2008

COSC 6374 Parallel Computation. Introduction to OpenMP. Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

OpenMP Tutorial. Dirk Schmidl. IT Center, RWTH Aachen University. Member of the HPC Group Christian Terboven

OpenMP for Accelerators

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP

CS 470 Spring Mike Lam, Professor. OpenMP

Review. Lecture 12 5/22/2012. Compiler Directives. Library Functions Environment Variables. Compiler directives for construct, collapse clause

Programming Shared-memory Platforms with OpenMP. Xu Liu

DPHPC: Introduction to OpenMP Recitation session

OpenMP I. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono

Lab: Scientific Computing Tsunami-Simulation

1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008

OpenMP examples. Sergeev Efim. Singularis Lab, Ltd. Senior software engineer

Distributed Systems + Middleware Concurrent Programming with OpenMP

Parallel Processing Top manufacturer of multiprocessing video & imaging solutions.

Advanced C Programming Winter Term 2008/09. Guest Lecture by Markus Thiele

Scientific Computing

15-418, Spring 2008 OpenMP: A Short Introduction

Parallel programming using OpenMP

Little Motivation Outline Introduction OpenMP Architecture Working with OpenMP Future of OpenMP End. OpenMP. Amasis Brauch German University in Cairo

OpenMP Application Program Interface

NUMERICAL PARALLEL COMPUTING

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing

Comparing OpenACC 2.5 and OpenMP 4.1 James C Beyer PhD, Sept 29 th 2015

Chip Multiprocessors COMP Lecture 9 - OpenMP & MPI

Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG

OpenMP Fundamentals Fork-join model and data environment

OpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

Wide operands. CP1: hardware can multiply 64-bit floating-point numbers RAM MUL. core

COMP4300/8300: The OpenMP Programming Model. Alistair Rendell. Specifications maintained by OpenMP Architecture Review Board (ARB)

COMP4300/8300: The OpenMP Programming Model. Alistair Rendell

Overview: The OpenMP Programming Model

Advanced OpenMP. Tasks

Department of Informatics V. HPC-Lab. Session 2: OpenMP M. Bader, A. Breuer. Alex Breuer

Shared memory programming

CS 470 Spring Mike Lam, Professor. Advanced OpenMP

Programming Shared Memory Systems with OpenMP Part I. Book

Shared memory parallel computing

OPENMP TIPS, TRICKS AND GOTCHAS

[Potentially] Your first parallel application

OPENMP TIPS, TRICKS AND GOTCHAS

Parallel Computing. Lecture 16: OpenMP - IV

Shared Memory Programming with OpenMP (3)

DPHPC: Introduction to OpenMP Recitation session

CS 470 Spring Mike Lam, Professor. Advanced OpenMP

Introduction to OpenMP

Shared Memory Parallelism - OpenMP

CME 213 S PRING Eric Darve

Jukka Julku Multicore programming: Low-level libraries. Outline. Processes and threads TBB MPI UPC. Examples

OpenMP - Introduction

Overview of OpenMP. Unit 19. Using OpenMP. Parallel for. OpenMP Library for Parallelism

OpenMP 4.0 implementation in GCC. Jakub Jelínek Consulting Engineer, Platform Tools Engineering, Red Hat

Tasking in OpenMP 4. Mirko Cestari - Marco Rorro -

Introduction to OpenMP. Tasks. N.M. Maclaren September 2017

19.1. Unit 19. OpenMP Library for Parallelism

Exception Handling with OpenMP in Object-Oriented Languages

OpenMP 4.0/4.5: New Features and Protocols. Jemmy Hu

Advanced OpenMP. Memory model, flush and atomics

OpenMP 4. CSCI 4850/5850 High-Performance Computing Spring 2018

Some features of modern CPUs. and how they help us

Introduction to OpenMP

Parallel processing with OpenMP. #pragma omp

Advanced OpenMP. Lecture 11: OpenMP 4.0

S Comparing OpenACC 2.5 and OpenMP 4.5

CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)

Synchronisation in Java - Java Monitor

Introduction to OpenMP

High Performance Computing: Tools and Applications

Parallel Programming

EE/CSCI 451: Parallel and Distributed Computation

Parallel Programming: OpenMP

Introduc)on to OpenMP

Programming Shared-memory Platforms with OpenMP

Computer Architecture

Parallel Numerical Algorithms

Exercise: OpenMP Programming

Transcription:

Advanced Practical Programming for Scientists Parallel Programming with OpenMP Robert Gottwald, Thorsten Koch Zuse Institute Berlin June 9 th, 2017

Sequential program From programmers perspective: Statements are executed in the order in which they appear On the CPU level this is not true Instruction reordering due to compiler optimizations Inside the CPU: Out of order execution to better utilize processing units Gottwald, Koch APPFS 2017 Parallel Programming with OpenMP 1 / 6

Sequential program From programmers perspective: Statements are executed in the order in which they appear On the CPU level this is not true Instruction reordering due to compiler optimizations Inside the CPU: Out of order execution to better utilize processing units Parallel program For the CPU completely independent programs executed on different cores Order of memory writes in one thread can be different when observed from another thread Programmer is responsible for all synchronization Statements from different threads can be interleaved in millions of ways even for small programs Gottwald, Koch APPFS 2017 Parallel Programming with OpenMP 1 / 6

Example What can happen if these statements are executed in parallel? i n t x = 0 ; i n t y = 0 ;. x = 1 ; y = 1 ;. i f ( y == 1 ) p r i n t f ( x i s %i \n, x ) ; Sequentially executed this program can only print x is 1 When executed in parallel x is 0 is possible Gottwald, Koch APPFS 2017 Parallel Programming with OpenMP 2 / 6

Example What can happen if these statements are executed in parallel? i n t x = 0 ; i n t y = 0 ;. x = 1 ; y = 1 ;. i f ( y == 1 ) p r i n t f ( x i s %i \n, x ) ; Sequentially executed this program can only print x is 1 When executed in parallel x is 0 is possible Memory operations might not be visible in the same order when observed from the other thread! Safe abstractions and memory model required for parallel programs Gottwald, Koch APPFS 2017 Parallel Programming with OpenMP 2 / 6

OpenMP OpenMP standard provides annotations for C, C++, and Fortran languages If compiled without OpenMP the program is still a valid sequential program Compiler takes care of thread handling and provides abstractions for synchronization Gottwald, Koch APPFS 2017 Parallel Programming with OpenMP 3 / 6

OpenMP OpenMP standard provides annotations for C, C++, and Fortran languages If compiled without OpenMP the program is still a valid sequential program Compiler takes care of thread handling and provides abstractions for synchronization d o u b l e sum = 0. 0 ; d o u b l e vec [N ] ; // i n i t i a l i z e vec #pragma omp p a r a l l e l f o r r e d u c t i o n (+:sum ) f o r ( i n t i = 0 ; i < N; ++i ) x += vec [ i ] ; Gottwald, Koch APPFS 2017 Parallel Programming with OpenMP 3 / 6

Parallel section and work sharing #pragma omp parallel starts parallel section Parallel section is executed by all threads #pragma omp p a r a l l e l numthreads ( 4 ) // i m p l i c i t f l u s h // executed by a l l 4 threads p r i n t f ( t h r e a d %i : h e l l o w o r l d!\ n, omp get thread num ( ) ) ; #pragma omp s i n g l e // e x e c u t e d on one t h r e a d } // i m p l i c i t b a r r i e r and f l u s h // e x e c u t e do work1 ( ) and do work2 ( ) i n p a r a l l e l #pragma omp s e c t i o n s #pragma omp s e c t i o n do work1 ( ) ; #pragma omp s e c t i o n do work2 ( ) ; } } // i m p l i c i t b a r r i e r and f l u s h Gottwald, Koch APPFS 2017 Parallel Programming with OpenMP 4 / 6

Parallel section and work sharing #pragma omp parallel starts parallel section Parallel section is executed by all threads Work sharing constructs are used to distribute work #pragma omp for #pragma omp sections #pragma omp single #pragma omp p a r a l l e l numthreads ( 4 ) // i m p l i c i t f l u s h // executed by a l l 4 threads p r i n t f ( t h r e a d %i : h e l l o w o r l d!\ n, omp get thread num ( ) ) ; #pragma omp s i n g l e // e x e c u t e d on one t h r e a d } // i m p l i c i t b a r r i e r and f l u s h // e x e c u t e do work1 ( ) and do work2 ( ) i n p a r a l l e l #pragma omp s e c t i o n s #pragma omp s e c t i o n do work1 ( ) ; #pragma omp s e c t i o n do work2 ( ) ; } } // i m p l i c i t b a r r i e r and f l u s h Gottwald, Koch APPFS 2017 Parallel Programming with OpenMP 4 / 6

Parallel section and work sharing #pragma omp parallel starts parallel section Parallel section is executed by all threads Work sharing constructs are used to distribute work #pragma omp for #pragma omp sections #pragma omp single Memory consisitency enforced with #pragma omp flush operation Wait for threads to reach specified point with #pragma omp barrier #pragma omp p a r a l l e l numthreads ( 4 ) // i m p l i c i t f l u s h // executed by a l l 4 threads p r i n t f ( t h r e a d %i : h e l l o w o r l d!\ n, omp get thread num ( ) ) ; #pragma omp s i n g l e // e x e c u t e d on one t h r e a d } // i m p l i c i t b a r r i e r and f l u s h // e x e c u t e do work1 ( ) and do work2 ( ) i n p a r a l l e l #pragma omp s e c t i o n s #pragma omp s e c t i o n do work1 ( ) ; #pragma omp s e c t i o n do work2 ( ) ; } } // i m p l i c i t b a r r i e r and f l u s h Gottwald, Koch APPFS 2017 Parallel Programming with OpenMP 4 / 6

Sharing state Default uses scoping rules to determines whether variables are shared Can also be specified explicitly with clauses firstprivate(var) private(var) lastprivate(var) shared(var) shared(var) clause only useful with default(none) or default(private) clause Gottwald, Koch APPFS 2017 Parallel Programming with OpenMP 5 / 6

Sharing state Default uses scoping rules to determines whether variables are shared Can also be specified explicitly with clauses firstprivate(var) private(var) lastprivate(var) shared(var) shared(var) clause only useful with default(none) or default(private) clause i n t x = 0 ; // x i s s h a r e d ( scope ) i n t y = 0 ; // y i s p r i v a t e ( c l a u s e ) #pragma omp p a r a l l e l numthreads ( 4 ), f i r s t p r i v a t e ( y ) // p r i v a t e ( y ) would make y u n i n i t i a l i z e d i n t z = 0 ; // z i s p r i v a t e ( scope ) } Gottwald, Koch APPFS 2017 Parallel Programming with OpenMP 5 / 6

Other Features Many new features with each new version #pragma omp task for recursive work sharing (since OpenMP 3.0) #pragma omp simd for instruction-level parallelism (since OpenMP 4.0) Directives for GPU computing (since OpenMP 4.0)... Gottwald, Koch APPFS 2017 Parallel Programming with OpenMP 6 / 6

Other Features Many new features with each new version #pragma omp task for recursive work sharing (since OpenMP 3.0) #pragma omp simd for instruction-level parallelism (since OpenMP 4.0) Directives for GPU computing (since OpenMP 4.0)... Thank you for your attention! Gottwald, Koch APPFS 2017 Parallel Programming with OpenMP 6 / 6