Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

Similar documents
Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

Advanced C Programming Winter Term 2008/09. Guest Lecture by Markus Thiele

EPL372 Lab Exercise 5: Introduction to OpenMP

Parallel Computing Using OpenMP/MPI. Presented by - Jyotsna 29/01/2008

Introduction to OpenMP

EE/CSCI 451: Parallel and Distributed Computation

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono

Introduction to OpenMP

Lecture 4: OpenMP Open Multi-Processing

OpenMP - Introduction

Introduction to OpenMP

OpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means

Parallel Programming with OpenMP. CS240A, T. Yang

Shared Memory programming paradigm: openmp

An Introduction to OpenMP

Shared Memory Parallelism - OpenMP

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

OpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2.

Introduction to OpenMP

OpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Review. Tasking. 34a.cpp. Lecture 14. Work Tasking 5/31/2011. Structured block. Parallel construct. Working-Sharing contructs.

EE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California

Shared Memory Parallelism using OpenMP

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP

UvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP

OpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.

Synchronisation in Java - Java Monitor

OpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.

Jukka Julku Multicore programming: Low-level libraries. Outline. Processes and threads TBB MPI UPC. Examples

Department of Informatics V. HPC-Lab. Session 2: OpenMP M. Bader, A. Breuer. Alex Breuer

Parallel Programming: OpenMP

15-418, Spring 2008 OpenMP: A Short Introduction

Alfio Lazzaro: Introduction to OpenMP

Distributed Systems + Middleware Concurrent Programming with OpenMP

OpenMP examples. Sergeev Efim. Singularis Lab, Ltd. Senior software engineer

Introduction to. Slides prepared by : Farzana Rahman 1

Introduction to Standard OpenMP 3.1

Introduction to OpenMP

Introduction to OpenMP.

OpenMP. A parallel language standard that support both data and functional Parallelism on a shared memory system

CS4961 Parallel Programming. Lecture 9: Task Parallelism in OpenMP 9/22/09. Administrative. Mary Hall September 22, 2009.

<Insert Picture Here> OpenMP on Solaris

OpenMP 4.0/4.5: New Features and Protocols. Jemmy Hu

Session 4: Parallel Programming with OpenMP

CSL 860: Modern Parallel

Shared memory programming

Practical in Numerical Astronomy, SS 2012 LECTURE 12

A Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh

OpenMP. OpenMP. Portable programming of shared memory systems. It is a quasi-standard. OpenMP-Forum API for Fortran and C/C++

Introduction to OpenMP

Shared Memory Programming with OpenMP

ECE 574 Cluster Computing Lecture 10

CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)

Topics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP

Hybrid MPI and OpenMP Parallel Programming

Multithreading in C with OpenMP

Introduction to OpenMP

A Study of Performance Scalability by Parallelizing Loop Iterations on Multi-core SMPs

Parallel Computing Parallel Programming Languages Hwansoo Han

Parallel Programming. OpenMP Parallel programming for multiprocessors for loops

OpenMP Tutorial. Seung-Jai Min. School of Electrical and Computer Engineering Purdue University, West Lafayette, IN

Programming with Shared Memory PART II. HPC Fall 2007 Prof. Robert van Engelen

Concurrent Programming with OpenMP

COSC 6374 Parallel Computation. Introduction to OpenMP(I) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)

Parallel Programming

CS691/SC791: Parallel & Distributed Computing

OpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

Programming with Shared Memory PART II. HPC Fall 2012 Prof. Robert van Engelen

OpenMP C and C++ Application Program Interface Version 1.0 October Document Number

OpenMP I. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen

Parallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops

Multi-core Architecture and Programming

1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008

Scientific Computing

OpenMP on Ranger and Stampede (with Labs)

Programming Shared Memory Systems with OpenMP Part I. Book

Parallel Programming in C with MPI and OpenMP

Overview: The OpenMP Programming Model

Computer Architecture

Masterpraktikum - High Performance Computing

OpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Thread-Level Parallelism (TLP) and OpenMP

Parallel Programming

Parallelising Scientific Codes Using OpenMP. Wadud Miah Research Computing Group

Introduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah

by system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by using function call

CS 470 Spring Mike Lam, Professor. OpenMP

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing

Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG

CS4961 Parallel Programming. Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms 9/5/12. Administrative. Mary Hall September 4, 2012

Lab: Scientific Computing Tsunami-Simulation

Elementary Parallel Programming with Examples. Reinhold Bader (LRZ) Georg Hager (RRZE)

Data Environment: Default storage attributes

CS 470 Spring Mike Lam, Professor. OpenMP

CS 61C: Great Ideas in Computer Architecture. Synchronization, OpenMP

Allows program to be incrementally parallelized

Shared Memory Programming Model

Parallel Computing. Prof. Marco Bertini

Transcription:

Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines

What is OpenMP? What does OpenMP stands for? What does OpenMP stands for? Open specifications for Multi Processing via collaborative work between interested parties from the hardware and software industry, government and academia. OpenMP is an Application Program Interface (API) that may be used to explicitly direct multithreaded, shared memory parallelism. API components: Compiler Directives, Runtime Library Routines. Environment Variables OpenMP is a directive-based method to invoke parallel computations on shared-memory multiprocessors

What is OpenMP? OpenMP API is specified for C/C++ and Fortran. OpenMP is not intrusive to the orginal serial code: instructions appear in comment statements for fortran and pragmas for C/C++. OpenMP website: http://www.openmp.org Materials in this lecture are taken from various Materials in this lecture are taken from various OpenMP tutorials in the website and other places.

Why OpenMP? OpenMP is portable: supported by HP, IBM, Intel, SGI, SUN, and others It is the de facto standard for writing shared memory programs. OpenMP can be implemented incrementally, one function or even one loop at a time. A nice way to get a parallel program from a sequential program.

How to compile and run OpenMP programs? GNU and Intel compilers support OpenMP. To compile OpenMP programs: icc example1.c -openmp To run: a.out In the default setting, the number of threads used is equal to the number of processors in the system. To change the number of threads: export OMP_NUM_THREADS=8 (bash)

OpenMP execution model OpenMP uses the fork-join model of parallel execution. All OpenMP programs begin with a single master thread. The master thread executes sequentially until a parallel region is encountered, when it creates a team of parallel threads (FORK). When the team threads complete the parallel region, they synchronize and terminate, leaving only the master thread that executes sequentially (JOIN).

OpenMP general code structure #include <omp.h> main () { int var1, var2, var3; Serial code... /* Beginning of parallel section. Fork a team of threads. Specify variable scoping*/ #pragma omp parallel private(var1, var2) shared(var3) { /* Parallel section executed by all threads */... /* All threads join master thread and disband*/ } Resume serial code... }

Data model Private and shared variables Variables in the global data space are accessed by all parallel threads (shared variables). Variables in a thread s private space can only be accessed by the thread (private variables)

#pragma omp parallel for private( privindx, privdbl ) } for ( i = 0; i < arraysize; i++ ) { } for ( privindx = 0; privindx < 16; privindx++ ) { privdbl = ( (double) privindx ) / 16; y[i] = sin( exp( cos( - exp( sin(x[i]) ) ) ) ) + cos( privdbl ); Parallel for loop index is Private by default.

OpenMP directives Format: #progma omp directive-name [clause,..] newline (use \ for multiple lines) Example: #pragma omp parallel default(shared)private(beta,pi) Scope of a directive is a block of statements { }

Parallel region construct A block of code that will be executed by multiple threads. #pragma omp parallel [clause ] { } (implied barrier) Example clauses: if (expression), private (list), shared (list), default (shared private none), reduction (operator: list), firstprivate(list), lastprivate(list) if (expression): only in parallel if expression evaluates to true private(list): everything private and local (no relation with variables outside the block). shared(list): data accessed by all threads default (none shared private)

The reduction clause: sum = 0.0; #pragma parallel shared (n, x) private (I) reduction(+ : sum) { For(I=0; I<n; I++) sum = sum + x(i); } Without the reduction clause, race condition occurs. With the reduction clause, OpenMP generates code such that the race condition is avoided. Firstprivate(list): variables are initialized with the value before entering the block Lastprivate(list): variables are updated going out of the block.

Work-sharing constructs #pragma omp for [clause ] #pragma omp section [clause ] #pragma omp single [clause ] The work is distributed over the threads Must be enclosed in parallel region No implied barrier on entry, implied barrier on exit (unless specified otherwise)

The omp for directive: example

Schedule clause (decide how the iterations are executed in parallel): schedule (static dynamic guided [, chunk])

The omp session clause - example

Synchronization: barrier For(I=0; I<N; I++) a[i] = b[i] + c[i]; For(I=0; I<N; I++) d[i] = a[i] + b[i] Both loops are in parallel region With no synchronization in between. What is the problem? Fix: For(I=0; I<N; I++) a[i] = b[i] + c[i]; #pragma omp barrier For(I=0; I<N; I++) d[i] = a[i] + b[i]

Critical session For(I=0; I<N; I++) { sum += A[I]; } Cannot be parallelized if sum is shared. Fix: For(I=0; I<N; I++) { #pragma omp critical { sum += A[I]; } }

OpenMP environment variables OMP_NUM_THREADS OMP_SCHEDULE

OpenMP runtime environment omp_get_num_threads omp_get_thread_num omp_in_parallel

Sequential Matrix Multiply For (I=0; I<n; I++) for (j=0; j<n; j++) c[i][j] = 0; for (k=0; k<n; k++) c[i][j] = c[i][j] + a[i][k] * b[k][j];

OpenMP Matrix Multiply #pragma omp parallel for private(j, k) For (I=0; I<n; I++) for (j=0; j<n; j++) c[i][j] = 0; for (k=0; k<n; k++) c[i][j] = c[i][j] + a[i][k] * b[k][j];

Summary: OpenMP provides a compact, yet powerful programming model for shared memory programming It is very easy to use OpenMP to create parallel programs. OpenMP preserves the sequential version of the program Developing an OpenMP program: Start from a sequential program Identify the code segment that takes most of the time. Determine whether the important loops can be parallelized The loops may have critical sections, reduction variables, etc Determine the shared and private variables. Add directives

MPI vs. OpenMP Pure MPI Pro: Pure OpenMP Pro: Portable to distributed and Easy to implement parallelism shared memory machines. Low latency, high bandwidth Scales beyond one node Implicit Communication No data placement problem Coarse and fine granularity Pure MPI Con: Dynamic load balancing Difficult to develop and debug Pure OpenMP Con: High latency, low bandwidth Explicit communication Large granularity Difficult load balancing Only on shared memory machines Scale within one node Possible data placement problem No specific thread order

Why Hybrid Hybrid MPI/OpenMP paradigm is the software trend for clusters of SMP architectures. Elegant in concept and architecture: using MPI across nodes and OpenMP within nodes. Good usage of shared memory system resource (memory, latency, and bandwidth). Avoids the extra communication overhead with MPI within node. OpenMP adds fine granularity (larger message sizes) and allows increased and/or dynamic load balancing. Some problems have two-level parallelism naturally. Some problems could only use restricted number of MPI tasks. Could have better scalability than both pure MPI and pure OpenMP.

A Pseudo Hybrid Code Program hybrid call MPI_INIT () call MPI_COMM_RANK ( ) call MPI_COMM_SIZE ( ) some computation and MPI communication #PRAGMA OMP PARALLEL DO SHARED(n) do i=1,n computation enddo some computation and MPI communication call MPI_FINALIZE () end