Parallel Programming: OpenMP

Similar documents
Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

Masterpraktikum - High Performance Computing

Lecture 4: OpenMP Open Multi-Processing

Overview: The OpenMP Programming Model

Advanced C Programming Winter Term 2008/09. Guest Lecture by Markus Thiele

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono

Distributed Systems + Middleware Concurrent Programming with OpenMP

Department of Informatics V. HPC-Lab. Session 2: OpenMP M. Bader, A. Breuer. Alex Breuer

CS691/SC791: Parallel & Distributed Computing

15-418, Spring 2008 OpenMP: A Short Introduction

OpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Compiling and running OpenMP programs. C/C++: cc fopenmp o prog prog.c -lomp CC fopenmp o prog prog.c -lomp. Programming with OpenMP*

EPL372 Lab Exercise 5: Introduction to OpenMP

Lab: Scientific Computing Tsunami-Simulation

EE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing

ECE 574 Cluster Computing Lecture 10

Barbara Chapman, Gabriele Jost, Ruud van der Pas

OpenMP. Application Program Interface. CINECA, 14 May 2012 OpenMP Marco Comparato

An Introduction to OpenMP

Shared Memory Programming with OpenMP

Open Multi-Processing: Basic Course

OpenMP C and C++ Application Program Interface Version 1.0 October Document Number

OpenMP Introduction. CS 590: High Performance Computing. OpenMP. A standard for shared-memory parallel programming. MP = multiprocessing

Shared Memory Parallelism - OpenMP

Parallel Programming. OpenMP Parallel programming for multiprocessors for loops

OpenMP, Part 2. EAS 520 High Performance Scientific Computing. University of Massachusetts Dartmouth. Spring 2015

Implementation of Parallelization

Jukka Julku Multicore programming: Low-level libraries. Outline. Processes and threads TBB MPI UPC. Examples

Introduction to OpenMP

OpenMP Fundamentals Fork-join model and data environment

Programming Shared Memory Systems with OpenMP Part I. Book

by system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by using function call

Module 11: The lastprivate Clause Lecture 21: Clause and Routines. The Lecture Contains: The lastprivate Clause. Data Scope Attribute Clauses

A Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh

Scientific Programming in C XIV. Parallel programming

Shared Memory programming paradigm: openmp

OpenMP - Introduction

OpenMP loops. Paolo Burgio.

Shared Memory Parallelism using OpenMP

GCC Developers Summit Ottawa, Canada, June 2006

Topics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP

Introduction to. Slides prepared by : Farzana Rahman 1

Session 4: Parallel Programming with OpenMP

Introduction to Standard OpenMP 3.1

UvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP

OpenMP threading: parallel regions. Paolo Burgio

Parallel Programming

Shared Memory Programming Model

OpenMP. OpenMP. Portable programming of shared memory systems. It is a quasi-standard. OpenMP-Forum API for Fortran and C/C++

Parallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops

Department of Informatics V. Tsunami-Lab. Session 4: Optimization and OMP Michael Bader, Alex Breuer. Alex Breuer

OpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2.

1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008

Introduction [1] 1. Directives [2] 7

High Performance Computing: Tools and Applications

CS 5220: Shared memory programming. David Bindel

OpenMP 4. CSCI 4850/5850 High-Performance Computing Spring 2018

Parallel Programming

Parallel Processing Top manufacturer of multiprocessing video & imaging solutions.

Practical in Numerical Astronomy, SS 2012 LECTURE 12

OpenMP I. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen

Concurrent Programming with OpenMP

Parallel Computing Using OpenMP/MPI. Presented by - Jyotsna 29/01/2008

Introduction to OpenMP

Shared Memory Programming with OpenMP. Lecture 3: Parallel Regions

Introduction to HPC and Optimization Tutorial VI

Introduction to OpenMP

Data Environment: Default storage attributes

Parallel and Distributed Programming. OpenMP

Parallel programming using OpenMP

CS 470 Spring Mike Lam, Professor. OpenMP

Introduction to OpenMP.

Parallel Programming in C with MPI and OpenMP

Multi-core Architecture and Programming

OpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

Using OpenMP. Rebecca Hartman-Baker Oak Ridge National Laboratory

Alfio Lazzaro: Introduction to OpenMP

Scientific Computing

Parallel Programming with OpenMP. CS240A, T. Yang

High performance computations with multithreading. Gene Soudlenkov

POSIX Threads and OpenMP tasks

PC to HPC. Xiaoge Wang ICER Jan 27, 2016

OpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013

OpenMPand the PGAS Model. CMSC714 Sept 15, 2015 Guest Lecturer: Ray Chen

OpenMP 4.0/4.5: New Features and Protocols. Jemmy Hu

EE/CSCI 451: Parallel and Distributed Computation

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP

COMP4300/8300: The OpenMP Programming Model. Alistair Rendell. Specifications maintained by OpenMP Architecture Review Board (ARB)

COMP4300/8300: The OpenMP Programming Model. Alistair Rendell

Parallel Programming

CS 470 Spring Mike Lam, Professor. OpenMP

Multithreading in C with OpenMP

Introduction to OpenMP. Lecture 2: OpenMP fundamentals

OpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means

Computational Mathematics

Parallel Programming using OpenMP

DPHPC: Introduction to OpenMP Recitation session

Introduction to OpenMP

Transcription:

Parallel Programming: OpenMP Xianyi Zeng xzeng@utep.edu Department of Mathematical Sciences The University of Texas at El Paso. November 10, 2016.

An Overview of OpenMP OpenMP: Open Multi-Processing An Application Program Interface (API) for parallelism with: Shared memory. Multiple threads. Supports C/C++ and Fortran. Three components: Compiler directives. Runtime library routines. Environmental variables. Incremental programming style: Start with a serial program. Insert compiler directives to enable parallelism. Support both fine-grain and coarse-grain parallelism.

An Overview of OpenMP OpenMP: Open Multi-Processing Accomplish parallelism exclusively through the use of threads. Use compiler directives to denote parallel regions: fork/join model. Allow nested parallel regions. Specifies nothing about parallel I/O. Relaxed-consistency: threads do not maintain exact consistency with real memory. To build with gcc/g++: Need to turn on the OpenMP support by adding the flag -fopenmp: gcc -fopenmp -o...

An Overview of OpenMP Syntax: OpenMP: Compiler Directives #pragma omp directive-name [clause,...] The compiler directive applies to one succeeding statement use block to enclose multiple statements. The compiler directive is case sensitive. Examples: #pragma omp parallel The code is to be execude by multiple threads in parallel. #pragma omp for The work in a for loop to be divided among threads. #pragma omp parallel for Shortcut combining the previous two. #pragma omp master Only the master thread should execute the code. #pragma omp critical Create a critical section. #pragma omp barrier All threads pause at the barrier, until all threads execute the barrier.

An Overview of OpenMP OpenMP: Run-time Library Routines Include omp.h, and these routines work similarly as C/C++ functions. Purposes: Setting and querying the number of threads. Quering a thread s unique identifier (thread ID). Quering the thread team size. Quering if in a parallel region, and at what level. Setting, initializing and terminating locks. Quering wall clock time. Etc. Example 1: Get the total number of threads: int Nthreads = omg get num threads(); Example 2: Get the thread ID for the current thread: int tid = omg get thread num();

An Overview of OpenMP OpenMP: Environmental Variables Work as other Unix environmental variables use export to allow them to be passed to the executed program (process). All 13 environmental variable names are uppercase, and the values assigned to them are not case-sensitive. A few of them: OMP NUM THREADS The default number of threads during the parallel region. OMP THREAD LIMIT The number of OpenMP threads to use for the whole program. OMP NESTED Enable/disable nested parallelism. OMP MAX ACTIVE LEVELS The maximum number of nested active parallel regions. OMP PROC BIND Enable/disable threads binding to processors. OMP STACKSIZE Set the size of the stack for non-master treads (default is about 2MB). Etc.

Example 1: Hello World Output example: Hellow world from thread 1 ( out of 4 ). Hellow world from thread 0 ( out of 4 ). Hellow world from thread 2 ( out of 4 ). Hellow world from thread 3 ( out of 4 ).

Example 2: Data parallelism One can combine the two directives into one: #pragma omp parallel for

Example 3: Intermediate Variables Solution: use clause to make area private.

Data Sharing Clauses shared(list) Default with the exception of the loop indices (C/C++) and locally declared variables. private(list) Make the variables in the list private to each thread. firstprivate(list) Initialize the privates with the value from the master thread. lastprivate(list) Copy out the last thread value into the master thread copy. default(list) Change the default type to some of the others. threadprivate(list) A private variable exists on the thread stack and only for the duration of the parallel region. A threadprivate variable, however, may exist on the heap and can exist across regions.

Example 4: Private Variables lastprivate is only valid with the directives for and section.

Example 5: Sections Each section is executed by one thread.

Example 6: Nested For Loops For C/C++, j is shared by default! For Fortran, however, both i and j are private by default. Correction 1: make j private by using for ( int j = 0; j < 4; j++ ) i.e., the style supported by the C99 standard. Correction 2: explicitly declare private(j) in the compiler directive.

Example 7:An Old Friend Inconsistent and unpredictable result due to data race.

Example 7:An Old Friend Use the run-time lock/unlock routines to avoid data race.

Example 7:An Old Friend Use the compiler directive to indicate a critical section.

Example 7:An Old Friend Use the reduction clause.

Reduction Operators + Compute the sum, initialize with zero. max Compute the maximum, initialize with the least number possible. min Compute the minimum, initilize with the largest number possible. Bit operations, logical operations, etc.

Example 8: Prime Number Counter

Resources The LLNL tutorial on OpenMP https://computing.llnl.gov/tutorials/openmp/ The XSEDE workshop.