An Introduction to OpenMP

Similar documents
15-418, Spring 2008 OpenMP: A Short Introduction

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

Parallel Computing Using OpenMP/MPI. Presented by - Jyotsna 29/01/2008

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono

Introduction to. Slides prepared by : Farzana Rahman 1

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

Shared Memory Parallelism - OpenMP

OpenMP - Introduction

Introduction to OpenMP. Rogelio Long CS 5334/4390 Spring 2014 February 25 Class

Advanced C Programming Winter Term 2008/09. Guest Lecture by Markus Thiele

CS691/SC791: Parallel & Distributed Computing

1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008

EPL372 Lab Exercise 5: Introduction to OpenMP

EE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California

OpenMP C and C++ Application Program Interface Version 1.0 October Document Number

Shared Memory Parallelism using OpenMP

Amdahl s Law. AMath 483/583 Lecture 13 April 25, Amdahl s Law. Amdahl s Law. Today: Amdahl s law Speed up, strong and weak scaling OpenMP

Data Handling in OpenMP

Distributed Systems + Middleware Concurrent Programming with OpenMP

Introduction [1] 1. Directives [2] 7

OpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018

OpenMP Application Program Interface

[Potentially] Your first parallel application

CSL 860: Modern Parallel

A brief introduction to OpenMP

OpenMP threading: parallel regions. Paolo Burgio

Shared Memory programming paradigm: openmp

Programming Shared-memory Platforms with OpenMP. Xu Liu

Lecture 4: OpenMP Open Multi-Processing

Overview: The OpenMP Programming Model

Module 11: The lastprivate Clause Lecture 21: Clause and Routines. The Lecture Contains: The lastprivate Clause. Data Scope Attribute Clauses

OpenMP Application Program Interface

Practical stuff! ü OpenMP. Ways of actually get stuff done in HPC:

Parallel and Distributed Programming. OpenMP

OpenMP Lab on Nested Parallelism and Tasks

OpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013

Parallel Programming: OpenMP

Introduction to OpenMP

OpenMP Tutorial. Dirk Schmidl. IT Center, RWTH Aachen University. Member of the HPC Group Christian Terboven

UvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP

Introduction to OpenMP

by system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by using function call

Topics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP

ECE 574 Cluster Computing Lecture 10

Implementation of Parallelization

OpenMP Technical Report 3 on OpenMP 4.0 enhancements

Introduction to Standard OpenMP 3.1

Practical in Numerical Astronomy, SS 2012 LECTURE 12

OpenMP I. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen

Programming Shared Memory Systems with OpenMP Part I. Book

OpenMP Fundamentals Fork-join model and data environment

Parallel Computing Parallel Programming Languages Hwansoo Han

<Insert Picture Here> OpenMP on Solaris

OpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

COMP4300/8300: The OpenMP Programming Model. Alistair Rendell. Specifications maintained by OpenMP Architecture Review Board (ARB)

COMP4300/8300: The OpenMP Programming Model. Alistair Rendell

OpenMP. OpenMP. Portable programming of shared memory systems. It is a quasi-standard. OpenMP-Forum API for Fortran and C/C++

OpenMP examples. Sergeev Efim. Singularis Lab, Ltd. Senior software engineer

Programming with Shared Memory PART II. HPC Fall 2012 Prof. Robert van Engelen

Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG

Mango DSP Top manufacturer of multiprocessing video & imaging solutions.

Alfio Lazzaro: Introduction to OpenMP

Data Environment: Default storage attributes

CME 213 S PRING Eric Darve

OPENMP OPEN MULTI-PROCESSING

Session 4: Parallel Programming with OpenMP

Parallel Programming Models. Parallel Programming Models. Threads Model. Implementations 3/24/2014. Shared Memory Model (without threads)

PC to HPC. Xiaoge Wang ICER Jan 27, 2016

Synchronisation in Java - Java Monitor

OpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2.

Introduction to OpenMP

OpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means

Introduction to OpenMP

EE/CSCI 451: Parallel and Distributed Computation

Parallel Programming

OpenMP. Table of Contents

Little Motivation Outline Introduction OpenMP Architecture Working with OpenMP Future of OpenMP End. OpenMP. Amasis Brauch German University in Cairo

Programming with Shared Memory PART II. HPC Fall 2007 Prof. Robert van Engelen

OpenMP on Ranger and Stampede (with Labs)

Enhancements in OpenMP 2.0

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP

Parallel Programming with OpenMP. CS240A, T. Yang, 2013 Modified from Demmel/Yelick s and Mary Hall s Slides

OpenMP 4. CSCI 4850/5850 High-Performance Computing Spring 2018

Introduction to OpenMP.

Parallel Processing Top manufacturer of multiprocessing video & imaging solutions.

Parallel Programming with OpenMP. CS240A, T. Yang

Jukka Julku Multicore programming: Low-level libraries. Outline. Processes and threads TBB MPI UPC. Examples

Multi-core Architecture and Programming

Shared Memory Programming with OpenMP

Lecture 16: Recapitulations. Lecture 16: Recapitulations p. 1

Shared memory parallel computing

Shared Memory Programming Model

HPCSE - I. «OpenMP Programming Model - Part I» Panos Hadjidoukas

Shared Memory Programming with OpenMP (3)

A Source-to-Source OpenMP Compiler MARIO SOUKUP

OpenMP 4.0/4.5: New Features and Protocols. Jemmy Hu

Shared Memory Programming Models I

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing

HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP)

OpenMP Library Functions and Environmental Variables. Most of the library functions are used for querying or managing the threading environment

Transcription:

An Introduction to OpenMP U N C L A S S I F I E D Slide 1

What Is OpenMP? OpenMP Is: An Application Program Interface (API) that may be used to explicitly direct multi-threaded, shared memory parallelism Comprised of three primary API components: Compiler Directives Runtime Library Routines Environment Variables Portable: The API is specified for C/C++ and Fortran Multiple platforms have been implemented including most Unix platforms and Windows NT Standardized: Jointly defined and endorsed by a group of major computer hardware and software vendors Expected to become an ANSI standard later (pending) U N C L A S S I F I E D Slide 2

What is OpenMP? What does OpenMP stand for? Open specifications for Multi Processing via collaborative work between interested parties from the hardware and software industry, government and academia. OpenMP Is Not: Meant for distributed memory parallel systems (by itself) Necessarily implemented identically by all vendors Guaranteed to make the most efficient use of shared memory (currently there are no data locality constructs) U N C L A S S I F I E D

OpenMP History Ancient History In the early 90's, vendors of shared-memory machines supplied similar, directive-based, Fortran programming extensions: The user would augment a serial Fortran program with directives specifying which loops were to be parallelized The compiler would be responsible for automatically parallelizing such loops across the SMP processors Implementations were all functionally similar, but were diverging (as usual) First attempt at a standard was the draft for ANSI X3H5 in 1994. It was never adopted, largely due to waning interest as distributed memory machines became popular. U N C L A S S I F I E D

OpenMP History More Recent History The OpenMP standard specification started in the spring of 1997, taking over where ANSI X3H5 had left off, as newer shared memory machine architectures started to become prevalent. Partners in the OpenMP standard specification included: (Disclaimer: all partner names derived from the OpenMP web site) Compaq / Digital Hewlett-Packard Company Intel Corporation Sun Microsystems, Inc. U.S. Department of Energy ASC program U N C L A S S I F I E D

OpenMP History Documentation Release History Oct 1997: Fortran version 1.0 Oct 1998: C/C++ version 1.0 Nov 2000: Fortran version 2.0 Mar 2002: C/C++ version 2.0 May 2005: C/C++ and Fortran version 2.5???: version 3.0 U N C L A S S I F I E D

Goals of OpenMP Standardization: Provide a standard among a variety of shared memory architectures/platforms Lean and Mean: Establish a simple and limited set of directives for programming shared memory machines. Significant parallelism can be implemented by using just 3 or 4 directives. Ease of Use: Provide capability to incrementally parallelize a serial program, unlike messagepassing libraries which typically require an all or nothing approach Provide the capability to implement both coarse-grain and fine-grain parallelism Portability: Supports Fortran (77, 90, and 95), C, and C++ Public forum for API and membership U N C L A S S I F I E D

OpenMP Programming Model Shared Memory, Thread Based Parallelism: OpenMP is based upon the existence of multiple threads in the shared memory programming paradigm. A shared memory process consists of multiple threads. Explicit Parallelism: OpenMP is an explicit (not automatic) programming model, offering the programmer full control over parallelization. Fork - Join Model: OpenMP uses the fork-join model of parallel execution: All OpenMP programs begin as a single process: the master thread. The master thread executes sequentially until the first parallel region construct is encountered. FORK: the master thread then creates a team of parallel threads The statements in the program that are enclosed by the parallel region construct are then executed in parallel among the various team threads JOIN: When the team threads complete the statements in the parallel region construct, they synchronize and terminate, leaving only the master thread U N C L A S S I F I E D

Fork Join Model U N C L A S S I F I E D

OpenMP Programming Model Compiler Directive Based: Most OpenMP parallelism is specified through the use of compiler directives which are imbedded in C/C++ or Fortran source code. Nested Parallelism Support: The API provides for the placement of parallel constructs inside of other parallel constructs. Implementations may or may not support this feature. Dynamic Threads: The API provides for dynamically altering the number of threads which may used to execute different parallel regions. Implementations may or may not support this feature. U N C L A S S I F I E D

OpenMP Programming Model I/O: OpenMP specifies nothing about parallel I/O. This is particularly important if multiple threads attempt to write/read from the same file. If every thread conducts I/O to a different file, the issues are not as significant. It is entirely up to the programmer to insure that I/O is conducted correctly within the context of a multi-threaded program. FLUSH Often?: OpenMP provides a "relaxed-consistency" and "temporary" view of thread memory (in their words). In other words, threads can "cache" their data and are not required to maintain exact consistency with real memory all of the time. When it is critical that all threads view a shared variable identically, the programmer is responsible for insuring that the variable is FLUSHed by all threads as needed. More on this later... U N C L A S S I F I E D

OpenMP Directives Uses commenting of application source #pragma omp [directive] [clause(s)] [newline] pragma Omp Always. No, I have no clue what it means. Directive A valid OpenMP directive. Must appear after the pragma and before any clauses. Clause(s) Optional. Clauses can be in any order, and repeated as necessary unless otherwise restricted. Newline Required. Precedes the structured block which is enclosed by this directive. Example #pragma omp parallel default(shared) private(beta,pi) U N C L A S S I F I E D

OpenMP Programming Model General Rules: Case sensitive (because this is C. FORTRAN not sensitive) Directives follow conventions of the C/C++ standards for compiler directives Only one directive-name may be specified per directive Each directive applies to at most one succeeding statement, which must be a structured block. Long directive lines can be "continued" on succeeding lines by escaping the newline character with a backslash ("\") at the end of a directive line. U N C L A S S I F I E D

OpenMP Directives (PARALLEL Directive) Purpose: A parallel region is a block of code that will be executed by multiple threads. This is the fundamental OpenMP parallel construct. Format: #pragma omp parallel [clause...] newline Some clauses: if (scalar_expression) private (list) shared (list) default (shared none) firstprivate (list) reduction (operator: list) copyin (list) num_threads (integer-expression) Then your structured block of code U N C L A S S I F I E D

OpenMP Directives (PARALLEL Directive) Notes: When a thread reaches a PARALLEL directive, it creates a team of threads and becomes the master of the team. The master is a member of that team and has thread number 0 within that team. (Remember when I mentioned thread pools on Monday? This is that concept in action) Starting from the beginning of this parallel region, the code is duplicated and all threads will execute that code. There is an implied barrier at the end of a parallel section. Only the master thread continues execution past this point. (Note that FORTRAN has an explicit barrier) If any thread terminates within a parallel region, all threads in the team will terminate, and the work done up until that point is undefined. U N C L A S S I F I E D

OpenMP Directives (PARALLEL Directive) How Many Threads? The number of threads in a parallel region is determined by the following factors, in order of precedence: Evaluation of the IF clause Setting of the NUM_THREADS clause Use of the omp_set_num_threads() library function Setting of the OMP_NUM_THREADS environment variable Implementation default - usually the number of CPUs on a node, though it could be dynamic (see next bullet). Threads are numbered from 0 (master thread) to N-1 U N C L A S S I F I E D

OpenMP Directives (PARALLEL Directive) Dynamic Threads: Use the omp_get_dynamic() library function to determine if dynamic threads are enabled. If supported, the two methods available for enabling dynamic threads are: The omp_set_dynamic() library routine Setting of the OMP_DYNAMIC environment variable to TRUE Nested Parallel Regions: Use the omp_get_nested() library function to determine if nested parallel regions are enabled. The two methods available for enabling nested parallel regions (if supported) are: The omp_set_nested() library routine Setting of the OMP_NESTED environment variable to TRUE If not supported, a parallel region nested within another parallel region results in the creation of a new team, consisting of one thread, by default. U N C L A S S I F I E D

OpenMP Directives (PARALLEL Directive) Clauses: IF clause: If present, it must evaluate to.true. (Fortran) or non-zero (C/C++) in order for a team of threads to be created. Otherwise, the region is executed serially by the master thread. (recall: this is evaluated first) The remaining clauses are described in the tutorial, in the Data Scope Attribute Clauses section. We may cover some if time permits. Restrictions: A parallel region must be a structured block that does not span multiple routines or code files It is illegal to branch into or out of a parallel region Only a single IF clause is permitted Only a single NUM_THREADS clause is permitted U N C L A S S I F I E D

stop U N C L A S S I F I E D

How do threads interact? OpenMP is a multi-threading, shared address model Threads communicate by sharing variables. Unintended sharing of data causes race conditions: race condition: when the program s outcome changes as the threads are scheduled differently. To control race conditions: Use synchronization to protect data conflicts. Synchronization is expensive so: Change how data is accessed to minimize the need for synchronization. U N C L A S S I F I E D