Parallel Processing Top manufacturer of multiprocessing video & imaging solutions.

Size: px
Start display at page:

Download "Parallel Processing Top manufacturer of multiprocessing video & imaging solutions."

Transcription

1 1 of 10 3/3/ :51 AM Linux Magazine March 2004 C++ Parallel Increase application performance without changing your source code. Parallel Processing Top manufacturer of multiprocessing video & imaging solutions. Copyright Linux Magazine 2004 EXTREME LINUX Using OpenMP, Part 3 by Forrest Hoffman This is the third and final column in a series on shared memory parallelization using OpenMP. Often used to improve performance of scientific models on symmetric multi-processor (SMP) machines or SMP nodes in a Linux cluster, OpenMP consists of a portable set of compiler directives, library calls, and environment variables. It's supported by a wide range of FORTRAN and C/C++ compilers for Linux and commercial supercomputers. OpenMP is based on the fork and join model of execution in which a team of threads is spawned (or forked) at the beginning of a concurrent section of code (called a parallel region) and subsequently killed (or joined) at the end of the parallel region. OpenMP is portable across platforms and is intended for use in programs that execute correctly either sequentially (that is, when compiled without OpenMP enabled) or in parallel (with OpenMP enabled). An introduction to the concepts and syntax of OpenMP directives was presented in January's column (available online at February's column (available online at covered more directives and all of the library functions and environment variables. Both previous columns included example C code, demonstrating many of the features of OpenMP. This month's column presents the remaining directives and OpenMP's data environment clauses. Reviewing Constructs OpenMP directives take the form #pragma omp directive-name [clause[[,] clause]...] and sit just above the structured code blocks that they affect. A directive, along with all the clauses that modify it and the subsequent structured block of code, constitute what is called a construct. We've already seen how to use the parallel construct. It's the fundamental construct that starts parallel execution. The work-sharing constructs -- for, sections, and single -- distribute the execution of associated program statements among the thread team members that encounter them. Combined parallel work-sharing

2 2 of 10 3/3/ :51 AM constructs are shortcuts for parallel regions containing only one work-sharing construct. The combined constructs are parallel for (used in January's example program) and parallel sections. The Last of the Directives The sections and parallel sections directives are used to declare blocks of code that can be executed concurrently. While the for and parallel for directives spread loop iterations across thread team members, sections and parallel sections spread non-iterative blocks of code across threads in a team. Each section or structured block is executed once by one of the threads. For example, some code may call a series of subroutines to compute physics processes on each surface of a cube. Since processes on each face can be computed independently and each has its own subroutine, the sections or parallel sections directives can be used to tell the compiler that computations for each section of code may completely overlap. Such a construct might look like this: void do_physics() #pragma omp parallel sections top_physics(); bottom_physics(); left_physics(); right_physics(); front_physics(); rear_physics(); Here, we used the combined parallel sections directive instead of having separate parallel and sections directives. Within the structured block of the parallel sections construct, each statement that may be concurrently executed has its own section directive. As a result, the program is free to completely overlap the computation of all these subroutines by distributing them among threads in the team. When the code snippet above is compiled (with sufficiently time-consuming subroutines), it should be about twice as fast when using two threads (with OpenMP enabled) than when compiled and run without OpenMP. In the example below, the code is first compiled and run with OpenMP disabled. Then the code is compiled with OpenMP support (enabled by the -mp flag on the compile line when using the Portland Group compiler) and run with two threads on a dual-processor Pentium III. [node01]$ pgcc -O -o sections sections.c

3 3 of 10 3/3/ :51 AM [node01]$ time./sections real 0m41.205s user 0m41.201s sys 0m0.002s node01]$ pgcc -mp -O -o sections sections.c node01]$ OMP_NUM_THREADS=2 time./sections 41.19user 0.15system 0:20.70elapsed 199%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (134major+14minor) pagefaults 0swaps As you can see, the serial version ran in 41.2 seconds. The OpenMP parallel version (using two threads) still consumed 41.2 seconds of user time, but real elapsed time was only 20.7 seconds. Therefore, using only very simple compiler directives, we were able to use both processors on an SMP machine to cut wallclock time in half. The single directive in a parallel region identifies a block of code to be executed by only one thread in the team. The thread that executes this code block need not be the master thread -- the block is usually executed by the first thread that encounters it. The following code snippet demonstrates this feature. #pragma omp parallel private(tid) tid = omp_get_thread_num(); #pragma omp single printf("%d: Starting process_block1\n", tid); process_block1(); #pragma omp single nowait printf("%d: Starting process_block2\n", tid); process_block2(); #pragma omp single printf("%d: All done\n", tid); The code contains a parallel region for which the variable tid is private to each thread. Within the parallel region, the single directive is contained above each of the printf statements so that the messages are printed only once no matter how many threads are executing statements in the parallel region. The thread id, obtained from the call to omp_ get_thread_num() and stored in the private variable tid, is printed by whichever thread executes each printf statement. When compiled and run, you can see that thread one executed the first and third print statements, while thread zero (the master thread) executed the one in the middle. [node01]$ pgcc -mp -O -o single single.c [node01]$ OMP_NUM_THREADS=2./single 1: Starting process_block1 0: Starting process_block2 1: All done There is an implied barrier at the end of a single construct. As a result, after one thread executes the print statement, all other threads must "catch up" to the barrier point before they all simultaneously execute the

4 4 of 10 3/3/ :51 AM next statements. The nowait clause can be used to eliminate the implied barrier. In the example code above, all threads begin executing process_block1() simultaneously, because of the single construct above it. However, threads may begin executing process_ block2() at slightly different times because the nowait clause is specified as part of the single construct above process_ block2(). The master directive is similar to the single directive, although it requires that only the master thread execute the adjoining code block. The critical directive is used to identify a section of code within a parallel region that should be executed by only one thread at a time. This directive should be used with caution, because too many criticals can result in frequent synchronization, thus slowing down processing. While critical constructs could be used for updating counters or performing similar reductions within parallel loops on global shared variables, the reduction clause is often better suited to that task. The critical directive is often useful for queuing applications in which calls are made to obtain new requests from a shared queue. A critical directive above a function call that returns a request identifier prevents two or more threads from requesting a new identifier at the same time, preventing a race condition. For example, in the following code snippet, the critical directive sits above the call to get_next_request(): #pragma omp parallel shared(request_queue) private(request_id,request_status) for (;;) #pragma omp critical (get_request) request_id = get_next_request(request_queue); printf("processing request %d\n", request_id); request_status = process_request (request_id); update_request_status(request_id, request_status); As a result, this function is called by only one thread at a time, ensuring that each receives a unique request identifier. Notice that the critical construct is contained within a parallel construct that identifies request_queue as a shared variable and request_id and request_status as variables private to each thread. The barrier directive provides a means for synchronizing all threads in a team. When encountered in the program, each thread in the team waits for all other team members to reach the same, specified point before collectively starting execution of the subsequent statements in parallel.

5 5 of 10 3/3/ :51 AM The barrier directive is often useful for ensuring that all threads have completed some phase of work prior to exchanging results as in the following code example. #pragma omp parallel work_phase1(); #pragma omp barrier exchange_results(); work_phase2(); Here work_phase1() is executed simultaneously by all threads in the team. As each thread returns from the routine, it waits for all threads to complete work_phase1() prior to calling exchange_results() and executing work_phase2(). In general, barriers should be avoided except where necessary to preserve the integrity of the data environment. Spending valuable time synchronizing threads that could operate completely independently is not a good use of computer time. The atomic directive ensures that a memory location is updated atomically instead of allowing multiple threads to write to the same location at once. Only certain mathematical expressions may be used in the atomic construct. For example, the following piece of code contains a parallel for construct with an atomic directive within the loop to protect against simultaneous updates of an element of the ts array that is accessed through an index array. #pragma omp parallel for shared(ts, index) for (i = 0; i < SIZE; i++) #pragma omp atomic ts[index[i]] += compute1(i); The advantage of using the atomic directive in this case is that multiple elements of ts can be simultaneously updated. If a critical directive had been used instead, all updates to ts would be serialized, resulting in poor performance. The flush directive is used to synchronize shared objects in memory across a team of threads. A list of variables that must be synchronized can be provided with the flush directive. Alternatively, flush without a variable list synchronizes all shared objects (and probably incurs more overhead). The ordered directive identifies a block of code that's executed in the order in which iterations would if they were executed sequentially. An ordered directive must be within the extent of a for or parallel for construct. Moreover, the for or parallel for must also specify an ordered clause. In the following example, the compute1() routine is called within a parallel for construct containing an ordered clause. The print

6 6 of 10 3/3/ :51 AM statement in compute1() has an ordered directive above it so that the output is generated in the expected sequential order. void compute1(int i) int tid; tid = omp_get_thread_num(); #pragma omp ordered printf("%d: compute1 called for iteration %d\n", tid, k); /* lots of work removed from here */ int main(int argc, char **argv) int i; #pragma omp parallel for ordered schedule(dynamic) for (i = 0; i < 10; i++) compute1(i); exit(0); The parallel for directive also has a schedule clause that specifies dynamic adjustment of threads. This clause causes each iteration to be assigned (in order) to the next available thread. In the output below, iteration 0 is assigned to the master thread (thread 0) and iteration 1 is assigned to thread 1. Since thread 1 completes its work first, the thread becomes available and is assigned iteration 2, the very next iteration. [node01]$ OMP_NUM_THREADS=2 time./ordered 0: compute1 called for iteration 0 1: compute1 called for iteration 1 1: compute1 called for iteration 2 0: compute1 called for iteration 3 0: compute1 called for iteration 4 1: compute1 called for iteration 5 0: compute1 called for iteration 6 1: compute1 called for iteration 7 0: compute1 called for iteration 8 1: compute1 called for iteration user 0.16system 0:24.66elapsed 198%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (144major+16minor)pagefaults 0swaps Thread Data Environment The data environment for OpenMP threads in a team is controlled by the threadprivate directive and a variety of data sharing clauses. We've already used the most common of these clauses -- private and shared -- in examples. Table One contains a list of all OpenMP clauses, including the data sharing attribute clauses, and the directives with which they may be used.

7 7 of 10 3/3/ :51 AM Table One: All OpenMP clauses and the directives with which they may be used Clause OPENMP Directives copyin copyprivate default firstprivate if lastprivate nowait num_threads ordered private reduction schedule shared parallel single parallel parallel, for, sections, single parallel for, sections for, sections, single parallel for parallel, for, sections, single parallel, for, sections for parallel The threadprivate directive is used to make various data objects, specified in a list along with the directive, private to each thread. As usual, the list is contained within parentheses and separated by commas. This amounts to creating a copy of the variable for each thread in the team. Each copy is initialized once prior to the first reference of that copy. As with all private objects, one thread may not reference another thread's copy of a threadprivate object. Within serial and master regions of the program, the master thread's copy of the object is used. threadprivate objects persist outside the parallel region in which they are copied only if the dynamic thread mechanism is disabled and the number of threads doesn't change.

8 8 of 10 3/3/ :51 AM The threadprivate directive must precede all references to any of the variables or objects in its list. In the following example, a counter variable called counter is declared then followed by a threadprivate directive at the same level (not within subroutines) and prior to being referenced. In main(), a parallel loop calls bump_counter() ten times, printing out its value in each iteration. int counter = 0; #pragma omp threadprivate(counter) int bump_counter() counter++; return counter; int main(int argc, char **argv) int i; #pragma omp parallel for for (i = 0; i < 10; i++) bump_counter(); printf("%d: i=%d and my copy of counter = %d\n", omp_get_thread_num(), i, counter); exit(0); When run without OpenMP (or with only one thread), a single copy of counter is bumped ten times resulting in a final value of 10. As seen below, when run with two threads, each copy of counter is bumped five times. This loop executes so quickly that all the output from thread zero appears before output from thread one. [node01]$ OMP_NUM_THREADS=2./tp 0: i=0 and my copy of counter = 1 0: i=1 and my copy of counter = 2 0: i=2 and my copy of counter = 3 0: i=3 and my copy of counter = 4 0: i=4 and my copy of counter = 5 1: i=5 and my copy of counter = 1 1: i=6 and my copy of counter = 2 1: i=7 and my copy of counter = 3 1: i=8 and my copy of counter = 4 1: i=9 and my copy of counter = 5 In addition to the threadprivate directive, a number of data sharing attribute clauses may be used with other directives to control whether data objects are shared or private, as well as how they are initialized before and saved after the associated code block. If an existing variable is not specified in a sharing attribute clause or threadprivate directive when a parallel or work-sharing construct is encountered, it is shared. Static variables and heap allocated memory is also shared. However, the pointer to this memory may be either private or shared. Automatic variables declared within a parallel region are private.

9 9 of 10 3/3/ :51 AM Most clauses accept a comma-separated list of variables contained within parentheses. Variables can't be specified in multiple clauses except for the firstprivate and lastprivate clauses. Not all clauses are valid for all directives. Table One provides a list of clauses and the directives with which they may be used. The combined parallel work-sharing constructs parallel for and parallel sections accept the same clauses as the for and sections constructs, respectively. As we've already seen in previous examples, the private clause declares variables to be private for each thread in a team. When objects are declared private, new objects with automatic storage duration are allocated on each thread. These new private variables are used for the extent of the construct. The original objects have an indeterminate value upon entry to and exit from the construct. The firstprivate clause has the same behavior as the private clause, except with regard to initialization of the private object. When used with a parallel construct, the firstprivate clause causes the specified variables to be initialized to the values of the original objects as they exist immediately prior to the parallel construct for the thread that encounters it. With a work-sharing construct, the initial value of new private objects is set to the value of the original object just prior to the point in time when the participating thread encountered the construct. In a similar fashion, the lastprivate clause behaves just like private, except that the final values of the specified variables are saved to the original objects outside of the parallel or work-sharing constructs upon exit of the construct. Variables not assigned a value in the last iteration of a for or parallel for construct or by the last section of a sections or parallel sections construct have indeterminate values upon exit of the construct. The shared clause makes specified objects shared among all threads in a team. It is usually not necessary to specify objects created outside a construct as shared since this is the default behavior. However, the default clause, which requires either (shared) or (none) as a parameter, may be used to change this behavior. Specifying default(none) requires that each variable be listed explicitly in a data-sharing attribute clause, unless it's declared within the parallel construct. The reduction clause performs a reduction on the scalar variables that appear in the variable list along with some operator. We used this clause in previous examples to sum up scalar variables across threads. Like the private clause, the reduction clause tells the compiler to create a private copy of the specified variables for each thread. Then at the end of the region for which the clause was specified, the original object is updated to reflect the combined result from all the threads based on the operator specified in the reduction clause. The copyin clause provides a way to assign the same value to

10 10 of 10 3/3/ :51 AM threadprivate variables for each thread in a team. The value of each variable in a copyin clause is copied from the master thread to the private copies on every other thread at the beginning of a parallel region. Similarly, the copyprivate clause, which may only appear with the single directive, may be used to broadcast to all threads values of variables from the thread which executed the single construct. This updating of private variables on each thread occurs after the execution of the code within the single construct and before any threads have left the implied barrier at the end of the construct. These data-sharing attribute clauses provide a powerful mechanism for manipulating the data environment for threads. Using the clauses, you can avoid writing your own shared memory data handling software. With a small number of fairly simple directives and powerful clauses, OpenMP can often be a very easy way to take advantage of shared memory systems for modeling and data processing. When combined with MPI for distributed memory parallelism, it can further improve performance and resource utilization on SMP clusters. We didn't discuss nesting of OpenMP directives, and some details of directive and clause restrictions have been glossed over. So when you are ready to add OpenMP to your own code, be sure to read the specification documents on the OpenMP web site at Forrest Hoffman is a computer modeling and simulation researcher at Oak Ridge National Laboratory. He can be reached at forrest@climate.ornl.gov. Linux Magazine March 2004 Fast Software Builds Distributed parallel Make that speeds up builds times. Parallel language for SMP Integrated parallelism, exceptions Practical million line systems Copyright Linux Magazine 2004

Mango DSP Top manufacturer of multiprocessing video & imaging solutions.

Mango DSP Top manufacturer of multiprocessing video & imaging solutions. 1 of 11 3/3/2005 10:50 AM Linux Magazine February 2004 C++ Parallel Increase application performance without changing your source code. Mango DSP Top manufacturer of multiprocessing video & imaging solutions.

More information

Synchronization. Event Synchronization

Synchronization. Event Synchronization Synchronization Synchronization: mechanisms by which a parallel program can coordinate the execution of multiple threads Implicit synchronizations Explicit synchronizations Main use of explicit synchronization

More information

JANUARY 2004 LINUX MAGAZINE Linux in Europe User Mode Linux PHP 5 Reflection Volume 6 / Issue 1 OPEN SOURCE. OPEN STANDARDS.

JANUARY 2004 LINUX MAGAZINE Linux in Europe User Mode Linux PHP 5 Reflection Volume 6 / Issue 1 OPEN SOURCE. OPEN STANDARDS. 0104 Cover (Curtis) 11/19/03 9:52 AM Page 1 JANUARY 2004 LINUX MAGAZINE Linux in Europe User Mode Linux PHP 5 Reflection Volume 6 / Issue 1 LINUX M A G A Z I N E OPEN SOURCE. OPEN STANDARDS. THE STATE

More information

Lecture 4: OpenMP Open Multi-Processing

Lecture 4: OpenMP Open Multi-Processing CS 4230: Parallel Programming Lecture 4: OpenMP Open Multi-Processing January 23, 2017 01/23/2017 CS4230 1 Outline OpenMP another approach for thread parallel programming Fork-Join execution model OpenMP

More information

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono OpenMP Algoritmi e Calcolo Parallelo References Useful references Using OpenMP: Portable Shared Memory Parallel Programming, Barbara Chapman, Gabriele Jost and Ruud van der Pas OpenMP.org http://openmp.org/

More information

ECE 574 Cluster Computing Lecture 10

ECE 574 Cluster Computing Lecture 10 ECE 574 Cluster Computing Lecture 10 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 1 October 2015 Announcements Homework #4 will be posted eventually 1 HW#4 Notes How granular

More information

15-418, Spring 2008 OpenMP: A Short Introduction

15-418, Spring 2008 OpenMP: A Short Introduction 15-418, Spring 2008 OpenMP: A Short Introduction This is a short introduction to OpenMP, an API (Application Program Interface) that supports multithreaded, shared address space (aka shared memory) parallelism.

More information

OpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

OpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen OpenMP - II Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 OpenMP References Using OpenMP: Portable Shared Memory Parallel Programming. The MIT

More information

Session 4: Parallel Programming with OpenMP

Session 4: Parallel Programming with OpenMP Session 4: Parallel Programming with OpenMP Xavier Martorell Barcelona Supercomputing Center Agenda Agenda 10:00-11:00 OpenMP fundamentals, parallel regions 11:00-11:30 Worksharing constructs 11:30-12:00

More information

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP COMP4510 Introduction to Parallel Computation Shared Memory and OpenMP Thanks to Jon Aronsson (UofM HPC consultant) for some of the material in these notes. Outline (cont d) Shared Memory and OpenMP Including

More information

Barbara Chapman, Gabriele Jost, Ruud van der Pas

Barbara Chapman, Gabriele Jost, Ruud van der Pas Using OpenMP Portable Shared Memory Parallel Programming Barbara Chapman, Gabriele Jost, Ruud van der Pas The MIT Press Cambridge, Massachusetts London, England c 2008 Massachusetts Institute of Technology

More information

EE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California

EE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California EE/CSCI 451 Introduction to Parallel and Distributed Computation Discussion #4 2/3/2017 University of Southern California 1 USC HPCC Access Compile Submit job OpenMP Today s topic What is OpenMP OpenMP

More information

Topics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP

Topics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP Topics Lecture 11 Introduction OpenMP Some Examples Library functions Environment variables 1 2 Introduction Shared Memory Parallelization OpenMP is: a standard for parallel programming in C, C++, and

More information

1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008

1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008 1 of 6 Lecture 7: March 4 CISC 879 Software Support for Multicore Architectures Spring 2008 Lecture 7: March 4, 2008 Lecturer: Lori Pollock Scribe: Navreet Virk Open MP Programming Topics covered 1. Introduction

More information

Computer Architecture

Computer Architecture Jens Teubner Computer Architecture Summer 2016 1 Computer Architecture Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Summer 2016 Jens Teubner Computer Architecture Summer 2016 2 Part I Programming

More information

OpenMP C and C++ Application Program Interface Version 1.0 October Document Number

OpenMP C and C++ Application Program Interface Version 1.0 October Document Number OpenMP C and C++ Application Program Interface Version 1.0 October 1998 Document Number 004 2229 001 Contents Page v Introduction [1] 1 Scope............................. 1 Definition of Terms.........................

More information

A brief introduction to OpenMP

A brief introduction to OpenMP A brief introduction to OpenMP Alejandro Duran Barcelona Supercomputing Center Outline 1 Introduction 2 Writing OpenMP programs 3 Data-sharing attributes 4 Synchronization 5 Worksharings 6 Task parallelism

More information

Shared Memory Parallelism - OpenMP

Shared Memory Parallelism - OpenMP Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (http://www.llnl.gov/computing/tutorials/openmp/#introduction) OpenMP sc99 tutorial

More information

Programming Shared Memory Systems with OpenMP Part I. Book

Programming Shared Memory Systems with OpenMP Part I. Book Programming Shared Memory Systems with OpenMP Part I Instructor Dr. Taufer Book Parallel Programming in OpenMP by Rohit Chandra, Leo Dagum, Dave Kohr, Dror Maydan, Jeff McDonald, Ramesh Menon 2 1 Machine

More information

OpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018

OpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018 OpenMP 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives

More information

Distributed Systems + Middleware Concurrent Programming with OpenMP

Distributed Systems + Middleware Concurrent Programming with OpenMP Distributed Systems + Middleware Concurrent Programming with OpenMP Gianpaolo Cugola Dipartimento di Elettronica e Informazione Politecnico, Italy cugola@elet.polimi.it http://home.dei.polimi.it/cugola

More information

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program Amdahl's Law About Data What is Data Race? Overview to OpenMP Components of OpenMP OpenMP Programming Model OpenMP Directives

More information

OpenMP I. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen

OpenMP I. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen OpenMP I Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS16/17 OpenMP References Using OpenMP: Portable Shared Memory Parallel Programming. The MIT Press,

More information

Data Environment: Default storage attributes

Data Environment: Default storage attributes COSC 6374 Parallel Computation Introduction to OpenMP(II) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel) Edgar Gabriel Fall 2014 Data Environment: Default storage attributes

More information

Chip Multiprocessors COMP Lecture 9 - OpenMP & MPI

Chip Multiprocessors COMP Lecture 9 - OpenMP & MPI Chip Multiprocessors COMP35112 Lecture 9 - OpenMP & MPI Graham Riley 14 February 2018 1 Today s Lecture Dividing work to be done in parallel between threads in Java (as you are doing in the labs) is rather

More information

OpenMPand the PGAS Model. CMSC714 Sept 15, 2015 Guest Lecturer: Ray Chen

OpenMPand the PGAS Model. CMSC714 Sept 15, 2015 Guest Lecturer: Ray Chen OpenMPand the PGAS Model CMSC714 Sept 15, 2015 Guest Lecturer: Ray Chen LastTime: Message Passing Natural model for distributed-memory systems Remote ( far ) memory must be retrieved before use Programmer

More information

Introduction to Standard OpenMP 3.1

Introduction to Standard OpenMP 3.1 Introduction to Standard OpenMP 3.1 Massimiliano Culpo - m.culpo@cineca.it Gian Franco Marras - g.marras@cineca.it CINECA - SuperComputing Applications and Innovation Department 1 / 59 Outline 1 Introduction

More information

Compiling and running OpenMP programs. C/C++: cc fopenmp o prog prog.c -lomp CC fopenmp o prog prog.c -lomp. Programming with OpenMP*

Compiling and running OpenMP programs. C/C++: cc fopenmp o prog prog.c -lomp CC fopenmp o prog prog.c -lomp. Programming with OpenMP* Advanced OpenMP Compiling and running OpenMP programs C/C++: cc fopenmp o prog prog.c -lomp CC fopenmp o prog prog.c -lomp 2 1 Running Standard environment variable determines the number of threads: tcsh

More information

OpenMP. Application Program Interface. CINECA, 14 May 2012 OpenMP Marco Comparato

OpenMP. Application Program Interface. CINECA, 14 May 2012 OpenMP Marco Comparato OpenMP Application Program Interface Introduction Shared-memory parallelism in C, C++ and Fortran compiler directives library routines environment variables Directives single program multiple data (SPMD)

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP Ricardo Fonseca https://sites.google.com/view/rafonseca2017/ Outline Shared Memory Programming OpenMP Fork-Join Model Compiler Directives / Run time library routines Compiling and

More information

Multithreading in C with OpenMP

Multithreading in C with OpenMP Multithreading in C with OpenMP ICS432 - Spring 2017 Concurrent and High-Performance Programming Henri Casanova (henric@hawaii.edu) Pthreads are good and bad! Multi-threaded programming in C with Pthreads

More information

Open Multi-Processing: Basic Course

Open Multi-Processing: Basic Course HPC2N, UmeåUniversity, 901 87, Sweden. May 26, 2015 Table of contents Overview of Paralellism 1 Overview of Paralellism Parallelism Importance Partitioning Data Distributed Memory Working on Abisko 2 Pragmas/Sentinels

More information

OpenMP 4. CSCI 4850/5850 High-Performance Computing Spring 2018

OpenMP 4. CSCI 4850/5850 High-Performance Computing Spring 2018 OpenMP 4 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives

More information

Jukka Julku Multicore programming: Low-level libraries. Outline. Processes and threads TBB MPI UPC. Examples

Jukka Julku Multicore programming: Low-level libraries. Outline. Processes and threads TBB MPI UPC. Examples Multicore Jukka Julku 19.2.2009 1 2 3 4 5 6 Disclaimer There are several low-level, languages and directive based approaches But no silver bullets This presentation only covers some examples of them is

More information

Allows program to be incrementally parallelized

Allows program to be incrementally parallelized Basic OpenMP What is OpenMP An open standard for shared memory programming in C/C+ + and Fortran supported by Intel, Gnu, Microsoft, Apple, IBM, HP and others Compiler directives and library support OpenMP

More information

OpenMP examples. Sergeev Efim. Singularis Lab, Ltd. Senior software engineer

OpenMP examples. Sergeev Efim. Singularis Lab, Ltd. Senior software engineer OpenMP examples Sergeev Efim Senior software engineer Singularis Lab, Ltd. OpenMP Is: An Application Program Interface (API) that may be used to explicitly direct multi-threaded, shared memory parallelism.

More information

Programming with Shared Memory PART II. HPC Fall 2012 Prof. Robert van Engelen

Programming with Shared Memory PART II. HPC Fall 2012 Prof. Robert van Engelen Programming with Shared Memory PART II HPC Fall 2012 Prof. Robert van Engelen Overview Sequential consistency Parallel programming constructs Dependence analysis OpenMP Autoparallelization Further reading

More information

OpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013

OpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013 OpenMP António Abreu Instituto Politécnico de Setúbal 1 de Março de 2013 António Abreu (Instituto Politécnico de Setúbal) OpenMP 1 de Março de 2013 1 / 37 openmp what? It s an Application Program Interface

More information

OpenMP. Dr. William McDoniel and Prof. Paolo Bientinesi WS17/18. HPAC, RWTH Aachen

OpenMP. Dr. William McDoniel and Prof. Paolo Bientinesi WS17/18. HPAC, RWTH Aachen OpenMP Dr. William McDoniel and Prof. Paolo Bientinesi HPAC, RWTH Aachen mcdoniel@aices.rwth-aachen.de WS17/18 Loop construct - Clauses #pragma omp for [clause [, clause]...] The following clauses apply:

More information

Introduction to. Slides prepared by : Farzana Rahman 1

Introduction to. Slides prepared by : Farzana Rahman 1 Introduction to OpenMP Slides prepared by : Farzana Rahman 1 Definition of OpenMP Application Program Interface (API) for Shared Memory Parallel Programming Directive based approach with library support

More information

COSC 6374 Parallel Computation. Introduction to OpenMP. Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)

COSC 6374 Parallel Computation. Introduction to OpenMP. Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel) COSC 6374 Parallel Computation Introduction to OpenMP Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel) Edgar Gabriel Fall 2015 OpenMP Provides thread programming model at a

More information

by system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by using function call

by system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by using function call OpenMP Syntax The OpenMP Programming Model Number of threads are determined by system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by

More information

OpenMP. OpenMP. Portable programming of shared memory systems. It is a quasi-standard. OpenMP-Forum API for Fortran and C/C++

OpenMP. OpenMP. Portable programming of shared memory systems. It is a quasi-standard. OpenMP-Forum API for Fortran and C/C++ OpenMP OpenMP Portable programming of shared memory systems. It is a quasi-standard. OpenMP-Forum 1997-2002 API for Fortran and C/C++ directives runtime routines environment variables www.openmp.org 1

More information

Shared Memory Programming Models I

Shared Memory Programming Models I Shared Memory Programming Models I Peter Bastian / Stefan Lang Interdisciplinary Center for Scientific Computing (IWR) University of Heidelberg INF 368, Room 532 D-69120 Heidelberg phone: 06221/54-8264

More information

Parallel Programming

Parallel Programming Parallel Programming OpenMP Nils Moschüring PhD Student (LMU) Nils Moschüring PhD Student (LMU), OpenMP 1 1 Overview What is parallel software development Why do we need parallel computation? Problems

More information

HPC Workshop University of Kentucky May 9, 2007 May 10, 2007

HPC Workshop University of Kentucky May 9, 2007 May 10, 2007 HPC Workshop University of Kentucky May 9, 2007 May 10, 2007 Part 3 Parallel Programming Parallel Programming Concepts Amdahl s Law Parallel Programming Models Tools Compiler (Intel) Math Libraries (Intel)

More information

https://www.youtube.com/playlist?list=pllx- Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG

https://www.youtube.com/playlist?list=pllx- Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG https://www.youtube.com/playlist?list=pllx- Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG OpenMP Basic Defs: Solution Stack HW System layer Prog. User layer Layer Directives, Compiler End User Application OpenMP library

More information

Overview: The OpenMP Programming Model

Overview: The OpenMP Programming Model Overview: The OpenMP Programming Model motivation and overview the parallel directive: clauses, equivalent pthread code, examples the for directive and scheduling of loop iterations Pi example in OpenMP

More information

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical

More information

Parallelising Scientific Codes Using OpenMP. Wadud Miah Research Computing Group

Parallelising Scientific Codes Using OpenMP. Wadud Miah Research Computing Group Parallelising Scientific Codes Using OpenMP Wadud Miah Research Computing Group Software Performance Lifecycle Scientific Programming Early scientific codes were mainly sequential and were executed on

More information

OpenMP. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen

OpenMP. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen OpenMP Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS16/17 Worksharing constructs To date: #pragma omp parallel created a team of threads We distributed

More information

Programming with Shared Memory PART II. HPC Fall 2007 Prof. Robert van Engelen

Programming with Shared Memory PART II. HPC Fall 2007 Prof. Robert van Engelen Programming with Shared Memory PART II HPC Fall 2007 Prof. Robert van Engelen Overview Parallel programming constructs Dependence analysis OpenMP Autoparallelization Further reading HPC Fall 2007 2 Parallel

More information

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 Message passing vs. Shared memory Client Client Client Client send(msg) recv(msg) send(msg) recv(msg) MSG MSG MSG IPC Shared

More information

Amdahl s Law. AMath 483/583 Lecture 13 April 25, Amdahl s Law. Amdahl s Law. Today: Amdahl s law Speed up, strong and weak scaling OpenMP

Amdahl s Law. AMath 483/583 Lecture 13 April 25, Amdahl s Law. Amdahl s Law. Today: Amdahl s law Speed up, strong and weak scaling OpenMP AMath 483/583 Lecture 13 April 25, 2011 Amdahl s Law Today: Amdahl s law Speed up, strong and weak scaling OpenMP Typically only part of a computation can be parallelized. Suppose 50% of the computation

More information

OpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2.

OpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2. OpenMP Overview in 30 Minutes Christian Terboven 06.12.2010 / Aachen, Germany Stand: 03.12.2010 Version 2.3 Rechen- und Kommunikationszentrum (RZ) Agenda OpenMP: Parallel Regions,

More information

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical

More information

Introduction [1] 1. Directives [2] 7

Introduction [1] 1. Directives [2] 7 OpenMP Fortran Application Program Interface Version 2.0, November 2000 Contents Introduction [1] 1 Scope............................. 1 Glossary............................ 1 Execution Model.........................

More information

Multi-core Architecture and Programming

Multi-core Architecture and Programming Multi-core Architecture and Programming Yang Quansheng( 杨全胜 ) http://www.njyangqs.com School of Computer Science & Engineering 1 http://www.njyangqs.com Programming with OpenMP Content What is PpenMP Parallel

More information

Parallel Computing Using OpenMP/MPI. Presented by - Jyotsna 29/01/2008

Parallel Computing Using OpenMP/MPI. Presented by - Jyotsna 29/01/2008 Parallel Computing Using OpenMP/MPI Presented by - Jyotsna 29/01/2008 Serial Computing Serially solving a problem Parallel Computing Parallelly solving a problem Parallel Computer Memory Architecture Shared

More information

OpenMP Fundamentals Fork-join model and data environment

OpenMP Fundamentals Fork-join model and data environment www.bsc.es OpenMP Fundamentals Fork-join model and data environment Xavier Teruel and Xavier Martorell Agenda: OpenMP Fundamentals OpenMP brief introduction The fork-join model Data environment OpenMP

More information

OpenMP - Introduction

OpenMP - Introduction OpenMP - Introduction Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı - 21.06.2012 Outline What is OpenMP? Introduction (Code Structure, Directives, Threads etc.) Limitations Data Scope Clauses Shared,

More information

Concurrent Programming with OpenMP

Concurrent Programming with OpenMP Concurrent Programming with OpenMP Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico March 7, 2016 CPD (DEI / IST) Parallel and Distributed

More information

Parallel Programming: OpenMP

Parallel Programming: OpenMP Parallel Programming: OpenMP Xianyi Zeng xzeng@utep.edu Department of Mathematical Sciences The University of Texas at El Paso. November 10, 2016. An Overview of OpenMP OpenMP: Open Multi-Processing An

More information

HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP)

HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP) HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP) V. Akishina, I. Kisel, G. Kozlov, I. Kulakov, M. Pugach, M. Zyzak Goethe University of Frankfurt am Main 2015 Task Parallelism Parallelization

More information

Shared Memory Programming Model

Shared Memory Programming Model Shared Memory Programming Model Ahmed El-Mahdy and Waleed Lotfy What is a shared memory system? Activity! Consider the board as a shared memory Consider a sheet of paper in front of you as a local cache

More information

Parallel Programming with OpenMP. CS240A, T. Yang

Parallel Programming with OpenMP. CS240A, T. Yang Parallel Programming with OpenMP CS240A, T. Yang 1 A Programmer s View of OpenMP What is OpenMP? Open specification for Multi-Processing Standard API for defining multi-threaded shared-memory programs

More information

CS 470 Spring Mike Lam, Professor. OpenMP

CS 470 Spring Mike Lam, Professor. OpenMP CS 470 Spring 2018 Mike Lam, Professor OpenMP OpenMP Programming language extension Compiler support required "Open Multi-Processing" (open standard; latest version is 4.5) Automatic thread-level parallelism

More information

Module 11: The lastprivate Clause Lecture 21: Clause and Routines. The Lecture Contains: The lastprivate Clause. Data Scope Attribute Clauses

Module 11: The lastprivate Clause Lecture 21: Clause and Routines. The Lecture Contains: The lastprivate Clause. Data Scope Attribute Clauses The Lecture Contains: The lastprivate Clause Data Scope Attribute Clauses Reduction Loop Work-sharing Construct: Schedule Clause Environment Variables List of Variables References: file:///d /...ary,%20dr.%20sanjeev%20k%20aggrwal%20&%20dr.%20rajat%20moona/multi-core_architecture/lecture%2021/21_1.htm[6/14/2012

More information

OpenMP. A parallel language standard that support both data and functional Parallelism on a shared memory system

OpenMP. A parallel language standard that support both data and functional Parallelism on a shared memory system OpenMP A parallel language standard that support both data and functional Parallelism on a shared memory system Use by system programmers more than application programmers Considered a low level primitives

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms http://sudalab.is.s.u-tokyo.ac.jp/~reiji/pna16/ [ 8 ] OpenMP Parallel Numerical Algorithms / IST / UTokyo 1 PNA16 Lecture Plan General Topics 1. Architecture and Performance

More information

Introduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah

Introduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah Introduction to OpenMP Martin Čuma Center for High Performance Computing University of Utah mcuma@chpc.utah.edu Overview Quick introduction. Parallel loops. Parallel loop directives. Parallel sections.

More information

CSL 860: Modern Parallel

CSL 860: Modern Parallel CSL 860: Modern Parallel Computation Hello OpenMP #pragma omp parallel { // I am now thread iof n switch(omp_get_thread_num()) { case 0 : blah1.. case 1: blah2.. // Back to normal Parallel Construct Extremely

More information

High Performance Computing: Tools and Applications

High Performance Computing: Tools and Applications High Performance Computing: Tools and Applications Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology Lecture 4 OpenMP directives So far we have seen #pragma omp

More information

CS 470 Spring Mike Lam, Professor. OpenMP

CS 470 Spring Mike Lam, Professor. OpenMP CS 470 Spring 2017 Mike Lam, Professor OpenMP OpenMP Programming language extension Compiler support required "Open Multi-Processing" (open standard; latest version is 4.5) Automatic thread-level parallelism

More information

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing NTNU, IMF February 16. 2018 1 Recap: Distributed memory programming model Parallelism with MPI. An MPI execution is started

More information

COSC 6374 Parallel Computation. Introduction to OpenMP(I) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)

COSC 6374 Parallel Computation. Introduction to OpenMP(I) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel) COSC 6374 Parallel Computation Introduction to OpenMP(I) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel) Edgar Gabriel Fall 2014 Introduction Threads vs. processes Recap of

More information

An Introduction to OpenMP

An Introduction to OpenMP An Introduction to OpenMP U N C L A S S I F I E D Slide 1 What Is OpenMP? OpenMP Is: An Application Program Interface (API) that may be used to explicitly direct multi-threaded, shared memory parallelism

More information

ITCS 4/5145 Parallel Computing Test 1 5:00 pm - 6:15 pm, Wednesday February 17, 2016 Solutions Name:...

ITCS 4/5145 Parallel Computing Test 1 5:00 pm - 6:15 pm, Wednesday February 17, 2016 Solutions Name:... ITCS 4/5145 Parallel Computing Test 1 5:00 pm - 6:15 pm, Wednesday February 17, 016 Solutions Name:... Answer questions in space provided below questions. Use additional paper if necessary but make sure

More information

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming Outline OpenMP Shared-memory model Parallel for loops Declaring private variables Critical sections Reductions

More information

DPHPC: Introduction to OpenMP Recitation session

DPHPC: Introduction to OpenMP Recitation session SALVATORE DI GIROLAMO DPHPC: Introduction to OpenMP Recitation session Based on http://openmp.org/mp-documents/intro_to_openmp_mattson.pdf OpenMP An Introduction What is it? A set of compiler directives

More information

OpenMP Application Program Interface

OpenMP Application Program Interface OpenMP Application Program Interface DRAFT Version.1.0-00a THIS IS A DRAFT AND NOT FOR PUBLICATION Copyright 1-0 OpenMP Architecture Review Board. Permission to copy without fee all or part of this material

More information

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines What is OpenMP? What does OpenMP stands for? What does OpenMP stands for? Open specifications for Multi

More information

Parallel Programming

Parallel Programming Parallel Programming Lecture delivered by: Venkatanatha Sarma Y Assistant Professor MSRSAS-Bangalore 1 Session Objectives To understand the parallelization in terms of computational solutions. To understand

More information

GCC Developers Summit Ottawa, Canada, June 2006

GCC Developers Summit Ottawa, Canada, June 2006 OpenMP Implementation in GCC Diego Novillo dnovillo@redhat.com Red Hat Canada GCC Developers Summit Ottawa, Canada, June 2006 OpenMP Language extensions for shared memory concurrency (C, C++ and Fortran)

More information

Introduction to OpenMP

Introduction to OpenMP Christian Terboven, Dirk Schmidl IT Center, RWTH Aachen University Member of the HPC Group terboven,schmidl@itc.rwth-aachen.de IT Center der RWTH Aachen University History De-facto standard for Shared-Memory

More information

CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)

CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman) CMSC 714 Lecture 4 OpenMP and UPC Chau-Wen Tseng (from A. Sussman) Programming Model Overview Message passing (MPI, PVM) Separate address spaces Explicit messages to access shared data Send / receive (MPI

More information

Shared Memory Parallelism using OpenMP

Shared Memory Parallelism using OpenMP Indian Institute of Science Bangalore, India भ रत य व ज ञ न स स थ न ब गल र, भ रत SE 292: High Performance Computing [3:0][Aug:2014] Shared Memory Parallelism using OpenMP Yogesh Simmhan Adapted from: o

More information

OpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means

OpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means High Performance Computing: Concepts, Methods & Means OpenMP Programming Prof. Thomas Sterling Department of Computer Science Louisiana State University February 8 th, 2007 Topics Introduction Overview

More information

Little Motivation Outline Introduction OpenMP Architecture Working with OpenMP Future of OpenMP End. OpenMP. Amasis Brauch German University in Cairo

Little Motivation Outline Introduction OpenMP Architecture Working with OpenMP Future of OpenMP End. OpenMP. Amasis Brauch German University in Cairo OpenMP Amasis Brauch German University in Cairo May 4, 2010 Simple Algorithm 1 void i n c r e m e n t e r ( short a r r a y ) 2 { 3 long i ; 4 5 for ( i = 0 ; i < 1000000; i ++) 6 { 7 a r r a y [ i ]++;

More information

[Potentially] Your first parallel application

[Potentially] Your first parallel application [Potentially] Your first parallel application Compute the smallest element in an array as fast as possible small = array[0]; for( i = 0; i < N; i++) if( array[i] < small ) ) small = array[i] 64-bit Intel

More information

Department of Informatics V. HPC-Lab. Session 2: OpenMP M. Bader, A. Breuer. Alex Breuer

Department of Informatics V. HPC-Lab. Session 2: OpenMP M. Bader, A. Breuer. Alex Breuer HPC-Lab Session 2: OpenMP M. Bader, A. Breuer Meetings Date Schedule 10/13/14 Kickoff 10/20/14 Q&A 10/27/14 Presentation 1 11/03/14 H. Bast, Intel 11/10/14 Presentation 2 12/01/14 Presentation 3 12/08/14

More information

OpenMP, Part 2. EAS 520 High Performance Scientific Computing. University of Massachusetts Dartmouth. Spring 2015

OpenMP, Part 2. EAS 520 High Performance Scientific Computing. University of Massachusetts Dartmouth. Spring 2015 OpenMP, Part 2 EAS 520 High Performance Scientific Computing University of Massachusetts Dartmouth Spring 2015 References This presentation is almost an exact copy of Dartmouth College's openmp tutorial.

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP Le Yan Scientific computing consultant User services group High Performance Computing @ LSU Goals Acquaint users with the concept of shared memory parallelism Acquaint users with

More information

Parallel Programming using OpenMP

Parallel Programming using OpenMP 1 Parallel Programming using OpenMP Mike Bailey mjb@cs.oregonstate.edu openmp.pptx OpenMP Multithreaded Programming 2 OpenMP stands for Open Multi-Processing OpenMP is a multi-vendor (see next page) standard

More information

Parallel Programming using OpenMP

Parallel Programming using OpenMP 1 OpenMP Multithreaded Programming 2 Parallel Programming using OpenMP OpenMP stands for Open Multi-Processing OpenMP is a multi-vendor (see next page) standard to perform shared-memory multithreading

More information

Introduction to OpenMP.

Introduction to OpenMP. Introduction to OpenMP www.openmp.org Motivation Parallelize the following code using threads: for (i=0; i

More information

OpenMP 4.5: Threading, vectorization & offloading

OpenMP 4.5: Threading, vectorization & offloading OpenMP 4.5: Threading, vectorization & offloading Michal Merta michal.merta@vsb.cz 2nd of March 2018 Agenda Introduction The Basics OpenMP Tasks Vectorization with OpenMP 4.x Offloading to Accelerators

More information

Parallel and Distributed Programming. OpenMP

Parallel and Distributed Programming. OpenMP Parallel and Distributed Programming OpenMP OpenMP Portability of software SPMD model Detailed versions (bindings) for different programming languages Components: directives for compiler library functions

More information

Introduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah

Introduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah Introduction to OpenMP Martin Čuma Center for High Performance Computing University of Utah mcuma@chpc.utah.edu Overview Quick introduction. Parallel loops. Parallel loop directives. Parallel sections.

More information

UvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP

UvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP Parallel Programming with Compiler Directives OpenMP Clemens Grelck University of Amsterdam UvA-SARA High Performance Computing Course June 2013 OpenMP at a Glance Loop Parallelization Scheduling Parallel

More information