Concurrent Programming with OpenMP
|
|
- Clyde Nichols
- 6 years ago
- Views:
Transcription
1 Concurrent Programming with OpenMP Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico October 11, 2012 CPD (DEI / IST) Parallel and Distributed Computing / 27
2 Outline More on OpenMP: Synchronism Conditional Parallelism Reduction Clause Scheduling Options Task Directive Nested Parallelism CPD (DEI / IST) Parallel and Distributed Computing / 27
3 Thread Synchronization #pragma omp parallel [clauses] {... } Implicit barriers at the end of parallel (and other control constructs): executions continues only after all threads have completed. Can be overridden at for and sections directives with the nowait clause: #pragma omp for nowait {... } CPD (DEI / IST) Parallel and Distributed Computing / 27
4 Example of nowait Usage int factorial(int number) { int fac = 1; #pragma omp parallel { int fac_private = 1; #pragma omp for nowait for(int n = 2; n <= number; ++n) fac_private *= n; #pragma omp atomic fac *= fac_private; } return fac; } CPD (DEI / IST) Parallel and Distributed Computing / 27
5 Explicit Synchronization A barrier can be explicitly inserted within the parallel code: /* some multi-threaded code */ #pragma omp barrier /* remainder of multi-threaded code */ Task 0 Task 1 Task 2 Task n Barrier Region idle idle idle Time CPD (DEI / IST) Parallel and Distributed Computing / 27
6 Explicit Synchronization #pragma omp parallel { /* All threads execute this. */ SomeCode(); } #pragma omp barrier /* All threads execute this, but not before * all threads have finished executing SomeCode(). */ SomeMoreCode(); CPD (DEI / IST) Parallel and Distributed Computing / 27
7 Explicit Synchronization Critical Section, similar to mutexes in threads. #pragma omp critical [(name)] {... } a thread waits at the beginning of a critical region until no other thread is executing a critical region with the same name all unnamed critical directives map to the same unspecified name Task 0 Task 1 Task 2 Task n Critical Region idle idle idle Time CPD (DEI / IST) Parallel and Distributed Computing / 27
8 Example of Critical Clause int cnt = 0; #pragma omp parallel { #pragma omp for for(i = 0; i < 20; i++) { if(b[i] == 0) { #pragma omp critical { cnt++; } } /* endif */ a[i] += b[i] * (i+1); } /* end for */ } /*omp end parallel */ CPD (DEI / IST) Parallel and Distributed Computing / 27
9 Explicit Synchronization A critical section creates a mutual exclusion in terms of the execution of a region of the code. However, the objective is the mutual exclusion of access to data. #pragma omp atomic {... } guarantees that reading and writing of a memory position is atomic applies only to the statement immediately following it CPD (DEI / IST) Parallel and Distributed Computing / 27
10 Example of Atomic Clause int accum = 0; #pragma omp parallel { #pragma omp for for(i = 0; i < 20; i++) { if(b[i] == 0) { #pragma omp atomic accum += b[i] * (i+1); } /* endif */ } /* end for */ } /*omp end parallel */ CPD (DEI / IST) Parallel and Distributed Computing / 27
11 Single Processor Region Slightly different problem: how to have a single thread execute a region of the parallel section? #pragma omp single {... } ideally suited for I/O or initialization which thread executes the region is not defined use master instead of single to guarantee that the master thread is the one that executes the single processor region Single Processor Region Time Task 0 Task 1 Task 2 idle idle Task n idle CPD (DEI / IST) Parallel and Distributed Computing / 27
12 Example of Single Processor Region #pragma omp parallel { #pragma omp single printf("beginning work1.\n"); work1(); #pragma omp single printf("finishing work1.\n"); #pragma omp single nowait printf("finished work1 and beginning work2.\n"); } work2(); CPD (DEI / IST) Parallel and Distributed Computing / 27
13 Conditional Parallelism Oftentimes, parallelism is only useful if the problem size large enough. For regions with low computational effort, overhead of parallelization exceeds benefit. #pragma omp parallel if( expression ) #pragma omp parallel sections if( expression ) #pragma omp parallel for if( expression ) Execute in parallel if expression evaluates to true, otherwise execute sequentially. CPD (DEI / IST) Parallel and Distributed Computing / 27
14 Example of Conditional Parallelism for(i = 0; i < n; i++) #pragma omp parallel for private (j,k) if(n-i > 100) for(j = i + 1; j < n; j++) for(k = i + 1; k < n; k++) a[j][k] = a[j][k] - a[i][k]*a[i][j] / a[j][j]; CPD (DEI / IST) Parallel and Distributed Computing / 27
15 reduction Clause How to parallelize the computation of an internal product? CPD (DEI / IST) Parallel and Distributed Computing / 27
16 reduction Clause How to parallelize the computation of an internal product? #pragma omp parallel for reduction(op:list) op is a binary operator (+, *, -, &, ^,, &&, ) list is a list of shared variables Actions: 1 a private copy of each list variable is created for each thread 2 at the end of the reduction, the reduction operator is applied to all private copies of the variable, and the result is written to the global shared variable CPD (DEI / IST) Parallel and Distributed Computing / 27
17 Reduction Example main() { int i, n = 100; float a[100], b[100], result = 0.0; #pragma omp parallel for for(i = 0; i < n; i++) { a[i] = i * 1.0; b[i] = i * 2.0; } #pragma omp parallel for reduction(+:result) for(i = 0; i < n; i++) result = result + (a[i] * b[i]); } printf("final result = %f\n",result); CPD (DEI / IST) Parallel and Distributed Computing / 27
18 Load Balancing With irregular workloads, care must be taken in distributing the work over the threads. Example: Multiplication of two matrices C = A B, where the A matrix is upper-triangular (all elements below diagonal are 0). #pragma omp parallel for private (j,k) for(i = 0; i < n; i++) for(j = 0; j < n; j++) { c[i][j] = 0.0; for(k = i; k < n; k++) c[i][j] += a[i][k] * b[k][j]; } CPD (DEI / IST) Parallel and Distributed Computing / 27
19 The schedule Clause Different options for work distribution among threads: schedule (static dynamic guided [,chunk]) schedule (auto runtime ) static [,chunk] iterations are divided into blocks of size chunk, and these blocks are assigned to the threads in in a round-robin fashion. in the absence of chunk, each thread executes approx. N/P chunks for a loop of length N and P threads. Example, loop of length N=8 and P=2 threads: TID 0 1 No chunk Chunk = 2 1-2, , 7-8 CPD (DEI / IST) Parallel and Distributed Computing / 27
20 The schedule Clause (cont.) dynamic [,chunk] a block of size chunk iterations is assigned to each thread (defaults to 1, if chunk not specified) when a thread finishes, it starts on the next block each block contains chunk iterations, except for the last block to be distributed, which may have fewer iterations CPD (DEI / IST) Parallel and Distributed Computing / 27
21 The schedule Clause (cont.) dynamic [,chunk] a block of size chunk iterations is assigned to each thread (defaults to 1, if chunk not specified) when a thread finishes, it starts on the next block each block contains chunk iterations, except for the last block to be distributed, which may have fewer iterations guided [,chunk] same dynamic behavior as dynamic, but threads are assigned a different block size the size of each block is proportional to the number of unassigned iterations divided by the number of threads, decreasing to chunk CPD (DEI / IST) Parallel and Distributed Computing / 27
22 The schedule Clause (cont.) dynamic [,chunk] a block of size chunk iterations is assigned to each thread (defaults to 1, if chunk not specified) when a thread finishes, it starts on the next block each block contains chunk iterations, except for the last block to be distributed, which may have fewer iterations guided [,chunk] same dynamic behavior as dynamic, but threads are assigned a different block size the size of each block is proportional to the number of unassigned iterations divided by the number of threads, decreasing to chunk auto the decision regarding scheduling is delegated to the compiler and/or runtime system CPD (DEI / IST) Parallel and Distributed Computing / 27
23 The schedule Clause (cont.) dynamic [,chunk] a block of size chunk iterations is assigned to each thread (defaults to 1, if chunk not specified) when a thread finishes, it starts on the next block each block contains chunk iterations, except for the last block to be distributed, which may have fewer iterations guided [,chunk] same dynamic behavior as dynamic, but threads are assigned a different block size the size of each block is proportional to the number of unassigned iterations divided by the number of threads, decreasing to chunk auto the decision regarding scheduling is delegated to the compiler and/or runtime system runtime iteration scheduling scheme is set at runtime through environment variable OMP SCHEDULE CPD (DEI / IST) Parallel and Distributed Computing / 27
24 Scheduling Options Static scheduling lower overhead may lead to higher workload imbalance Chunks larger chunks reduce overhead and may increase cache hit rate small chunks allow finer balancing of workload CPD (DEI / IST) Parallel and Distributed Computing / 27
25 Task Construct The task directive allows for the definition of tasks to be performed, which are added to a pool and eventually executed by a thread in the team. #pragma omp task [clauses] { <structured block> } Offers a flexible model for irregular parallelism. CPD (DEI / IST) Parallel and Distributed Computing / 27
26 Task Construct The task directive allows for the definition of tasks to be performed, which are added to a pool and eventually executed by a thread in the team. #pragma omp task [clauses] { <structured block> } Offers a flexible model for irregular parallelism. Tasks are guaranteed to have completed at: at thread barriers, either implicit or explicit CPD (DEI / IST) Parallel and Distributed Computing / 27
27 Task Construct The task directive allows for the definition of tasks to be performed, which are added to a pool and eventually executed by a thread in the team. #pragma omp task [clauses] { <structured block> } Offers a flexible model for irregular parallelism. Tasks are guaranteed to have completed at: at thread barriers, either implicit or explicit at task barriers #pragma omp taskwait CPD (DEI / IST) Parallel and Distributed Computing / 27
28 Example of Task Usage void postorder(node *p) { if (p->left) #pragma omp task postorder(p->left); if (p->right) #pragma omp task postorder(p->right); #pragma omp taskwait // wait for descendants process(p->data); } CPD (DEI / IST) Parallel and Distributed Computing / 27
29 Nested Parallelism Parallel regions can be nested (support is implementation dependent). Fork Master Thread Fork Fork Fork Join Join Join Join Must enabled with the OMP NESTED environment variable or the omp set nested() routine. if a parallel directive is encountered within another parallel directive, new team of threads created new team contains only one thread unless nested parallelism is enabled CPD (DEI / IST) Parallel and Distributed Computing / 27
30 Nested Parallelism Set number of threads per level: environment variable: OMP NUM THREADS (i.e., 4,3,2) runtime routine: omp set num threads() inside a parallel region clause: add num threads() clause to a parallel directive CPD (DEI / IST) Parallel and Distributed Computing / 27
31 Nested Parallelism Set number of threads per level: environment variable: OMP NUM THREADS (i.e., 4,3,2) runtime routine: omp set num threads() inside a parallel region clause: add num threads() clause to a parallel directive Set/get the maximum number of OpenMP threads available to the program: environment variable: OMP THREAD LIMIT runtime routines: omp get thread limit() CPD (DEI / IST) Parallel and Distributed Computing / 27
32 Nested Parallelism Set/get the maximum number of nested active parallel regions: environment variable: OMP MAX ACTIVE LEVELS runtime routines: omp set max active levels(), omp get max active levels() CPD (DEI / IST) Parallel and Distributed Computing / 27
33 Nested Parallelism Set/get the maximum number of nested active parallel regions: environment variable: OMP MAX ACTIVE LEVELS runtime routines: omp set max active levels(), omp get max active levels() Library routines to determine: depth of nesting: omp get level(), omp get active level() CPD (DEI / IST) Parallel and Distributed Computing / 27
34 Nested Parallelism Set/get the maximum number of nested active parallel regions: environment variable: OMP MAX ACTIVE LEVELS runtime routines: omp set max active levels(), omp get max active levels() Library routines to determine: depth of nesting: omp get level(), omp get active level() IDs of parent/grandparent/etc threads: omp get ancestor thread num(level) CPD (DEI / IST) Parallel and Distributed Computing / 27
35 Nested Parallelism Set/get the maximum number of nested active parallel regions: environment variable: OMP MAX ACTIVE LEVELS runtime routines: omp set max active levels(), omp get max active levels() Library routines to determine: depth of nesting: omp get level(), omp get active level() IDs of parent/grandparent/etc threads: omp get ancestor thread num(level) team sizes of parent/grandparent/etc teams: omp get team size(level) CPD (DEI / IST) Parallel and Distributed Computing / 27
36 Review More on OpenMP: Synchronism Conditional Parallelism Reduction Clause Scheduling Options Task Directive Nested Parallelism CPD (DEI / IST) Parallel and Distributed Computing / 27
37 Next Class More on programming shared memory systems: Debugging Performance CPD (DEI / IST) Parallel and Distributed Computing / 27
Parallel and Distributed Computing
Concurrent Programming with OpenMP Rodrigo Miragaia Rodrigues MSc in Information Systems and Computer Engineering DEA in Computational Engineering CS Department (DEI) Instituto Superior Técnico October
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #7 2/5/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline From last class
More informationParallel Programming
Parallel Programming OpenMP Nils Moschüring PhD Student (LMU) Nils Moschüring PhD Student (LMU), OpenMP 1 1 Overview What is parallel software development Why do we need parallel computation? Problems
More informationCS 470 Spring Mike Lam, Professor. Advanced OpenMP
CS 470 Spring 2018 Mike Lam, Professor Advanced OpenMP Atomics OpenMP provides access to highly-efficient hardware synchronization mechanisms Use the atomic pragma to annotate a single statement Statement
More informationModule 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program
The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program Amdahl's Law About Data What is Data Race? Overview to OpenMP Components of OpenMP OpenMP Programming Model OpenMP Directives
More informationTasking in OpenMP 4. Mirko Cestari - Marco Rorro -
Tasking in OpenMP 4 Mirko Cestari - m.cestari@cineca.it Marco Rorro - m.rorro@cineca.it Outline Introduction to OpenMP General characteristics of Taks Some examples Live Demo Multi-threaded process Each
More informationIntroduction to OpenMP
Introduction to OpenMP Ekpe Okorafor School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014 A little about me! PhD Computer Engineering Texas A&M University Computer Science
More informationParallel Computing Parallel Programming Languages Hwansoo Han
Parallel Computing Parallel Programming Languages Hwansoo Han Parallel Programming Practice Current Start with a parallel algorithm Implement, keeping in mind Data races Synchronization Threading syntax
More informationCSL 860: Modern Parallel
CSL 860: Modern Parallel Computation Hello OpenMP #pragma omp parallel { // I am now thread iof n switch(omp_get_thread_num()) { case 0 : blah1.. case 1: blah2.. // Back to normal Parallel Construct Extremely
More informationIntroduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines
Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines What is OpenMP? What does OpenMP stands for? What does OpenMP stands for? Open specifications for Multi
More informationCS 470 Spring Mike Lam, Professor. Advanced OpenMP
CS 470 Spring 2017 Mike Lam, Professor Advanced OpenMP Atomics OpenMP provides access to highly-efficient hardware synchronization mechanisms Use the atomic pragma to annotate a single statement Statement
More informationOpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means
High Performance Computing: Concepts, Methods & Means OpenMP Programming Prof. Thomas Sterling Department of Computer Science Louisiana State University February 8 th, 2007 Topics Introduction Overview
More informationOverview: The OpenMP Programming Model
Overview: The OpenMP Programming Model motivation and overview the parallel directive: clauses, equivalent pthread code, examples the for directive and scheduling of loop iterations Pi example in OpenMP
More informationEPL372 Lab Exercise 5: Introduction to OpenMP
EPL372 Lab Exercise 5: Introduction to OpenMP References: https://computing.llnl.gov/tutorials/openmp/ http://openmp.org/wp/openmp-specifications/ http://openmp.org/mp-documents/openmp-4.0-c.pdf http://openmp.org/mp-documents/openmp4.0.0.examples.pdf
More informationIntroduction to OpenMP.
Introduction to OpenMP www.openmp.org Motivation Parallelize the following code using threads: for (i=0; i
More informationComputational Mathematics
Computational Mathematics Hamid Sarbazi-Azad Department of Computer Engineering Sharif University of Technology e-mail: azad@sharif.edu OpenMP Work-sharing Instructor PanteA Zardoshti Department of Computer
More informationParallel Programming with OpenMP. CS240A, T. Yang
Parallel Programming with OpenMP CS240A, T. Yang 1 A Programmer s View of OpenMP What is OpenMP? Open specification for Multi-Processing Standard API for defining multi-threaded shared-memory programs
More information15-418, Spring 2008 OpenMP: A Short Introduction
15-418, Spring 2008 OpenMP: A Short Introduction This is a short introduction to OpenMP, an API (Application Program Interface) that supports multithreaded, shared address space (aka shared memory) parallelism.
More informationOpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2.
OpenMP Overview in 30 Minutes Christian Terboven 06.12.2010 / Aachen, Germany Stand: 03.12.2010 Version 2.3 Rechen- und Kommunikationszentrum (RZ) Agenda OpenMP: Parallel Regions,
More informationLecture 4: OpenMP Open Multi-Processing
CS 4230: Parallel Programming Lecture 4: OpenMP Open Multi-Processing January 23, 2017 01/23/2017 CS4230 1 Outline OpenMP another approach for thread parallel programming Fork-Join execution model OpenMP
More informationCME 213 S PRING Eric Darve
CME 213 S PRING 2017 Eric Darve OPENMP Standard multicore API for scientific computing Based on fork-join model: fork many threads, join and resume sequential thread Uses pragma:#pragma omp parallel Shared/private
More informationMore Advanced OpenMP. Saturday, January 30, 16
More Advanced OpenMP This is an abbreviated form of Tim Mattson s and Larry Meadow s (both at Intel) SC 08 tutorial located at http:// openmp.org/mp-documents/omp-hands-on-sc08.pdf All errors are my responsibility
More informationParallel Programming. OpenMP Parallel programming for multiprocessors for loops
Parallel Programming OpenMP Parallel programming for multiprocessors for loops OpenMP OpenMP An application programming interface (API) for parallel programming on multiprocessors Assumes shared memory
More informationOpenMP. Dr. William McDoniel and Prof. Paolo Bientinesi WS17/18. HPAC, RWTH Aachen
OpenMP Dr. William McDoniel and Prof. Paolo Bientinesi HPAC, RWTH Aachen mcdoniel@aices.rwth-aachen.de WS17/18 Loop construct - Clauses #pragma omp for [clause [, clause]...] The following clauses apply:
More informationMultithreading in C with OpenMP
Multithreading in C with OpenMP ICS432 - Spring 2017 Concurrent and High-Performance Programming Henri Casanova (henric@hawaii.edu) Pthreads are good and bad! Multi-threaded programming in C with Pthreads
More informationOpenMP! baseado em material cedido por Cristiana Amza!
OpenMP! baseado em material cedido por Cristiana Amza! www.eecg.toronto.edu/~amza/ece1747h/ece1747h.html! What is OpenMP?! Standard for shared memory programming for scientific applications.! Has specific
More informationIntroduction to OpenMP
Introduction to OpenMP Ricardo Fonseca https://sites.google.com/view/rafonseca2017/ Outline Shared Memory Programming OpenMP Fork-Join Model Compiler Directives / Run time library routines Compiling and
More informationCS4961 Parallel Programming. Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms 9/5/12. Administrative. Mary Hall September 4, 2012
CS4961 Parallel Programming Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms Administrative Mailing list set up, everyone should be on it - You should have received a test mail last night
More informationUvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP
Parallel Programming with Compiler Directives OpenMP Clemens Grelck University of Amsterdam UvA-SARA High Performance Computing Course June 2013 OpenMP at a Glance Loop Parallelization Scheduling Parallel
More informationIntroduction to Standard OpenMP 3.1
Introduction to Standard OpenMP 3.1 Massimiliano Culpo - m.culpo@cineca.it Gian Franco Marras - g.marras@cineca.it CINECA - SuperComputing Applications and Innovation Department 1 / 59 Outline 1 Introduction
More informationShared Memory Programming with OpenMP (3)
Shared Memory Programming with OpenMP (3) 2014 Spring Jinkyu Jeong (jinkyu@skku.edu) 1 SCHEDULING LOOPS 2 Scheduling Loops (2) parallel for directive Basic partitioning policy block partitioning Iteration
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical
More informationConcurrent Programming with OpenMP
Concurrent Programming with OpenMP Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico March 7, 2016 CPD (DEI / IST) Parallel and Distributed
More informationOpenMP. Application Program Interface. CINECA, 14 May 2012 OpenMP Marco Comparato
OpenMP Application Program Interface Introduction Shared-memory parallelism in C, C++ and Fortran compiler directives library routines environment variables Directives single program multiple data (SPMD)
More informationParallel Programming using OpenMP
1 OpenMP Multithreaded Programming 2 Parallel Programming using OpenMP OpenMP stands for Open Multi-Processing OpenMP is a multi-vendor (see next page) standard to perform shared-memory multithreading
More informationParallel Programming using OpenMP
1 Parallel Programming using OpenMP Mike Bailey mjb@cs.oregonstate.edu openmp.pptx OpenMP Multithreaded Programming 2 OpenMP stands for Open Multi-Processing OpenMP is a multi-vendor (see next page) standard
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical
More informationECE 574 Cluster Computing Lecture 10
ECE 574 Cluster Computing Lecture 10 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 1 October 2015 Announcements Homework #4 will be posted eventually 1 HW#4 Notes How granular
More informationMulti-core Architecture and Programming
Multi-core Architecture and Programming Yang Quansheng( 杨全胜 ) http://www.njyangqs.com School of Computer Science & Engineering 1 http://www.njyangqs.com Programming with OpenMP Content What is PpenMP Parallel
More informationAllows program to be incrementally parallelized
Basic OpenMP What is OpenMP An open standard for shared memory programming in C/C+ + and Fortran supported by Intel, Gnu, Microsoft, Apple, IBM, HP and others Compiler directives and library support OpenMP
More informationOpenMP examples. Sergeev Efim. Singularis Lab, Ltd. Senior software engineer
OpenMP examples Sergeev Efim Senior software engineer Singularis Lab, Ltd. OpenMP Is: An Application Program Interface (API) that may be used to explicitly direct multi-threaded, shared memory parallelism.
More informationA brief introduction to OpenMP
A brief introduction to OpenMP Alejandro Duran Barcelona Supercomputing Center Outline 1 Introduction 2 Writing OpenMP programs 3 Data-sharing attributes 4 Synchronization 5 Worksharings 6 Task parallelism
More informationOpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.
OpenMP and MPI Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico November 16, 2011 CPD (DEI / IST) Parallel and Distributed Computing 18
More informationOpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono
OpenMP Algoritmi e Calcolo Parallelo References Useful references Using OpenMP: Portable Shared Memory Parallel Programming, Barbara Chapman, Gabriele Jost and Ruud van der Pas OpenMP.org http://openmp.org/
More informationTopics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP
Topics Lecture 11 Introduction OpenMP Some Examples Library functions Environment variables 1 2 Introduction Shared Memory Parallelization OpenMP is: a standard for parallel programming in C, C++, and
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming Outline OpenMP Shared-memory model Parallel for loops Declaring private variables Critical sections Reductions
More informationShared memory programming model OpenMP TMA4280 Introduction to Supercomputing
Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing NTNU, IMF February 16. 2018 1 Recap: Distributed memory programming model Parallelism with MPI. An MPI execution is started
More informationLoop Modifications to Enhance Data-Parallel Performance
Loop Modifications to Enhance Data-Parallel Performance Abstract In data-parallel applications, the same independent
More informationProgramming with Shared Memory PART II. HPC Fall 2007 Prof. Robert van Engelen
Programming with Shared Memory PART II HPC Fall 2007 Prof. Robert van Engelen Overview Parallel programming constructs Dependence analysis OpenMP Autoparallelization Further reading HPC Fall 2007 2 Parallel
More informationECE/ME/EMA/CS 759 High Performance Computing for Engineering Applications
ECE/ME/EMA/CS 759 High Performance Computing for Engineering Applications Work Sharing in OpenMP November 2, 2015 Lecture 21 Dan Negrut, 2015 ECE/ME/EMA/CS 759 UW-Madison Quote of the Day Success consists
More informationProgramming with Shared Memory PART II. HPC Fall 2012 Prof. Robert van Engelen
Programming with Shared Memory PART II HPC Fall 2012 Prof. Robert van Engelen Overview Sequential consistency Parallel programming constructs Dependence analysis OpenMP Autoparallelization Further reading
More informationIntroduction to. Slides prepared by : Farzana Rahman 1
Introduction to OpenMP Slides prepared by : Farzana Rahman 1 Definition of OpenMP Application Program Interface (API) for Shared Memory Parallel Programming Directive based approach with library support
More informationCS691/SC791: Parallel & Distributed Computing
CS691/SC791: Parallel & Distributed Computing Introduction to OpenMP 1 Contents Introduction OpenMP Programming Model and Examples OpenMP programming examples Task parallelism. Explicit thread synchronization.
More informationAlfio Lazzaro: Introduction to OpenMP
First INFN International School on Architectures, tools and methodologies for developing efficient large scale scientific computing applications Ce.U.B. Bertinoro Italy, 12 17 October 2009 Alfio Lazzaro:
More informationME759 High Performance Computing for Engineering Applications
ME759 High Performance Computing for Engineering Applications Parallel Computing on Multicore CPUs October 25, 2013 Dan Negrut, 2013 ME964 UW-Madison A programming language is low level when its programs
More informationCOMP Parallel Computing. SMM (2) OpenMP Programming Model
COMP 633 - Parallel Computing Lecture 7 September 12, 2017 SMM (2) OpenMP Programming Model Reading for next time look through sections 7-9 of the Open MP tutorial Topics OpenMP shared-memory parallel
More information2
1 2 3 4 5 Code transformation Every time the compiler finds a #pragma omp parallel directive creates a new function in which the code belonging to the scope of the pragma itself is moved The directive
More informationParallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops
Parallel Programming Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops Single computers nowadays Several CPUs (cores) 4 to 8 cores on a single chip Hyper-threading
More informationOpenMP - Introduction
OpenMP - Introduction Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı - 21.06.2012 Outline What is OpenMP? Introduction (Code Structure, Directives, Threads etc.) Limitations Data Scope Clauses Shared,
More informationParallel Computing. Prof. Marco Bertini
Parallel Computing Prof. Marco Bertini Shared memory: OpenMP Implicit threads: motivations Implicit threading frameworks and libraries take care of much of the minutiae needed to create, manage, and (to
More informationSynchronization. Event Synchronization
Synchronization Synchronization: mechanisms by which a parallel program can coordinate the execution of multiple threads Implicit synchronizations Explicit synchronizations Main use of explicit synchronization
More informationOpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.
OpenMP and MPI Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico November 15, 2010 José Monteiro (DEI / IST) Parallel and Distributed Computing
More informationHPC Practical Course Part 3.1 Open Multi-Processing (OpenMP)
HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP) V. Akishina, I. Kisel, G. Kozlov, I. Kulakov, M. Pugach, M. Zyzak Goethe University of Frankfurt am Main 2015 Task Parallelism Parallelization
More informationA Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh
A Short Introduction to OpenMP Mark Bull, EPCC, University of Edinburgh Overview Shared memory systems Basic Concepts in Threaded Programming Basics of OpenMP Parallel regions Parallel loops 2 Shared memory
More informationParallel Programming with OpenMP. CS240A, T. Yang, 2013 Modified from Demmel/Yelick s and Mary Hall s Slides
Parallel Programming with OpenMP CS240A, T. Yang, 203 Modified from Demmel/Yelick s and Mary Hall s Slides Introduction to OpenMP What is OpenMP? Open specification for Multi-Processing Standard API for
More informationShared Memory Programming Model
Shared Memory Programming Model Ahmed El-Mahdy and Waleed Lotfy What is a shared memory system? Activity! Consider the board as a shared memory Consider a sheet of paper in front of you as a local cache
More informationShared Memory Parallelism - OpenMP
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (http://www.llnl.gov/computing/tutorials/openmp/#introduction) OpenMP sc99 tutorial
More information[Potentially] Your first parallel application
[Potentially] Your first parallel application Compute the smallest element in an array as fast as possible small = array[0]; for( i = 0; i < N; i++) if( array[i] < small ) ) small = array[i] 64-bit Intel
More informationOpenMP - III. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen
OpenMP - III Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 OpenMP References Using OpenMP: Portable Shared Memory Parallel Programming. The MIT
More information1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008
1 of 6 Lecture 7: March 4 CISC 879 Software Support for Multicore Architectures Spring 2008 Lecture 7: March 4, 2008 Lecturer: Lori Pollock Scribe: Navreet Virk Open MP Programming Topics covered 1. Introduction
More informationOpenMP. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen
OpenMP Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS16/17 Worksharing constructs To date: #pragma omp parallel created a team of threads We distributed
More informationShared Memory Parallelism using OpenMP
Indian Institute of Science Bangalore, India भ रत य व ज ञ न स स थ न ब गल र, भ रत SE 292: High Performance Computing [3:0][Aug:2014] Shared Memory Parallelism using OpenMP Yogesh Simmhan Adapted from: o
More informationProgramming with OpenMP*
Objectives At the completion of this module you will be able to Thread serial code with basic OpenMP pragmas Use OpenMP synchronization pragmas to coordinate thread execution and memory access 2 Agenda
More informationSession 4: Parallel Programming with OpenMP
Session 4: Parallel Programming with OpenMP Xavier Martorell Barcelona Supercomputing Center Agenda Agenda 10:00-11:00 OpenMP fundamentals, parallel regions 11:00-11:30 Worksharing constructs 11:30-12:00
More informationOpenMP C and C++ Application Program Interface Version 1.0 October Document Number
OpenMP C and C++ Application Program Interface Version 1.0 October 1998 Document Number 004 2229 001 Contents Page v Introduction [1] 1 Scope............................. 1 Definition of Terms.........................
More informationOpenMP Introduction. CS 590: High Performance Computing. OpenMP. A standard for shared-memory parallel programming. MP = multiprocessing
CS 590: High Performance Computing OpenMP Introduction Fengguang Song Department of Computer Science IUPUI OpenMP A standard for shared-memory parallel programming. MP = multiprocessing Designed for systems
More informationCS 470 Spring Mike Lam, Professor. OpenMP
CS 470 Spring 2018 Mike Lam, Professor OpenMP OpenMP Programming language extension Compiler support required "Open Multi-Processing" (open standard; latest version is 4.5) Automatic thread-level parallelism
More informationIntroduction to OpenMP
Introduction to OpenMP Lecture 4: Work sharing directives Work sharing directives Directives which appear inside a parallel region and indicate how work should be shared out between threads Parallel do/for
More informationCMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)
CMSC 714 Lecture 4 OpenMP and UPC Chau-Wen Tseng (from A. Sussman) Programming Model Overview Message passing (MPI, PVM) Separate address spaces Explicit messages to access shared data Send / receive (MPI
More informationIntroduction to OpenMP
Introduction to OpenMP Lecture 9: Performance tuning Sources of overhead There are 6 main causes of poor performance in shared memory parallel programs: sequential code communication load imbalance synchronisation
More informationAdvanced C Programming Winter Term 2008/09. Guest Lecture by Markus Thiele
Advanced C Programming Winter Term 2008/09 Guest Lecture by Markus Thiele Lecture 14: Parallel Programming with OpenMP Motivation: Why parallelize? The free lunch is over. Herb
More informationOpenMP. OpenMP. Portable programming of shared memory systems. It is a quasi-standard. OpenMP-Forum API for Fortran and C/C++
OpenMP OpenMP Portable programming of shared memory systems. It is a quasi-standard. OpenMP-Forum 1997-2002 API for Fortran and C/C++ directives runtime routines environment variables www.openmp.org 1
More informationEE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California
EE/CSCI 451 Introduction to Parallel and Distributed Computation Discussion #4 2/3/2017 University of Southern California 1 USC HPCC Access Compile Submit job OpenMP Today s topic What is OpenMP OpenMP
More informationOpenMP Tasking Model Unstructured parallelism
www.bsc.es OpenMP Tasking Model Unstructured parallelism Xavier Teruel and Xavier Martorell What is a task in OpenMP? Tasks are work units whose execution may be deferred or it can be executed immediately!!!
More informationReview. 35a.cpp. 36a.cpp. Lecture 13 5/29/2012. Compiler Directives. Library Functions Environment Variables
Review Lecture 3 Compiler Directives Conditional compilation Parallel construct Work-sharing constructs for, section, single Work-tasking Synchronization Library Functions Environment Variables 2 35a.cpp
More informationCompiling for GPUs. Adarsh Yoga Madhav Ramesh
Compiling for GPUs Adarsh Yoga Madhav Ramesh Agenda Introduction to GPUs Compute Unified Device Architecture (CUDA) Control Structure Optimization Technique for GPGPU Compiler Framework for Automatic Translation
More informationProgramming Shared Address Space Platforms using OpenMP
Programming Shared Address Space Platforms using OpenMP Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Topic Overview Introduction to OpenMP OpenMP
More informationLecture 16: Recapitulations. Lecture 16: Recapitulations p. 1
Lecture 16: Recapitulations Lecture 16: Recapitulations p. 1 Parallel computing and programming in general Parallel computing a form of parallel processing by utilizing multiple computing units concurrently
More informationQuestions from last time
Questions from last time Pthreads vs regular thread? Pthreads are POSIX-standard threads (1995). There exist earlier and newer standards (C++11). Pthread is probably most common. Pthread API: about a 100
More informationCluster Computing. Performance and Debugging Issues in OpenMP. Topics. Factors impacting performance. Scalable Speedup
Topics Scalable Speedup and Data Locality Parallelizing Sequential Programs Breaking data dependencies Avoiding synchronization overheads Performance and Debugging Issues in OpenMP Achieving Cache and
More informationShared Memory Programming with OpenMP
Shared Memory Programming with OpenMP (An UHeM Training) Süha Tuna Informatics Institute, Istanbul Technical University February 12th, 2016 2 Outline - I Shared Memory Systems Threaded Programming Model
More informationReview. Lecture 12 5/22/2012. Compiler Directives. Library Functions Environment Variables. Compiler directives for construct, collapse clause
Review Lecture 12 Compiler Directives Conditional compilation Parallel construct Work-sharing constructs for, section, single Synchronization Work-tasking Library Functions Environment Variables 1 2 13b.cpp
More informationProgramming Shared Memory Systems with OpenMP Part I. Book
Programming Shared Memory Systems with OpenMP Part I Instructor Dr. Taufer Book Parallel Programming in OpenMP by Rohit Chandra, Leo Dagum, Dave Kohr, Dror Maydan, Jeff McDonald, Ramesh Menon 2 1 Machine
More informationA common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...
OPENMP PERFORMANCE 2 A common scenario... So I wrote my OpenMP program, and I checked it gave the right answers, so I ran some timing tests, and the speedup was, well, a bit disappointing really. Now what?.
More informationCS 470 Spring Mike Lam, Professor. OpenMP
CS 470 Spring 2017 Mike Lam, Professor OpenMP OpenMP Programming language extension Compiler support required "Open Multi-Processing" (open standard; latest version is 4.5) Automatic thread-level parallelism
More informationIntroduction to OpenMP
Christian Terboven, Dirk Schmidl IT Center, RWTH Aachen University Member of the HPC Group terboven,schmidl@itc.rwth-aachen.de IT Center der RWTH Aachen University History De-facto standard for Shared-Memory
More informationAcknowledgments. Amdahl s Law. Contents. Programming with MPI Parallel programming. 1 speedup = (1 P )+ P N. Type to enter text
Acknowledgments Programming with MPI Parallel ming Jan Thorbecke Type to enter text This course is partly based on the MPI courses developed by Rolf Rabenseifner at the High-Performance Computing-Center
More informationMPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016
MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 Message passing vs. Shared memory Client Client Client Client send(msg) recv(msg) send(msg) recv(msg) MSG MSG MSG IPC Shared
More informationIntroduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah
Introduction to OpenMP Martin Čuma Center for High Performance Computing University of Utah mcuma@chpc.utah.edu Overview Quick introduction. Parallel loops. Parallel loop directives. Parallel sections.
More informationIntroduction to OpenMP
Introduction to OpenMP Christian Terboven 10.04.2013 / Darmstadt, Germany Stand: 06.03.2013 Version 2.3 Rechen- und Kommunikationszentrum (RZ) History De-facto standard for
More information