Practical in Numerical Astronomy, SS 2012 LECTURE 12
|
|
- Joella Sharp
- 6 years ago
- Views:
Transcription
1 Practical in Numerical Astronomy, SS 2012 LECTURE 12 Parallelization II. Open Multiprocessing (OpenMP) Lecturer Eduard Vorobyov. raum
2 OpenMP is a shared memory parallelism. It is designed for the SMP (symmetric multuprocessing) machines. Wikipedia: symmetric multiprocessing involves a multi-processor computer hardware architecture where two or more identical processors are connected to a single shared main memory and are controlled by a single OS instance. MPI is a distributed memory parallelism. It is designed for computer clusters with distributed memory. Wikipedia: distributed memory refers to a multiple-processor computer systems in which each processor has its own private memory. 2
3 Basic idea fork- join programming model 0 Serial region 1 2 Serial region Serial region 3 4 1) The code starts as serial (non-parallel) and it has only one master thread. 2) The master thread is forked into N threads where a parallel region is encountered (in this example four additional threads 1, 2, 3, 4 are created). Thread 0 remains the master of all five threads. 3) Each thread executes part of the code in parallel with other threads. 4) Upon completion of the parallel region, threads and joined into one master thread, which continues execution in the serial region. 5) Calculations continue in the serial mode until a new parallel region is reached. 3
4 Parallelizing a serial code using OpenMP directives. The OpenMP standard offers the possibility of using the same source code with and without OpenMP parallelization (the MPI standard does not do this!). This can only be achieved by hiding the OpenMP directives and commands in such a way, that a normal compiler is unable to see them. For that purpose the following directive sentinel is introduced:!$omp Since the first character is an exclamation mark!, a normal compiler will interpret the line as a comment and will neglect its content. But an OpenMP-compliant compiler will identify the complete sequence and will execute commands that follow:!$omp PARALLEL DEFAULT(shared) PRIVATE(C, D) REDUCTION(+:a) 4
5 Making the FORTRAN compile recognize OpenMP directives. In order for the FOTRAN compiler to recognize OpenMP directives, one needs to compile the source code with a specific flag, which may be compiler-dependent and tells the compiler to link specific OpenMP libraries. GNU Fortran compiler gfortran -fopenmp Intel Fortran compiler ifort -openmp PGI Fortran compiler pgf90 -mp Note that when using OpenMP all local arrays will be allocated on the stack. When porting existing code to OpenMP, this may lead to surprising results, especially to segmentation faults if the stacksize is limited. 5
6 Setting the number of threads in a parallel region The number of threads can be set by environment variables In BASH shell: export OMP_NUM_THREADS = 8 In TCSH shell setenv OMP_NUM_THREADS = 8 Environment variables affect all OpenMP codes that are run from a given terminal. 6
7 The number of threads can also be set by OpenMP library calls Subroutine OMPsetup integer omp_get_num_threads, omp_get_max_threads, omp_get_num_procs Call OMP_SET_NUM_Threads(8)! Sets number of threads to 8!$OMP parallel! Parallel region starts here!$omp master! The following commands will be executed only by the master thread print(*,*) 'num threads=', omp_get_num_threads()! Number of executing threads print(*,*) 'max threads=', omp_get_max_threads()! Maximum possible number of threads print(*,*) 'max cpus=', omp_get_num_procs()! Available number of processors!$omp end master!$omp end parallel! End of parallel regions End subroutine OMPSetup Note that OMP_SET_NUM_Threads is called from a serial part of the code. The library call to OMP_SET_NUM_Threads supersedes the environment variable OMP_NUM_THREADS. 7
8 The PARALLEL construct The most important directive in OpenMP is the one in charge of defining the so called parallel regions. Such a region is a block of code that is going to be executed by multiple threads running in parallel. Since a parallel region needs to be created/opened and destroyed/closed, two directives are necessary, forming a so called directive-pair:!$omp parallel --!$OMP end parallel.... serial code..!$omp parallel write(*,*) "Hello"!$OMP end parallel... serial code.. Parallel code Since the code enclosed between the two directives is executed by each thread, the message Hello appears in the screen as many times as threads are being used in the parallel region. Before and after the parallel region, the code is executed by only one thread, which is the normal behavior in serial programs. 8
9 Parallelizing a DO loop. PRIVATE clause Serial DO loop Integer k Do k = 1, end do Parallel DO loop Integer k Call OMP_SET_NUM_Threads(2)!$OMP parallel do private(k) Do k = 1, end do!$omp end parallel do thread 0 thread 0 thread 1 Do k = 1, 1000 Do k = 1, 500 Do k = 501, 1000 Master thread does all the job Each thread computes part of the global DO loop Note that the same counter variable k has different values in each thread in the parallelized DO loop! To avoid memory conflicts, two copies of variable k need to be created in the memory. The clause PRIVATE(k) tells the compiler that each thread needs to have its own copy of the variable k. The PRIVATE clause can be very resource consuming Variables should be declared private only if they are modified inside the DO loop. Upon entering and after leaving the parallel DO loop, variable k is undefined (in the serial DO loop, k=1000, after leaving the loop). 9
10 Program example Implicit NONE Shared variables. The SHARED clause In contrast to the previous situation, sometimes there are variables which should be available to all threads inside the DO-loop because their values are needed by all threads or because all threads have to update their values. Call omp_set_num_threads(4)! Setting the number of threads to 4 Integer, parameter :: n=10 Integer i Real b Real, dimension(n) :: a!$omp parallel do shared(a,n) private(i,b)! Parallel DO loop begins here Do i=1, n b = i + 1 a(i) = b End do!$omp end parallel do! Parallel DO loop ends here end In this example, an array variable a(i), variable b, and counter i are modified inside the DO loop. However, each iteration of the loop accesses different elements of the array a(i). Therefore, one need not to create separate copies of array a(i). Such variables are declared as SHARED. Use shared when: a variable is not modified in the loop (as, e.g., n) a variable is an array in which each iteration of the loop accesses a different element. 10
11 Other DO loop clauses FIRSTPRIVATE(list) LASTPRIVATE(list) REDUCTION(operator:list) SCHEDULE(type, chunk) ORDERED DEFAULT FIRSTPRIVATE clause. Private variables have an undefined value after entering the parallel do construct. But sometimes it is of interest that these local variables have the value of the original variable in the serial part of the code. This is achieved by including the variable in a FIRSTPRIVATE clause as follows: Integer a, b a = 2 b = 1!$OMP parallel do private(a) firstprivate(b)!$omp end parallel do In this example, variable a has an undefined value at the beginning of the parallel region, while b has the value specified in the preceding serial region, namely b = 1. 11
12 LASTPRIVATE clause Private variables have an undefined value after leaving the parallel do construct. This is sometimes not convenient. By including a variable in a LASTPRIVATE clause, the original variable will be updated by the last value it gets inside the parallel DO-loop, if this DOloop would be executed in serial mode. For example: Integer i, a!$omp do private(i) lastprivate(a) do i = 1, 1000 a = i End do!$omp end do After the finishing of the parallel DO loop, the variable a will be equal to 1000, which is the value it would have, if the OpenMP directive would not exist. 12
13 The REDUCTION clause Integer i, a do i = 1, 1000 a = a + i enddo wrong OmpenMP parallelization!!$omp parallel do private(i) shared (a) do i = 1, 1000 a = a + i enddo!$omp end do When a variable has been declared as SHARED because all threads need to modify its value, it is necessary to ensure that only one thread at a time is writing/updating the memory location of the considered variable, otherwise unpredictable results will occur. By using the clause REDUCTION it is possible to solve this problem, since only one thread at a time is allowed to update the value of a, ensuring that the final result will be the correct one.!$omp parallel do reduction(+:a) do i = 1, 1000 a = a + i Endd o!$omp end parallel do 13
14 General syntax of the REDUCTION clause REDUCTION(operator or intrinsic function : variable list) Initialization rules for variables in variable list A private copy of each variable in variable list is created for each thread as if the PRIVATE clause had been used. The resulting private copies are initialized following the rules shown in the Table. At the end of the REDUCTION, the shared variable is updated to reflect the result of combining the final value of each of the private copies using the specified operator. 14
15 The SCHEDULE clause. Load balancing. Call omp_set_num_threads(4)!$omp parallel do private(k) shared(n) Do k=1,n.. End do!$omp end parallel do When a do-loop is parallelized and its iterations distributed over the different threads, the most simple way of doing this is by giving to each thread the same number of iterations: n/4. But this is not always the best choice, since the computational cost of the iterations may not be equal for all of them. Therefore, different ways of distributing the iterations exist. The SCHEDULE clause is meant to allow the programmer to specify the scheduling for each do-loop using the following syntax: Call omp_set_num_threads(4)!$omp parallel do private(k) shared(n) schedule(type, chunk) Do k=1,n.. End do!$omp end parallel do 15
16 The SCHEDULE clause accepts two parameters. The first one, type, specifies the way in which the work is distributed over the threads. The second one, chunk, is an optional parameter specifying the size of the work given to each thread.. STATIC :when this option is specified, the pieces of work created from the iteration space of the do-loop are distributed over the threads in the team following the order of their thread identification number. This assignment of work is done at the beginning of the do-loop and stays fixed during its execution. Number of threads = 3 and the DO-loop iteration space k=1, 600 No value of chunk is specified. Best choice in most cases. 16
17 When SCHEDULE(DYNAMIC,chunk) is specified, the iteration space is divided into pieces of work with a size equal to chunk. If this optional parameter is not given, then a size equal to one iteration is considered. Thereafter, each thread gets one of these pieces of work. When they have finished with their task, they get assigned a new one until no pieces of work are left. Example of dynamic scheduling See also: GUIDED and RUNTIME clauses 17
18 The ODERED clause. Eliminating the race condition. Program race_condition Integer i Integer, dimension(5) :: a,b a=1 b=2 Call omp_set_num_threads(2)!$omp parallel do private(i) shared(a,b) Do i=1,4 a(i+1)=a(i)+b(i) End do!$omp end parallel do end Thread 0 a(2) = a(1)+b(1) Thread 0 a(3) = a(2)+b(2) Thread 1 a(4) = a(3)+b(3) Thread 1 a(5) = a(4)+b(4) We have a data dependency between iterations, causing a so-called race condition P R O B L E M A solution is to use the ORDERED clause, which tell the compiler that some statements in the DO-loop need to be executed sequentially. 18
19 Program no_race_condition Integer i Integer, dimension(5) :: a,b a=1 b=2 Call omp_set_num_threads(2)!$omp parallel do private(i) shared(a,b) ordered Do i=1,4!$omp ordered a(i+1)=a(i)+b(i)!$omp end ordered End do!$omp end parallel do end In this case, the threads do not run in parallel. DEFAULT( PRIVATE SHARED NONE ) clause When most of the variables used inside the DO-loop are going to be private/shared, then it would be cumbersome to include all of them in one of the previous clauses. To avoid this, it is possible to specify what OpenMP has to do, when nothing is said about a specific variable: it is possible to specify a default setting. For example!$omp parallel do default(private) shared(a) 19
20 Parallelization of implicit DO-loops. WORKSHARE construct. FORTRAN 90 array operations include implicit DO-loops and can be parallelized by the WORKSHARE construct serial code real, dimension (10):: a, b, c.. a = 5.0 * cos(a) * sin(a).. parallelized code real, dimension (10):: a, b, c..!$omp parallel workshare a = 5.0 * cos(a) * sin(a)!$omp end parallel workshare.. Not all compilers support parallelization of FORTRAN 90 array operations! 20
21 Parallelization of nested DO-loops When several nested do-loops are present, it is always convenient to parallelize the outer most one, since then the amount of work distributed over the different threads is maximal. Also the number of times in which the!$omp parallel do --!$OMP end parallel do directive pair effectively acts is minimal, which implies a minimal overhead due to the OpenMP directive. do i = 1, 10 do j = 1, 10!$OMP parallel do private(k) shared(a,j,i) do k = 1, 10 A(k,j,i) = i * j * k end do!$omp end parallel do end do end do!$omp parallel do private(i,j,k) shared(a) do i = 1, 10 do j = 1, 10 do k = 1, 10 A(k,j,i) = i * j * k end do end do end do!$omp end parallel do the work to be computed in parallel is distributed i *j = 100 times and each thread gets less than 10 iterations to compute, since only the innermost do- loop is parallelized. the work to be computed in parallel is distributed only once and the work given to each thread has at least j*k = 100 iterations. Therefore, in this second case a better performance of the parallelization is to expect. 21
22 The SECTIONS construct The SECTIONS construct allows to assign to each thread a completely different task leading to an MPMD 1 model of execution. Each section of code is executed once and only once by a thread in the team. The syntax of this construct is the following one:!$omp parallel sections clause1 clause2...!$omp section... code executed by one thread!$omp section... code executed by another thread!$omp end parallel sections Each block of code, to be executed by one of the threads, starts with an!$omp SECTION directive and extends until the same directive is found again or until the closing-directive!$omp END SECTIONS is found. Any number of sections can be defined inside the present directive-pair, but only the existing number of threads is used to distribute the different blocks of code. This means, that if the number of sections is larger than the number of available threads, then some threads will execute more than one section of code in a serial fashion. Allowed clauses: PRIVATE, FIRSTPRIVATE, LASTPRIVATE, REDUCTION 1 MPMD stands for Multiple Programs Multiple Data and refers to the case of having completely different programs/tasks which share or interchange information and which are running simultaneously on different processors. 22
23 Calling serial subroutines inside a parallel region. SINGLE construct. integer, dimension(0:3) :: a = 99 integer :: i_am Call omp_set_num_threads(4)!$omp parallel private(i_am) shared(a) i_am = omp_get_thread_num() call work(a, i_am)!$omp single print*, 'a = ', a!$omp end single!$omp end parallel subroutine work(a, i_am) integer, dimension(0:3) :: a! becomes shared integer :: i_am! becomes private print*, 'work', i_am a(i_am) = i_am end subroutine work Dummy arguments inherit the data-sharing attributes of the associated actual arguments. The code enclosed in the SINGLE construct is only executed by one of the threads in the team, namely the one who first arrives to the opening-directive!$omp SINGLE. All the remaining threads wait at the implied synchronization in the closing-directive!$omp END SINGLE. Result of execution work 1 work 3 a = 99, 1, 99, 3 work 2 work 0 What went wrong? The SINGLE construct was executed by one of the threads (1 or 3) before threads 2 and 0 completed execution of subroutine work. 23
24 integer, dimension(0:3) :: a = 99 integer :: i_am Call omp_set_num_threads(4)!$omp parallel private(i_am) shared(a) i_am = omp_get_thread_num() call work(a, i_am)!$omp barrier! All threads wait at the barrier!$omp single print*, 'a = ', a!$omp end single!$omp end parallel subroutine work(a, i_am) integer, dimension(0:3) :: a! becomes shared integer :: i_am! becomes private print*, 'work', i_am a(i_am) = i_am end subroutine work Result of execution work 1 work 3 work 2 work 0 a = 0, 1, 2, 3 The BARRIER directive represents an explicit synchronization between the different threads in the team. When encountered, each thread waits until all the other threads have reached this point. 24
25 Calling parallel subroutines inside a parallel region. Call omp_set_num_threads(2)!$omp parallel shared(s) private(p)!$omp do private(j) do j = 1, end do!$omp end do call sub(s, p)!$omp end parallel... end subroutine sub(s, p) integer :: s! shared integer :: p! private integer :: var, k! local variables are private!$omp do private(k) do k = 1, 10...! Thread 0 will do the first 5 iterations...! Thread 1 will do the last 5 iterations end do!$omp end do do k = 1, 10...! All threads will do full 10 iterations end do A PARALLEL directive dynamically inside another PARALLEL directive logically establishes a new team, which is composed of only the current thread, unless nested parallelism is established. We say that the loop is serialized. All threads perform six iterations each.!$omp parallel do private(k) do k = 1, 10...! A PARALLEL directive inside! another PARALLEL directive end do!$omp end parallel do end 25
26 The MASTER and CRITICAL constructs The code enclosed inside the MASTER construct is executed only by the master thread of the team. Meanwhile, all the other threads continue with their work. The syntax is as follows:!$omp master...!$omp end master In essence, this construct is similar to using the!$omp single --!$OMP end single construct presented before, only that the thread to execute the block of code is forced to be the master one instead of the first arriving one. The CRITICAL construct restricts the access to the enclosed code to only one thread at a time. Examples of application of this directive-pair could be to read an input from the keyboard/file or to update the value of a shared variable. The syntax is the following one:!$omp critical...!$omp end critical When a thread reaches the beginning of a critical section, it waits there until no other thread is executing the code in the critical section. 26
27 The THREADPRIVATE construct Sometimes it is of interest to have global variables, but with values which are specific for each thread. An example could be a variable called my_id, which stores the thread identification number of each thread: this number will be different for each thread, but it would be useful that its value is accessible from everywhere inside each thread and that its value does not change from one parallel region to the next. When the program enters the first parallel region, a private copy of each variable marked as THREADPRIVATE is created for each thread. integer, save :: my_id! Variable must have a SAVE attribute!$omp threadprivate(my_id)!$omp parallel my_id = OMP_get_thread_num()! Thread number is assigned to my_id!$omp end parallel..!$omp parallel...!$omp end parallel. In this example, the variable my_id gets assigned the thread identification number of each thread during the first parallel region. In the second parallel region, the variable my_id keeps the values assigned to it in the first parallel region, since it is THREADPRIVATE. 27
28 OpenMP runtime library overview OpenMP Fortran library routines are external functions Their names start with OMP_ but usually have an integer or logical return type These functions must be declared explicitly Name omp_set_num_threads omp_get_num_threads omp_get_max_threads omp_get_thread_num omp_get_num_procs omp_in_parallel omp_set_dynamic omp_get_dynamic omp_set_nested omp_get_nested Functionality Set number of threads Return number of threads in team Return maximum number of threads Get thread ID Return maximum number of processors Check whether in parallel region Activate dynamic thread adjustment Check for dynamic thread adjustment Activate nested parallelism Check for nested parallelism 28
29 References : OpenMP Application Program Interface Version 3.0 May 2008 ALSO various web resources and books 29
30 Assignment 9 (five extra points) Parallelize your version of the Sedov test problem (or Sod shock tube problem) using OpenMP directives. (see Nigel s lecture on hyperbolic equations, Assignment 6). Use sufficiently high resolution so that the serial code would run 1 minute minimum. Use different number of threads (2, 4, max) Calculate the speedup for variable number of threads (2, 4, max) relative to the purely serial code. (use time./your _code in Linux to calculate the run time of your code) The report is due on
OpenMP+F90 p OpenMP+F90
OpenMP+F90 hmli@ustc.edu.cn - http://hpcjl.ustc.edu.cn OpenMP+F90 p. 1 OpenMP+F90 p. 2 OpenMP ccnuma Cache-Coherent Non-Uniform Memory Access SMP Symmetric MultiProcessing MPI MPP Massively Parallel Processing
More informationIntroduction [1] 1. Directives [2] 7
OpenMP Fortran Application Program Interface Version 2.0, November 2000 Contents Introduction [1] 1 Scope............................. 1 Glossary............................ 1 Execution Model.........................
More informationIntroduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines
Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines What is OpenMP? What does OpenMP stands for? What does OpenMP stands for? Open specifications for Multi
More informationParallel Programming in Fortran 95 using OpenMP
Parallel Programming in Fortran 95 using OpenMP Miguel Hermanns School of Aeronautical Engineering Departamento de Motopropulsión y Termofluidodinámica Universidad Politécnica de Madrid Spain email: hermanns@tupi.dmt.upm.es
More informationEPL372 Lab Exercise 5: Introduction to OpenMP
EPL372 Lab Exercise 5: Introduction to OpenMP References: https://computing.llnl.gov/tutorials/openmp/ http://openmp.org/wp/openmp-specifications/ http://openmp.org/mp-documents/openmp-4.0-c.pdf http://openmp.org/mp-documents/openmp4.0.0.examples.pdf
More informationParallelising Scientific Codes Using OpenMP. Wadud Miah Research Computing Group
Parallelising Scientific Codes Using OpenMP Wadud Miah Research Computing Group Software Performance Lifecycle Scientific Programming Early scientific codes were mainly sequential and were executed on
More informationAmdahl s Law. AMath 483/583 Lecture 13 April 25, Amdahl s Law. Amdahl s Law. Today: Amdahl s law Speed up, strong and weak scaling OpenMP
AMath 483/583 Lecture 13 April 25, 2011 Amdahl s Law Today: Amdahl s law Speed up, strong and weak scaling OpenMP Typically only part of a computation can be parallelized. Suppose 50% of the computation
More informationAdvanced C Programming Winter Term 2008/09. Guest Lecture by Markus Thiele
Advanced C Programming Winter Term 2008/09 Guest Lecture by Markus Thiele Lecture 14: Parallel Programming with OpenMP Motivation: Why parallelize? The free lunch is over. Herb
More informationEE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California
EE/CSCI 451 Introduction to Parallel and Distributed Computation Discussion #4 2/3/2017 University of Southern California 1 USC HPCC Access Compile Submit job OpenMP Today s topic What is OpenMP OpenMP
More informationOpenMP programming. Thomas Hauser Director Research Computing Research CU-Boulder
OpenMP programming Thomas Hauser Director Research Computing thomas.hauser@colorado.edu CU meetup 1 Outline OpenMP Shared-memory model Parallel for loops Declaring private variables Critical sections Reductions
More informationIntroduction to. Slides prepared by : Farzana Rahman 1
Introduction to OpenMP Slides prepared by : Farzana Rahman 1 Definition of OpenMP Application Program Interface (API) for Shared Memory Parallel Programming Directive based approach with library support
More informationCSL 860: Modern Parallel
CSL 860: Modern Parallel Computation Hello OpenMP #pragma omp parallel { // I am now thread iof n switch(omp_get_thread_num()) { case 0 : blah1.. case 1: blah2.. // Back to normal Parallel Construct Extremely
More informationOpenMP on Ranger and Stampede (with Labs)
OpenMP on Ranger and Stampede (with Labs) Steve Lantz Senior Research Associate Cornell CAC Parallel Computing at TACC: Ranger to Stampede Transition November 6, 2012 Based on materials developed by Kent
More informationOpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018
OpenMP 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives
More informationParallel Programming
Parallel Programming OpenMP Dr. Hyrum D. Carroll November 22, 2016 Parallel Programming in a Nutshell Load balancing vs Communication This is the eternal problem in parallel computing. The basic approaches
More informationOpenMP. OpenMP. Portable programming of shared memory systems. It is a quasi-standard. OpenMP-Forum API for Fortran and C/C++
OpenMP OpenMP Portable programming of shared memory systems. It is a quasi-standard. OpenMP-Forum 1997-2002 API for Fortran and C/C++ directives runtime routines environment variables www.openmp.org 1
More informationOpenMP C and C++ Application Program Interface Version 1.0 October Document Number
OpenMP C and C++ Application Program Interface Version 1.0 October 1998 Document Number 004 2229 001 Contents Page v Introduction [1] 1 Scope............................. 1 Definition of Terms.........................
More informationShared Memory Programming with OpenMP
Shared Memory Programming with OpenMP (An UHeM Training) Süha Tuna Informatics Institute, Istanbul Technical University February 12th, 2016 2 Outline - I Shared Memory Systems Threaded Programming Model
More informationIntroduction to OpenMP
Introduction to OpenMP Le Yan Scientific computing consultant User services group High Performance Computing @ LSU Goals Acquaint users with the concept of shared memory parallelism Acquaint users with
More informationModule 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program
The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program Amdahl's Law About Data What is Data Race? Overview to OpenMP Components of OpenMP OpenMP Programming Model OpenMP Directives
More informationData Handling in OpenMP
Data Handling in OpenMP Manipulate data by threads By private: a thread initializes and uses a variable alone Keep local copies, such as loop indices By firstprivate: a thread repeatedly reads a variable
More informationA Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh
A Short Introduction to OpenMP Mark Bull, EPCC, University of Edinburgh Overview Shared memory systems Basic Concepts in Threaded Programming Basics of OpenMP Parallel regions Parallel loops 2 Shared memory
More informationAMath 483/583 Lecture 14
AMath 483/583 Lecture 14 Outline: OpenMP: Parallel blocks, critical sections, private and shared variables Parallel do loops, reductions Reading: class notes: OpenMP section of Bibliography $UWHPSC/codes/openmp
More informationShared Memory programming paradigm: openmp
IPM School of Physics Workshop on High Performance Computing - HPC08 Shared Memory programming paradigm: openmp Luca Heltai Stefano Cozzini SISSA - Democritos/INFM
More informationShared Memory Parallelism - OpenMP
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (http://www.llnl.gov/computing/tutorials/openmp/#introduction) OpenMP sc99 tutorial
More informationby system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by using function call
OpenMP Syntax The OpenMP Programming Model Number of threads are determined by system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by
More information15-418, Spring 2008 OpenMP: A Short Introduction
15-418, Spring 2008 OpenMP: A Short Introduction This is a short introduction to OpenMP, an API (Application Program Interface) that supports multithreaded, shared address space (aka shared memory) parallelism.
More informationOpenMP - Introduction
OpenMP - Introduction Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı - 21.06.2012 Outline What is OpenMP? Introduction (Code Structure, Directives, Threads etc.) Limitations Data Scope Clauses Shared,
More information1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008
1 of 6 Lecture 7: March 4 CISC 879 Software Support for Multicore Architectures Spring 2008 Lecture 7: March 4, 2008 Lecturer: Lori Pollock Scribe: Navreet Virk Open MP Programming Topics covered 1. Introduction
More informationHPC Practical Course Part 3.1 Open Multi-Processing (OpenMP)
HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP) V. Akishina, I. Kisel, G. Kozlov, I. Kulakov, M. Pugach, M. Zyzak Goethe University of Frankfurt am Main 2015 Task Parallelism Parallelization
More informationAlfio Lazzaro: Introduction to OpenMP
First INFN International School on Architectures, tools and methodologies for developing efficient large scale scientific computing applications Ce.U.B. Bertinoro Italy, 12 17 October 2009 Alfio Lazzaro:
More informationOpenMP. Application Program Interface. CINECA, 14 May 2012 OpenMP Marco Comparato
OpenMP Application Program Interface Introduction Shared-memory parallelism in C, C++ and Fortran compiler directives library routines environment variables Directives single program multiple data (SPMD)
More informationhttps://www.youtube.com/playlist?list=pllx- Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG
https://www.youtube.com/playlist?list=pllx- Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG OpenMP Basic Defs: Solution Stack HW System layer Prog. User layer Layer Directives, Compiler End User Application OpenMP library
More informationCOMP4300/8300: The OpenMP Programming Model. Alistair Rendell. Specifications maintained by OpenMP Architecture Review Board (ARB)
COMP4300/8300: The OpenMP Programming Model Alistair Rendell See: www.openmp.org Introduction to High Performance Computing for Scientists and Engineers, Hager and Wellein, Chapter 6 & 7 High Performance
More informationCOMP4300/8300: The OpenMP Programming Model. Alistair Rendell
COMP4300/8300: The OpenMP Programming Model Alistair Rendell See: www.openmp.org Introduction to High Performance Computing for Scientists and Engineers, Hager and Wellein, Chapter 6 & 7 High Performance
More informationAn Introduction to OpenMP
An Introduction to OpenMP U N C L A S S I F I E D Slide 1 What Is OpenMP? OpenMP Is: An Application Program Interface (API) that may be used to explicitly direct multi-threaded, shared memory parallelism
More informationIntroduction to OpenMP
Introduction to OpenMP Xiaoxu Guan High Performance Computing, LSU April 6, 2016 LSU HPC Training Series, Spring 2016 p. 1/44 Overview Overview of Parallel Computing LSU HPC Training Series, Spring 2016
More informationIntroduction to Standard OpenMP 3.1
Introduction to Standard OpenMP 3.1 Massimiliano Culpo - m.culpo@cineca.it Gian Franco Marras - g.marras@cineca.it CINECA - SuperComputing Applications and Innovation Department 1 / 59 Outline 1 Introduction
More informationParallel and Distributed Programming. OpenMP
Parallel and Distributed Programming OpenMP OpenMP Portability of software SPMD model Detailed versions (bindings) for different programming languages Components: directives for compiler library functions
More informationProgramming Shared-memory Platforms with OpenMP. Xu Liu
Programming Shared-memory Platforms with OpenMP Xu Liu Introduction to OpenMP OpenMP directives concurrency directives parallel regions loops, sections, tasks Topics for Today synchronization directives
More informationIntroduction to Programming with OpenMP
Introduction to Programming with OpenMP Kent Milfeld; Lars Koesterke Yaakoub El Khamra (presenting) milfeld lars yye00@tacc.utexas.edu October 2012, TACC Outline What is OpenMP? How does OpenMP work? Architecture
More informationShared memory parallelism
OpenMP Shared memory parallelism Shared memory parallel programs may be described as processes in which the execution flow is divided in different threads when needed. Threads, being generated inside a
More informationIntroduction to OpenMP
Introduction to OpenMP Le Yan HPC Consultant User Services Goals Acquaint users with the concept of shared memory parallelism Acquaint users with the basics of programming with OpenMP Discuss briefly the
More informationParallel Programming: OpenMP + FORTRAN
Parallel Programming: OpenMP + FORTRAN John Burkardt Virginia Tech... http://people.sc.fsu.edu/ jburkardt/presentations/... fsu 2010 openmp.pdf... FSU Department of Scientific Computing Introduction to
More informationDistributed Systems + Middleware Concurrent Programming with OpenMP
Distributed Systems + Middleware Concurrent Programming with OpenMP Gianpaolo Cugola Dipartimento di Elettronica e Informazione Politecnico, Italy cugola@elet.polimi.it http://home.dei.polimi.it/cugola
More informationIntroduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah
Introduction to OpenMP Martin Čuma Center for High Performance Computing University of Utah mcuma@chpc.utah.edu Overview Quick introduction. Parallel loops. Parallel loop directives. Parallel sections.
More informationMultithreading in C with OpenMP
Multithreading in C with OpenMP ICS432 - Spring 2017 Concurrent and High-Performance Programming Henri Casanova (henric@hawaii.edu) Pthreads are good and bad! Multi-threaded programming in C with Pthreads
More informationCOMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP
COMP4510 Introduction to Parallel Computation Shared Memory and OpenMP Thanks to Jon Aronsson (UofM HPC consultant) for some of the material in these notes. Outline (cont d) Shared Memory and OpenMP Including
More informationParallel Programming
Parallel Programming OpenMP Nils Moschüring PhD Student (LMU) Nils Moschüring PhD Student (LMU), OpenMP 1 1 Overview What is parallel software development Why do we need parallel computation? Problems
More informationShared memory programming
CME342- Parallel Methods in Numerical Analysis Shared memory programming May 14, 2014 Lectures 13-14 Motivation Popularity of shared memory systems is increasing: Early on, DSM computers (SGI Origin 3000
More informationIntroduction to OpenMP
Introduction to OpenMP Le Yan Objectives of Training Acquaint users with the concept of shared memory parallelism Acquaint users with the basics of programming with OpenMP Memory System: Shared Memory
More informationOpenMP Shared Memory Programming
OpenMP Shared Memory Programming John Burkardt, Information Technology Department, Virginia Tech.... Mathematics Department, Ajou University, Suwon, Korea, 13 May 2009.... http://people.sc.fsu.edu/ jburkardt/presentations/
More informationSpeeding Up Reactive Transport Code Using OpenMP. OpenMP
Speeding Up Reactive Transport Code Using OpenMP By Jared McLaughlin OpenMP A standard for parallelizing Fortran and C/C++ on shared memory systems Minimal changes to sequential code required Incremental
More informationOpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013
OpenMP António Abreu Instituto Politécnico de Setúbal 1 de Março de 2013 António Abreu (Instituto Politécnico de Setúbal) OpenMP 1 de Março de 2013 1 / 37 openmp what? It s an Application Program Interface
More informationLecture 4: OpenMP Open Multi-Processing
CS 4230: Parallel Programming Lecture 4: OpenMP Open Multi-Processing January 23, 2017 01/23/2017 CS4230 1 Outline OpenMP another approach for thread parallel programming Fork-Join execution model OpenMP
More informationSession 4: Parallel Programming with OpenMP
Session 4: Parallel Programming with OpenMP Xavier Martorell Barcelona Supercomputing Center Agenda Agenda 10:00-11:00 OpenMP fundamentals, parallel regions 11:00-11:30 Worksharing constructs 11:30-12:00
More informationOpenMP Library Functions and Environmental Variables. Most of the library functions are used for querying or managing the threading environment
OpenMP Library Functions and Environmental Variables Most of the library functions are used for querying or managing the threading environment The environment variables are used for setting runtime parameters
More information[Potentially] Your first parallel application
[Potentially] Your first parallel application Compute the smallest element in an array as fast as possible small = array[0]; for( i = 0; i < N; i++) if( array[i] < small ) ) small = array[i] 64-bit Intel
More informationShared Memory Parallelism using OpenMP
Indian Institute of Science Bangalore, India भ रत य व ज ञ न स स थ न ब गल र, भ रत SE 292: High Performance Computing [3:0][Aug:2014] Shared Memory Parallelism using OpenMP Yogesh Simmhan Adapted from: o
More informationParallel Programming: OpenMP
Parallel Programming: OpenMP Xianyi Zeng xzeng@utep.edu Department of Mathematical Sciences The University of Texas at El Paso. November 10, 2016. An Overview of OpenMP OpenMP: Open Multi-Processing An
More informationOpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means
High Performance Computing: Concepts, Methods & Means OpenMP Programming Prof. Thomas Sterling Department of Computer Science Louisiana State University February 8 th, 2007 Topics Introduction Overview
More informationIntroduction to OpenMP. Lecture 4: Work sharing directives
Introduction to OpenMP Lecture 4: Work sharing directives Work sharing directives Directives which appear inside a parallel region and indicate how work should be shared out between threads Parallel do/for
More informationOpenMP threading: parallel regions. Paolo Burgio
OpenMP threading: parallel regions Paolo Burgio paolo.burgio@unimore.it Outline Expressing parallelism Understanding parallel threads Memory Data management Data clauses Synchronization Barriers, locks,
More informationIntroduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah
Introduction to OpenMP Martin Čuma Center for High Performance Computing University of Utah mcuma@chpc.utah.edu Overview Quick introduction. Parallel loops. Parallel loop directives. Parallel sections.
More informationIntroduction to OpenMP
Presentation Introduction to OpenMP Martin Cuma Center for High Performance Computing University of Utah mcuma@chpc.utah.edu September 9, 2004 http://www.chpc.utah.edu 4/13/2006 http://www.chpc.utah.edu
More informationProgramming with Shared Memory PART II. HPC Fall 2012 Prof. Robert van Engelen
Programming with Shared Memory PART II HPC Fall 2012 Prof. Robert van Engelen Overview Sequential consistency Parallel programming constructs Dependence analysis OpenMP Autoparallelization Further reading
More informationOpenMP Application Program Interface
OpenMP Application Program Interface DRAFT Version.1.0-00a THIS IS A DRAFT AND NOT FOR PUBLICATION Copyright 1-0 OpenMP Architecture Review Board. Permission to copy without fee all or part of this material
More informationAllows program to be incrementally parallelized
Basic OpenMP What is OpenMP An open standard for shared memory programming in C/C+ + and Fortran supported by Intel, Gnu, Microsoft, Apple, IBM, HP and others Compiler directives and library support OpenMP
More informationIntroduction to OpenMP
Introduction to OpenMP Ekpe Okorafor School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014 A little about me! PhD Computer Engineering Texas A&M University Computer Science
More informationProgramming with Shared Memory PART II. HPC Fall 2007 Prof. Robert van Engelen
Programming with Shared Memory PART II HPC Fall 2007 Prof. Robert van Engelen Overview Parallel programming constructs Dependence analysis OpenMP Autoparallelization Further reading HPC Fall 2007 2 Parallel
More informationTopics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP
Topics Lecture 11 Introduction OpenMP Some Examples Library functions Environment variables 1 2 Introduction Shared Memory Parallelization OpenMP is: a standard for parallel programming in C, C++, and
More informationCOSC 6374 Parallel Computation. Introduction to OpenMP. Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)
COSC 6374 Parallel Computation Introduction to OpenMP Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel) Edgar Gabriel Fall 2015 OpenMP Provides thread programming model at a
More informationCompiling and running OpenMP programs. C/C++: cc fopenmp o prog prog.c -lomp CC fopenmp o prog prog.c -lomp. Programming with OpenMP*
Advanced OpenMP Compiling and running OpenMP programs C/C++: cc fopenmp o prog prog.c -lomp CC fopenmp o prog prog.c -lomp 2 1 Running Standard environment variable determines the number of threads: tcsh
More informationPractical stuff! ü OpenMP. Ways of actually get stuff done in HPC:
Ways of actually get stuff done in HPC: Practical stuff! Ø Message Passing (send, receive, broadcast,...) Ø Shared memory (load, store, lock, unlock) ü MPI Ø Transparent (compiler works magic) Ø Directive-based
More informationOpenMP Tutorial. Dirk Schmidl. IT Center, RWTH Aachen University. Member of the HPC Group Christian Terboven
OpenMP Tutorial Dirk Schmidl IT Center, RWTH Aachen University Member of the HPC Group schmidl@itc.rwth-aachen.de IT Center, RWTH Aachen University Head of the HPC Group terboven@itc.rwth-aachen.de 1 IWOMP
More informationOPENMP OPEN MULTI-PROCESSING
OPENMP OPEN MULTI-PROCESSING OpenMP OpenMP is a portable directive-based API that can be used with FORTRAN, C, and C++ for programming shared address space machines. OpenMP provides the programmer with
More informationIntroduction to OpenMP
1 Introduction to OpenMP NTNU-IT HPC Section John Floan Notur: NTNU HPC http://www.notur.no/ www.hpc.ntnu.no/ Name, title of the presentation 2 Plan for the day Introduction to OpenMP and parallel programming
More informationIntroduction to OpenMP
Introduction to OpenMP Ricardo Fonseca https://sites.google.com/view/rafonseca2017/ Outline Shared Memory Programming OpenMP Fork-Join Model Compiler Directives / Run time library routines Compiling and
More informationIntroduction to OpenMP. Rogelio Long CS 5334/4390 Spring 2014 February 25 Class
Introduction to OpenMP Rogelio Long CS 5334/4390 Spring 2014 February 25 Class Acknowledgment These slides are adapted from the Lawrence Livermore OpenMP Tutorial by Blaise Barney at https://computing.llnl.gov/tutorials/openmp/
More informationIntroduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah
Introduction to OpenMP Martin Čuma Center for High Performance Computing University of Utah m.cuma@utah.edu Overview Quick introduction. Parallel loops. Parallel loop directives. Parallel sections. Some
More informationIntroduction to OpenMP
Christian Terboven, Dirk Schmidl IT Center, RWTH Aachen University Member of the HPC Group terboven,schmidl@itc.rwth-aachen.de IT Center der RWTH Aachen University History De-facto standard for Shared-Memory
More informationParallel Processing Top manufacturer of multiprocessing video & imaging solutions.
1 of 10 3/3/2005 10:51 AM Linux Magazine March 2004 C++ Parallel Increase application performance without changing your source code. Parallel Processing Top manufacturer of multiprocessing video & imaging
More informationOpenMP: Open Multiprocessing
OpenMP: Open Multiprocessing Erik Schnetter June 7, 2012, IHPC 2012, Iowa City Outline 1. Basic concepts, hardware architectures 2. OpenMP Programming 3. How to parallelise an existing code 4. Advanced
More informationOpenMP Fundamentals Fork-join model and data environment
www.bsc.es OpenMP Fundamentals Fork-join model and data environment Xavier Teruel and Xavier Martorell Agenda: OpenMP Fundamentals OpenMP brief introduction The fork-join model Data environment OpenMP
More informationOpen Multi-Processing: Basic Course
HPC2N, UmeåUniversity, 901 87, Sweden. May 26, 2015 Table of contents Overview of Paralellism 1 Overview of Paralellism Parallelism Importance Partitioning Data Distributed Memory Working on Abisko 2 Pragmas/Sentinels
More informationUsing OpenMP. Rebecca Hartman-Baker Oak Ridge National Laboratory
Using OpenMP Rebecca Hartman-Baker Oak Ridge National Laboratory hartmanbakrj@ornl.gov 2004-2009 Rebecca Hartman-Baker. Reproduction permitted for non-commercial, educational use only. Outline I. About
More informationOpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen
OpenMP - II Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 OpenMP References Using OpenMP: Portable Shared Memory Parallel Programming. The MIT
More informationMulti-core Architecture and Programming
Multi-core Architecture and Programming Yang Quansheng( 杨全胜 ) http://www.njyangqs.com School of Computer Science & Engineering 1 http://www.njyangqs.com Programming with OpenMP Content What is PpenMP Parallel
More informationCMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)
CMSC 714 Lecture 4 OpenMP and UPC Chau-Wen Tseng (from A. Sussman) Programming Model Overview Message passing (MPI, PVM) Separate address spaces Explicit messages to access shared data Send / receive (MPI
More informationINTRODUCTION TO OPENMP & OPENACC. Kadin Tseng Boston University Research Computing Services
INTRODUCTION TO OPENMP & OPENACC Kadin Tseng Boston University Research Computing Services 2 Outline Introduction to OpenMP (for CPUs) Introduction to OpenACC (for GPUs) 3 Introduction to OpenMP (for CPUs)
More informationData Environment: Default storage attributes
COSC 6374 Parallel Computation Introduction to OpenMP(II) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel) Edgar Gabriel Fall 2014 Data Environment: Default storage attributes
More information!OMP #pragma opm _OPENMP
Advanced OpenMP Lecture 12: Tips, tricks and gotchas Directives Mistyping the sentinel (e.g.!omp or #pragma opm ) typically raises no error message. Be careful! The macro _OPENMP is defined if code is
More informationParallel Computing. Prof. Marco Bertini
Parallel Computing Prof. Marco Bertini Shared memory: OpenMP Implicit threads: motivations Implicit threading frameworks and libraries take care of much of the minutiae needed to create, manage, and (to
More informationOpenMP Lab on Nested Parallelism and Tasks
OpenMP Lab on Nested Parallelism and Tasks Nested Parallelism 2 Nested Parallelism Some OpenMP implementations support nested parallelism A thread within a team of threads may fork spawning a child team
More informationModule 11: The lastprivate Clause Lecture 21: Clause and Routines. The Lecture Contains: The lastprivate Clause. Data Scope Attribute Clauses
The Lecture Contains: The lastprivate Clause Data Scope Attribute Clauses Reduction Loop Work-sharing Construct: Schedule Clause Environment Variables List of Variables References: file:///d /...ary,%20dr.%20sanjeev%20k%20aggrwal%20&%20dr.%20rajat%20moona/multi-core_architecture/lecture%2021/21_1.htm[6/14/2012
More informationOpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2.
OpenMP Overview in 30 Minutes Christian Terboven 06.12.2010 / Aachen, Germany Stand: 03.12.2010 Version 2.3 Rechen- und Kommunikationszentrum (RZ) Agenda OpenMP: Parallel Regions,
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical
More informationA brief introduction to OpenMP
A brief introduction to OpenMP Alejandro Duran Barcelona Supercomputing Center Outline 1 Introduction 2 Writing OpenMP programs 3 Data-sharing attributes 4 Synchronization 5 Worksharings 6 Task parallelism
More informationProgramming Shared-memory Platforms with OpenMP
Programming Shared-memory Platforms with OpenMP John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 7 31 February 2017 Introduction to OpenMP OpenMP
More informationIntroduction to OpenMP.
Introduction to OpenMP www.openmp.org Motivation Parallelize the following code using threads: for (i=0; i
More information