Introduction Hybrid Programming. Sebastian von Alfthan 13/8/2008 CSC the Finnish IT Center for Science PRACE summer school 2008

Size: px
Start display at page:

Download "Introduction Hybrid Programming. Sebastian von Alfthan 13/8/2008 CSC the Finnish IT Center for Science PRACE summer school 2008"

Transcription

1 Introduction Hybrid Programming Sebastian von Alfthan 13/8/2008 CSC the Finnish IT Center for Science PRACE summer school 2008

2 Contents Introduction OpenMP in detail OpenMP programming models & performance Hybrid programming

3 The need for improved parallelism Top 500 trend show that in years #1 has a peak of 1 EF #500 has a peak of 1 PF General purpose processors are not getting (very much) faster MPP:s Massively parallel procesor (MPP) Very large number of nodes connected with a fast interconnect Symmetric multiprocessor nodes (SMP) Hybrid architectures Cell, GPGPU...

4 Hybrid programming - Mixed mode Parallel programming model combining: Parallelization over one SMP node Shared memory parallelization OpenMP de facto standard Posix threads (Pthreads) Low level Parallelization between nodes Distributed memory MPI ( obsolete ) PVM Here: MPI + OpenMP Is it faster? C3 C4 Sometimes In most cases not... C1 C2 Memory OpenMP L3 MPI Seastar2

5 OpenMP: a brief introduction An API that can be used for multithreaded shared memory parallelization Fortran 77/9X and C/C++ are supported Current version implemented in compilers is 2.5 (this talk) OpenMP 3.0 specs have been released Enables one to parallelize one part of the program at a time Easy to do quick and dirty prototyping Efficient and well scaling code still requires effort

6 OpenMP OpenMP API has three components: 1) Compiler directives Expresses shared memory parallelization Preceded by sentinel, can compile serial version 2) Runtime library routines Small number of library functions Example: get number of threads, get rank of thread.. Can be discarded in serial version via conditional compiling 3) Environment variables Bind threads to cores Specify number of threads

7 A simple OpenMP program: F95 PROGRAM demo1 USE omp_lib INTEGER::omp_rank ( private(omp_rank!$omp parallel () omp_rank=omp_get_thread_num WRITE(*,*) "thread is ",omp_rank!$omp end parallel END PROGRAM demo1 >export OMP_NUM_THREADS=2 >ftn -mp demo1.f95 >aprun -n 1./a.out thread is 0 thread is 1

8 A simple OpenMP program: C #include <stdio.h> #include "omp.h int main(int argc,char *argv[]){ int omp_rank; #pragma omp parallel private(omp_rank) { } omp_rank=omp_get_thread_num(); printf("thread is %d\n",omp_rank); } >export OMP_NUM_THREADS=2 >cc -mp demo1.c >aprun -n 1./a.out thread is 0 thread is 1

9 OpenMP in detail

10 Directives Sentinels precede each OpenMP directive C/C++ #pragma omp Fortran free form!$omp Fortran fixed form!$omp c$omp *$omp Space in sixth column begins directive No space depicts continuation line

11 Directives: parallel Starts a parallel region Prior to it only one thread, master Creates a team of threads: master+slave threads At end of block is a barrier and all shared data is synchronized Clauses if(logical expression) private(list) shared(list) default(private/shared/none) firstprivate(list) reduction(operator:list) copyin(list) num_threads(integer)!$omp parallel!$omp end parallel

12 Directives: parallel clauses private(list) Comma separated list with private variables Private variables are on private stack of each thread Undefined initial value Undefined value after parallel region Firstprivate(list) Private variable with initial value that is the same as the original objects Lastprivate(list) Private variable The Thread that performs the last parallel iteration step or section copies its value to the original object

13 Directives: parallel clauses shared(list) Comma separated list with private variables All threads can write to, and read from a shared variable Race condition if other threads access a variable while one writes to it Variables are shared by default (with some exceptions) default(private/shared/none) Sets default for variables to be shared, private or not defined In C/C++ private is not allowed None can be useful for debugging as each variable has to be defined manually

14 Directives: parallel clauses reduction(operator:list) Performs reduction on the (scalar) variables in list Private reduction variable is created for each threads partial result Private reduction variable initialized to operators identity value (see table) After parallel region the reduction operation is applied to private variables and result is aggregated to the shared variable Operator Fortran Initial value Operator C/C++ Initial value * 1 * 1.AND..TRUE. && 1.OR..FALSE. 0 MAX MIN Smallest value Largest value

15 Work-sharing directives: DO/for Directive instructing compiler to share the work of a DO loop Fortran: $OMP DO C/C++: #pragma for Directive is inside parallel region prior to DO-loop Can also be combined with parallel: $OMP PARALLEL DO Loop variable is private by default in Fortran Not in C/C++, there it has to be explicitly defined to be private sumvar=0!$omp parallel do reduction(+:sumvar) do i=1,10 sumvar=sumvar+1 end do!$omp end parallel do WRITE(*,*) "sum is ",sumvar sum is 10

16 Work-sharing directives: DO/for clauses Clauses schedule (type [,chunk]) ordered nowait private(list) firstprivate(list) lastprivate(list) shared(list) reduction(operator:list) schedule(type [,chunksize]) Defines how the iterations are divided over the threads Static Dynamic Guided ordered Iterations performed in same order as in serial program nowait No barrier or synchronization at end of loop In Fortran it is seclared in the!$omp end directive

17 Work-sharing directives: DO/for schedules schedule(static,[chunksize]) Iterations are divided into chunk-sized parts Chunks are statically assigned to threads Default chunk-size is iterations/threads Low overhead Load balance can be problematic The static schedule is used by default if the schedule not defined (implementation dependent)

18 Work-sharing directives: DO/for schedules schedule(dynamic,[chunksize]) Iterations are divided into chunk-sized parts After a thread completes a chunk, it is dynamically assigned a new one Default value of chunksize is 1 Higher overhead Better load balance for unbalanced iterations schedule(guided,[chunksize]) Like dynamic but the size of the chunks decreases exponentially Size of first chunk is implementation-dependent Size of smallest chunk is chunksize Default value of chunksize is 1

19 Work-sharing directives: sections Defines sections that are executed by different threads!$omp sections!$omp section write(*,*) "thread",omp_get_thread_num(),"section A"!$omp section write(*,*) "thread",omp_get_thread_num(),"section B"!$omp end sections thread 0 section A thread 1 section B

20 Directives: Master $OMP MASTER Specifies that the region should only be executed by the master thread!$omp master!code for master!$omp end master #pragma omp master { //code for master }

21 Directives: Single $OMP SINGLE Specifies that the region should only be executed by one arbitrary thread Implicit barrier at end directive (unless nowait is defined) Clauses private(list) firstprivate(list) copyprivate(list) nowait(list)!$omp single!code for any thread!$omp end single #pragma omp single { //code for any thread }

22 Directives: Critical!$OMP CRITICAL [name] A section that should only be executed by one thread at a time Optional name specifies different critical section All unnamed critical sections are treated as the same section!$omp critical!code that is not thread-safe!$omp end critical #pragma omp critical { //code that is not thread safe }

23 Directives: Atomic!$OMP ATOMIC Specifies that a memory location is to be updated atomically, only one thread at a time Applies to only one line Only certain kinds of expressions allowed C/C++ a= a+= a-= a*= a/= a++ ++a a-- --a!$omp atomic var=... #pragma omp atomic var=...;

24 Directives: Barrier! $OMP BARRIER Synchronizes all threads at this point When a thread reaches a barrier it only continues after all threads have reached it!$omp BARRIER #pragma omp barrier

25 Directives: Flush! $OMP FLUSH [list] Synchronizes the memory of all threads Makes sure each thread has a consistent view of memory at this point Also required on cache-coherent systems, changes to variables could still reside in registers Can also only flush variables in list, if not defined all variables are flushed Implicit flush for several directives: BARRIER PARALLEL Both entry and exit CRITICAL Both entry and exit DO - On exit SECTIONS On exit SINGLE On exit ORDERED Both entry and exit

26 OpenMP: Run time library routines OMP_SET_NUM_THREADS OMP_GET_NUM_THREADS OMP_GET_MAX_THREADS OMP_GET_THREAD_NUM OMP_GET_NUM_PROCS OMP_IN_PARALLEL OMP_SET_DYNAMIC OMP_GET_DYNAMIC OMP_SET_NESTED OMP_GET_NESTED OMP_INIT_LOCK OMP_DESTROY_LOCK OMP_SET_LOCK OMP_UNSET_LOCK OMP_TEST_LOCK OMP_GET_WTIME OMP_GET_WTICK

27 OpenMP: Important environment variables OMP_NUM_THREADS Maximum number of threads OMP_NESTED TRUE or FALSE Enables or disables nested parallelism Not always supported Compiler specific flags for binding threads to cores PGI setenv MP_BIND yes Pathscale setenv PSC_OMP_AFFINITY TRUE setenv PSC_OMP_AFFINITY_GLOBAL TRUE GNU setenv GOMP_CPU_AFFINITY "0-3"

28 OpenMP: Compilation flags PGI -mp=nonuma Pathscale -mp GNU -fopenmp

29 OpenMP programming models & performance

30 OpenMP: Programming models Fine-grained: loop level, several local parallel regions PARALLEL DO Can be introduced in piecewise fashion Often simple to implement Performance benefits are limited Coarse-grained: parallel region extends over larger segments (or whole program) PARALLEL OMP_GET_NUM_THREADS OMP_GET_THREAD_NUM Divide work based on thread number Similar to MPI programming Often demands larger changes to program and algorithm Larger potential benefits

31 Case study: Matrix-multiplication Naive serial matrixmultiplication Slow algorithm - do not use in real code (BLAS) n=m=p=1000 Execution time 1.92 t1=omp_get_wtime() DO j=1,m DO i=1,n DO k=1,p c(i,j)=a(i,k)*b(k,j) END DO END DO END DO t2=omp_get_wtime() WRITE(*,*) "Execution time"&,t2-t1

32 Case study: Matrix-multiplication Fine-grained parallelization Static scheduling j,i,k private All other variables shared j-loop parallelized 4-threads (n=m=p=1000) Execution time 0.479s Perfect speedup in this simple case t1=omp_get_wtime()!$omp PARALLEL DO DO j=1,m DO i=1,n DO k=1,p c(i,j)=a(i,k)*b(k,j) END DO END DO END DO!$END OMP PARALLEL DO t2=omp_get_wtime() WRITE(*,*) "Execution time"&,t2-t1

33 Case study: Matrix-multiplication Fine-grained parallelization Static scheduling j,i,k private All other variables shared k-loop parallelized 4-threads (n=m=p=1000) Execution time 1.8s Thread-overhead Synchronization t1=omp_get_wtime() DO j=1,m DO i=1,n!$omp PARALLEL DO DO k=1,p c(i,j)=a(i,k)*b(k,j) END DO!$END OMP PARALLEL DO END DO END DO t2=omp_get_wtime() WRITE(*,*) "Execution time"&,t2-t1

34 Case study: Matrix-multiplication Coarse-grained parallelization Simple domaindecomposition 4-threads (n=m=p=1000) Execution time 0.49s A few percent worse than fine-grained version Best-case scenario for fine-grained parallelization t1=omp_get_wtime()!$omp PARALLEL!$OMP& PRIVATE(omp_rank,i,j,k)!$OMP& PRIVATE(imin,jmin,imax,jmax) omp_rank=omp_get_thread_num() imin=... jmin=... imax=... jmax=... DO j=jmin,jmax DO i=imin,imax DO k=1,p c(i,j)=a(i,k)*b(k,j) END DO END DO END DO!$OMP END PARALLEL t2=omp_get_wtime() WRITE(*,*) "Execution time",t2-t1

35 OpenMP performance: False sharing Memory is read and written in whole cache lines 64 bytes on AMD opteron False sharing When a thread modifies a part of a cache line the whole cache line is marked as invalid When another threads attempt to read or modify another part of the cache it is forced to fetch a newer copy How to avoid Does not occur if all threads only read from variable If data is used by threads in small slices the risk for false sharing is larger than if larger chunks are used

36 OpenMP performance: ccnuma issues cache coherent Non Uniform Memory Access (ccnuma) Shared memory model Local memory closer to processor is faster Caches are kept coherent Examples: Some MPP nodes such as Cray XT5 nodes Large shared memory computers such as SGI Altix Uniform memory access (UMA) Only feasible for small systems Examples: Cray XT4 nodes BlueGene/P nodes

37 OpenMP performance: ccnuma issues In a NUMA node some of the memory is more expensive to access Can lead to severe performance problems OpenMP has no support for NUMA Does not specify where the data is stored Does not give tools to check where it is stored How to avoid Often the system uses a first touch principle: The thread that first accesses the data will host it in its memory Initialization loops that make sure the threads data are local Can also use the low level system call madvise on some systems

38 OpenMP performance: Overheads Amdahls-law If only parts of program are parallelized (fine-grained) then Amdahls-law limits performance In hybrid programs the number of threads is low less of a problem Thread management has a large overhead Avoid creating/destroying threads, use larger parallel regions Synchronization Avoid explicit and implicit barriers If you can use NOWAIT clauses then do it Avoid (if possible) BARRIER/CRITICAL/ORDERED/FLUSH Use named CRITICAL regions

39 OpenMP performance: DO/for directive PARALLEL DO can be more efficient than a DO directive inside a PARALLEL region implementation dependent If the iterations are well balanced use STATIC If there are load-balancing issues then use GUIDED or DYNAMIC Small loops should not be parallelized In case of nested loops inner loops should not bee parallelized See COLLAPSE in OMP 3.0 Use NOWAIT if possible 2 threads 4 threads PARALLEL 0.5 µs 1.0 µs STATIC(1) 0.9 µs 1.3 µs STATIC(64) 0.4 µs 0.7 µs DYNAMIC(1) 34 µs 315 µs DYNAMIC(6 4) QC Cray XT4 1.2 µs 2.7 µs GUIDED(1) 15 µs 214 µs GUIDED(64) 3.3 µs 6.2 µs

40 Hybrid programming

41 Hybrid programming Parallel programming model combining: OpenMP parallelization over one node MPI parallelization between nodes Hybrid model closer to hardware model of SMP cluster, is it therefore always faster? No there is a large body of work suggesting its often slower There are a number of possible benefits and problems Analyze program and target platform to decide if the benefits might yield improvements

42 Hybrid parallel programming models 1. No overlapping communication and computation 1. MPI is called only outside parallel regions and by the master thread 2. MPI is called by several threads 2. Communication and computation overlap: while some of the thread communicate, the rest are executing an application 1. MPI is called only by the master thread 2. Communication is carried out with several threads 3. Each thread handles its own communication demands

43 MPI support for threading MPI standard defines four levels of support 0. MPI_THREAD_SINGLE Only one thread allowed 1. MPI_THREAD_FUNNELED Only master thread allowed to make MPI call 2. MPI_THREAD_SERIALIZED All threads allowed to make MPI calls, but not concurrently 3. MPI_THREAD_MULTIPLE No restrictions Some implementations support an additional model 0.5. MPI calls are allowed only outside parallel regions Returns MPI_THREAD_SINGLE

44 MPI support on Cray XT4/XT5 MPI_Init_thread(&argc,&argv,MPI_THREAD_MULTIPLE,&provided); printf("supports level %d of %d %d %d %d\n", provided, MPI_THREAD_SINGLE, MPI_THREAD_FUNNELED, MPI_THREAD_SERIALIZED, MPI_THREAD_MULTIPLE); Cray XT4 (xt-mpt ) > Supports level 1 of

45 MPI support on Cray XT4/XT5 MPI-library supports MPI_THREAD_FUNNELED Overlapping communication/computation still possible Non-blocking communication can be started in MASTER block Completes while parallel region computes Able to saturate the interconnect with only one thread communicating Might not be true on all architectures, possible problem for the funneled model

46 First Hybrid program int main(int argc, char *argv[]){ int rank,omp_rank,mpisupport; MPI_Init_thread(&argc,&argv, MPI_THREAD_FUNNELED,&mpisupport); MPI_Comm_rank(MPI_COMM_WORLD,&rank); #pragma omp parallel private(omp_rank) { omp_rank=omp_get_thread_num(); printf("%d %d \n",rank,omp_rank); } MPI_Finalize(); }

47 Communication Communication inside node replaced by direct memory reads/writes Improved throughput and latency Decreased overhead from MPI-library Aggregated messages In many (data-parallel) algorithms the messages are larger as the number MPI-processes are decreased Increased throughput on inter-node communication In some algorithms the number of messages are reduced E.g. All-to-all Restrictions on calling MPI routines Depends on level of support Only allowed outside parallel region - all other cores are idle MPI_THREAD_FUNNELED - Other threads can calculate MPI_THREAD_MULTIPLE - Best but often not available

48 Case study: All-to-all on QC XT4 Collective operations often performance bottlenecks Especially all-to-all operations Point-to-point implementation can be faster Hybrid implementation For all-to-all operations (maximum) number of transfers decreases by a factor of #threads 2 Size of message increases by a factor of #threads Allow overlapping communication and communication

49 Case study: All-to-all on QC XT4 Collective operations often performance bottlenecks Especially all-to-all operations Point-to-point implementation can be faster Hybrid implementation For all-to-all operations (maximum) number of transfers decreases by a factor of #threads 2 Size of message increases by a factor of #threads Allow overlapping communication and communication

50 Case study: All-to-all on QC XT4 Collective operations often performance bottlenecks Especially all-to-all operations Point-to-point implementation can be faster Hybrid implementation For all-to-all operations (maximum) number of transfers decreases by a factor of #threads 2 Size of message increases by a factor of #threads Allow overlapping communication and communication

51 Case study: All-to-all 40 Kbytes of data per node 400 Kbytes of data per node

52 Algorithmic issues The benefits of the hybrid approach are algorithm dependent, some examples: Limited parallelism in MPI-parallelization Additional levels of parallelism can be easier to implement with hybrid approach E.g. Grid-based algorithm only parallelized in one dimension E.g. Master-slave algorithms Embarrassingly parallel algorithms Can be used to speed up single tasks Can be used to increase system size Domain decomposition (see next case study)

53 Case study: Domain decomposition Number of atoms per cell is proportional to the number of threads Number of ghost particles is proportional to #threads -1/3 We can reduce communication by hybridizing the algorithm With four threads per process the number of ghost particles decreases by about 40%

54 Case study: Domain decomposition Number of atoms per cell is proportional to the number of threads Number of ghost particles is proportional to #threads -1/3 We can reduce communication by hybridizing the algorithm With four threads per process the number of ghost particles decreases by about 40%

55 Case study: Domain decomposition Fine-grained hybridization of MD code Parallel region entered each time the potential is evaluated Loop over atoms parallelized with static for Temporary array for forces Shared Separate space for each thread Avoids the need for synchronization when Newton s third law is used Results added to real force array at end of parallel region #pragma omp parallel {... zero(ptforce[thread][..][..])... #pragma omp for schedule(static,10) for (ii = 0; ii < atoms; ii++)... ptforce[thread][ii][..]+=... ptforce[thread][jj][..]+=... }... for(t=0;t<threads;t++) force[..][..]+=ptforce[t][..][..]...

56 Case study: Domain decomposition

57 Load balance Good load balance is harder and harder to achieve as the number of MPI-processes increases Hybrid approach decreases number of processes One can dynamically change the number of threads per process Can improve load balance Hardware of SMP - clusters restricts the usefulness E.g. Node with two QC processors (XT5) - two MPIprocesses which can have 1-7 threads

58 Case study: Master-slave algorithms Matrix multiplication Demonstration of a master-slave algorithm Scaling is improved by going to a coarse-grained hybrid model Utilizes the following benefits: + Better load-balancing due to fewer MPI-processes + Message aggregation and reduced communication

59 Overlapping communication/computation If level of support is at least MPI_THREAD_FUNNELED there are more options for overlapping Isend/Irecv are available as normal While master thread communicate other threads can calculate Can be difficult to utilize properly - load balancing tricky With enough threads master thread could be a dedicated communication thread

60 Memory issues The hybrid programming method can be used to decrease memory requirements Some algorithms have replicated data - can yield significant savings In domain decomposition algorithms there are fewer boundary data-points Improved cache usage Many processors have shared caches L2 in Intel Core2 L3 in AMD QC Shared data can reside in this cache - decreased cache pressure

61 Parallel I/O I/O is expensive and it is difficult to make it optimal Some approaches for parallel I/O MPI-2 I/O Single writer reduction Subset of writers/readers N writers/readers to N files Single writer Subset of writers N writers Hybrid MPI

62 Parallel I/O: a simple hybrid approach Every MPI process opens a file Good I/O BW No communication needed Large filesystem stress, slow open/closes Inconvenient as many files are created Hybridization: only one core per processor writes a shared array Achievable BW similar Decreases number of files by a factor of #threads Easy to implement Allows overlapping of communication/computation

63 Summary Hybrid approach is difficult, but sometimes useful Performance of hybrid approach is a tradeoff between greater overhead and decreased communication costs Direct benefits achieved without additional effort All-to-all collective operations 2-5 times faster Gives parallel IO with reduced file-system stress in the N- writers case Message aggregation We expect the potential benefits to be even greater on XT5

64 Questions!

OpenMP+F90 p OpenMP+F90

OpenMP+F90 p OpenMP+F90 OpenMP+F90 hmli@ustc.edu.cn - http://hpcjl.ustc.edu.cn OpenMP+F90 p. 1 OpenMP+F90 p. 2 OpenMP ccnuma Cache-Coherent Non-Uniform Memory Access SMP Symmetric MultiProcessing MPI MPP Massively Parallel Processing

More information

Advanced C Programming Winter Term 2008/09. Guest Lecture by Markus Thiele

Advanced C Programming Winter Term 2008/09. Guest Lecture by Markus Thiele Advanced C Programming Winter Term 2008/09 Guest Lecture by Markus Thiele Lecture 14: Parallel Programming with OpenMP Motivation: Why parallelize? The free lunch is over. Herb

More information

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines What is OpenMP? What does OpenMP stands for? What does OpenMP stands for? Open specifications for Multi

More information

OpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018

OpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018 OpenMP 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives

More information

Parallelising Scientific Codes Using OpenMP. Wadud Miah Research Computing Group

Parallelising Scientific Codes Using OpenMP. Wadud Miah Research Computing Group Parallelising Scientific Codes Using OpenMP Wadud Miah Research Computing Group Software Performance Lifecycle Scientific Programming Early scientific codes were mainly sequential and were executed on

More information

Data Handling in OpenMP

Data Handling in OpenMP Data Handling in OpenMP Manipulate data by threads By private: a thread initializes and uses a variable alone Keep local copies, such as loop indices By firstprivate: a thread repeatedly reads a variable

More information

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program Amdahl's Law About Data What is Data Race? Overview to OpenMP Components of OpenMP OpenMP Programming Model OpenMP Directives

More information

by system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by using function call

by system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by using function call OpenMP Syntax The OpenMP Programming Model Number of threads are determined by system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by

More information

Practical in Numerical Astronomy, SS 2012 LECTURE 12

Practical in Numerical Astronomy, SS 2012 LECTURE 12 Practical in Numerical Astronomy, SS 2012 LECTURE 12 Parallelization II. Open Multiprocessing (OpenMP) Lecturer Eduard Vorobyov. Email: eduard.vorobiev@univie.ac.at, raum 006.6 1 OpenMP is a shared memory

More information

Introduction to. Slides prepared by : Farzana Rahman 1

Introduction to. Slides prepared by : Farzana Rahman 1 Introduction to OpenMP Slides prepared by : Farzana Rahman 1 Definition of OpenMP Application Program Interface (API) for Shared Memory Parallel Programming Directive based approach with library support

More information

Shared Memory Programming with OpenMP

Shared Memory Programming with OpenMP Shared Memory Programming with OpenMP (An UHeM Training) Süha Tuna Informatics Institute, Istanbul Technical University February 12th, 2016 2 Outline - I Shared Memory Systems Threaded Programming Model

More information

A Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh

A Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh A Short Introduction to OpenMP Mark Bull, EPCC, University of Edinburgh Overview Shared memory systems Basic Concepts in Threaded Programming Basics of OpenMP Parallel regions Parallel loops 2 Shared memory

More information

OpenMP C and C++ Application Program Interface Version 1.0 October Document Number

OpenMP C and C++ Application Program Interface Version 1.0 October Document Number OpenMP C and C++ Application Program Interface Version 1.0 October 1998 Document Number 004 2229 001 Contents Page v Introduction [1] 1 Scope............................. 1 Definition of Terms.........................

More information

HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP)

HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP) HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP) V. Akishina, I. Kisel, G. Kozlov, I. Kulakov, M. Pugach, M. Zyzak Goethe University of Frankfurt am Main 2015 Task Parallelism Parallelization

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP Le Yan Scientific computing consultant User services group High Performance Computing @ LSU Goals Acquaint users with the concept of shared memory parallelism Acquaint users with

More information

Standard promoted by main manufacturers Fortran. Structure: Directives, clauses and run time calls

Standard promoted by main manufacturers   Fortran. Structure: Directives, clauses and run time calls OpenMP Introducción Directivas Regiones paralelas Worksharing sincronizaciones Visibilidad datos Implementación OpenMP: introduction Standard promoted by main manufacturers http://www.openmp.org, http://www.compunity.org

More information

Parallel and Distributed Programming. OpenMP

Parallel and Distributed Programming. OpenMP Parallel and Distributed Programming OpenMP OpenMP Portability of software SPMD model Detailed versions (bindings) for different programming languages Components: directives for compiler library functions

More information

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP COMP4510 Introduction to Parallel Computation Shared Memory and OpenMP Thanks to Jon Aronsson (UofM HPC consultant) for some of the material in these notes. Outline (cont d) Shared Memory and OpenMP Including

More information

Standard promoted by main manufacturers Fortran

Standard promoted by main manufacturers  Fortran OpenMP Introducción Directivas Regiones paralelas Worksharing sincronizaciones Visibilidad datos Implementación OpenMP: introduction Standard promoted by main manufacturers http://www.openmp.org Fortran

More information

OPENMP OPEN MULTI-PROCESSING

OPENMP OPEN MULTI-PROCESSING OPENMP OPEN MULTI-PROCESSING OpenMP OpenMP is a portable directive-based API that can be used with FORTRAN, C, and C++ for programming shared address space machines. OpenMP provides the programmer with

More information

Lecture 4: OpenMP Open Multi-Processing

Lecture 4: OpenMP Open Multi-Processing CS 4230: Parallel Programming Lecture 4: OpenMP Open Multi-Processing January 23, 2017 01/23/2017 CS4230 1 Outline OpenMP another approach for thread parallel programming Fork-Join execution model OpenMP

More information

COSC 6374 Parallel Computation. Introduction to OpenMP(I) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)

COSC 6374 Parallel Computation. Introduction to OpenMP(I) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel) COSC 6374 Parallel Computation Introduction to OpenMP(I) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel) Edgar Gabriel Fall 2014 Introduction Threads vs. processes Recap of

More information

Shared Memory programming paradigm: openmp

Shared Memory programming paradigm: openmp IPM School of Physics Workshop on High Performance Computing - HPC08 Shared Memory programming paradigm: openmp Luca Heltai Stefano Cozzini SISSA - Democritos/INFM

More information

OpenMP on Ranger and Stampede (with Labs)

OpenMP on Ranger and Stampede (with Labs) OpenMP on Ranger and Stampede (with Labs) Steve Lantz Senior Research Associate Cornell CAC Parallel Computing at TACC: Ranger to Stampede Transition November 6, 2012 Based on materials developed by Kent

More information

Introduction [1] 1. Directives [2] 7

Introduction [1] 1. Directives [2] 7 OpenMP Fortran Application Program Interface Version 2.0, November 2000 Contents Introduction [1] 1 Scope............................. 1 Glossary............................ 1 Execution Model.........................

More information

Parallel Programming

Parallel Programming Parallel Programming OpenMP Nils Moschüring PhD Student (LMU) Nils Moschüring PhD Student (LMU), OpenMP 1 1 Overview What is parallel software development Why do we need parallel computation? Problems

More information

OpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013

OpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013 OpenMP António Abreu Instituto Politécnico de Setúbal 1 de Março de 2013 António Abreu (Instituto Politécnico de Setúbal) OpenMP 1 de Março de 2013 1 / 37 openmp what? It s an Application Program Interface

More information

Shared Memory Parallelism - OpenMP

Shared Memory Parallelism - OpenMP Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (http://www.llnl.gov/computing/tutorials/openmp/#introduction) OpenMP sc99 tutorial

More information

OpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means

OpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means High Performance Computing: Concepts, Methods & Means OpenMP Programming Prof. Thomas Sterling Department of Computer Science Louisiana State University February 8 th, 2007 Topics Introduction Overview

More information

Review. Tasking. 34a.cpp. Lecture 14. Work Tasking 5/31/2011. Structured block. Parallel construct. Working-Sharing contructs.

Review. Tasking. 34a.cpp. Lecture 14. Work Tasking 5/31/2011. Structured block. Parallel construct. Working-Sharing contructs. Review Lecture 14 Structured block Parallel construct clauses Working-Sharing contructs for, single, section for construct with different scheduling strategies 1 2 Tasking Work Tasking New feature in OpenMP

More information

OpenMP: Open Multiprocessing

OpenMP: Open Multiprocessing OpenMP: Open Multiprocessing Erik Schnetter June 7, 2012, IHPC 2012, Iowa City Outline 1. Basic concepts, hardware architectures 2. OpenMP Programming 3. How to parallelise an existing code 4. Advanced

More information

PC to HPC. Xiaoge Wang ICER Jan 27, 2016

PC to HPC. Xiaoge Wang ICER Jan 27, 2016 PC to HPC Xiaoge Wang ICER Jan 27, 2016 About This Series Format: talk + discussion Focus: fundamentals of parallel compucng (i) parcconing: data parccon and task parccon; (ii) communicacon: data sharing

More information

Parallel Programming with OpenMP. CS240A, T. Yang

Parallel Programming with OpenMP. CS240A, T. Yang Parallel Programming with OpenMP CS240A, T. Yang 1 A Programmer s View of OpenMP What is OpenMP? Open specification for Multi-Processing Standard API for defining multi-threaded shared-memory programs

More information

<Insert Picture Here> OpenMP on Solaris

<Insert Picture Here> OpenMP on Solaris 1 OpenMP on Solaris Wenlong Zhang Senior Sales Consultant Agenda What s OpenMP Why OpenMP OpenMP on Solaris 3 What s OpenMP Why OpenMP OpenMP on Solaris

More information

Introduction to OpenMP

Introduction to OpenMP Presentation Introduction to OpenMP Martin Cuma Center for High Performance Computing University of Utah mcuma@chpc.utah.edu September 9, 2004 http://www.chpc.utah.edu 4/13/2006 http://www.chpc.utah.edu

More information

SHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008

SHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008 SHARCNET Workshop on Parallel Computing Hugh Merz Laurentian University May 2008 What is Parallel Computing? A computational method that utilizes multiple processing elements to solve a problem in tandem

More information

Introduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah

Introduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah Introduction to OpenMP Martin Čuma Center for High Performance Computing University of Utah mcuma@chpc.utah.edu Overview Quick introduction. Parallel loops. Parallel loop directives. Parallel sections.

More information

Shared Memory Parallelism using OpenMP

Shared Memory Parallelism using OpenMP Indian Institute of Science Bangalore, India भ रत य व ज ञ न स स थ न ब गल र, भ रत SE 292: High Performance Computing [3:0][Aug:2014] Shared Memory Parallelism using OpenMP Yogesh Simmhan Adapted from: o

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP Le Yan HPC Consultant User Services Goals Acquaint users with the concept of shared memory parallelism Acquaint users with the basics of programming with OpenMP Discuss briefly the

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP Xiaoxu Guan High Performance Computing, LSU April 6, 2016 LSU HPC Training Series, Spring 2016 p. 1/44 Overview Overview of Parallel Computing LSU HPC Training Series, Spring 2016

More information

Department of Informatics V. HPC-Lab. Session 2: OpenMP M. Bader, A. Breuer. Alex Breuer

Department of Informatics V. HPC-Lab. Session 2: OpenMP M. Bader, A. Breuer. Alex Breuer HPC-Lab Session 2: OpenMP M. Bader, A. Breuer Meetings Date Schedule 10/13/14 Kickoff 10/20/14 Q&A 10/27/14 Presentation 1 11/03/14 H. Bast, Intel 11/10/14 Presentation 2 12/01/14 Presentation 3 12/08/14

More information

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads... OPENMP PERFORMANCE 2 A common scenario... So I wrote my OpenMP program, and I checked it gave the right answers, so I ran some timing tests, and the speedup was, well, a bit disappointing really. Now what?.

More information

EPL372 Lab Exercise 5: Introduction to OpenMP

EPL372 Lab Exercise 5: Introduction to OpenMP EPL372 Lab Exercise 5: Introduction to OpenMP References: https://computing.llnl.gov/tutorials/openmp/ http://openmp.org/wp/openmp-specifications/ http://openmp.org/mp-documents/openmp-4.0-c.pdf http://openmp.org/mp-documents/openmp4.0.0.examples.pdf

More information

Barbara Chapman, Gabriele Jost, Ruud van der Pas

Barbara Chapman, Gabriele Jost, Ruud van der Pas Using OpenMP Portable Shared Memory Parallel Programming Barbara Chapman, Gabriele Jost, Ruud van der Pas The MIT Press Cambridge, Massachusetts London, England c 2008 Massachusetts Institute of Technology

More information

Mango DSP Top manufacturer of multiprocessing video & imaging solutions.

Mango DSP Top manufacturer of multiprocessing video & imaging solutions. 1 of 11 3/3/2005 10:50 AM Linux Magazine February 2004 C++ Parallel Increase application performance without changing your source code. Mango DSP Top manufacturer of multiprocessing video & imaging solutions.

More information

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono OpenMP Algoritmi e Calcolo Parallelo References Useful references Using OpenMP: Portable Shared Memory Parallel Programming, Barbara Chapman, Gabriele Jost and Ruud van der Pas OpenMP.org http://openmp.org/

More information

Introduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah

Introduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah Introduction to OpenMP Martin Čuma Center for High Performance Computing University of Utah mcuma@chpc.utah.edu Overview Quick introduction. Parallel loops. Parallel loop directives. Parallel sections.

More information

HPC Workshop University of Kentucky May 9, 2007 May 10, 2007

HPC Workshop University of Kentucky May 9, 2007 May 10, 2007 HPC Workshop University of Kentucky May 9, 2007 May 10, 2007 Part 3 Parallel Programming Parallel Programming Concepts Amdahl s Law Parallel Programming Models Tools Compiler (Intel) Math Libraries (Intel)

More information

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 Message passing vs. Shared memory Client Client Client Client send(msg) recv(msg) send(msg) recv(msg) MSG MSG MSG IPC Shared

More information

Introduction to Standard OpenMP 3.1

Introduction to Standard OpenMP 3.1 Introduction to Standard OpenMP 3.1 Massimiliano Culpo - m.culpo@cineca.it Gian Franco Marras - g.marras@cineca.it CINECA - SuperComputing Applications and Innovation Department 1 / 59 Outline 1 Introduction

More information

ECE 574 Cluster Computing Lecture 10

ECE 574 Cluster Computing Lecture 10 ECE 574 Cluster Computing Lecture 10 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 1 October 2015 Announcements Homework #4 will be posted eventually 1 HW#4 Notes How granular

More information

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing NTNU, IMF February 16. 2018 1 Recap: Distributed memory programming model Parallelism with MPI. An MPI execution is started

More information

OpenMP Application Program Interface

OpenMP Application Program Interface OpenMP Application Program Interface DRAFT Version.1.0-00a THIS IS A DRAFT AND NOT FOR PUBLICATION Copyright 1-0 OpenMP Architecture Review Board. Permission to copy without fee all or part of this material

More information

Lab: Scientific Computing Tsunami-Simulation

Lab: Scientific Computing Tsunami-Simulation Lab: Scientific Computing Tsunami-Simulation Session 4: Optimization and OMP Sebastian Rettenberger, Michael Bader 23.11.15 Session 4: Optimization and OMP, 23.11.15 1 Department of Informatics V Linux-Cluster

More information

A brief introduction to OpenMP

A brief introduction to OpenMP A brief introduction to OpenMP Alejandro Duran Barcelona Supercomputing Center Outline 1 Introduction 2 Writing OpenMP programs 3 Data-sharing attributes 4 Synchronization 5 Worksharings 6 Task parallelism

More information

MPI & OpenMP Mixed Hybrid Programming

MPI & OpenMP Mixed Hybrid Programming MPI & OpenMP Mixed Hybrid Programming Berk ONAT İTÜ Bilişim Enstitüsü 22 Haziran 2012 Outline Introduc/on Share & Distributed Memory Programming MPI & OpenMP Advantages/Disadvantages MPI vs. OpenMP Why

More information

OpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.

OpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico. OpenMP and MPI Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico November 16, 2011 CPD (DEI / IST) Parallel and Distributed Computing 18

More information

OpenMP Shared Memory Programming

OpenMP Shared Memory Programming OpenMP Shared Memory Programming John Burkardt, Information Technology Department, Virginia Tech.... Mathematics Department, Ajou University, Suwon, Korea, 13 May 2009.... http://people.sc.fsu.edu/ jburkardt/presentations/

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Portland State University ECE 588/688 Introduction to Parallel Computing Reference: Lawrence Livermore National Lab Tutorial https://computing.llnl.gov/tutorials/parallel_comp/ Copyright by Alaa Alameldeen

More information

Shared memory programming

Shared memory programming CME342- Parallel Methods in Numerical Analysis Shared memory programming May 14, 2014 Lectures 13-14 Motivation Popularity of shared memory systems is increasing: Early on, DSM computers (SGI Origin 3000

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP Le Yan Objectives of Training Acquaint users with the concept of shared memory parallelism Acquaint users with the basics of programming with OpenMP Memory System: Shared Memory

More information

Introduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah

Introduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah Introduction to OpenMP Martin Čuma Center for High Performance Computing University of Utah m.cuma@utah.edu Overview Quick introduction. Parallel loops. Parallel loop directives. Parallel sections. Some

More information

OpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.

OpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico. OpenMP and MPI Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico November 15, 2010 José Monteiro (DEI / IST) Parallel and Distributed Computing

More information

[Potentially] Your first parallel application

[Potentially] Your first parallel application [Potentially] Your first parallel application Compute the smallest element in an array as fast as possible small = array[0]; for( i = 0; i < N; i++) if( array[i] < small ) ) small = array[i] 64-bit Intel

More information

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads... OPENMP PERFORMANCE 2 A common scenario... So I wrote my OpenMP program, and I checked it gave the right answers, so I ran some timing tests, and the speedup was, well, a bit disappointing really. Now what?.

More information

Overview: The OpenMP Programming Model

Overview: The OpenMP Programming Model Overview: The OpenMP Programming Model motivation and overview the parallel directive: clauses, equivalent pthread code, examples the for directive and scheduling of loop iterations Pi example in OpenMP

More information

Parallel Computing Using OpenMP/MPI. Presented by - Jyotsna 29/01/2008

Parallel Computing Using OpenMP/MPI. Presented by - Jyotsna 29/01/2008 Parallel Computing Using OpenMP/MPI Presented by - Jyotsna 29/01/2008 Serial Computing Serially solving a problem Parallel Computing Parallelly solving a problem Parallel Computer Memory Architecture Shared

More information

Programming with Shared Memory PART II. HPC Fall 2012 Prof. Robert van Engelen

Programming with Shared Memory PART II. HPC Fall 2012 Prof. Robert van Engelen Programming with Shared Memory PART II HPC Fall 2012 Prof. Robert van Engelen Overview Sequential consistency Parallel programming constructs Dependence analysis OpenMP Autoparallelization Further reading

More information

CSL 860: Modern Parallel

CSL 860: Modern Parallel CSL 860: Modern Parallel Computation Hello OpenMP #pragma omp parallel { // I am now thread iof n switch(omp_get_thread_num()) { case 0 : blah1.. case 1: blah2.. // Back to normal Parallel Construct Extremely

More information

Session 4: Parallel Programming with OpenMP

Session 4: Parallel Programming with OpenMP Session 4: Parallel Programming with OpenMP Xavier Martorell Barcelona Supercomputing Center Agenda Agenda 10:00-11:00 OpenMP fundamentals, parallel regions 11:00-11:30 Worksharing constructs 11:30-12:00

More information

Parallel Programming using OpenMP

Parallel Programming using OpenMP 1 OpenMP Multithreaded Programming 2 Parallel Programming using OpenMP OpenMP stands for Open Multi-Processing OpenMP is a multi-vendor (see next page) standard to perform shared-memory multithreading

More information

Parallel Programming using OpenMP

Parallel Programming using OpenMP 1 Parallel Programming using OpenMP Mike Bailey mjb@cs.oregonstate.edu openmp.pptx OpenMP Multithreaded Programming 2 OpenMP stands for Open Multi-Processing OpenMP is a multi-vendor (see next page) standard

More information

OpenMP. A parallel language standard that support both data and functional Parallelism on a shared memory system

OpenMP. A parallel language standard that support both data and functional Parallelism on a shared memory system OpenMP A parallel language standard that support both data and functional Parallelism on a shared memory system Use by system programmers more than application programmers Considered a low level primitives

More information

Using OpenMP. Rebecca Hartman-Baker Oak Ridge National Laboratory

Using OpenMP. Rebecca Hartman-Baker Oak Ridge National Laboratory Using OpenMP Rebecca Hartman-Baker Oak Ridge National Laboratory hartmanbakrj@ornl.gov 2004-2009 Rebecca Hartman-Baker. Reproduction permitted for non-commercial, educational use only. Outline I. About

More information

An Introduction to OpenMP

An Introduction to OpenMP Dipartimento di Ingegneria Industriale e dell'informazione University of Pavia December 4, 2017 Recap Parallel machines are everywhere Many architectures, many programming model. Among them: multithreading.

More information

UvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP

UvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP Parallel Programming with Compiler Directives OpenMP Clemens Grelck University of Amsterdam UvA-SARA High Performance Computing Course June 2013 OpenMP at a Glance Loop Parallelization Scheduling Parallel

More information

Distributed Systems + Middleware Concurrent Programming with OpenMP

Distributed Systems + Middleware Concurrent Programming with OpenMP Distributed Systems + Middleware Concurrent Programming with OpenMP Gianpaolo Cugola Dipartimento di Elettronica e Informazione Politecnico, Italy cugola@elet.polimi.it http://home.dei.polimi.it/cugola

More information

Parallel Programming: OpenMP

Parallel Programming: OpenMP Parallel Programming: OpenMP Xianyi Zeng xzeng@utep.edu Department of Mathematical Sciences The University of Texas at El Paso. November 10, 2016. An Overview of OpenMP OpenMP: Open Multi-Processing An

More information

Amdahl s Law. AMath 483/583 Lecture 13 April 25, Amdahl s Law. Amdahl s Law. Today: Amdahl s law Speed up, strong and weak scaling OpenMP

Amdahl s Law. AMath 483/583 Lecture 13 April 25, Amdahl s Law. Amdahl s Law. Today: Amdahl s law Speed up, strong and weak scaling OpenMP AMath 483/583 Lecture 13 April 25, 2011 Amdahl s Law Today: Amdahl s law Speed up, strong and weak scaling OpenMP Typically only part of a computation can be parallelized. Suppose 50% of the computation

More information

COSC 6385 Computer Architecture - Multi Processor Systems

COSC 6385 Computer Architecture - Multi Processor Systems COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:

More information

An Introduction to OpenMP

An Introduction to OpenMP An Introduction to OpenMP U N C L A S S I F I E D Slide 1 What Is OpenMP? OpenMP Is: An Application Program Interface (API) that may be used to explicitly direct multi-threaded, shared memory parallelism

More information

OpenMP. Application Program Interface. CINECA, 14 May 2012 OpenMP Marco Comparato

OpenMP. Application Program Interface. CINECA, 14 May 2012 OpenMP Marco Comparato OpenMP Application Program Interface Introduction Shared-memory parallelism in C, C++ and Fortran compiler directives library routines environment variables Directives single program multiple data (SPMD)

More information

Introduction to OpenMP. Rogelio Long CS 5334/4390 Spring 2014 February 25 Class

Introduction to OpenMP. Rogelio Long CS 5334/4390 Spring 2014 February 25 Class Introduction to OpenMP Rogelio Long CS 5334/4390 Spring 2014 February 25 Class Acknowledgment These slides are adapted from the Lawrence Livermore OpenMP Tutorial by Blaise Barney at https://computing.llnl.gov/tutorials/openmp/

More information

Parallel Programming with OpenMP. CS240A, T. Yang, 2013 Modified from Demmel/Yelick s and Mary Hall s Slides

Parallel Programming with OpenMP. CS240A, T. Yang, 2013 Modified from Demmel/Yelick s and Mary Hall s Slides Parallel Programming with OpenMP CS240A, T. Yang, 203 Modified from Demmel/Yelick s and Mary Hall s Slides Introduction to OpenMP What is OpenMP? Open specification for Multi-Processing Standard API for

More information

Lecture 14: Mixed MPI-OpenMP programming. Lecture 14: Mixed MPI-OpenMP programming p. 1

Lecture 14: Mixed MPI-OpenMP programming. Lecture 14: Mixed MPI-OpenMP programming p. 1 Lecture 14: Mixed MPI-OpenMP programming Lecture 14: Mixed MPI-OpenMP programming p. 1 Overview Motivations for mixed MPI-OpenMP programming Advantages and disadvantages The example of the Jacobi method

More information

Programming with Shared Memory PART II. HPC Fall 2007 Prof. Robert van Engelen

Programming with Shared Memory PART II. HPC Fall 2007 Prof. Robert van Engelen Programming with Shared Memory PART II HPC Fall 2007 Prof. Robert van Engelen Overview Parallel programming constructs Dependence analysis OpenMP Autoparallelization Further reading HPC Fall 2007 2 Parallel

More information

OpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2.

OpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2. OpenMP Overview in 30 Minutes Christian Terboven 06.12.2010 / Aachen, Germany Stand: 03.12.2010 Version 2.3 Rechen- und Kommunikationszentrum (RZ) Agenda OpenMP: Parallel Regions,

More information

Topics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP

Topics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP Topics Lecture 11 Introduction OpenMP Some Examples Library functions Environment variables 1 2 Introduction Shared Memory Parallelization OpenMP is: a standard for parallel programming in C, C++, and

More information

COSC 6374 Parallel Computation. Introduction to OpenMP. Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)

COSC 6374 Parallel Computation. Introduction to OpenMP. Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel) COSC 6374 Parallel Computation Introduction to OpenMP Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel) Edgar Gabriel Fall 2015 OpenMP Provides thread programming model at a

More information

COMP4300/8300: The OpenMP Programming Model. Alistair Rendell. Specifications maintained by OpenMP Architecture Review Board (ARB)

COMP4300/8300: The OpenMP Programming Model. Alistair Rendell. Specifications maintained by OpenMP Architecture Review Board (ARB) COMP4300/8300: The OpenMP Programming Model Alistair Rendell See: www.openmp.org Introduction to High Performance Computing for Scientists and Engineers, Hager and Wellein, Chapter 6 & 7 High Performance

More information

COMP4300/8300: The OpenMP Programming Model. Alistair Rendell

COMP4300/8300: The OpenMP Programming Model. Alistair Rendell COMP4300/8300: The OpenMP Programming Model Alistair Rendell See: www.openmp.org Introduction to High Performance Computing for Scientists and Engineers, Hager and Wellein, Chapter 6 & 7 High Performance

More information

OpenMP 4. CSCI 4850/5850 High-Performance Computing Spring 2018

OpenMP 4. CSCI 4850/5850 High-Performance Computing Spring 2018 OpenMP 4 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives

More information

OpenMP examples. Sergeev Efim. Singularis Lab, Ltd. Senior software engineer

OpenMP examples. Sergeev Efim. Singularis Lab, Ltd. Senior software engineer OpenMP examples Sergeev Efim Senior software engineer Singularis Lab, Ltd. OpenMP Is: An Application Program Interface (API) that may be used to explicitly direct multi-threaded, shared memory parallelism.

More information

Programming Shared-memory Platforms with OpenMP. Xu Liu

Programming Shared-memory Platforms with OpenMP. Xu Liu Programming Shared-memory Platforms with OpenMP Xu Liu Introduction to OpenMP OpenMP directives concurrency directives parallel regions loops, sections, tasks Topics for Today synchronization directives

More information

OpenMP: Open Multiprocessing

OpenMP: Open Multiprocessing OpenMP: Open Multiprocessing Erik Schnetter May 20-22, 2013, IHPC 2013, Iowa City 2,500 BC: Military Invents Parallelism Outline 1. Basic concepts, hardware architectures 2. OpenMP Programming 3. How to

More information

COMP Parallel Computing. SMM (2) OpenMP Programming Model

COMP Parallel Computing. SMM (2) OpenMP Programming Model COMP 633 - Parallel Computing Lecture 7 September 12, 2017 SMM (2) OpenMP Programming Model Reading for next time look through sections 7-9 of the Open MP tutorial Topics OpenMP shared-memory parallel

More information

Shared Memory Programming Model

Shared Memory Programming Model Shared Memory Programming Model Ahmed El-Mahdy and Waleed Lotfy What is a shared memory system? Activity! Consider the board as a shared memory Consider a sheet of paper in front of you as a local cache

More information

Hybrid MPI/OpenMP parallelization. Recall: MPI uses processes for parallelism. Each process has its own, separate address space.

Hybrid MPI/OpenMP parallelization. Recall: MPI uses processes for parallelism. Each process has its own, separate address space. Hybrid MPI/OpenMP parallelization Recall: MPI uses processes for parallelism. Each process has its own, separate address space. Thread parallelism (such as OpenMP or Pthreads) can provide additional parallelism

More information

OpenMP Application Program Interface

OpenMP Application Program Interface OpenMP Application Program Interface Version.0 - RC - March 01 Public Review Release Candidate Copyright 1-01 OpenMP Architecture Review Board. Permission to copy without fee all or part of this material

More information

Review. 35a.cpp. 36a.cpp. Lecture 13 5/29/2012. Compiler Directives. Library Functions Environment Variables

Review. 35a.cpp. 36a.cpp. Lecture 13 5/29/2012. Compiler Directives. Library Functions Environment Variables Review Lecture 3 Compiler Directives Conditional compilation Parallel construct Work-sharing constructs for, section, single Work-tasking Synchronization Library Functions Environment Variables 2 35a.cpp

More information