Introduction Hybrid Programming. Sebastian von Alfthan 13/8/2008 CSC the Finnish IT Center for Science PRACE summer school 2008
|
|
- Jonas Webb
- 5 years ago
- Views:
Transcription
1 Introduction Hybrid Programming Sebastian von Alfthan 13/8/2008 CSC the Finnish IT Center for Science PRACE summer school 2008
2 Contents Introduction OpenMP in detail OpenMP programming models & performance Hybrid programming
3 The need for improved parallelism Top 500 trend show that in years #1 has a peak of 1 EF #500 has a peak of 1 PF General purpose processors are not getting (very much) faster MPP:s Massively parallel procesor (MPP) Very large number of nodes connected with a fast interconnect Symmetric multiprocessor nodes (SMP) Hybrid architectures Cell, GPGPU...
4 Hybrid programming - Mixed mode Parallel programming model combining: Parallelization over one SMP node Shared memory parallelization OpenMP de facto standard Posix threads (Pthreads) Low level Parallelization between nodes Distributed memory MPI ( obsolete ) PVM Here: MPI + OpenMP Is it faster? C3 C4 Sometimes In most cases not... C1 C2 Memory OpenMP L3 MPI Seastar2
5 OpenMP: a brief introduction An API that can be used for multithreaded shared memory parallelization Fortran 77/9X and C/C++ are supported Current version implemented in compilers is 2.5 (this talk) OpenMP 3.0 specs have been released Enables one to parallelize one part of the program at a time Easy to do quick and dirty prototyping Efficient and well scaling code still requires effort
6 OpenMP OpenMP API has three components: 1) Compiler directives Expresses shared memory parallelization Preceded by sentinel, can compile serial version 2) Runtime library routines Small number of library functions Example: get number of threads, get rank of thread.. Can be discarded in serial version via conditional compiling 3) Environment variables Bind threads to cores Specify number of threads
7 A simple OpenMP program: F95 PROGRAM demo1 USE omp_lib INTEGER::omp_rank ( private(omp_rank!$omp parallel () omp_rank=omp_get_thread_num WRITE(*,*) "thread is ",omp_rank!$omp end parallel END PROGRAM demo1 >export OMP_NUM_THREADS=2 >ftn -mp demo1.f95 >aprun -n 1./a.out thread is 0 thread is 1
8 A simple OpenMP program: C #include <stdio.h> #include "omp.h int main(int argc,char *argv[]){ int omp_rank; #pragma omp parallel private(omp_rank) { } omp_rank=omp_get_thread_num(); printf("thread is %d\n",omp_rank); } >export OMP_NUM_THREADS=2 >cc -mp demo1.c >aprun -n 1./a.out thread is 0 thread is 1
9 OpenMP in detail
10 Directives Sentinels precede each OpenMP directive C/C++ #pragma omp Fortran free form!$omp Fortran fixed form!$omp c$omp *$omp Space in sixth column begins directive No space depicts continuation line
11 Directives: parallel Starts a parallel region Prior to it only one thread, master Creates a team of threads: master+slave threads At end of block is a barrier and all shared data is synchronized Clauses if(logical expression) private(list) shared(list) default(private/shared/none) firstprivate(list) reduction(operator:list) copyin(list) num_threads(integer)!$omp parallel!$omp end parallel
12 Directives: parallel clauses private(list) Comma separated list with private variables Private variables are on private stack of each thread Undefined initial value Undefined value after parallel region Firstprivate(list) Private variable with initial value that is the same as the original objects Lastprivate(list) Private variable The Thread that performs the last parallel iteration step or section copies its value to the original object
13 Directives: parallel clauses shared(list) Comma separated list with private variables All threads can write to, and read from a shared variable Race condition if other threads access a variable while one writes to it Variables are shared by default (with some exceptions) default(private/shared/none) Sets default for variables to be shared, private or not defined In C/C++ private is not allowed None can be useful for debugging as each variable has to be defined manually
14 Directives: parallel clauses reduction(operator:list) Performs reduction on the (scalar) variables in list Private reduction variable is created for each threads partial result Private reduction variable initialized to operators identity value (see table) After parallel region the reduction operation is applied to private variables and result is aggregated to the shared variable Operator Fortran Initial value Operator C/C++ Initial value * 1 * 1.AND..TRUE. && 1.OR..FALSE. 0 MAX MIN Smallest value Largest value
15 Work-sharing directives: DO/for Directive instructing compiler to share the work of a DO loop Fortran: $OMP DO C/C++: #pragma for Directive is inside parallel region prior to DO-loop Can also be combined with parallel: $OMP PARALLEL DO Loop variable is private by default in Fortran Not in C/C++, there it has to be explicitly defined to be private sumvar=0!$omp parallel do reduction(+:sumvar) do i=1,10 sumvar=sumvar+1 end do!$omp end parallel do WRITE(*,*) "sum is ",sumvar sum is 10
16 Work-sharing directives: DO/for clauses Clauses schedule (type [,chunk]) ordered nowait private(list) firstprivate(list) lastprivate(list) shared(list) reduction(operator:list) schedule(type [,chunksize]) Defines how the iterations are divided over the threads Static Dynamic Guided ordered Iterations performed in same order as in serial program nowait No barrier or synchronization at end of loop In Fortran it is seclared in the!$omp end directive
17 Work-sharing directives: DO/for schedules schedule(static,[chunksize]) Iterations are divided into chunk-sized parts Chunks are statically assigned to threads Default chunk-size is iterations/threads Low overhead Load balance can be problematic The static schedule is used by default if the schedule not defined (implementation dependent)
18 Work-sharing directives: DO/for schedules schedule(dynamic,[chunksize]) Iterations are divided into chunk-sized parts After a thread completes a chunk, it is dynamically assigned a new one Default value of chunksize is 1 Higher overhead Better load balance for unbalanced iterations schedule(guided,[chunksize]) Like dynamic but the size of the chunks decreases exponentially Size of first chunk is implementation-dependent Size of smallest chunk is chunksize Default value of chunksize is 1
19 Work-sharing directives: sections Defines sections that are executed by different threads!$omp sections!$omp section write(*,*) "thread",omp_get_thread_num(),"section A"!$omp section write(*,*) "thread",omp_get_thread_num(),"section B"!$omp end sections thread 0 section A thread 1 section B
20 Directives: Master $OMP MASTER Specifies that the region should only be executed by the master thread!$omp master!code for master!$omp end master #pragma omp master { //code for master }
21 Directives: Single $OMP SINGLE Specifies that the region should only be executed by one arbitrary thread Implicit barrier at end directive (unless nowait is defined) Clauses private(list) firstprivate(list) copyprivate(list) nowait(list)!$omp single!code for any thread!$omp end single #pragma omp single { //code for any thread }
22 Directives: Critical!$OMP CRITICAL [name] A section that should only be executed by one thread at a time Optional name specifies different critical section All unnamed critical sections are treated as the same section!$omp critical!code that is not thread-safe!$omp end critical #pragma omp critical { //code that is not thread safe }
23 Directives: Atomic!$OMP ATOMIC Specifies that a memory location is to be updated atomically, only one thread at a time Applies to only one line Only certain kinds of expressions allowed C/C++ a= a+= a-= a*= a/= a++ ++a a-- --a!$omp atomic var=... #pragma omp atomic var=...;
24 Directives: Barrier! $OMP BARRIER Synchronizes all threads at this point When a thread reaches a barrier it only continues after all threads have reached it!$omp BARRIER #pragma omp barrier
25 Directives: Flush! $OMP FLUSH [list] Synchronizes the memory of all threads Makes sure each thread has a consistent view of memory at this point Also required on cache-coherent systems, changes to variables could still reside in registers Can also only flush variables in list, if not defined all variables are flushed Implicit flush for several directives: BARRIER PARALLEL Both entry and exit CRITICAL Both entry and exit DO - On exit SECTIONS On exit SINGLE On exit ORDERED Both entry and exit
26 OpenMP: Run time library routines OMP_SET_NUM_THREADS OMP_GET_NUM_THREADS OMP_GET_MAX_THREADS OMP_GET_THREAD_NUM OMP_GET_NUM_PROCS OMP_IN_PARALLEL OMP_SET_DYNAMIC OMP_GET_DYNAMIC OMP_SET_NESTED OMP_GET_NESTED OMP_INIT_LOCK OMP_DESTROY_LOCK OMP_SET_LOCK OMP_UNSET_LOCK OMP_TEST_LOCK OMP_GET_WTIME OMP_GET_WTICK
27 OpenMP: Important environment variables OMP_NUM_THREADS Maximum number of threads OMP_NESTED TRUE or FALSE Enables or disables nested parallelism Not always supported Compiler specific flags for binding threads to cores PGI setenv MP_BIND yes Pathscale setenv PSC_OMP_AFFINITY TRUE setenv PSC_OMP_AFFINITY_GLOBAL TRUE GNU setenv GOMP_CPU_AFFINITY "0-3"
28 OpenMP: Compilation flags PGI -mp=nonuma Pathscale -mp GNU -fopenmp
29 OpenMP programming models & performance
30 OpenMP: Programming models Fine-grained: loop level, several local parallel regions PARALLEL DO Can be introduced in piecewise fashion Often simple to implement Performance benefits are limited Coarse-grained: parallel region extends over larger segments (or whole program) PARALLEL OMP_GET_NUM_THREADS OMP_GET_THREAD_NUM Divide work based on thread number Similar to MPI programming Often demands larger changes to program and algorithm Larger potential benefits
31 Case study: Matrix-multiplication Naive serial matrixmultiplication Slow algorithm - do not use in real code (BLAS) n=m=p=1000 Execution time 1.92 t1=omp_get_wtime() DO j=1,m DO i=1,n DO k=1,p c(i,j)=a(i,k)*b(k,j) END DO END DO END DO t2=omp_get_wtime() WRITE(*,*) "Execution time"&,t2-t1
32 Case study: Matrix-multiplication Fine-grained parallelization Static scheduling j,i,k private All other variables shared j-loop parallelized 4-threads (n=m=p=1000) Execution time 0.479s Perfect speedup in this simple case t1=omp_get_wtime()!$omp PARALLEL DO DO j=1,m DO i=1,n DO k=1,p c(i,j)=a(i,k)*b(k,j) END DO END DO END DO!$END OMP PARALLEL DO t2=omp_get_wtime() WRITE(*,*) "Execution time"&,t2-t1
33 Case study: Matrix-multiplication Fine-grained parallelization Static scheduling j,i,k private All other variables shared k-loop parallelized 4-threads (n=m=p=1000) Execution time 1.8s Thread-overhead Synchronization t1=omp_get_wtime() DO j=1,m DO i=1,n!$omp PARALLEL DO DO k=1,p c(i,j)=a(i,k)*b(k,j) END DO!$END OMP PARALLEL DO END DO END DO t2=omp_get_wtime() WRITE(*,*) "Execution time"&,t2-t1
34 Case study: Matrix-multiplication Coarse-grained parallelization Simple domaindecomposition 4-threads (n=m=p=1000) Execution time 0.49s A few percent worse than fine-grained version Best-case scenario for fine-grained parallelization t1=omp_get_wtime()!$omp PARALLEL!$OMP& PRIVATE(omp_rank,i,j,k)!$OMP& PRIVATE(imin,jmin,imax,jmax) omp_rank=omp_get_thread_num() imin=... jmin=... imax=... jmax=... DO j=jmin,jmax DO i=imin,imax DO k=1,p c(i,j)=a(i,k)*b(k,j) END DO END DO END DO!$OMP END PARALLEL t2=omp_get_wtime() WRITE(*,*) "Execution time",t2-t1
35 OpenMP performance: False sharing Memory is read and written in whole cache lines 64 bytes on AMD opteron False sharing When a thread modifies a part of a cache line the whole cache line is marked as invalid When another threads attempt to read or modify another part of the cache it is forced to fetch a newer copy How to avoid Does not occur if all threads only read from variable If data is used by threads in small slices the risk for false sharing is larger than if larger chunks are used
36 OpenMP performance: ccnuma issues cache coherent Non Uniform Memory Access (ccnuma) Shared memory model Local memory closer to processor is faster Caches are kept coherent Examples: Some MPP nodes such as Cray XT5 nodes Large shared memory computers such as SGI Altix Uniform memory access (UMA) Only feasible for small systems Examples: Cray XT4 nodes BlueGene/P nodes
37 OpenMP performance: ccnuma issues In a NUMA node some of the memory is more expensive to access Can lead to severe performance problems OpenMP has no support for NUMA Does not specify where the data is stored Does not give tools to check where it is stored How to avoid Often the system uses a first touch principle: The thread that first accesses the data will host it in its memory Initialization loops that make sure the threads data are local Can also use the low level system call madvise on some systems
38 OpenMP performance: Overheads Amdahls-law If only parts of program are parallelized (fine-grained) then Amdahls-law limits performance In hybrid programs the number of threads is low less of a problem Thread management has a large overhead Avoid creating/destroying threads, use larger parallel regions Synchronization Avoid explicit and implicit barriers If you can use NOWAIT clauses then do it Avoid (if possible) BARRIER/CRITICAL/ORDERED/FLUSH Use named CRITICAL regions
39 OpenMP performance: DO/for directive PARALLEL DO can be more efficient than a DO directive inside a PARALLEL region implementation dependent If the iterations are well balanced use STATIC If there are load-balancing issues then use GUIDED or DYNAMIC Small loops should not be parallelized In case of nested loops inner loops should not bee parallelized See COLLAPSE in OMP 3.0 Use NOWAIT if possible 2 threads 4 threads PARALLEL 0.5 µs 1.0 µs STATIC(1) 0.9 µs 1.3 µs STATIC(64) 0.4 µs 0.7 µs DYNAMIC(1) 34 µs 315 µs DYNAMIC(6 4) QC Cray XT4 1.2 µs 2.7 µs GUIDED(1) 15 µs 214 µs GUIDED(64) 3.3 µs 6.2 µs
40 Hybrid programming
41 Hybrid programming Parallel programming model combining: OpenMP parallelization over one node MPI parallelization between nodes Hybrid model closer to hardware model of SMP cluster, is it therefore always faster? No there is a large body of work suggesting its often slower There are a number of possible benefits and problems Analyze program and target platform to decide if the benefits might yield improvements
42 Hybrid parallel programming models 1. No overlapping communication and computation 1. MPI is called only outside parallel regions and by the master thread 2. MPI is called by several threads 2. Communication and computation overlap: while some of the thread communicate, the rest are executing an application 1. MPI is called only by the master thread 2. Communication is carried out with several threads 3. Each thread handles its own communication demands
43 MPI support for threading MPI standard defines four levels of support 0. MPI_THREAD_SINGLE Only one thread allowed 1. MPI_THREAD_FUNNELED Only master thread allowed to make MPI call 2. MPI_THREAD_SERIALIZED All threads allowed to make MPI calls, but not concurrently 3. MPI_THREAD_MULTIPLE No restrictions Some implementations support an additional model 0.5. MPI calls are allowed only outside parallel regions Returns MPI_THREAD_SINGLE
44 MPI support on Cray XT4/XT5 MPI_Init_thread(&argc,&argv,MPI_THREAD_MULTIPLE,&provided); printf("supports level %d of %d %d %d %d\n", provided, MPI_THREAD_SINGLE, MPI_THREAD_FUNNELED, MPI_THREAD_SERIALIZED, MPI_THREAD_MULTIPLE); Cray XT4 (xt-mpt ) > Supports level 1 of
45 MPI support on Cray XT4/XT5 MPI-library supports MPI_THREAD_FUNNELED Overlapping communication/computation still possible Non-blocking communication can be started in MASTER block Completes while parallel region computes Able to saturate the interconnect with only one thread communicating Might not be true on all architectures, possible problem for the funneled model
46 First Hybrid program int main(int argc, char *argv[]){ int rank,omp_rank,mpisupport; MPI_Init_thread(&argc,&argv, MPI_THREAD_FUNNELED,&mpisupport); MPI_Comm_rank(MPI_COMM_WORLD,&rank); #pragma omp parallel private(omp_rank) { omp_rank=omp_get_thread_num(); printf("%d %d \n",rank,omp_rank); } MPI_Finalize(); }
47 Communication Communication inside node replaced by direct memory reads/writes Improved throughput and latency Decreased overhead from MPI-library Aggregated messages In many (data-parallel) algorithms the messages are larger as the number MPI-processes are decreased Increased throughput on inter-node communication In some algorithms the number of messages are reduced E.g. All-to-all Restrictions on calling MPI routines Depends on level of support Only allowed outside parallel region - all other cores are idle MPI_THREAD_FUNNELED - Other threads can calculate MPI_THREAD_MULTIPLE - Best but often not available
48 Case study: All-to-all on QC XT4 Collective operations often performance bottlenecks Especially all-to-all operations Point-to-point implementation can be faster Hybrid implementation For all-to-all operations (maximum) number of transfers decreases by a factor of #threads 2 Size of message increases by a factor of #threads Allow overlapping communication and communication
49 Case study: All-to-all on QC XT4 Collective operations often performance bottlenecks Especially all-to-all operations Point-to-point implementation can be faster Hybrid implementation For all-to-all operations (maximum) number of transfers decreases by a factor of #threads 2 Size of message increases by a factor of #threads Allow overlapping communication and communication
50 Case study: All-to-all on QC XT4 Collective operations often performance bottlenecks Especially all-to-all operations Point-to-point implementation can be faster Hybrid implementation For all-to-all operations (maximum) number of transfers decreases by a factor of #threads 2 Size of message increases by a factor of #threads Allow overlapping communication and communication
51 Case study: All-to-all 40 Kbytes of data per node 400 Kbytes of data per node
52 Algorithmic issues The benefits of the hybrid approach are algorithm dependent, some examples: Limited parallelism in MPI-parallelization Additional levels of parallelism can be easier to implement with hybrid approach E.g. Grid-based algorithm only parallelized in one dimension E.g. Master-slave algorithms Embarrassingly parallel algorithms Can be used to speed up single tasks Can be used to increase system size Domain decomposition (see next case study)
53 Case study: Domain decomposition Number of atoms per cell is proportional to the number of threads Number of ghost particles is proportional to #threads -1/3 We can reduce communication by hybridizing the algorithm With four threads per process the number of ghost particles decreases by about 40%
54 Case study: Domain decomposition Number of atoms per cell is proportional to the number of threads Number of ghost particles is proportional to #threads -1/3 We can reduce communication by hybridizing the algorithm With four threads per process the number of ghost particles decreases by about 40%
55 Case study: Domain decomposition Fine-grained hybridization of MD code Parallel region entered each time the potential is evaluated Loop over atoms parallelized with static for Temporary array for forces Shared Separate space for each thread Avoids the need for synchronization when Newton s third law is used Results added to real force array at end of parallel region #pragma omp parallel {... zero(ptforce[thread][..][..])... #pragma omp for schedule(static,10) for (ii = 0; ii < atoms; ii++)... ptforce[thread][ii][..]+=... ptforce[thread][jj][..]+=... }... for(t=0;t<threads;t++) force[..][..]+=ptforce[t][..][..]...
56 Case study: Domain decomposition
57 Load balance Good load balance is harder and harder to achieve as the number of MPI-processes increases Hybrid approach decreases number of processes One can dynamically change the number of threads per process Can improve load balance Hardware of SMP - clusters restricts the usefulness E.g. Node with two QC processors (XT5) - two MPIprocesses which can have 1-7 threads
58 Case study: Master-slave algorithms Matrix multiplication Demonstration of a master-slave algorithm Scaling is improved by going to a coarse-grained hybrid model Utilizes the following benefits: + Better load-balancing due to fewer MPI-processes + Message aggregation and reduced communication
59 Overlapping communication/computation If level of support is at least MPI_THREAD_FUNNELED there are more options for overlapping Isend/Irecv are available as normal While master thread communicate other threads can calculate Can be difficult to utilize properly - load balancing tricky With enough threads master thread could be a dedicated communication thread
60 Memory issues The hybrid programming method can be used to decrease memory requirements Some algorithms have replicated data - can yield significant savings In domain decomposition algorithms there are fewer boundary data-points Improved cache usage Many processors have shared caches L2 in Intel Core2 L3 in AMD QC Shared data can reside in this cache - decreased cache pressure
61 Parallel I/O I/O is expensive and it is difficult to make it optimal Some approaches for parallel I/O MPI-2 I/O Single writer reduction Subset of writers/readers N writers/readers to N files Single writer Subset of writers N writers Hybrid MPI
62 Parallel I/O: a simple hybrid approach Every MPI process opens a file Good I/O BW No communication needed Large filesystem stress, slow open/closes Inconvenient as many files are created Hybridization: only one core per processor writes a shared array Achievable BW similar Decreases number of files by a factor of #threads Easy to implement Allows overlapping of communication/computation
63 Summary Hybrid approach is difficult, but sometimes useful Performance of hybrid approach is a tradeoff between greater overhead and decreased communication costs Direct benefits achieved without additional effort All-to-all collective operations 2-5 times faster Gives parallel IO with reduced file-system stress in the N- writers case Message aggregation We expect the potential benefits to be even greater on XT5
64 Questions!
OpenMP+F90 p OpenMP+F90
OpenMP+F90 hmli@ustc.edu.cn - http://hpcjl.ustc.edu.cn OpenMP+F90 p. 1 OpenMP+F90 p. 2 OpenMP ccnuma Cache-Coherent Non-Uniform Memory Access SMP Symmetric MultiProcessing MPI MPP Massively Parallel Processing
More informationAdvanced C Programming Winter Term 2008/09. Guest Lecture by Markus Thiele
Advanced C Programming Winter Term 2008/09 Guest Lecture by Markus Thiele Lecture 14: Parallel Programming with OpenMP Motivation: Why parallelize? The free lunch is over. Herb
More informationIntroduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines
Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines What is OpenMP? What does OpenMP stands for? What does OpenMP stands for? Open specifications for Multi
More informationOpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018
OpenMP 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives
More informationParallelising Scientific Codes Using OpenMP. Wadud Miah Research Computing Group
Parallelising Scientific Codes Using OpenMP Wadud Miah Research Computing Group Software Performance Lifecycle Scientific Programming Early scientific codes were mainly sequential and were executed on
More informationData Handling in OpenMP
Data Handling in OpenMP Manipulate data by threads By private: a thread initializes and uses a variable alone Keep local copies, such as loop indices By firstprivate: a thread repeatedly reads a variable
More informationModule 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program
The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program Amdahl's Law About Data What is Data Race? Overview to OpenMP Components of OpenMP OpenMP Programming Model OpenMP Directives
More informationby system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by using function call
OpenMP Syntax The OpenMP Programming Model Number of threads are determined by system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by
More informationPractical in Numerical Astronomy, SS 2012 LECTURE 12
Practical in Numerical Astronomy, SS 2012 LECTURE 12 Parallelization II. Open Multiprocessing (OpenMP) Lecturer Eduard Vorobyov. Email: eduard.vorobiev@univie.ac.at, raum 006.6 1 OpenMP is a shared memory
More informationIntroduction to. Slides prepared by : Farzana Rahman 1
Introduction to OpenMP Slides prepared by : Farzana Rahman 1 Definition of OpenMP Application Program Interface (API) for Shared Memory Parallel Programming Directive based approach with library support
More informationShared Memory Programming with OpenMP
Shared Memory Programming with OpenMP (An UHeM Training) Süha Tuna Informatics Institute, Istanbul Technical University February 12th, 2016 2 Outline - I Shared Memory Systems Threaded Programming Model
More informationA Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh
A Short Introduction to OpenMP Mark Bull, EPCC, University of Edinburgh Overview Shared memory systems Basic Concepts in Threaded Programming Basics of OpenMP Parallel regions Parallel loops 2 Shared memory
More informationOpenMP C and C++ Application Program Interface Version 1.0 October Document Number
OpenMP C and C++ Application Program Interface Version 1.0 October 1998 Document Number 004 2229 001 Contents Page v Introduction [1] 1 Scope............................. 1 Definition of Terms.........................
More informationHPC Practical Course Part 3.1 Open Multi-Processing (OpenMP)
HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP) V. Akishina, I. Kisel, G. Kozlov, I. Kulakov, M. Pugach, M. Zyzak Goethe University of Frankfurt am Main 2015 Task Parallelism Parallelization
More informationIntroduction to OpenMP
Introduction to OpenMP Le Yan Scientific computing consultant User services group High Performance Computing @ LSU Goals Acquaint users with the concept of shared memory parallelism Acquaint users with
More informationStandard promoted by main manufacturers Fortran. Structure: Directives, clauses and run time calls
OpenMP Introducción Directivas Regiones paralelas Worksharing sincronizaciones Visibilidad datos Implementación OpenMP: introduction Standard promoted by main manufacturers http://www.openmp.org, http://www.compunity.org
More informationParallel and Distributed Programming. OpenMP
Parallel and Distributed Programming OpenMP OpenMP Portability of software SPMD model Detailed versions (bindings) for different programming languages Components: directives for compiler library functions
More informationCOMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP
COMP4510 Introduction to Parallel Computation Shared Memory and OpenMP Thanks to Jon Aronsson (UofM HPC consultant) for some of the material in these notes. Outline (cont d) Shared Memory and OpenMP Including
More informationStandard promoted by main manufacturers Fortran
OpenMP Introducción Directivas Regiones paralelas Worksharing sincronizaciones Visibilidad datos Implementación OpenMP: introduction Standard promoted by main manufacturers http://www.openmp.org Fortran
More informationOPENMP OPEN MULTI-PROCESSING
OPENMP OPEN MULTI-PROCESSING OpenMP OpenMP is a portable directive-based API that can be used with FORTRAN, C, and C++ for programming shared address space machines. OpenMP provides the programmer with
More informationLecture 4: OpenMP Open Multi-Processing
CS 4230: Parallel Programming Lecture 4: OpenMP Open Multi-Processing January 23, 2017 01/23/2017 CS4230 1 Outline OpenMP another approach for thread parallel programming Fork-Join execution model OpenMP
More informationCOSC 6374 Parallel Computation. Introduction to OpenMP(I) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)
COSC 6374 Parallel Computation Introduction to OpenMP(I) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel) Edgar Gabriel Fall 2014 Introduction Threads vs. processes Recap of
More informationShared Memory programming paradigm: openmp
IPM School of Physics Workshop on High Performance Computing - HPC08 Shared Memory programming paradigm: openmp Luca Heltai Stefano Cozzini SISSA - Democritos/INFM
More informationOpenMP on Ranger and Stampede (with Labs)
OpenMP on Ranger and Stampede (with Labs) Steve Lantz Senior Research Associate Cornell CAC Parallel Computing at TACC: Ranger to Stampede Transition November 6, 2012 Based on materials developed by Kent
More informationIntroduction [1] 1. Directives [2] 7
OpenMP Fortran Application Program Interface Version 2.0, November 2000 Contents Introduction [1] 1 Scope............................. 1 Glossary............................ 1 Execution Model.........................
More informationParallel Programming
Parallel Programming OpenMP Nils Moschüring PhD Student (LMU) Nils Moschüring PhD Student (LMU), OpenMP 1 1 Overview What is parallel software development Why do we need parallel computation? Problems
More informationOpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013
OpenMP António Abreu Instituto Politécnico de Setúbal 1 de Março de 2013 António Abreu (Instituto Politécnico de Setúbal) OpenMP 1 de Março de 2013 1 / 37 openmp what? It s an Application Program Interface
More informationShared Memory Parallelism - OpenMP
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (http://www.llnl.gov/computing/tutorials/openmp/#introduction) OpenMP sc99 tutorial
More informationOpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means
High Performance Computing: Concepts, Methods & Means OpenMP Programming Prof. Thomas Sterling Department of Computer Science Louisiana State University February 8 th, 2007 Topics Introduction Overview
More informationReview. Tasking. 34a.cpp. Lecture 14. Work Tasking 5/31/2011. Structured block. Parallel construct. Working-Sharing contructs.
Review Lecture 14 Structured block Parallel construct clauses Working-Sharing contructs for, single, section for construct with different scheduling strategies 1 2 Tasking Work Tasking New feature in OpenMP
More informationOpenMP: Open Multiprocessing
OpenMP: Open Multiprocessing Erik Schnetter June 7, 2012, IHPC 2012, Iowa City Outline 1. Basic concepts, hardware architectures 2. OpenMP Programming 3. How to parallelise an existing code 4. Advanced
More informationPC to HPC. Xiaoge Wang ICER Jan 27, 2016
PC to HPC Xiaoge Wang ICER Jan 27, 2016 About This Series Format: talk + discussion Focus: fundamentals of parallel compucng (i) parcconing: data parccon and task parccon; (ii) communicacon: data sharing
More informationParallel Programming with OpenMP. CS240A, T. Yang
Parallel Programming with OpenMP CS240A, T. Yang 1 A Programmer s View of OpenMP What is OpenMP? Open specification for Multi-Processing Standard API for defining multi-threaded shared-memory programs
More information<Insert Picture Here> OpenMP on Solaris
1 OpenMP on Solaris Wenlong Zhang Senior Sales Consultant Agenda What s OpenMP Why OpenMP OpenMP on Solaris 3 What s OpenMP Why OpenMP OpenMP on Solaris
More informationIntroduction to OpenMP
Presentation Introduction to OpenMP Martin Cuma Center for High Performance Computing University of Utah mcuma@chpc.utah.edu September 9, 2004 http://www.chpc.utah.edu 4/13/2006 http://www.chpc.utah.edu
More informationSHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008
SHARCNET Workshop on Parallel Computing Hugh Merz Laurentian University May 2008 What is Parallel Computing? A computational method that utilizes multiple processing elements to solve a problem in tandem
More informationIntroduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah
Introduction to OpenMP Martin Čuma Center for High Performance Computing University of Utah mcuma@chpc.utah.edu Overview Quick introduction. Parallel loops. Parallel loop directives. Parallel sections.
More informationShared Memory Parallelism using OpenMP
Indian Institute of Science Bangalore, India भ रत य व ज ञ न स स थ न ब गल र, भ रत SE 292: High Performance Computing [3:0][Aug:2014] Shared Memory Parallelism using OpenMP Yogesh Simmhan Adapted from: o
More informationIntroduction to OpenMP
Introduction to OpenMP Le Yan HPC Consultant User Services Goals Acquaint users with the concept of shared memory parallelism Acquaint users with the basics of programming with OpenMP Discuss briefly the
More informationIntroduction to OpenMP
Introduction to OpenMP Xiaoxu Guan High Performance Computing, LSU April 6, 2016 LSU HPC Training Series, Spring 2016 p. 1/44 Overview Overview of Parallel Computing LSU HPC Training Series, Spring 2016
More informationDepartment of Informatics V. HPC-Lab. Session 2: OpenMP M. Bader, A. Breuer. Alex Breuer
HPC-Lab Session 2: OpenMP M. Bader, A. Breuer Meetings Date Schedule 10/13/14 Kickoff 10/20/14 Q&A 10/27/14 Presentation 1 11/03/14 H. Bast, Intel 11/10/14 Presentation 2 12/01/14 Presentation 3 12/08/14
More informationA common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...
OPENMP PERFORMANCE 2 A common scenario... So I wrote my OpenMP program, and I checked it gave the right answers, so I ran some timing tests, and the speedup was, well, a bit disappointing really. Now what?.
More informationEPL372 Lab Exercise 5: Introduction to OpenMP
EPL372 Lab Exercise 5: Introduction to OpenMP References: https://computing.llnl.gov/tutorials/openmp/ http://openmp.org/wp/openmp-specifications/ http://openmp.org/mp-documents/openmp-4.0-c.pdf http://openmp.org/mp-documents/openmp4.0.0.examples.pdf
More informationBarbara Chapman, Gabriele Jost, Ruud van der Pas
Using OpenMP Portable Shared Memory Parallel Programming Barbara Chapman, Gabriele Jost, Ruud van der Pas The MIT Press Cambridge, Massachusetts London, England c 2008 Massachusetts Institute of Technology
More informationMango DSP Top manufacturer of multiprocessing video & imaging solutions.
1 of 11 3/3/2005 10:50 AM Linux Magazine February 2004 C++ Parallel Increase application performance without changing your source code. Mango DSP Top manufacturer of multiprocessing video & imaging solutions.
More informationOpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono
OpenMP Algoritmi e Calcolo Parallelo References Useful references Using OpenMP: Portable Shared Memory Parallel Programming, Barbara Chapman, Gabriele Jost and Ruud van der Pas OpenMP.org http://openmp.org/
More informationIntroduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah
Introduction to OpenMP Martin Čuma Center for High Performance Computing University of Utah mcuma@chpc.utah.edu Overview Quick introduction. Parallel loops. Parallel loop directives. Parallel sections.
More informationHPC Workshop University of Kentucky May 9, 2007 May 10, 2007
HPC Workshop University of Kentucky May 9, 2007 May 10, 2007 Part 3 Parallel Programming Parallel Programming Concepts Amdahl s Law Parallel Programming Models Tools Compiler (Intel) Math Libraries (Intel)
More informationMPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016
MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 Message passing vs. Shared memory Client Client Client Client send(msg) recv(msg) send(msg) recv(msg) MSG MSG MSG IPC Shared
More informationIntroduction to Standard OpenMP 3.1
Introduction to Standard OpenMP 3.1 Massimiliano Culpo - m.culpo@cineca.it Gian Franco Marras - g.marras@cineca.it CINECA - SuperComputing Applications and Innovation Department 1 / 59 Outline 1 Introduction
More informationECE 574 Cluster Computing Lecture 10
ECE 574 Cluster Computing Lecture 10 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 1 October 2015 Announcements Homework #4 will be posted eventually 1 HW#4 Notes How granular
More informationShared memory programming model OpenMP TMA4280 Introduction to Supercomputing
Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing NTNU, IMF February 16. 2018 1 Recap: Distributed memory programming model Parallelism with MPI. An MPI execution is started
More informationOpenMP Application Program Interface
OpenMP Application Program Interface DRAFT Version.1.0-00a THIS IS A DRAFT AND NOT FOR PUBLICATION Copyright 1-0 OpenMP Architecture Review Board. Permission to copy without fee all or part of this material
More informationLab: Scientific Computing Tsunami-Simulation
Lab: Scientific Computing Tsunami-Simulation Session 4: Optimization and OMP Sebastian Rettenberger, Michael Bader 23.11.15 Session 4: Optimization and OMP, 23.11.15 1 Department of Informatics V Linux-Cluster
More informationA brief introduction to OpenMP
A brief introduction to OpenMP Alejandro Duran Barcelona Supercomputing Center Outline 1 Introduction 2 Writing OpenMP programs 3 Data-sharing attributes 4 Synchronization 5 Worksharings 6 Task parallelism
More informationMPI & OpenMP Mixed Hybrid Programming
MPI & OpenMP Mixed Hybrid Programming Berk ONAT İTÜ Bilişim Enstitüsü 22 Haziran 2012 Outline Introduc/on Share & Distributed Memory Programming MPI & OpenMP Advantages/Disadvantages MPI vs. OpenMP Why
More informationOpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.
OpenMP and MPI Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico November 16, 2011 CPD (DEI / IST) Parallel and Distributed Computing 18
More informationOpenMP Shared Memory Programming
OpenMP Shared Memory Programming John Burkardt, Information Technology Department, Virginia Tech.... Mathematics Department, Ajou University, Suwon, Korea, 13 May 2009.... http://people.sc.fsu.edu/ jburkardt/presentations/
More informationIntroduction to Parallel Computing
Portland State University ECE 588/688 Introduction to Parallel Computing Reference: Lawrence Livermore National Lab Tutorial https://computing.llnl.gov/tutorials/parallel_comp/ Copyright by Alaa Alameldeen
More informationShared memory programming
CME342- Parallel Methods in Numerical Analysis Shared memory programming May 14, 2014 Lectures 13-14 Motivation Popularity of shared memory systems is increasing: Early on, DSM computers (SGI Origin 3000
More informationIntroduction to OpenMP
Introduction to OpenMP Le Yan Objectives of Training Acquaint users with the concept of shared memory parallelism Acquaint users with the basics of programming with OpenMP Memory System: Shared Memory
More informationIntroduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah
Introduction to OpenMP Martin Čuma Center for High Performance Computing University of Utah m.cuma@utah.edu Overview Quick introduction. Parallel loops. Parallel loop directives. Parallel sections. Some
More informationOpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.
OpenMP and MPI Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico November 15, 2010 José Monteiro (DEI / IST) Parallel and Distributed Computing
More information[Potentially] Your first parallel application
[Potentially] Your first parallel application Compute the smallest element in an array as fast as possible small = array[0]; for( i = 0; i < N; i++) if( array[i] < small ) ) small = array[i] 64-bit Intel
More informationA common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...
OPENMP PERFORMANCE 2 A common scenario... So I wrote my OpenMP program, and I checked it gave the right answers, so I ran some timing tests, and the speedup was, well, a bit disappointing really. Now what?.
More informationOverview: The OpenMP Programming Model
Overview: The OpenMP Programming Model motivation and overview the parallel directive: clauses, equivalent pthread code, examples the for directive and scheduling of loop iterations Pi example in OpenMP
More informationParallel Computing Using OpenMP/MPI. Presented by - Jyotsna 29/01/2008
Parallel Computing Using OpenMP/MPI Presented by - Jyotsna 29/01/2008 Serial Computing Serially solving a problem Parallel Computing Parallelly solving a problem Parallel Computer Memory Architecture Shared
More informationProgramming with Shared Memory PART II. HPC Fall 2012 Prof. Robert van Engelen
Programming with Shared Memory PART II HPC Fall 2012 Prof. Robert van Engelen Overview Sequential consistency Parallel programming constructs Dependence analysis OpenMP Autoparallelization Further reading
More informationCSL 860: Modern Parallel
CSL 860: Modern Parallel Computation Hello OpenMP #pragma omp parallel { // I am now thread iof n switch(omp_get_thread_num()) { case 0 : blah1.. case 1: blah2.. // Back to normal Parallel Construct Extremely
More informationSession 4: Parallel Programming with OpenMP
Session 4: Parallel Programming with OpenMP Xavier Martorell Barcelona Supercomputing Center Agenda Agenda 10:00-11:00 OpenMP fundamentals, parallel regions 11:00-11:30 Worksharing constructs 11:30-12:00
More informationParallel Programming using OpenMP
1 OpenMP Multithreaded Programming 2 Parallel Programming using OpenMP OpenMP stands for Open Multi-Processing OpenMP is a multi-vendor (see next page) standard to perform shared-memory multithreading
More informationParallel Programming using OpenMP
1 Parallel Programming using OpenMP Mike Bailey mjb@cs.oregonstate.edu openmp.pptx OpenMP Multithreaded Programming 2 OpenMP stands for Open Multi-Processing OpenMP is a multi-vendor (see next page) standard
More informationOpenMP. A parallel language standard that support both data and functional Parallelism on a shared memory system
OpenMP A parallel language standard that support both data and functional Parallelism on a shared memory system Use by system programmers more than application programmers Considered a low level primitives
More informationUsing OpenMP. Rebecca Hartman-Baker Oak Ridge National Laboratory
Using OpenMP Rebecca Hartman-Baker Oak Ridge National Laboratory hartmanbakrj@ornl.gov 2004-2009 Rebecca Hartman-Baker. Reproduction permitted for non-commercial, educational use only. Outline I. About
More informationAn Introduction to OpenMP
Dipartimento di Ingegneria Industriale e dell'informazione University of Pavia December 4, 2017 Recap Parallel machines are everywhere Many architectures, many programming model. Among them: multithreading.
More informationUvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP
Parallel Programming with Compiler Directives OpenMP Clemens Grelck University of Amsterdam UvA-SARA High Performance Computing Course June 2013 OpenMP at a Glance Loop Parallelization Scheduling Parallel
More informationDistributed Systems + Middleware Concurrent Programming with OpenMP
Distributed Systems + Middleware Concurrent Programming with OpenMP Gianpaolo Cugola Dipartimento di Elettronica e Informazione Politecnico, Italy cugola@elet.polimi.it http://home.dei.polimi.it/cugola
More informationParallel Programming: OpenMP
Parallel Programming: OpenMP Xianyi Zeng xzeng@utep.edu Department of Mathematical Sciences The University of Texas at El Paso. November 10, 2016. An Overview of OpenMP OpenMP: Open Multi-Processing An
More informationAmdahl s Law. AMath 483/583 Lecture 13 April 25, Amdahl s Law. Amdahl s Law. Today: Amdahl s law Speed up, strong and weak scaling OpenMP
AMath 483/583 Lecture 13 April 25, 2011 Amdahl s Law Today: Amdahl s law Speed up, strong and weak scaling OpenMP Typically only part of a computation can be parallelized. Suppose 50% of the computation
More informationCOSC 6385 Computer Architecture - Multi Processor Systems
COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:
More informationAn Introduction to OpenMP
An Introduction to OpenMP U N C L A S S I F I E D Slide 1 What Is OpenMP? OpenMP Is: An Application Program Interface (API) that may be used to explicitly direct multi-threaded, shared memory parallelism
More informationOpenMP. Application Program Interface. CINECA, 14 May 2012 OpenMP Marco Comparato
OpenMP Application Program Interface Introduction Shared-memory parallelism in C, C++ and Fortran compiler directives library routines environment variables Directives single program multiple data (SPMD)
More informationIntroduction to OpenMP. Rogelio Long CS 5334/4390 Spring 2014 February 25 Class
Introduction to OpenMP Rogelio Long CS 5334/4390 Spring 2014 February 25 Class Acknowledgment These slides are adapted from the Lawrence Livermore OpenMP Tutorial by Blaise Barney at https://computing.llnl.gov/tutorials/openmp/
More informationParallel Programming with OpenMP. CS240A, T. Yang, 2013 Modified from Demmel/Yelick s and Mary Hall s Slides
Parallel Programming with OpenMP CS240A, T. Yang, 203 Modified from Demmel/Yelick s and Mary Hall s Slides Introduction to OpenMP What is OpenMP? Open specification for Multi-Processing Standard API for
More informationLecture 14: Mixed MPI-OpenMP programming. Lecture 14: Mixed MPI-OpenMP programming p. 1
Lecture 14: Mixed MPI-OpenMP programming Lecture 14: Mixed MPI-OpenMP programming p. 1 Overview Motivations for mixed MPI-OpenMP programming Advantages and disadvantages The example of the Jacobi method
More informationProgramming with Shared Memory PART II. HPC Fall 2007 Prof. Robert van Engelen
Programming with Shared Memory PART II HPC Fall 2007 Prof. Robert van Engelen Overview Parallel programming constructs Dependence analysis OpenMP Autoparallelization Further reading HPC Fall 2007 2 Parallel
More informationOpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2.
OpenMP Overview in 30 Minutes Christian Terboven 06.12.2010 / Aachen, Germany Stand: 03.12.2010 Version 2.3 Rechen- und Kommunikationszentrum (RZ) Agenda OpenMP: Parallel Regions,
More informationTopics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP
Topics Lecture 11 Introduction OpenMP Some Examples Library functions Environment variables 1 2 Introduction Shared Memory Parallelization OpenMP is: a standard for parallel programming in C, C++, and
More informationCOSC 6374 Parallel Computation. Introduction to OpenMP. Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)
COSC 6374 Parallel Computation Introduction to OpenMP Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel) Edgar Gabriel Fall 2015 OpenMP Provides thread programming model at a
More informationCOMP4300/8300: The OpenMP Programming Model. Alistair Rendell. Specifications maintained by OpenMP Architecture Review Board (ARB)
COMP4300/8300: The OpenMP Programming Model Alistair Rendell See: www.openmp.org Introduction to High Performance Computing for Scientists and Engineers, Hager and Wellein, Chapter 6 & 7 High Performance
More informationCOMP4300/8300: The OpenMP Programming Model. Alistair Rendell
COMP4300/8300: The OpenMP Programming Model Alistair Rendell See: www.openmp.org Introduction to High Performance Computing for Scientists and Engineers, Hager and Wellein, Chapter 6 & 7 High Performance
More informationOpenMP 4. CSCI 4850/5850 High-Performance Computing Spring 2018
OpenMP 4 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives
More informationOpenMP examples. Sergeev Efim. Singularis Lab, Ltd. Senior software engineer
OpenMP examples Sergeev Efim Senior software engineer Singularis Lab, Ltd. OpenMP Is: An Application Program Interface (API) that may be used to explicitly direct multi-threaded, shared memory parallelism.
More informationProgramming Shared-memory Platforms with OpenMP. Xu Liu
Programming Shared-memory Platforms with OpenMP Xu Liu Introduction to OpenMP OpenMP directives concurrency directives parallel regions loops, sections, tasks Topics for Today synchronization directives
More informationOpenMP: Open Multiprocessing
OpenMP: Open Multiprocessing Erik Schnetter May 20-22, 2013, IHPC 2013, Iowa City 2,500 BC: Military Invents Parallelism Outline 1. Basic concepts, hardware architectures 2. OpenMP Programming 3. How to
More informationCOMP Parallel Computing. SMM (2) OpenMP Programming Model
COMP 633 - Parallel Computing Lecture 7 September 12, 2017 SMM (2) OpenMP Programming Model Reading for next time look through sections 7-9 of the Open MP tutorial Topics OpenMP shared-memory parallel
More informationShared Memory Programming Model
Shared Memory Programming Model Ahmed El-Mahdy and Waleed Lotfy What is a shared memory system? Activity! Consider the board as a shared memory Consider a sheet of paper in front of you as a local cache
More informationHybrid MPI/OpenMP parallelization. Recall: MPI uses processes for parallelism. Each process has its own, separate address space.
Hybrid MPI/OpenMP parallelization Recall: MPI uses processes for parallelism. Each process has its own, separate address space. Thread parallelism (such as OpenMP or Pthreads) can provide additional parallelism
More informationOpenMP Application Program Interface
OpenMP Application Program Interface Version.0 - RC - March 01 Public Review Release Candidate Copyright 1-01 OpenMP Architecture Review Board. Permission to copy without fee all or part of this material
More informationReview. 35a.cpp. 36a.cpp. Lecture 13 5/29/2012. Compiler Directives. Library Functions Environment Variables
Review Lecture 3 Compiler Directives Conditional compilation Parallel construct Work-sharing constructs for, section, single Work-tasking Synchronization Library Functions Environment Variables 2 35a.cpp
More information