Computational Mathematics

Size: px
Start display at page:

Download "Computational Mathematics"

Transcription

1 Computational Mathematics Hamid Sarbazi-Azad Department of Computer Engineering Sharif University of Technology

2 OpenMP Work-sharing Instructor PanteA Zardoshti Department of Computer Engineering Sharif University of Technology

3 Work-sharing A worksharing construct distributes the execution of the associated region among the members of the team that encounters it. #pragma omp parallel for { for (i=0;i<100;i++) A(i) = A(i) + B } 3

4 Work-sharing A worksharing construct distributes the execution of the associated region among the members of the team that encounters it. A worksharing region has no barrier on entry; however, an implied barrier exists at the end of the worksharing region. #pragma omp parallel for { for (i=0;i<100;i++) A(i) = A(i) + B } barrier 4

5 Constructs The OpenMP API defines the following worksharing constructs, and these are described in the sections that follow: loop sections single 5

6 LOOP CONSTRUCT 6

7 Loop Construct The loop construct specifies that the iterations of one or more associated loops will be executed in parallel by threads in the team in the context of their implicit tasks. The iterations are distributed across threads that already exist in the team executing the parallel region to which the loop region binds. #pragma omp for [clause[[,] clause]... ] for-loops 7

8 Clauses where clause is one of the following: private(list) firstprivate(list) lastprivate(list) schedule(kind[, chunk_size]) collapse(n) ordered Nowait reduction(reduction-identifier: list) 8

9 Schedule Clause How OMP schedules iterations? 9

10 Schedule Clause How OMP schedules iterations? Although the OpenMP standard does not specify how a loop should be partitioned most compilers split the loop in N/p (N #iterations, p #threads) chunks by default. 10

11 Schedule Clause How OMP schedules iterations? Although the OpenMP standard does not specify how a loop should be partitioned most compilers split the loop in N/p (N #iterations, p #threads) chunks by default. This is called a static schedule (with chunk size N/p) 11

12 Schedule Clause How OMP schedules iterations? Although the OpenMP standard does not specify how a loop should be partitioned most compilers split the loop in N/p (N #iterations, p #threads) chunks by default. This is called a static schedule (with chunk size N/p) For example, suppose we have a loop with 1000 iterations and 4 omp threads.the loop is partitioned as follows:

13 Schedule Clause How OMP schedules iterations? Although the OpenMP standard does not specify how a loop should be partitioned most compilers split the loop in N/p (N #iterations, p #threads) chunks by default. This is called a static schedule (with chunk size N/p) For example, suppose we have a loop with 1000 iterations and 4 omp threads.the loop is partitioned as follows:

14 Schedule Clause Static Blocks of iterations of size chunk to threads Round robin distribution schedule(static [,chunk]) 14

15 Schedule Clause Static Blocks of iterations of size chunk to threads Round robin distribution schedule(static [,chunk]) Dynamic Threads grab chunk iterations When done with iterations, thread requests next set schedule(dynamic[,chunk]) 15

16 Schedule Clause(cont d) Guided Dynamic schedule starting with large block Size of the blocks shrink; no smaller than chunk Runtime schedule(guided[,chunk]) Indicates that the schedule type and chunk are specified by environment variable OMP_SCHEDULE Example of run-time specified scheduling OMP_SCHEDULE dynamic,2 16

17 The Experiment 17

18 Collapse Clause Allows parallelization of perfectly nested loops without using nested parallelism collapse clause on for/do loop indicates how many loops should be collapsed Compiler forms a single loop and then parallelizes this #pragma omp for collapse (2) for (k=1; k<=100; k++) for (j=1; j<=200; j++) 18

19 Ordered Clause The ordered region executes in the sequential order #pragma omp parallel for for(i = 0; i < nproc; i++){ do_lots_of_work(result[i]); #pragma omp ordered fprintf(fid, %d %f\n, i,result[i] ); #pragma omp end ordered } since do_lots_of_work takes a lot of time, most parallel benefit will be realized ordered is helpful for debugging 19

20 Nowait Clause To minimize synchronization, some OpenMP pragmas support the optional nowait clause If present, threads do not synchronize/wait at the end of that particular construct #pragma omp for nowait for (k=1; k<=100; k++) 20

21 Example #pragma omp parallel shared(n,a,b,c,x,y,z) private(f,i,scale) { f = 1.0; #pragma omp for nowait for (i=0; i<n; i++) z[i] = x[i] + y[i]; #pragma omp for nowait for (i=0; i<n; i++) a[i] = b[i] + c[i];... #pragma omp barrier scale = sum(a,0,n) + sum(z,0,n) + f } /*-- End of parallel region --*/ 21

22 Example #pragma omp parallel shared(n,a,b,c,x,y,z) private(f,i,scale) { f = 1.0; #pragma omp for nowait for (i=0; i<n; i++) z[i] = x[i] + y[i]; #pragma omp for nowait for (i=0; i<n; i++) a[i] = b[i] + c[i];... #pragma omp barrier scale = sum(a,0,n) + sum(z,0,n) + f } /*-- End of parallel region --*/ parallel region 22

23 Example #pragma omp parallel shared(n,a,b,c,x,y,z) private(f,i,scale) { f = 1.0; Statement is executed by all threads #pragma omp for nowait for (i=0; i<n; i++) z[i] = x[i] + y[i]; #pragma omp for nowait for (i=0; i<n; i++) a[i] = b[i] + c[i];... #pragma omp barrier scale = sum(a,0,n) + sum(z,0,n) + f } /*-- End of parallel region --*/ parallel region 23

24 Example #pragma omp parallel shared(n,a,b,c,x,y,z) private(f,i,scale) { f = 1.0; Statement is executed by all threads #pragma omp for nowait for (i=0; i<n; i++) z[i] = x[i] + y[i]; #pragma omp for nowait for (i=0; i<n; i++) a[i] = b[i] + c[i];... #pragma omp barrier scale = sum(a,0,n) + sum(z,0,n) + f } /*-- End of parallel region --*/ parallel loop (work is distributed) parallel region 24

25 Example #pragma omp parallel shared(n,a,b,c,x,y,z) private(f,i,scale) { f = 1.0; Statement is executed by all threads #pragma omp for nowait for (i=0; i<n; i++) z[i] = x[i] + y[i]; #pragma omp for nowait for (i=0; i<n; i++) a[i] = b[i] + c[i];... #pragma omp barrier scale = sum(a,0,n) + sum(z,0,n) + f } /*-- End of parallel region --*/ parallel loop (work is distributed) parallel loop (work is distributed) parallel region 25

26 Example #pragma omp parallel shared(n,a,b,c,x,y,z) private(f,i,scale) { f = 1.0; Statement is executed by all threads #pragma omp for nowait for (i=0; i<n; i++) z[i] = x[i] + y[i]; #pragma omp for nowait for (i=0; i<n; i++) a[i] = b[i] + c[i];... #pragma omp barrier scale = sum(a,0,n) + sum(z,0,n) + f } /*-- End of parallel region --*/ parallel loop (work is distributed) parallel loop (work is distributed) synchronization parallel region 26

27 Example #pragma omp parallel shared(n,a,b,c,x,y,z) private(f,i,scale) { f = 1.0; Statement is executed by all threads #pragma omp for nowait for (i=0; i<n; i++) z[i] = x[i] + y[i]; #pragma omp for nowait for (i=0; i<n; i++) a[i] = b[i] + c[i];... #pragma omp barrier scale = sum(a,0,n) + sum(z,0,n) + f } /*-- End of parallel region --*/ parallel loop (work is distributed) parallel loop (work is distributed) synchronization Statement is executed by all threads parallel region 27

28 Barrier Tread 1 Tread 2 Tread 3 barrier? Tread 1 Tread 2 Tread 3 barrier 28

29 Barrier Tread 1 Tread 2 Tread 3? barrier Use OMP_WAIT_POLICY to control behaviour of idle threads Tread 1 Tread 2 Tread 3 barrier 29

30 Example Suppose we run each of these two loops in parallel over i: for (i=0; i < N; i++) a[i] = b[i] + c[i]; for (i=0; i < N; i++) d[i] = a[i] + b[i]; This may give us a wrong answer, Why? 30

31 Example Suppose we run each of these two loops in parallel over i: for (i=0; i < N; i++) a[i] = b[i] + c[i]; for (i=0; i < N; i++) d[i] = a[i] + b[i]; This may give us a wrong answer, Why? 31

32 Example(cont d) We need to have updated all of a[ ] first, before using a[ ] for (i=0; i < N; i++) a[i] = b[i] + c[i]; for (i=0; i < N; i++) d[i] = a[i] + b[i]; All threads wait at the barrier point and only continue when all threads have reached the barrier point 32

33 Example(cont d) We need to have updated all of a[ ] first, before using a[ ] for (i=0; i < N; i++) a[i] = b[i] + c[i]; for (i=0; i < N; i++) d[i] = a[i] + b[i]; wait! barrier All threads wait at the barrier point and only continue when all threads have reached the barrier point 33

34 Barrier 34

35 SECTIONS CONSTRUCT 35

36 Sections Construct Independent sections of code can execute concurrently #pragma omp parallel sections [clause[[,] clause]...] { #pragma omp section phase1(); #pragma omp section phase2(); #pragma omp section phase3(); } 36

37 Clauses where clause is one of the following: private(list) firstprivate(list) lastprivate(list) Nowait reduction(reduction-identifier: list) 37

38 Example #pragma omp parallel default(none) shared(n,a,b,c,d) private(i) { #pragma omp sections nowait { #pragma omp section for (i=0; i<n-1; i++) b[i] = (a[i] + a[i+1])/2; #pragma omp section for (i=0; i<n; i++) d[i] = 1.0/c[i]; } /*-- End of sections --*/ } /*-- End of parallel region --*/ Section #1 Section #2 Parallel Region Time 38

39 Example 39

40 Example 40

41 SINGLE CONSTRUCT 41

42 Single Construct Denotes block of code to be executed by only one thread Thread chosen is implementation dependent Implicit barrier at end Threads wait here for single #pragma omp parallel { DoManyThings(); #pragma omp single { } ExchangeBoundaries(); } DoManyMoreThings(); 42

43 Single Construct 43

44 CRITICAL SECTION 44

45 Critical Section float dot_prod(float* a, float* b, int N) { } float sum = 0.0; #pragma omp parallel for shared(sum) for(int i=0; i<n; i++) { } return sum; sum += a[i] * b[i]; What is Wrong? 45

46 Critical Construct Defines a critical region on a structured block #pragma omp critical [(lock_name)] float dot_prod(float* a, float* b, int N) { float sum = 0.0; #pragma omp parallel for shared(sum) for(int i=0; i<n; i++) { #pragma omp critical sum += a[i] * b[i]; } } return sum; Naming the critical constructs is optional,but may increase performance. 46

47 Reduction Clause The variables in list must be shared in the enclosing parallel Region Inside parallel or work-sharing construct: A PRIVATE copy of each list variable is created and initialized depending on the op These copies are updated locally by threads At end of construct, local copies are combined through op into a single value and combined with the value in the original SHARED variable reduction (op : list) 47

48 Reduction Clause float dot_prod(float* a, float* b, int N) { float sum = 0.0; #pragma omp parallel for reduction (+:sum) for(int i=0; i<n; i++) { sum += a[i] * b[i]; } return sum; } Local copy of sum for each thread All local copies of sum added together and stored in global variable 48

49 Reduction Clause (cont.) Operators + * & ^ && Sum Product Bitwise and Bitwise or Bitwise exclusive or Logical and Logical or c) Hamid Sarbazi-Azad Parallel Programming: OpenMP 49

50 Atomic Construct Special case of a critical section Applies only to simple update of memory location #pragma omp parallel for shared(x, y, index, n) for (i = 0; i < n; i++) { #pragma omp atomic x[index[i]] += work1(i); y[i] += work2(i); } 50

51 Lock Construct Protect resources with locks. void omp_init_lock(omp_lock_t void omp_set_lock(omp_lock_t * lock_p); * lock_p); void omp_unset_lock(omp_lock_t * lock_p); void omp_destroy_lock(omp_lock_t * lock_p); 51

52 Lock Construct omp_lock_t lck; omp_init_lock(&lck); #pragma omp parallel for for(i=0;i<=n;i++){ } omp_set_lock(&lck); result+=w[i]*y[i]; omp_unset_lock(&lck); omp_destroy_lock(&lck); 52

53 Lock Construct omp_lock_t lck; omp_init_lock(&lck); #pragma omp parallel for for(i=0;i<=n;i++){ omp_set_lock(&lck); result+=w[i]*y[i]; Wait here for your turn } omp_unset_lock(&lck); omp_destroy_lock(&lck); 53

54 Lock Construct omp_lock_t lck; omp_init_lock(&lck); #pragma omp parallel for for(i=0;i<=n;i++){ } omp_set_lock(&lck); result+=w[i]*y[i]; omp_unset_lock(&lck); omp_destroy_lock(&lck); Wait here for your turn Release the lock so the next thread gets a turn 54

55 Lock Construct omp_lock_t lck; omp_init_lock(&lck); #pragma omp parallel for for(i=0;i<=n;i++){ } omp_set_lock(&lck); result+=w[i]*y[i]; omp_unset_lock(&lck); omp_destroy_lock(&lck); Wait here for your turn Release the lock so the next thread gets a turn Free--up storage when done 55

56

Multi-core Architecture and Programming

Multi-core Architecture and Programming Multi-core Architecture and Programming Yang Quansheng( 杨全胜 ) http://www.njyangqs.com School of Computer Science & Engineering 1 http://www.njyangqs.com Programming with OpenMP Content What is PpenMP Parallel

More information

Programming with OpenMP*

Programming with OpenMP* Objectives At the completion of this module you will be able to Thread serial code with basic OpenMP pragmas Use OpenMP synchronization pragmas to coordinate thread execution and memory access 2 Agenda

More information

HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP)

HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP) HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP) V. Akishina, I. Kisel, G. Kozlov, I. Kulakov, M. Pugach, M. Zyzak Goethe University of Frankfurt am Main 2015 Task Parallelism Parallelization

More information

Shared Memory Parallelism using OpenMP

Shared Memory Parallelism using OpenMP Indian Institute of Science Bangalore, India भ रत य व ज ञ न स स थ न ब गल र, भ रत SE 292: High Performance Computing [3:0][Aug:2014] Shared Memory Parallelism using OpenMP Yogesh Simmhan Adapted from: o

More information

Cluster Computing 2008

Cluster Computing 2008 Objectives At the completion of this module you will be able to Thread serial code with basic OpenMP pragmas Use OpenMP synchronization pragmas to coordinate thread execution and memory access Based on

More information

Objectives At the completion of this module you will be able to Thread serial code with basic OpenMP pragmas Use OpenMP synchronization pragmas to coordinate thread execution and memory access Based on

More information

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program Amdahl's Law About Data What is Data Race? Overview to OpenMP Components of OpenMP OpenMP Programming Model OpenMP Directives

More information

Introduction to OpenMP.

Introduction to OpenMP. Introduction to OpenMP www.openmp.org Motivation Parallelize the following code using threads: for (i=0; i

More information

Programming with Shared Memory PART II. HPC Fall 2007 Prof. Robert van Engelen

Programming with Shared Memory PART II. HPC Fall 2007 Prof. Robert van Engelen Programming with Shared Memory PART II HPC Fall 2007 Prof. Robert van Engelen Overview Parallel programming constructs Dependence analysis OpenMP Autoparallelization Further reading HPC Fall 2007 2 Parallel

More information

Introduction to. Slides prepared by : Farzana Rahman 1

Introduction to. Slides prepared by : Farzana Rahman 1 Introduction to OpenMP Slides prepared by : Farzana Rahman 1 Definition of OpenMP Application Program Interface (API) for Shared Memory Parallel Programming Directive based approach with library support

More information

Programming with Shared Memory PART II. HPC Fall 2012 Prof. Robert van Engelen

Programming with Shared Memory PART II. HPC Fall 2012 Prof. Robert van Engelen Programming with Shared Memory PART II HPC Fall 2012 Prof. Robert van Engelen Overview Sequential consistency Parallel programming constructs Dependence analysis OpenMP Autoparallelization Further reading

More information

OpenMP loops. Paolo Burgio.

OpenMP loops. Paolo Burgio. OpenMP loops Paolo Burgio paolo.burgio@unimore.it Outline Expressing parallelism Understanding parallel threads Memory Data management Data clauses Synchronization Barriers, locks, critical sections Work

More information

CS 470 Spring Mike Lam, Professor. Advanced OpenMP

CS 470 Spring Mike Lam, Professor. Advanced OpenMP CS 470 Spring 2018 Mike Lam, Professor Advanced OpenMP Atomics OpenMP provides access to highly-efficient hardware synchronization mechanisms Use the atomic pragma to annotate a single statement Statement

More information

OpenMP. Application Program Interface. CINECA, 14 May 2012 OpenMP Marco Comparato

OpenMP. Application Program Interface. CINECA, 14 May 2012 OpenMP Marco Comparato OpenMP Application Program Interface Introduction Shared-memory parallelism in C, C++ and Fortran compiler directives library routines environment variables Directives single program multiple data (SPMD)

More information

OpenMP. Dr. William McDoniel and Prof. Paolo Bientinesi WS17/18. HPAC, RWTH Aachen

OpenMP. Dr. William McDoniel and Prof. Paolo Bientinesi WS17/18. HPAC, RWTH Aachen OpenMP Dr. William McDoniel and Prof. Paolo Bientinesi HPAC, RWTH Aachen mcdoniel@aices.rwth-aachen.de WS17/18 Loop construct - Clauses #pragma omp for [clause [, clause]...] The following clauses apply:

More information

OpenMP dynamic loops. Paolo Burgio.

OpenMP dynamic loops. Paolo Burgio. OpenMP dynamic loops Paolo Burgio paolo.burgio@unimore.it Outline Expressing parallelism Understanding parallel threads Memory Data management Data clauses Synchronization Barriers, locks, critical sections

More information

Topics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP

Topics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP Topics Lecture 11 Introduction OpenMP Some Examples Library functions Environment variables 1 2 Introduction Shared Memory Parallelization OpenMP is: a standard for parallel programming in C, C++, and

More information

OpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013

OpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013 OpenMP António Abreu Instituto Politécnico de Setúbal 1 de Março de 2013 António Abreu (Instituto Politécnico de Setúbal) OpenMP 1 de Março de 2013 1 / 37 openmp what? It s an Application Program Interface

More information

Concurrent Programming with OpenMP

Concurrent Programming with OpenMP Concurrent Programming with OpenMP Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico October 11, 2012 CPD (DEI / IST) Parallel and Distributed

More information

Parallel Computing. Lecture 16: OpenMP - IV

Parallel Computing. Lecture 16: OpenMP - IV CSCI-UA.0480-003 Parallel Computing Lecture 16: OpenMP - IV Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com PRODUCERS AND CONSUMERS Queues A natural data structure to use in many multithreaded

More information

OpenMP. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen

OpenMP. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen OpenMP Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS16/17 Worksharing constructs To date: #pragma omp parallel created a team of threads We distributed

More information

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines What is OpenMP? What does OpenMP stands for? What does OpenMP stands for? Open specifications for Multi

More information

OpenMP - III. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

OpenMP - III. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen OpenMP - III Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 OpenMP References Using OpenMP: Portable Shared Memory Parallel Programming. The MIT

More information

Shared Memory Parallelism - OpenMP

Shared Memory Parallelism - OpenMP Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (http://www.llnl.gov/computing/tutorials/openmp/#introduction) OpenMP sc99 tutorial

More information

Introduction to Standard OpenMP 3.1

Introduction to Standard OpenMP 3.1 Introduction to Standard OpenMP 3.1 Massimiliano Culpo - m.culpo@cineca.it Gian Franco Marras - g.marras@cineca.it CINECA - SuperComputing Applications and Innovation Department 1 / 59 Outline 1 Introduction

More information

OpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018

OpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018 OpenMP 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #7 2/5/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline From last class

More information

Review. Lecture 12 5/22/2012. Compiler Directives. Library Functions Environment Variables. Compiler directives for construct, collapse clause

Review. Lecture 12 5/22/2012. Compiler Directives. Library Functions Environment Variables. Compiler directives for construct, collapse clause Review Lecture 12 Compiler Directives Conditional compilation Parallel construct Work-sharing constructs for, section, single Synchronization Work-tasking Library Functions Environment Variables 1 2 13b.cpp

More information

Multithreading in C with OpenMP

Multithreading in C with OpenMP Multithreading in C with OpenMP ICS432 - Spring 2017 Concurrent and High-Performance Programming Henri Casanova (henric@hawaii.edu) Pthreads are good and bad! Multi-threaded programming in C with Pthreads

More information

CS 470 Spring Mike Lam, Professor. Advanced OpenMP

CS 470 Spring Mike Lam, Professor. Advanced OpenMP CS 470 Spring 2017 Mike Lam, Professor Advanced OpenMP Atomics OpenMP provides access to highly-efficient hardware synchronization mechanisms Use the atomic pragma to annotate a single statement Statement

More information

by system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by using function call

by system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by using function call OpenMP Syntax The OpenMP Programming Model Number of threads are determined by system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by

More information

Parallel Computing. Prof. Marco Bertini

Parallel Computing. Prof. Marco Bertini Parallel Computing Prof. Marco Bertini Shared memory: OpenMP Implicit threads: motivations Implicit threading frameworks and libraries take care of much of the minutiae needed to create, manage, and (to

More information

Masterpraktikum - High Performance Computing

Masterpraktikum - High Performance Computing Masterpraktikum - High Performance Computing OpenMP Michael Bader Alexander Heinecke Alexander Breuer Technische Universität München, Germany 2 #include ... #pragma omp parallel for for(i = 0; i

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP Ekpe Okorafor School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014 A little about me! PhD Computer Engineering Texas A&M University Computer Science

More information

OpenMP examples. Sergeev Efim. Singularis Lab, Ltd. Senior software engineer

OpenMP examples. Sergeev Efim. Singularis Lab, Ltd. Senior software engineer OpenMP examples Sergeev Efim Senior software engineer Singularis Lab, Ltd. OpenMP Is: An Application Program Interface (API) that may be used to explicitly direct multi-threaded, shared memory parallelism.

More information

Lecture 4: OpenMP Open Multi-Processing

Lecture 4: OpenMP Open Multi-Processing CS 4230: Parallel Programming Lecture 4: OpenMP Open Multi-Processing January 23, 2017 01/23/2017 CS4230 1 Outline OpenMP another approach for thread parallel programming Fork-Join execution model OpenMP

More information

More Advanced OpenMP. Saturday, January 30, 16

More Advanced OpenMP. Saturday, January 30, 16 More Advanced OpenMP This is an abbreviated form of Tim Mattson s and Larry Meadow s (both at Intel) SC 08 tutorial located at http:// openmp.org/mp-documents/omp-hands-on-sc08.pdf All errors are my responsibility

More information

[Potentially] Your first parallel application

[Potentially] Your first parallel application [Potentially] Your first parallel application Compute the smallest element in an array as fast as possible small = array[0]; for( i = 0; i < N; i++) if( array[i] < small ) ) small = array[i] 64-bit Intel

More information

CSL 860: Modern Parallel

CSL 860: Modern Parallel CSL 860: Modern Parallel Computation Hello OpenMP #pragma omp parallel { // I am now thread iof n switch(omp_get_thread_num()) { case 0 : blah1.. case 1: blah2.. // Back to normal Parallel Construct Extremely

More information

Programming Shared-memory Platforms with OpenMP. Xu Liu

Programming Shared-memory Platforms with OpenMP. Xu Liu Programming Shared-memory Platforms with OpenMP Xu Liu Introduction to OpenMP OpenMP directives concurrency directives parallel regions loops, sections, tasks Topics for Today synchronization directives

More information

COSC 6374 Parallel Computation. Introduction to OpenMP. Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)

COSC 6374 Parallel Computation. Introduction to OpenMP. Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel) COSC 6374 Parallel Computation Introduction to OpenMP Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel) Edgar Gabriel Fall 2015 OpenMP Provides thread programming model at a

More information

ME759 High Performance Computing for Engineering Applications

ME759 High Performance Computing for Engineering Applications ME759 High Performance Computing for Engineering Applications Parallel Computing on Multicore CPUs October 25, 2013 Dan Negrut, 2013 ME964 UW-Madison A programming language is low level when its programs

More information

New Features after OpenMP 2.5

New Features after OpenMP 2.5 New Features after OpenMP 2.5 2 Outline OpenMP Specifications Version 3.0 Task Parallelism Improvements to nested and loop parallelism Additional new Features Version 3.1 - New Features Version 4.0 simd

More information

OpenMP Tutorial. Rudi Eigenmann of Purdue, Sanjiv Shah of Intel and others too numerous to name have contributed content for this tutorial.

OpenMP Tutorial. Rudi Eigenmann of Purdue, Sanjiv Shah of Intel and others too numerous to name have contributed content for this tutorial. OpenMP * in Action Tim Mattson Intel Corporation Barbara Chapman University of Houston Acknowledgements: Rudi Eigenmann of Purdue, Sanjiv Shah of Intel and others too numerous to name have contributed

More information

OpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2.

OpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2. OpenMP Overview in 30 Minutes Christian Terboven 06.12.2010 / Aachen, Germany Stand: 03.12.2010 Version 2.3 Rechen- und Kommunikationszentrum (RZ) Agenda OpenMP: Parallel Regions,

More information

HPCSE - I. «OpenMP Programming Model - Part I» Panos Hadjidoukas

HPCSE - I. «OpenMP Programming Model - Part I» Panos Hadjidoukas HPCSE - I «OpenMP Programming Model - Part I» Panos Hadjidoukas 1 Schedule and Goals 13.10.2017: OpenMP - part 1 study the basic features of OpenMP able to understand and write OpenMP programs 20.10.2017:

More information

Alfio Lazzaro: Introduction to OpenMP

Alfio Lazzaro: Introduction to OpenMP First INFN International School on Architectures, tools and methodologies for developing efficient large scale scientific computing applications Ce.U.B. Bertinoro Italy, 12 17 October 2009 Alfio Lazzaro:

More information

Advanced C Programming Winter Term 2008/09. Guest Lecture by Markus Thiele

Advanced C Programming Winter Term 2008/09. Guest Lecture by Markus Thiele Advanced C Programming Winter Term 2008/09 Guest Lecture by Markus Thiele Lecture 14: Parallel Programming with OpenMP Motivation: Why parallelize? The free lunch is over. Herb

More information

Programming with OpenMP* Intel Software College

Programming with OpenMP* Intel Software College Programming with OpenMP* Intel Software College Objectives Upon completion of this module you will be able to use OpenMP to: implement data parallelism implement task parallelism Agenda What is OpenMP?

More information

Shared Memory Programming with OpenMP (3)

Shared Memory Programming with OpenMP (3) Shared Memory Programming with OpenMP (3) 2014 Spring Jinkyu Jeong (jinkyu@skku.edu) 1 SCHEDULING LOOPS 2 Scheduling Loops (2) parallel for directive Basic partitioning policy block partitioning Iteration

More information

https://www.youtube.com/playlist?list=pllx- Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG

https://www.youtube.com/playlist?list=pllx- Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG https://www.youtube.com/playlist?list=pllx- Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG OpenMP Basic Defs: Solution Stack HW System layer Prog. User layer Layer Directives, Compiler End User Application OpenMP library

More information

Parallel Programming using OpenMP

Parallel Programming using OpenMP 1 Parallel Programming using OpenMP Mike Bailey mjb@cs.oregonstate.edu openmp.pptx OpenMP Multithreaded Programming 2 OpenMP stands for Open Multi-Processing OpenMP is a multi-vendor (see next page) standard

More information

Parallel Programming using OpenMP

Parallel Programming using OpenMP 1 OpenMP Multithreaded Programming 2 Parallel Programming using OpenMP OpenMP stands for Open Multi-Processing OpenMP is a multi-vendor (see next page) standard to perform shared-memory multithreading

More information

Shared Memory Programming with OpenMP

Shared Memory Programming with OpenMP Shared Memory Programming with OpenMP (An UHeM Training) Süha Tuna Informatics Institute, Istanbul Technical University February 12th, 2016 2 Outline - I Shared Memory Systems Threaded Programming Model

More information

OpenMP Tutorial. Seung-Jai Min. School of Electrical and Computer Engineering Purdue University, West Lafayette, IN

OpenMP Tutorial. Seung-Jai Min. School of Electrical and Computer Engineering Purdue University, West Lafayette, IN OpenMP Tutorial Seung-Jai Min (smin@purdue.edu) School of Electrical and Computer Engineering Purdue University, West Lafayette, IN 1 Parallel Programming Standards Thread Libraries - Win32 API / Posix

More information

COMP4300/8300: The OpenMP Programming Model. Alistair Rendell. Specifications maintained by OpenMP Architecture Review Board (ARB)

COMP4300/8300: The OpenMP Programming Model. Alistair Rendell. Specifications maintained by OpenMP Architecture Review Board (ARB) COMP4300/8300: The OpenMP Programming Model Alistair Rendell See: www.openmp.org Introduction to High Performance Computing for Scientists and Engineers, Hager and Wellein, Chapter 6 & 7 High Performance

More information

COMP4300/8300: The OpenMP Programming Model. Alistair Rendell

COMP4300/8300: The OpenMP Programming Model. Alistair Rendell COMP4300/8300: The OpenMP Programming Model Alistair Rendell See: www.openmp.org Introduction to High Performance Computing for Scientists and Engineers, Hager and Wellein, Chapter 6 & 7 High Performance

More information

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono OpenMP Algoritmi e Calcolo Parallelo References Useful references Using OpenMP: Portable Shared Memory Parallel Programming, Barbara Chapman, Gabriele Jost and Ruud van der Pas OpenMP.org http://openmp.org/

More information

Introduction to OpenMP. Motivation

Introduction to OpenMP.  Motivation Introduction to OpenMP www.openmp.org Motivation Parallel machines are abundant Servers are 2-8 way SMPs and more Upcoming processors are multicore parallel programming is beneficial and actually necessary

More information

CS691/SC791: Parallel & Distributed Computing

CS691/SC791: Parallel & Distributed Computing CS691/SC791: Parallel & Distributed Computing Introduction to OpenMP Part 2 1 OPENMP: SORTING 1 Bubble Sort Serial Odd-Even Transposition Sort 2 Serial Odd-Even Transposition Sort First OpenMP Odd-Even

More information

Multicore Computing. Arash Tavakkol. Instructor: Department of Computer Engineering Sharif University of Technology Spring 2016

Multicore Computing. Arash Tavakkol. Instructor: Department of Computer Engineering Sharif University of Technology Spring 2016 Multicore Computing Instructor: Arash Tavakkol Department of Computer Engineering Sharif University of Technology Spring 2016 Shared Memory Programming Using OpenMP Some Slides come From Parallel Programmingin

More information

Shared Memory Programming Model

Shared Memory Programming Model Shared Memory Programming Model Ahmed El-Mahdy and Waleed Lotfy What is a shared memory system? Activity! Consider the board as a shared memory Consider a sheet of paper in front of you as a local cache

More information

COSC 6374 Parallel Computation. Introduction to OpenMP(I) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)

COSC 6374 Parallel Computation. Introduction to OpenMP(I) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel) COSC 6374 Parallel Computation Introduction to OpenMP(I) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel) Edgar Gabriel Fall 2014 Introduction Threads vs. processes Recap of

More information

OpenMP C and C++ Application Program Interface Version 1.0 October Document Number

OpenMP C and C++ Application Program Interface Version 1.0 October Document Number OpenMP C and C++ Application Program Interface Version 1.0 October 1998 Document Number 004 2229 001 Contents Page v Introduction [1] 1 Scope............................. 1 Definition of Terms.........................

More information

Computing on Graphs An Overview

Computing on Graphs An Overview Computing on Graphs An Overview Lecture 2 CSCI 4974/6971 1 Sep 2016 1 / 16 Today s Learning 1. Computations of Graphs 2. OpenMP refresher 3. Hands-on: Breadth-First Search 2 / 16 Computations of Graphs

More information

CSL 730: Parallel Programming. OpenMP

CSL 730: Parallel Programming. OpenMP CSL 730: Parallel Programming OpenMP int sum2d(int data[n][n]) { int i,j; #pragma omp parallel for for (i=0; i

More information

Overview: The OpenMP Programming Model

Overview: The OpenMP Programming Model Overview: The OpenMP Programming Model motivation and overview the parallel directive: clauses, equivalent pthread code, examples the for directive and scheduling of loop iterations Pi example in OpenMP

More information

Parallel Programming: OpenMP

Parallel Programming: OpenMP Parallel Programming: OpenMP Xianyi Zeng xzeng@utep.edu Department of Mathematical Sciences The University of Texas at El Paso. November 10, 2016. An Overview of OpenMP OpenMP: Open Multi-Processing An

More information

OpenMP 4. CSCI 4850/5850 High-Performance Computing Spring 2018

OpenMP 4. CSCI 4850/5850 High-Performance Computing Spring 2018 OpenMP 4 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives

More information

Lecture 2 A Hand-on Introduction to OpenMP

Lecture 2 A Hand-on Introduction to OpenMP CS075 1896 1920 1987 2006 Lecture 2 A Hand-on Introduction to OpenMP, 2,1 01 1 2 Outline Introduction to OpenMP Creating Threads Synchronization between variables Parallel Loops Synchronize single masters

More information

Parallel and Distributed Computing

Parallel and Distributed Computing Concurrent Programming with OpenMP Rodrigo Miragaia Rodrigues MSc in Information Systems and Computer Engineering DEA in Computational Engineering CS Department (DEI) Instituto Superior Técnico October

More information

Parallel Programming with OpenMP. CS240A, T. Yang, 2013 Modified from Demmel/Yelick s and Mary Hall s Slides

Parallel Programming with OpenMP. CS240A, T. Yang, 2013 Modified from Demmel/Yelick s and Mary Hall s Slides Parallel Programming with OpenMP CS240A, T. Yang, 203 Modified from Demmel/Yelick s and Mary Hall s Slides Introduction to OpenMP What is OpenMP? Open specification for Multi-Processing Standard API for

More information

Parallel Computing on Multi-Core Systems

Parallel Computing on Multi-Core Systems Parallel Computing on Multi-Core Systems Instructor: Arash Tavakkol Department of Computer Engineering Sharif University of Technology Spring 2016 Optimization Techniques in OpenMP programs Some slides

More information

Introduction to OpenMP. Lecture 4: Work sharing directives

Introduction to OpenMP. Lecture 4: Work sharing directives Introduction to OpenMP Lecture 4: Work sharing directives Work sharing directives Directives which appear inside a parallel region and indicate how work should be shared out between threads Parallel do/for

More information

CS4961 Parallel Programming. Lecture 13: Task Parallelism in OpenMP 10/05/2010. Programming Assignment 2: Due 11:59 PM, Friday October 8

CS4961 Parallel Programming. Lecture 13: Task Parallelism in OpenMP 10/05/2010. Programming Assignment 2: Due 11:59 PM, Friday October 8 CS4961 Parallel Programming Lecture 13: Task Parallelism in OpenMP 10/05/2010 Mary Hall October 5, 2010 CS4961 1 Programming Assignment 2: Due 11:59 PM, Friday October 8 Combining Locality, Thread and

More information

Review. Tasking. 34a.cpp. Lecture 14. Work Tasking 5/31/2011. Structured block. Parallel construct. Working-Sharing contructs.

Review. Tasking. 34a.cpp. Lecture 14. Work Tasking 5/31/2011. Structured block. Parallel construct. Working-Sharing contructs. Review Lecture 14 Structured block Parallel construct clauses Working-Sharing contructs for, single, section for construct with different scheduling strategies 1 2 Tasking Work Tasking New feature in OpenMP

More information

CS4961 Parallel Programming. Lecture 9: Task Parallelism in OpenMP 9/22/09. Administrative. Mary Hall September 22, 2009.

CS4961 Parallel Programming. Lecture 9: Task Parallelism in OpenMP 9/22/09. Administrative. Mary Hall September 22, 2009. Parallel Programming Lecture 9: Task Parallelism in OpenMP Administrative Programming assignment 1 is posted (after class) Due, Tuesday, September 22 before class - Use the handin program on the CADE machines

More information

Mango DSP Top manufacturer of multiprocessing video & imaging solutions.

Mango DSP Top manufacturer of multiprocessing video & imaging solutions. 1 of 11 3/3/2005 10:50 AM Linux Magazine February 2004 C++ Parallel Increase application performance without changing your source code. Mango DSP Top manufacturer of multiprocessing video & imaging solutions.

More information

Massimo Bernaschi Istituto Applicazioni del Calcolo Consiglio Nazionale delle Ricerche.

Massimo Bernaschi Istituto Applicazioni del Calcolo Consiglio Nazionale delle Ricerche. Massimo Bernaschi Istituto Applicazioni del Calcolo Consiglio Nazionale delle Ricerche massimo.bernaschi@cnr.it OpenMP by example } Dijkstra algorithm for finding the shortest paths from vertex 0 to the

More information

Parallel Programming. OpenMP Parallel programming for multiprocessors for loops

Parallel Programming. OpenMP Parallel programming for multiprocessors for loops Parallel Programming OpenMP Parallel programming for multiprocessors for loops OpenMP OpenMP An application programming interface (API) for parallel programming on multiprocessors Assumes shared memory

More information

OpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means

OpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means High Performance Computing: Concepts, Methods & Means OpenMP Programming Prof. Thomas Sterling Department of Computer Science Louisiana State University February 8 th, 2007 Topics Introduction Overview

More information

Parallel Programming with OpenMP. CS240A, T. Yang

Parallel Programming with OpenMP. CS240A, T. Yang Parallel Programming with OpenMP CS240A, T. Yang 1 A Programmer s View of OpenMP What is OpenMP? Open specification for Multi-Processing Standard API for defining multi-threaded shared-memory programs

More information

ECE/ME/EMA/CS 759 High Performance Computing for Engineering Applications

ECE/ME/EMA/CS 759 High Performance Computing for Engineering Applications ECE/ME/EMA/CS 759 High Performance Computing for Engineering Applications Work Sharing in OpenMP November 2, 2015 Lecture 21 Dan Negrut, 2015 ECE/ME/EMA/CS 759 UW-Madison Quote of the Day Success consists

More information

!OMP #pragma opm _OPENMP

!OMP #pragma opm _OPENMP Advanced OpenMP Lecture 12: Tips, tricks and gotchas Directives Mistyping the sentinel (e.g.!omp or #pragma opm ) typically raises no error message. Be careful! The macro _OPENMP is defined if code is

More information

EE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California

EE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California EE/CSCI 451 Introduction to Parallel and Distributed Computation Discussion #4 2/3/2017 University of Southern California 1 USC HPCC Access Compile Submit job OpenMP Today s topic What is OpenMP OpenMP

More information

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical

More information

ECE/ME/EMA/CS 759 High Performance Computing for Engineering Applications

ECE/ME/EMA/CS 759 High Performance Computing for Engineering Applications ECE/ME/EMA/CS 759 High Performance Computing for Engineering Applications Variable Sharing in OpenMP OpenMP synchronization issues OpenMP performance issues November 4, 2015 Lecture 22 Dan Negrut, 2015

More information

EPL372 Lab Exercise 5: Introduction to OpenMP

EPL372 Lab Exercise 5: Introduction to OpenMP EPL372 Lab Exercise 5: Introduction to OpenMP References: https://computing.llnl.gov/tutorials/openmp/ http://openmp.org/wp/openmp-specifications/ http://openmp.org/mp-documents/openmp-4.0-c.pdf http://openmp.org/mp-documents/openmp4.0.0.examples.pdf

More information

Programming Shared-memory Platforms with OpenMP

Programming Shared-memory Platforms with OpenMP Programming Shared-memory Platforms with OpenMP John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 7 31 February 2017 Introduction to OpenMP OpenMP

More information

OpenMP Introduction. CS 590: High Performance Computing. OpenMP. A standard for shared-memory parallel programming. MP = multiprocessing

OpenMP Introduction. CS 590: High Performance Computing. OpenMP. A standard for shared-memory parallel programming. MP = multiprocessing CS 590: High Performance Computing OpenMP Introduction Fengguang Song Department of Computer Science IUPUI OpenMP A standard for shared-memory parallel programming. MP = multiprocessing Designed for systems

More information

Parallel Programming Principle and Practice. Lecture 6 Shared Memory Programming OpenMP. Jin, Hai

Parallel Programming Principle and Practice. Lecture 6 Shared Memory Programming OpenMP. Jin, Hai Parallel Programming Principle and Practice Lecture 6 Shared Memory Programming OpenMP Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Outline OpenMP Overview

More information

OPENMP TIPS, TRICKS AND GOTCHAS

OPENMP TIPS, TRICKS AND GOTCHAS OPENMP TIPS, TRICKS AND GOTCHAS OpenMPCon 2015 2 Directives Mistyping the sentinel (e.g.!omp or #pragma opm ) typically raises no error message. Be careful! Extra nasty if it is e.g. #pragma opm atomic

More information

Distributed Systems + Middleware Concurrent Programming with OpenMP

Distributed Systems + Middleware Concurrent Programming with OpenMP Distributed Systems + Middleware Concurrent Programming with OpenMP Gianpaolo Cugola Dipartimento di Elettronica e Informazione Politecnico, Italy cugola@elet.polimi.it http://home.dei.polimi.it/cugola

More information

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical

More information

CS4961 Parallel Programming. Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms 9/5/12. Administrative. Mary Hall September 4, 2012

CS4961 Parallel Programming. Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms 9/5/12. Administrative. Mary Hall September 4, 2012 CS4961 Parallel Programming Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms Administrative Mailing list set up, everyone should be on it - You should have received a test mail last night

More information

Allows program to be incrementally parallelized

Allows program to be incrementally parallelized Basic OpenMP What is OpenMP An open standard for shared memory programming in C/C+ + and Fortran supported by Intel, Gnu, Microsoft, Apple, IBM, HP and others Compiler directives and library support OpenMP

More information

Introduction to Programming with OpenMP

Introduction to Programming with OpenMP Introduction to Programming with OpenMP Kent Milfeld; Lars Koesterke Yaakoub El Khamra (presenting) milfeld lars yye00@tacc.utexas.edu October 2012, TACC Outline What is OpenMP? How does OpenMP work? Architecture

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP Lecture 4: Work sharing directives Work sharing directives Directives which appear inside a parallel region and indicate how work should be shared out between threads Parallel do/for

More information

A Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh

A Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh A Short Introduction to OpenMP Mark Bull, EPCC, University of Edinburgh Overview Shared memory systems Basic Concepts in Threaded Programming Basics of OpenMP Parallel regions Parallel loops 2 Shared memory

More information

OPENMP OPEN MULTI-PROCESSING

OPENMP OPEN MULTI-PROCESSING OPENMP OPEN MULTI-PROCESSING OpenMP OpenMP is a portable directive-based API that can be used with FORTRAN, C, and C++ for programming shared address space machines. OpenMP provides the programmer with

More information