Parallel Programming. March 15,

Size: px
Start display at page:

Download "Parallel Programming. March 15,"

Transcription

1 Parallel Programming March 15,

2 Some Definitions Computational Models and Models of Computation real world system domain model - mathematical - organizational -... computational model March 15,

3 Models Computational model is a model of a (real) system which can be solved by a computer Model of Computation is the way a computational model (problem) is solved by a computer Model of Computation defines the functionality of a virtual machine March 15,

4 Elements of Models of Concurrent Computation Schedulable units of computation and data operators, functions, programs, arrays, objects,... Bindings between schedulable units of computation (values to names) global variables, communicated variables [Brown 86] March 15,

5 Example T1 program T1 var: A, global; A := 5; fork (T1,T2); T2 T3 program T2 var: B, local; B := A;... program T3 var: C, global; C := 1; A := C; schedulable computations: T 1, T 2, T 3 schedulable data: local variables aligned with T s global variables not binding computation: control flow name-to-value binding: may lead to non-determinism March 15,

6 From Application to Machine (Concurrent) Computation can be hierarchically specified by selecting refined models of concurrent computation The refinement is done by specifying the elements for the model of computation and the mapping of those elements between successive virtual machine For each element in the model of computation there might exist alternative choices March 15,

7 Major Levels Specification Design Implementation System Machines March 15,

8 Example Application Level: Element Assembly Node ordering Linear System Solver Design level: Element Assembly e.g. Cholesky with nested dissection March 15,

9 Example (cnt ) Implementation level: sequential programs with message passing explicit mapping System level: MPI March 15,

10 Another view Decompose Find schedulable units of computation (i.e. tasks) and data Cluster Cluster tasks and data into sequential allocatable units of work Goal: Establishing right level of granularity e.g. define processes or threads of execution Embed in programming environment select parallel programming language add communication and synchronization constructs Map allocatable units of work to processors Binding processes or threads to physical processors March 15,

11 Decompose Decide what level of parallelism to use Parallelism may be static or dynamic Level of parallelism may depend of problem size Determine number of processors that can effectively be used given a certain problem size More problem dependent than architecture dependent March 15,

12 Cluster Bring tasks (and sometimes data) together in conceptually executable programs (processes or threads) Various mechanisms cluster iterations data driven decomposition use heuristic cluster algorithms More problem dependent than architecture dependent March 15,

13 Embed in programming environment Select parallel programming method Add communication and synchronization constructs Add access mechanisms to shared data Design distributed data structures Optimize program to reduce cost of parallelism combine communication messages remove redundant synchronizations reduce parallel loop overhead Partly dependent on architecture March 15,

14 Map Assign processes or threads to processors Try to schedule processes that communicate a lot to the same processor Dependent on architecture and run-time system March 15,

15 Languages and Machines (Parallel) Language Or: sequential language + library Compiler Run-time system Machines March 15,

16 Basic Notions T1 T2 T3 T4 3 5 Critical Path Max Concurrency 18 3 T5 Task dependency graph but: data handling is not included March 15,

17 How to find parallelism? Start with sequential code base Do profiling to identify computationally intensive kernels Code inspection (identify computationally intensive loops) Inspect data structures and if necessary unbundle them rewrite parts of the code using parallel constructs or library calls Change the algorithm(s) Sometimes algorithms are too sequential find parallel alternatives Start all over again Working with existing code base is sometimes in efficient Recode from specification March 15,

18 Algorithm Change [1] i Stencil operation: nearest neighbor Gauss-Seidel solver j for(i=1;i<dim;i++){ for(j=1;j<dim;j++){ a(i,j)=0.2*(a(i,j)+a(i,j-1)+a(i-1,j)+a(i,j+1)+a(i+1,j)) }} March 15,

19 Algorithm Change [2] j Do coordinate transform on iteration space Parallelism along anti-diagonals i Load balancing problem for(i=1;i<dim-1;i++){ parfor(j=??){ a(i,j)=0.2*(a(i,j)+a(i,j-1)+a(i-1,j)+a(i,j+1)+a(i+1,j)) }} March 15,

20 Algorithm Change [3] If all (i,j) iterations are done simultaneously; Jacobi solver (needs duplicate data structure) If not, program might become nondeterministic! parallel for(i=1;i<dim;i++){ parallel for(j=1;j<dim;j++){ a(i,j)=0.2*(a(i,j)+a(i,j-1)+a(i-1,j)+a(i,j+1)+a(i+1,j)) } } March 15,

21 Algorithm Change [4] Alternative scheme: red-black ordering Red and black elements can be done in parallel, respectively Different schemes have different numerical convergence!! parallel for( red elements ){ a(i,j)=0.2*(a(i,j)+a(i,j-1)+a(i-1,j)+a(i,j+1)+a(i+1,j)) } parallel for( black elements ){ a(i,j)=0.2*(a(i,j)+a(i,j-1)+a(i-1,j)+a(i,j+1)+a(i+1,j)) } March 15,

22 Important Machine Classes Shared Memory physically shared address space Distributed Memory physically local address spaces March 15,

23 Shared Memory a b c interconnection network P P P P P P a(1)=.. a(2)=.. a(3)=.. All processors have equal access to all memory locations for(i=1;i<dim+1;i++){ a(i)=b(i)+c(i) } translate to: parallel for(i=1;i<dim+1;i++){ a(i)=b(i)+c(i) } (or directly if possible) March 15,

24 Distributed Memory a b c interconnection network P?b(1) P?b(3) P a(1)=.. a(2)=.. a(3)=.. Every processor has its own local address space Non-local data can only be accessed through a request to processor owning the data Access times may depend on distance March 15,

25 Virtual Shared Memory Virtual Shared Memory P P P Global address space Memories are distributed Local access is faster than remote access Hardware (or software) hide memory distribution. March 15,

26 Problems Shared Memory model machines have scalability problems automatic parallel program generation difficult Distributed Memory model data distribution is important automatic parallel program generation very difficult March 15,

27 Models of parallel computation For sequential programming, structured programming techniques have evolved to aid the programmer What do we have for parallel programming? conceptual level farmer/worker divide and conquer data parallelism function parallelism bulk synchronous system (implementation) level (logical) data spaces and programs communication and synchronization March 15,

28 Farmer/Worker 1. Farmer / Worker pool of workers Farmer A pool of identical processes is formed, which are each given a piece of work by the farmer process. When finished, a worker asks the farmer for another job until nothing is left. March 15,

29 Divide and Conquer 2. Divide and Conquer A task is recursively split in smaller tasks, which at some point of granularity are handed to workers. When finished, the results are again recursively moved up until the root task is reached. March 15,

30 Data parallelism 3. Data Parallelism block distribution data structure processor-memory pairs communication network The data structures are subdivided among the memories of the distributed processors. Computations follow the given data distributions. March 15,

31 Function or Task Parallelism 4. Function parallelism f f g g h h Functions act on arguments. There are a number of subclasses in this model, due to differences in execution semantics: pipelining data flow demand driven / control driven March 15,

32 Bulk synchronous 5. Bulk Synchronous global sync between phases computation communication decision communication Parallel computation consists of repeated cycles of Compute phase, where all calculations are done locally Communication phase Decision Phase (e.g. convergence) Communication Phase March 15,

33 Communication typical mechanisms: - semaphores - critical sections thread-1 d:=e+a thread-2 a in global memory a:=5 typical mechanisms: - message passing a in local memory March 15,

34 Communication in shared memory thread-1 thread-2 s in global memory lock statement must be atomic s:=s+4 /code thread-1... lock s; s=s+4; unlock s;... mutual exclusions required s:=s+5 /code thread-1... lock s; s=s+5; unlock s;... March 15,

35 Synchronization in shared memory Some parallel models, like the bulk synchronous model, require global synchronization. Barrier construct: barrier(name,processor_group) /code thread-1... /compute code barrier(5,procs) /communication /or updating shared /variables barrier(6,procs) /decision code barrier(7,procs) /code thread-2... /compute code barrier(5,procs) /communication /or updating shared /variables barrier(6,procs) /decision code barrier(7,procs)... March 15,

36 Message Passing Basic building blocks send(void *sendbuf, int n_elems, int p_dest) receive(void *recbuf, int n_elems, int p_source) (not all parameters shown for convenience) /code P0... a=100; send(&a,1,1) a=0;... /code P1... receive(&a,1,0) Printf( %d\n,a);... not the same a MPI March 15,

37 Message Passing Discipline /code P send(&a,1,1) receive(&b,1,1) /code P send(&b,1,0) receive(&a,1,0) send must be non-blocking receive must be blocking March 15,

38 Matrix vector example y 1 y 2 y 3 y 4 a 11 a 12 a 13 a 14 a = 21 a 22 a 23 a 24 a 31 a 32 a 33 a 34 a 41 a 42 a 43 a 44 x 1 x 2 x 3 x 4 y = A x for(i=1;i<dim+1;i++){ y(i)=0; for(j=1;j<dim+1;j++){ y(i)=y(i) + a(i,j) * x(j) } } March 15,

39 Shared memory model: OpenMP Extension to Fortran, C, and C++ Basic Features Parallel Loops Parallel Sections Synchronization Critical Sections Communication Shared variables March 15,

40 The OMP execution model parallel construct worksharing construct base process n idle end parallel construct Parallel construct creates a number of worker processes (or threads) Worksharing construct hands out work to available workers (threads) March 15,

41 Format OMP #pragma omp directive [clause list] Example: #pragma omp parallel [clause list] /* structured block */. March 15,

42 Parallel DO #pragma omp parallel #pragma omp for schedule(static,2) for(i=1;i<dim+1;i++){ March 15,

43 Parallel Sections PARALLEL SECTIONS combines parallel and worksharing construct Allows independent execution of subprograms #pragma omp parallel section { #pragma omp section{ task_a(); } #pragma omp section{ } task_b(); } March 15,

44 Locks and Critical Sections for(i=1;i<dim+1;i++){ sum = sum + b(i)**2 } #pragma omp parallel private(i) reduction(+:sum) schedule(dynamic,1) for(i=1;i<dim+1;i++){ } sum = sum + b(i)**2 } b(i)**2 update sum March 15,

45 MVP example #pragma omp parallel for private(j) for(i=1;i<dim+1;i++){ y(i)=0; for(j=1;j<dim+1;j++){ y(i)= y(i) + a(i,j) * x(j) } } March 15,

46 Distributed Memory model: HPF High Performance Fortran (HPF) Based on adding annotations to sequential Fortran 90 base program Main features parallel constructs worksharing based on data distribution statements March 15,

47 Mapping model a REAL a(n,n) align (optional) tem P Real_P!HPF$ TEMPLATE tem(n,n)!hpf$ ALIGN a(:,:) WITH tem(:,:) distribute!hpf$ PROCESSORS p(5)!hpf$ DISTRIBUTE a(block,*) ONTO p P -> Real_P mapping by compiler March 15,

48 Virtual processors Create a grid of virtual processors Examples:!HPF$ PROCESSORS p(3,3)!hpf$ PROCESSORS q(m)!hpf$ PROCESSORS powerpc q p powerpc March 15,

49 Distribution Distributes the array elements over virtual processors Various predefined distribution functions block, block(m) cyclic, cyclic(m) * (all elements in dimension to same processor) examples:!hpf$ DISTRIBUTE a(block,block) ONTO p!hpf$ DISTRIBUTE a(cyclic,*) ONTO q!hpf$ DISTRIBUTE a(*,*) ONTO powerpc March 15,

50 Block distribution BLOCK means give equal sized contiguous blocks of array elements to processors Suited for problems with equal computational load per element REAL a(16)!hpf$ PROCESSORS p(4)!hpf$ DISTRIBUTE a(block) ONTO p or!hpf$ DISTRIBUTE a(block(8)) ONTO p block P 0 P 1 P 2 P 3 block(8) P 0 P 1 March 15,

51 Alignment Arrays can be ALIGNed to each other Aligned arrays are collectively distributed through top array Top array can be specified as a fictitious array, called TEMPLATE Purpose: specification of locality: two aligned array elements reside on the same processor to allow replication and collapsing of dimensions convenient means to distribute group of arrays with a single DISTRIBUTE statement March 15,

52 Alignment, example (1) One-to-one alignment REAL a(16), b(16)!hpf$ ALIGN b(:) WITH a(:)!hpf$ DISTRIBUTE a(block) ONTO p P 0 P 1 P 2 P 3 a b March 15,

53 Alignment, example (2) Collapsed alignment REAL a(16), b(16,3)!hpf$ ALIGN b(:,*) WITH a(:)!hpf$ DISTRIBUTE a(block) ONTO p a b More alignment functions available March 15,

54 Parallelism F90 array assignment statements FORALL statement FORALL is new language feature, no directive March 15,

55 Examples Array statements REAL, DIMENSION(N) :: a,b,c a = b+c a(1:10:2) = b(6:10) + c(1:5) Forall statements FORALL (i=1:n,j=1,m, a(i,j).ne.0) a(i,j) = 1.0 / a(i,j) has copy-in copy-out semantics March 15,

56 MVP in HPF (1) P 0 P 1 P 2 y A x P 3 REAL a(m,n),y(m),x(n)!hpf$ PROCESSORS p(4)!hpf$ ALIGN a(:,*) WITH y(:)!hpf$ ALIGN x(:) WITH y(:)!hpf$ DISTRIBUTE y(block) ONTO p!hpf$ INDPENDENT DO j=1,n FORALL(i=1:m) y(i) =y(i) + a(i,j)*x(j) END FORALL END DO or March 15, FORALL(i=1,m) y(i) = DOT_PRODUCT(a(i,:),x(:))

57 MVP in HPF (2) P 0 P 1 P 2 P 3 y A x x x x REAL a(m,n),y(m),x(n)!hpf$ PROCESSORS p(4)!hpf$ ALIGN a(:,*) WITH y(:)!hpf$ ALIGN x(*) WITH y(*)!hpf$ DISTRIBUTE y(block) ONTO p!hpf$ INDEPENDENT DO j=1,n FORALL(i=1,m) y(i) = DOT_PRODUCT(a(i,:),x(:)) END DO March 15,

58 Distributed Memory Model: Message Passing No notion of global data Every processor has its own private arrays and variables No separate synchronization constructs implicit synchronization by message transfer barrier synchronization is done by all-to-all broadcast Program can only identify itself by (locally) calling a function that returns the processor number it is executing on The usual way of programming is SPMD (Single Program, Multiple Data), i.e. each processor runs a copy of the same program and parameterizes itself according to the processor identifier Difficult to get code error free (lots of details to handle in code) March 15,

59 Local data structures P 0 P 1 P 2 P 3 y A x (third loop) = y_local a_local x_local s from other Processors (second loop) x_local temp to other processors (first loop) 3/15/10 59

60 MVP in Message Passing } float a_local[m/procs,n],y_local[m/procs],x_local[n/procs],temp[n]; int mypid; mypid=proc(); k=n/procs; for(i=0,i<procs,i++){ if(mypid=i)then send(&x_local[0],k,i); (1) }}} for(i=0,i<procs,i++){ if(mypid!=i)then receive(&temp[i*k],k,i); else{ for(j=0,j<k,j++){ temp[i*k+j]=x_local[j]; March 15, }} for(i=0,i<k,i++){ for(j=0,j<n,j++){ y_local[i] = y_local[i] + a_local[i,j]*temp(j); P 0 P 1 P 2 P 3 y A (2) x (3)

61 MVP in Message Passing [2] Need local buffer (temp[ ]) of global size n anyway! Whether previous message passing scheme is efficient is determined by how x_local[ ] changes during the rest of the computation the MVP is embedded in If it changes locally on the processors, we need to resend it each cycle. But in that case we might as well put in all elements of temp in a buffer x_local[ ] of global size n, saving a local copy loop March 15,

62 Local data structures P 0 P 1 P 2 P 3 y A x (third loop) = y_local a_local x_local s from other Processors (second loop) x-local to other processors (first loop) 3/15/10 62

63 MVP in Message Passing [3] resulting in: float a_local[m/procs,n],y_local[m/procs],x_local[n]; int mypid; mypid=proc(); k=n/procs; for(i=0,i<procs,i++){ if(mypid=i)then send(&x_local[mypid*k],k,i); (1) for(i=0,i<procs,i++){ if(mypid!=i)then receive(&x_local[i*k],k,i); (2) }} for(i=0,i<k,i++){ for(j=0,j<n,j++){ (3) y_local[i] = y_local[i] + a_local[i,j]*x_local(j); }} March 15,

64 Message Passing Observations A general observation is that in message passing often a local copy of remote data needs to be created. The only way to avoid that are element-wise transfers, but they are very inefficient Also the HPF compiler needs to do that, but this is hidden from the programmer For example in stencil-type of operations, we need to extend the local array with so-called ghost rows March 15,

65 Ghost rows tedious hand-coding required to get the details right March 15,

HPF commands specify which processor gets which part of the data. Concurrency is defined by HPF commands based on Fortran90

HPF commands specify which processor gets which part of the data. Concurrency is defined by HPF commands based on Fortran90 149 Fortran and HPF 6.2 Concept High Performance Fortran 6.2 Concept Fortran90 extension SPMD (Single Program Multiple Data) model each process operates with its own part of data HPF commands specify which

More information

Synchronous Shared Memory Parallel Examples. HPC Fall 2010 Prof. Robert van Engelen

Synchronous Shared Memory Parallel Examples. HPC Fall 2010 Prof. Robert van Engelen Synchronous Shared Memory Parallel Examples HPC Fall 2010 Prof. Robert van Engelen Examples Data parallel prefix sum and OpenMP example Task parallel prefix sum and OpenMP example Simple heat distribution

More information

Synchronous Computation Examples. HPC Fall 2008 Prof. Robert van Engelen

Synchronous Computation Examples. HPC Fall 2008 Prof. Robert van Engelen Synchronous Computation Examples HPC Fall 2008 Prof. Robert van Engelen Overview Data parallel prefix sum with OpenMP Simple heat distribution problem with OpenMP Iterative solver with OpenMP Simple heat

More information

Synchronous Shared Memory Parallel Examples. HPC Fall 2012 Prof. Robert van Engelen

Synchronous Shared Memory Parallel Examples. HPC Fall 2012 Prof. Robert van Engelen Synchronous Shared Memory Parallel Examples HPC Fall 2012 Prof. Robert van Engelen Examples Data parallel prefix sum and OpenMP example Task parallel prefix sum and OpenMP example Simple heat distribution

More information

Multithreading in C with OpenMP

Multithreading in C with OpenMP Multithreading in C with OpenMP ICS432 - Spring 2017 Concurrent and High-Performance Programming Henri Casanova (henric@hawaii.edu) Pthreads are good and bad! Multi-threaded programming in C with Pthreads

More information

Systolic arrays Parallel SIMD machines. 10k++ processors. Vector/Pipeline units. Front End. Normal von Neuman Runs the application program

Systolic arrays Parallel SIMD machines. 10k++ processors. Vector/Pipeline units. Front End. Normal von Neuman Runs the application program SIMD Single Instruction Multiple Data Lecture 12: SIMD-machines & data parallelism, dependency analysis for automatic vectorizing and parallelizing of serial program Part 1 Parallelism through simultaneous

More information

Parallelization of an Example Program

Parallelization of an Example Program Parallelization of an Example Program [ 2.3] In this lecture, we will consider a parallelization of the kernel of the Ocean application. Goals: Illustrate parallel programming in a low-level parallel language.

More information

CS 470 Spring Mike Lam, Professor. Advanced OpenMP

CS 470 Spring Mike Lam, Professor. Advanced OpenMP CS 470 Spring 2018 Mike Lam, Professor Advanced OpenMP Atomics OpenMP provides access to highly-efficient hardware synchronization mechanisms Use the atomic pragma to annotate a single statement Statement

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #7 2/5/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline From last class

More information

CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)

CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman) CMSC 714 Lecture 4 OpenMP and UPC Chau-Wen Tseng (from A. Sussman) Programming Model Overview Message passing (MPI, PVM) Separate address spaces Explicit messages to access shared data Send / receive (MPI

More information

Simulating ocean currents

Simulating ocean currents Simulating ocean currents We will study a parallel application that simulates ocean currents. Goal: Simulate the motion of water currents in the ocean. Important to climate modeling. Motion depends on

More information

OpenMP - III. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

OpenMP - III. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen OpenMP - III Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 OpenMP References Using OpenMP: Portable Shared Memory Parallel Programming. The MIT

More information

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 Message passing vs. Shared memory Client Client Client Client send(msg) recv(msg) send(msg) recv(msg) MSG MSG MSG IPC Shared

More information

Acknowledgments. Amdahl s Law. Contents. Programming with MPI Parallel programming. 1 speedup = (1 P )+ P N. Type to enter text

Acknowledgments. Amdahl s Law. Contents. Programming with MPI Parallel programming. 1 speedup = (1 P )+ P N. Type to enter text Acknowledgments Programming with MPI Parallel ming Jan Thorbecke Type to enter text This course is partly based on the MPI courses developed by Rolf Rabenseifner at the High-Performance Computing-Center

More information

Lecture 4: OpenMP Open Multi-Processing

Lecture 4: OpenMP Open Multi-Processing CS 4230: Parallel Programming Lecture 4: OpenMP Open Multi-Processing January 23, 2017 01/23/2017 CS4230 1 Outline OpenMP another approach for thread parallel programming Fork-Join execution model OpenMP

More information

Parallelization Principles. Sathish Vadhiyar

Parallelization Principles. Sathish Vadhiyar Parallelization Principles Sathish Vadhiyar Parallel Programming and Challenges Recall the advantages and motivation of parallelism But parallel programs incur overheads not seen in sequential programs

More information

Static Data Race Detection for SPMD Programs via an Extended Polyhedral Representation

Static Data Race Detection for SPMD Programs via an Extended Polyhedral Representation via an Extended Polyhedral Representation Habanero Extreme Scale Software Research Group Department of Computer Science Rice University 6th International Workshop on Polyhedral Compilation Techniques (IMPACT

More information

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines What is OpenMP? What does OpenMP stands for? What does OpenMP stands for? Open specifications for Multi

More information

Introduction to OpenMP.

Introduction to OpenMP. Introduction to OpenMP www.openmp.org Motivation Parallelize the following code using threads: for (i=0; i

More information

Lecture 16: Recapitulations. Lecture 16: Recapitulations p. 1

Lecture 16: Recapitulations. Lecture 16: Recapitulations p. 1 Lecture 16: Recapitulations Lecture 16: Recapitulations p. 1 Parallel computing and programming in general Parallel computing a form of parallel processing by utilizing multiple computing units concurrently

More information

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Computing 2012 Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Algorithm Design Outline Computational Model Design Methodology Partitioning Communication

More information

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program Amdahl's Law About Data What is Data Race? Overview to OpenMP Components of OpenMP OpenMP Programming Model OpenMP Directives

More information

Performance Issues in Parallelization Saman Amarasinghe Fall 2009

Performance Issues in Parallelization Saman Amarasinghe Fall 2009 Performance Issues in Parallelization Saman Amarasinghe Fall 2009 Today s Lecture Performance Issues of Parallelism Cilk provides a robust environment for parallelization It hides many issues and tries

More information

OpenMP. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen

OpenMP. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen OpenMP Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS16/17 Worksharing constructs To date: #pragma omp parallel created a team of threads We distributed

More information

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical

More information

Concurrent Programming with OpenMP

Concurrent Programming with OpenMP Concurrent Programming with OpenMP Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico October 11, 2012 CPD (DEI / IST) Parallel and Distributed

More information

Computational Mathematics

Computational Mathematics Computational Mathematics Hamid Sarbazi-Azad Department of Computer Engineering Sharif University of Technology e-mail: azad@sharif.edu OpenMP Work-sharing Instructor PanteA Zardoshti Department of Computer

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #15 3/7/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 From last class Outline

More information

EE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California

EE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California EE/CSCI 451 Introduction to Parallel and Distributed Computation Discussion #4 2/3/2017 University of Southern California 1 USC HPCC Access Compile Submit job OpenMP Today s topic What is OpenMP OpenMP

More information

OpenMP. Dr. William McDoniel and Prof. Paolo Bientinesi WS17/18. HPAC, RWTH Aachen

OpenMP. Dr. William McDoniel and Prof. Paolo Bientinesi WS17/18. HPAC, RWTH Aachen OpenMP Dr. William McDoniel and Prof. Paolo Bientinesi HPAC, RWTH Aachen mcdoniel@aices.rwth-aachen.de WS17/18 Loop construct - Clauses #pragma omp for [clause [, clause]...] The following clauses apply:

More information

6.189 IAP Lecture 11. Parallelizing Compilers. Prof. Saman Amarasinghe, MIT IAP 2007 MIT

6.189 IAP Lecture 11. Parallelizing Compilers. Prof. Saman Amarasinghe, MIT IAP 2007 MIT 6.189 IAP 2007 Lecture 11 Parallelizing Compilers 1 6.189 IAP 2007 MIT Outline Parallel Execution Parallelizing Compilers Dependence Analysis Increasing Parallelization Opportunities Generation of Parallel

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms http://sudalab.is.s.u-tokyo.ac.jp/~reiji/pna16/ [ 9 ] Shared Memory Performance Parallel Numerical Algorithms / IST / UTokyo 1 PNA16 Lecture Plan General Topics 1. Architecture

More information

Chip Multiprocessors COMP Lecture 9 - OpenMP & MPI

Chip Multiprocessors COMP Lecture 9 - OpenMP & MPI Chip Multiprocessors COMP35112 Lecture 9 - OpenMP & MPI Graham Riley 14 February 2018 1 Today s Lecture Dividing work to be done in parallel between threads in Java (as you are doing in the labs) is rather

More information

ECE7660 Parallel Computer Architecture. Perspective on Parallel Programming

ECE7660 Parallel Computer Architecture. Perspective on Parallel Programming ECE7660 Parallel Computer Architecture Perspective on Parallel Programming Outline Motivating Problems (application case studies) Process of creating a parallel program What a simple parallel program looks

More information

Parallel Programming. OpenMP Parallel programming for multiprocessors for loops

Parallel Programming. OpenMP Parallel programming for multiprocessors for loops Parallel Programming OpenMP Parallel programming for multiprocessors for loops OpenMP OpenMP An application programming interface (API) for parallel programming on multiprocessors Assumes shared memory

More information

Parallel Programming with OpenMP. CS240A, T. Yang, 2013 Modified from Demmel/Yelick s and Mary Hall s Slides

Parallel Programming with OpenMP. CS240A, T. Yang, 2013 Modified from Demmel/Yelick s and Mary Hall s Slides Parallel Programming with OpenMP CS240A, T. Yang, 203 Modified from Demmel/Yelick s and Mary Hall s Slides Introduction to OpenMP What is OpenMP? Open specification for Multi-Processing Standard API for

More information

CSCE 5160 Parallel Processing. CSCE 5160 Parallel Processing

CSCE 5160 Parallel Processing. CSCE 5160 Parallel Processing HW #9 10., 10.3, 10.7 Due April 17 { } Review Completing Graph Algorithms Maximal Independent Set Johnson s shortest path algorithm using adjacency lists Q= V; for all v in Q l[v] = infinity; l[s] = 0;

More information

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing NTNU, IMF February 16. 2018 1 Recap: Distributed memory programming model Parallelism with MPI. An MPI execution is started

More information

Lecture 7. OpenMP: Reduction, Synchronization, Scheduling & Applications

Lecture 7. OpenMP: Reduction, Synchronization, Scheduling & Applications Lecture 7 OpenMP: Reduction, Synchronization, Scheduling & Applications Announcements Section and Lecture will be switched on Thursday and Friday Thursday: section and Q2 Friday: Lecture 2010 Scott B.

More information

High Performance Computing Lecture 41. Matthew Jacob Indian Institute of Science

High Performance Computing Lecture 41. Matthew Jacob Indian Institute of Science High Performance Computing Lecture 41 Matthew Jacob Indian Institute of Science Example: MPI Pi Calculating Program /Each process initializes, determines the communicator size and its own rank MPI_Init

More information

Performance Issues in Parallelization. Saman Amarasinghe Fall 2010

Performance Issues in Parallelization. Saman Amarasinghe Fall 2010 Performance Issues in Parallelization Saman Amarasinghe Fall 2010 Today s Lecture Performance Issues of Parallelism Cilk provides a robust environment for parallelization It hides many issues and tries

More information

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads... OPENMP PERFORMANCE 2 A common scenario... So I wrote my OpenMP program, and I checked it gave the right answers, so I ran some timing tests, and the speedup was, well, a bit disappointing really. Now what?.

More information

Overview: The OpenMP Programming Model

Overview: The OpenMP Programming Model Overview: The OpenMP Programming Model motivation and overview the parallel directive: clauses, equivalent pthread code, examples the for directive and scheduling of loop iterations Pi example in OpenMP

More information

OpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means

OpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means High Performance Computing: Concepts, Methods & Means OpenMP Programming Prof. Thomas Sterling Department of Computer Science Louisiana State University February 8 th, 2007 Topics Introduction Overview

More information

CS 470 Spring Mike Lam, Professor. Advanced OpenMP

CS 470 Spring Mike Lam, Professor. Advanced OpenMP CS 470 Spring 2017 Mike Lam, Professor Advanced OpenMP Atomics OpenMP provides access to highly-efficient hardware synchronization mechanisms Use the atomic pragma to annotate a single statement Statement

More information

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical

More information

Parallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming University of Evansville

Parallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming University of Evansville Parallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming of Evansville Selection of slides from CIS 410/510 Introduction to Parallel Computing Department of Computer and Information

More information

Parallel Computing. Prof. Marco Bertini

Parallel Computing. Prof. Marco Bertini Parallel Computing Prof. Marco Bertini Shared memory: OpenMP Implicit threads: motivations Implicit threading frameworks and libraries take care of much of the minutiae needed to create, manage, and (to

More information

Shared Memory programming paradigm: openmp

Shared Memory programming paradigm: openmp IPM School of Physics Workshop on High Performance Computing - HPC08 Shared Memory programming paradigm: openmp Luca Heltai Stefano Cozzini SISSA - Democritos/INFM

More information

UvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP

UvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP Parallel Programming with Compiler Directives OpenMP Clemens Grelck University of Amsterdam UvA-SARA High Performance Computing Course June 2013 OpenMP at a Glance Loop Parallelization Scheduling Parallel

More information

OpenMP Tutorial. Seung-Jai Min. School of Electrical and Computer Engineering Purdue University, West Lafayette, IN

OpenMP Tutorial. Seung-Jai Min. School of Electrical and Computer Engineering Purdue University, West Lafayette, IN OpenMP Tutorial Seung-Jai Min (smin@purdue.edu) School of Electrical and Computer Engineering Purdue University, West Lafayette, IN 1 Parallel Programming Standards Thread Libraries - Win32 API / Posix

More information

Jukka Julku Multicore programming: Low-level libraries. Outline. Processes and threads TBB MPI UPC. Examples

Jukka Julku Multicore programming: Low-level libraries. Outline. Processes and threads TBB MPI UPC. Examples Multicore Jukka Julku 19.2.2009 1 2 3 4 5 6 Disclaimer There are several low-level, languages and directive based approaches But no silver bullets This presentation only covers some examples of them is

More information

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming Outline OpenMP Shared-memory model Parallel for loops Declaring private variables Critical sections Reductions

More information

1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008

1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008 1 of 6 Lecture 7: March 4 CISC 879 Software Support for Multicore Architectures Spring 2008 Lecture 7: March 4, 2008 Lecturer: Lori Pollock Scribe: Navreet Virk Open MP Programming Topics covered 1. Introduction

More information

Parallel Programming using OpenMP

Parallel Programming using OpenMP 1 OpenMP Multithreaded Programming 2 Parallel Programming using OpenMP OpenMP stands for Open Multi-Processing OpenMP is a multi-vendor (see next page) standard to perform shared-memory multithreading

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP Lecture 4: Work sharing directives Work sharing directives Directives which appear inside a parallel region and indicate how work should be shared out between threads Parallel do/for

More information

Parallel Programming using OpenMP

Parallel Programming using OpenMP 1 Parallel Programming using OpenMP Mike Bailey mjb@cs.oregonstate.edu openmp.pptx OpenMP Multithreaded Programming 2 OpenMP stands for Open Multi-Processing OpenMP is a multi-vendor (see next page) standard

More information

OpenMP on Ranger and Stampede (with Labs)

OpenMP on Ranger and Stampede (with Labs) OpenMP on Ranger and Stampede (with Labs) Steve Lantz Senior Research Associate Cornell CAC Parallel Computing at TACC: Ranger to Stampede Transition November 6, 2012 Based on materials developed by Kent

More information

Allows program to be incrementally parallelized

Allows program to be incrementally parallelized Basic OpenMP What is OpenMP An open standard for shared memory programming in C/C+ + and Fortran supported by Intel, Gnu, Microsoft, Apple, IBM, HP and others Compiler directives and library support OpenMP

More information

A brief introduction to OpenMP

A brief introduction to OpenMP A brief introduction to OpenMP Alejandro Duran Barcelona Supercomputing Center Outline 1 Introduction 2 Writing OpenMP programs 3 Data-sharing attributes 4 Synchronization 5 Worksharings 6 Task parallelism

More information

COMP/CS 605: Introduction to Parallel Computing Topic: Parallel Computing Overview/Introduction

COMP/CS 605: Introduction to Parallel Computing Topic: Parallel Computing Overview/Introduction COMP/CS 605: Introduction to Parallel Computing Topic: Parallel Computing Overview/Introduction Mary Thomas Department of Computer Science Computational Science Research Center (CSRC) San Diego State University

More information

Multi-core Architecture and Programming

Multi-core Architecture and Programming Multi-core Architecture and Programming Yang Quansheng( 杨全胜 ) http://www.njyangqs.com School of Computer Science & Engineering 1 http://www.njyangqs.com Programming with OpenMP Content What is PpenMP Parallel

More information

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP COMP4510 Introduction to Parallel Computation Shared Memory and OpenMP Thanks to Jon Aronsson (UofM HPC consultant) for some of the material in these notes. Outline (cont d) Shared Memory and OpenMP Including

More information

Lecture 14. Performance Programming with OpenMP

Lecture 14. Performance Programming with OpenMP Lecture 14 Performance Programming with OpenMP Sluggish front end nodes? Advice from NCSA Announcements Login nodes are honest[1-4].ncsa.uiuc.edu Login to specific node to target one that's not overloaded

More information

Shared Memory Programming with OpenMP

Shared Memory Programming with OpenMP Shared Memory Programming with OpenMP (An UHeM Training) Süha Tuna Informatics Institute, Istanbul Technical University February 12th, 2016 2 Outline - I Shared Memory Systems Threaded Programming Model

More information

Computer Architecture

Computer Architecture Jens Teubner Computer Architecture Summer 2016 1 Computer Architecture Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Summer 2016 Jens Teubner Computer Architecture Summer 2016 2 Part I Programming

More information

OpenMP: Open Multiprocessing

OpenMP: Open Multiprocessing OpenMP: Open Multiprocessing Erik Schnetter June 7, 2012, IHPC 2012, Iowa City Outline 1. Basic concepts, hardware architectures 2. OpenMP Programming 3. How to parallelise an existing code 4. Advanced

More information

CSL 860: Modern Parallel

CSL 860: Modern Parallel CSL 860: Modern Parallel Computation Hello OpenMP #pragma omp parallel { // I am now thread iof n switch(omp_get_thread_num()) { case 0 : blah1.. case 1: blah2.. // Back to normal Parallel Construct Extremely

More information

Chap. 6 Part 3. CIS*3090 Fall Fall 2016 CIS*3090 Parallel Programming 1

Chap. 6 Part 3. CIS*3090 Fall Fall 2016 CIS*3090 Parallel Programming 1 Chap. 6 Part 3 CIS*3090 Fall 2016 Fall 2016 CIS*3090 Parallel Programming 1 OpenMP popular for decade Compiler-based technique Start with plain old C, C++, or Fortran Insert #pragmas into source file You

More information

CS4961 Parallel Programming. Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms 9/5/12. Administrative. Mary Hall September 4, 2012

CS4961 Parallel Programming. Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms 9/5/12. Administrative. Mary Hall September 4, 2012 CS4961 Parallel Programming Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms Administrative Mailing list set up, everyone should be on it - You should have received a test mail last night

More information

Introduction to OpenMP. Lecture 4: Work sharing directives

Introduction to OpenMP. Lecture 4: Work sharing directives Introduction to OpenMP Lecture 4: Work sharing directives Work sharing directives Directives which appear inside a parallel region and indicate how work should be shared out between threads Parallel do/for

More information

Martin Kruliš, v

Martin Kruliš, v Martin Kruliš 1 Optimizations in General Code And Compilation Memory Considerations Parallelism Profiling And Optimization Examples 2 Premature optimization is the root of all evil. -- D. Knuth Our goal

More information

SC12 HPC Educators session: Unveiling parallelization strategies at undergraduate level

SC12 HPC Educators session: Unveiling parallelization strategies at undergraduate level SC12 HPC Educators session: Unveiling parallelization strategies at undergraduate level E. Ayguadé, R. M. Badia, D. Jiménez, J. Labarta and V. Subotic August 31, 2012 Index Index 1 1 The infrastructure:

More information

Exploring Parallelism At Different Levels

Exploring Parallelism At Different Levels Exploring Parallelism At Different Levels Balanced composition and customization of optimizations 7/9/2014 DragonStar 2014 - Qing Yi 1 Exploring Parallelism Focus on Parallelism at different granularities

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP Ekpe Okorafor School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014 A little about me! PhD Computer Engineering Texas A&M University Computer Science

More information

Alexey Syschikov Researcher Parallel programming Lab Bolshaya Morskaya, No St. Petersburg, Russia

Alexey Syschikov Researcher Parallel programming Lab Bolshaya Morskaya, No St. Petersburg, Russia St. Petersburg State University of Aerospace Instrumentation Institute of High-Performance Computer and Network Technologies Data allocation for parallel processing in distributed computing systems Alexey

More information

Towards OpenMP for Java

Towards OpenMP for Java Towards OpenMP for Java Mark Bull and Martin Westhead EPCC, University of Edinburgh, UK Mark Kambites Dept. of Mathematics, University of York, UK Jan Obdrzalek Masaryk University, Brno, Czech Rebublic

More information

Review. Lecture 12 5/22/2012. Compiler Directives. Library Functions Environment Variables. Compiler directives for construct, collapse clause

Review. Lecture 12 5/22/2012. Compiler Directives. Library Functions Environment Variables. Compiler directives for construct, collapse clause Review Lecture 12 Compiler Directives Conditional compilation Parallel construct Work-sharing constructs for, section, single Synchronization Work-tasking Library Functions Environment Variables 1 2 13b.cpp

More information

Shared Memory Programming. Parallel Programming Overview

Shared Memory Programming. Parallel Programming Overview Shared Memory Programming Arvind Krishnamurthy Fall 2004 Parallel Programming Overview Basic parallel programming problems: 1. Creating parallelism & managing parallelism Scheduling to guarantee parallelism

More information

Shared Memory Programming Models I

Shared Memory Programming Models I Shared Memory Programming Models I Peter Bastian / Stefan Lang Interdisciplinary Center for Scientific Computing (IWR) University of Heidelberg INF 368, Room 532 D-69120 Heidelberg phone: 06221/54-8264

More information

OpenMP 4.0. Mark Bull, EPCC

OpenMP 4.0. Mark Bull, EPCC OpenMP 4.0 Mark Bull, EPCC OpenMP 4.0 Version 4.0 was released in July 2013 Now available in most production version compilers support for device offloading not in all compilers, and not for all devices!

More information

DPHPC: Introduction to OpenMP Recitation session

DPHPC: Introduction to OpenMP Recitation session SALVATORE DI GIROLAMO DPHPC: Introduction to OpenMP Recitation session Based on http://openmp.org/mp-documents/intro_to_openmp_mattson.pdf OpenMP An Introduction What is it? A set

More information

Parallel Programming

Parallel Programming Parallel Programming 7. Data Parallelism Christoph von Praun praun@acm.org 07-1 (1) Parallel algorithm structure design space Organization by Data (1.1) Geometric Decomposition Organization by Tasks (1.3)

More information

Parallel Computing Parallel Programming Languages Hwansoo Han

Parallel Computing Parallel Programming Languages Hwansoo Han Parallel Computing Parallel Programming Languages Hwansoo Han Parallel Programming Practice Current Start with a parallel algorithm Implement, keeping in mind Data races Synchronization Threading syntax

More information

[Potentially] Your first parallel application

[Potentially] Your first parallel application [Potentially] Your first parallel application Compute the smallest element in an array as fast as possible small = array[0]; for( i = 0; i < N; i++) if( array[i] < small ) ) small = array[i] 64-bit Intel

More information

High Performance Computing. Introduction to Parallel Computing

High Performance Computing. Introduction to Parallel Computing High Performance Computing Introduction to Parallel Computing Acknowledgements Content of the following presentation is borrowed from The Lawrence Livermore National Laboratory https://hpc.llnl.gov/training/tutorials

More information

A Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh

A Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh A Short Introduction to OpenMP Mark Bull, EPCC, University of Edinburgh Overview Shared memory systems Basic Concepts in Threaded Programming Basics of OpenMP Parallel regions Parallel loops 2 Shared memory

More information

Chapel Introduction and

Chapel Introduction and Lecture 24 Chapel Introduction and Overview of X10 and Fortress John Cavazos Dept of Computer & Information Sciences University of Delaware www.cis.udel.edu/~cavazos/cisc879 But before that Created a simple

More information

Shared Memory Parallelism - OpenMP

Shared Memory Parallelism - OpenMP Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (http://www.llnl.gov/computing/tutorials/openmp/#introduction) OpenMP sc99 tutorial

More information

Comparing OpenACC 2.5 and OpenMP 4.1 James C Beyer PhD, Sept 29 th 2015

Comparing OpenACC 2.5 and OpenMP 4.1 James C Beyer PhD, Sept 29 th 2015 Comparing OpenACC 2.5 and OpenMP 4.1 James C Beyer PhD, Sept 29 th 2015 Abstract As both an OpenMP and OpenACC insider I will present my opinion of the current status of these two directive sets for programming

More information

Multiprocessors at Earth This lecture: How do we program such computers?

Multiprocessors at Earth This lecture: How do we program such computers? Multiprocessors at IT/UPPMAX DARK 2 Programming of Multiprocessors Sverker Holmgren IT/Division of Scientific Computing Ra. 280 proc. Cluster with SM nodes Ngorongoro. 64 proc. SM (cc-numa) Debet&Kredit.

More information

Introduction to parallel computing. Seminar Organization

Introduction to parallel computing. Seminar Organization Introduction to parallel computing Rami Melhem Department of Computer Science 1 Seminar Organization 1) Introductory lectures (probably 4) 2) aper presentations by students (2/3 per short/long class) -

More information

EE382N (20): Computer Architecture - Parallelism and Locality Lecture 13 Parallelism in Software IV

EE382N (20): Computer Architecture - Parallelism and Locality Lecture 13 Parallelism in Software IV EE382 (20): Computer Architecture - Parallelism and Locality Lecture 13 Parallelism in Software IV Mattan Erez The University of Texas at Austin EE382: Parallelilsm and Locality (c) Rodric Rabbah, Mattan

More information

CME 213 S PRING Eric Darve

CME 213 S PRING Eric Darve CME 213 S PRING 2017 Eric Darve PTHREADS pthread_create, pthread_exit, pthread_join Mutex: locked/unlocked; used to protect access to shared variables (read/write) Condition variables: used to allow threads

More information

Topics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP

Topics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP Topics Lecture 11 Introduction OpenMP Some Examples Library functions Environment variables 1 2 Introduction Shared Memory Parallelization OpenMP is: a standard for parallel programming in C, C++, and

More information

OpenMP: Open Multiprocessing

OpenMP: Open Multiprocessing OpenMP: Open Multiprocessing Erik Schnetter May 20-22, 2013, IHPC 2013, Iowa City 2,500 BC: Military Invents Parallelism Outline 1. Basic concepts, hardware architectures 2. OpenMP Programming 3. How to

More information

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads... OPENMP PERFORMANCE 2 A common scenario... So I wrote my OpenMP program, and I checked it gave the right answers, so I ran some timing tests, and the speedup was, well, a bit disappointing really. Now what?.

More information

Lecture 14: Mixed MPI-OpenMP programming. Lecture 14: Mixed MPI-OpenMP programming p. 1

Lecture 14: Mixed MPI-OpenMP programming. Lecture 14: Mixed MPI-OpenMP programming p. 1 Lecture 14: Mixed MPI-OpenMP programming Lecture 14: Mixed MPI-OpenMP programming p. 1 Overview Motivations for mixed MPI-OpenMP programming Advantages and disadvantages The example of the Jacobi method

More information

Module 4: Parallel Programming: Shared Memory and Message Passing Lecture 7: Examples of Shared Memory and Message Passing Programming

Module 4: Parallel Programming: Shared Memory and Message Passing Lecture 7: Examples of Shared Memory and Message Passing Programming The Lecture Contains: Shared Memory Version Mutual Exclusion LOCK Optimization More Synchronization Message Passing Major Changes MPI-like Environment file:///d /...haudhary,%20dr.%20sanjeev%20k%20aggrwal%20&%20dr.%20rajat%20moona/multi-core_architecture/lecture7/7_1.htm[6/14/2012

More information

!OMP #pragma opm _OPENMP

!OMP #pragma opm _OPENMP Advanced OpenMP Lecture 12: Tips, tricks and gotchas Directives Mistyping the sentinel (e.g.!omp or #pragma opm ) typically raises no error message. Be careful! The macro _OPENMP is defined if code is

More information