Fundamentals of OmpSs
|
|
- Rosalind O’Brien’
- 5 years ago
- Views:
Transcription
1 Fundamentals of OmpSs Tasks and Dependences Xavier Teruel New York, June 2013
2 AGENDA: Fundamentals of OmpSs Tasking and Synchronization Data Sharing Attributes Dependence Model Other Tasking Directives Clauses Taskwait Synchronization Outlined Task Syntax Memory Regions, Nesting and Dependences Memory regions and dependences Nested tasks and dependences Using dependence sentinels Paralllel Programming Methodology 2
3 TASKING AND SYNCHRONIZATION
4 Task Directive Task description Computation unit. Amount of work (granularity) may vary in a wide range (μsecs to msecs or even seconds), may depend on input arguments, Once started can execute to completion independent of other tasks Can be declared inlined or outlined Creating tasks: task directive #pragma omp task [ default( ) ] [ shared( ) ] [ private( ) ] [ firstprivate( ) ] [ in(...) ] [ out(...) ] [ inout (...) ] [ concurrent (...) ] [ commutative ( ) ] [ priority( ) ] [ label( ) ] [ if( ) ] [ final( ) ] [untied] { structured-block
5 Data Sharing Attributes Task directive (and data sharing clauses) #pragma omp task [ default( ) ] [ shared( ) ] [ private( ) ] [ firstprivate( ) ] { structured-block Data Sharing Attributes default( private firstprivate shared none ) shared(var-list): shares the original symbol private(var-list): privatizes the original symbol firstprivate(var-list): privatizes symbol and capture value
6 Data Sharing Attributes (example) Vector initialization using a task int main( int argc, char *argv[] ) { int i, N = 100; int X[100]; #pragma omp task private(i) shared(x) \\ firstprivate(n) out(x) for (i=0; i < N; i++) X[i] = i ; #pragma omp taskwait on(x) for (i=0; i < size; i++) printf( (%d),x[i]);
7 Dependences: in( ), out( ), inout( ) ~ 1 Task directive (and dependence clauses) #pragma omp task [ in(...) ] [ out(...) ] [ inout (...) ] { structured-block Data dependences in(var-list): task will read the contents of the variables out(var-list): task will write into the variable s contents inout(var-list): task will read first and then write #pragma omp task out(x) x = 5; #pragma omp task in(x) printf("%d \n", x ) ; #pragma omp task inout(x) x = x + 1; antidependence printf(x) x=5 x=x+1
8 Dependences: in( ), out( ), inout( ) ~ 2 Tasks and non taskified code Before task directives: it is already executed no problem After task directives: synchronization required (task / taskwait on) int foo () { int x = 0; This code is already executed before creating other tasks x=5 #pragma omp task out(x) x = 5; printf(x) #pragma omp task in(x) printf( first : %d\n", x ) ; #pragma omp task inout(x) antidependence x=x+1 x = x + 1; #pragma omp task in(x) printf( second : %d\n", x ) ; Synchronization with previous tasks needed. x = {0 5 6? printf(x)
9 Dependences: in( ), out( ), inout( ) ~ 3 Removing user s program (anti)-dependences User can also rename some variables to break dependences Only possible with WaR and WaW dependences int foo () { int x = 0; #pragma omp task out(x) x = 5; #pragma omp task in(x) printf( first : %d\n", x ) ; #pragma omp task inout(x) in(x) out(y) x y = x + 1; #pragma omp task in(y) in(x) printf( second : %d\n", x y ) ; antidependence printf(x) x=5 x=x+1 printf(x) Rename x to y
10 Dependences: concurrent ~ 1 Task directive (and concurrent clause) #pragma omp task [ concurrent(...) ] { structured-block concurrent(var-list): task can run in parallel with other concurrent s Usually employed as concurrent updates (inout) on var-list items Following dependences on var-item will wait for all concurrent tasks Task may require additional synchronization (atomic, mutex, )
11 Dependences: concurrent ~ 2 Concurrent example void foo() { int local, total=0 ; for (int j=0; j<n; j+=bs) { #pragma omp task private(local) \\ in(vec[j;bs]) concurrent(total) { local = 0 ; for (int i=0; i<bs; i++) local += vec[i] ; #pragma omp atomic total += local ; #pragma omp task in(total) printf ( TOTAL is %d\n, total) ; N=12 BS= Task #1 Task #2 Task #3 printf
12 Dependences: commutative ~ 1 Task directive (and commutative clause) #pragma omp task [ commutative(...) ] { structured-block commutative(var-list): tasks can run in any order (but not in parallel) Usually employed as non-ordered updates (inout) on var-list items Following dependences on var-item will wait for all commutative tasks Task DO NOT require additional synchronization (atomic, mutex, )
13 Dependences: commutative ~ 2 Commutative example void foo() { int local, total=0 ; for (int j=0; j<n; j+=bs) { #pragma omp task private(local) \\ in(vec[j;bs]) commutative(total) { local = 0 ; for (int i=0; i<bs; i++) local += vec[i] ; total += local ; #pragma omp task in(total) printf ( TOTAL is %d\n, total) ; N=12 BS= Task #a Task #1 Task #b Task #c Task #2 Task #3 Task #d printf
14 Task Directive: task priorities Task Directive (and priority clause) #pragma omp task [ priority( ) ] { structured-block priority(value): the higher the best All ready tasks are inserted in an ordered ready queue Once a thread becomes idle gets one of the highest priority tasks User must choose a priority scheduler (in order to honor them) 16
15 Task Directive: labeling tasks Task Directive (and label clause) #pragma omp task [ label( ) ] { structured-block label(identifier): Used for instrumentation purposes Tasks will be named by this identifier instead of compiler given name 17
16 Task Directive: if and final clauses Task Directive (and if clause) #pragma omp task [ if( ) ] { structured-block if(expr): If expression evaluates to false, do not create the task Throttle (cutoff) mechanism to control task creation Task code will be executed immediately (not deferred) Task Directive (and final clause) #pragma omp task [ final( ) ] { structured-block final(expr): If expression evaluates to false, do not create more tasks in the enclosed task context 18
17 Task Directive: if and final clauses (example) Creating (or not) a task using if clause void foo(int *size, int **vector, int N) { for (int i=0; i<n; i++) { #pragma omp task if ( size[i] > MIN_SIZE ) compute(vector[i], size[i]) ; * {4,23,84 * {54,78,,16 * * * {83,12 {34,32,92,64 {24,29,,27 Using final tasks as cutoff mechanism to control granularity void fibonacci ( int n ) { if ( n < 2 ) return n ; #pragma omp task final ( n < CUTOFF ) int x = fibonacci ( n 1 ) ; #pragma omp task final ( n < CUTOFF ) int y = fibonacci ( n 2 ) ; return x + y ;
18 Task Directive: untied Task Directive (and untied clause) #pragma omp task [ final( ) ] { structured-block Task-Pool untied: Task will not be tied to any thread Tied tasks: (default) once the task is executed (for the first time) in one thread it will be tied to this thread, and (if suspended) will be only resumed in this thread. Untied tasks: can be suspended and resumed in any thread (thread switch) Global Thread Team 20
19 Taskwait Synchronization ~ 1 Taskwait directive Suspends the current task until children / dependences are completed #pragma omp taskwait [ on( ) ] Taskwait with no clause (wait for all created/children tasks) void traverse_queue ( Queue q ) { Element e ; for ( e = q->first; e ; e = e->next ) { #pragma omp task process ( e ) ; #pragma omp taskwait return; process return process process process process Without taskwait the subroutine will return immediately after spawning the tasks allowing the calling function to continue spawning tasks
20 TODO: Taskwait Synchronization ~ 2a Taskwait on data dependences on(var-list): list of variables to wait-on before proceed int foo () { int x = 0 ; #pragma omp task out(x) x = 5 ; #pragma omp task in(x) printf( first : %d\n", x ) ; #pragma omp task in(x) out(y) y = x + 1 ; #pragma omp task in(y) printf( second : %d\n", y ) ; #pragma omp taskwait return ; We don t want to create a task here int x = 0 return printf(x) printf(x) x=5 y=x+1
21 TODO: Taskwait Synchronization ~ 2b Taskwait on data dependences on(var-list): list of variables to wait-on before proceed int foo () { int x = 0 ; #pragma omp task out(x) x = 5 ; #pragma omp task in(x) printf( first : %d\n", x ) ; #pragma omp taskwait on(x) y = x + 1 ; #pragma omp task in(y) printf( second : %d\n", y ) ; #pragma omp taskwait return ; int x = 0 printf(x) y = x + 1 return printf(x) x=5
22 Task Directive in Functions Task directive attached to a function definition/declaration #pragma omp task [ default( ) ] [ shared( ) ] [ private( ) ] [ firstprivate( ) ] [ in(...) ] [ out(...) ] [ inout (...) ] [ concurrent (...) ] [ commutative ( ) ] [ priority( ) ] [ label( ) ] [ if( ) ] [ final( ) ] [untied] { function-definition or { function-declaration All Function invocations become a task Data sharing attributes not needed anymore Scalar parameters firstprivate equivalent Memory pointers parameters shared equivalent Function local variables private equivalent Remaining clauses keep its meaning
23 Task Directive in Functions (example) Outline code s tasks Private Locals variables Shared Pointers, arrays Firstprivate Arguments int main( int argc, char *argv[] ) { int i, N = 100; int X[100]; #pragma omp task private(i) shared(x) \\ firstprivate(n) out(x) for (i=0; i < N; i++) X[i] = i ; Function s tasks #pragma omp task out(y) void foo ( int Y[], int size ) { int j; for (j=0; j < size; j++) Y[j] = j; int main( int argc, char *argv[] ) { int i, N = 100; int X[100]; foo (X, N) ; #pragma omp taskwait on(x) for (i=0; i < size; i++) printf( (%d),x[i]); #pragma omp taskwait on(x) for (i=0; i < size; i++) printf( (%d),x[i]);
24 MEMORY REGIONS, NESTING AND DEPENDENCES
25 Region Dependences: in( ), out( ), inout( ) ~ 1 Task directive (and region dependence clauses) #pragma omp task [ in(...) ] [ out(...) ] [ inout (...) ] { structured-block Data dependences {in out inout(region-list): task will read/write the specified region int A[8][8] ; #pragma omp task out(a[0;4][0;3]) m_init_1(a,0,0,3,2, 8 ) ; #pragma omp task out(a[2;5][4;3]) m_init_2(a,2,4,6,6, 8) ; #pragma omp task in(a[5;2][1;4]) m_print(a,5,1,6,4, 8) ; #pragma omp taskwait return ; init_1 return init_2 print
26 Region Dependences: in( ), out( ), inout( ) ~ 2 Indicating 1-dim matrix region Single element region in/out(a[i]) Argument is the element A[i] First element and size in/out(a[i;size]) A[i;size] is a block of size elements starting in element A[i] First element can be omitted (default is 0) Size can be omitted if it is known Range of elements in/out(a[i:k]) A[i:k] is a block from element A[i] to element A[k] (both included) Lower bound can be omitted (default is 0) Upper bound can be omitted (default is Size-1) Multiple ways to specify the same region Clause in/out(a[i:i+bs-1]) is equivalent to in/out(a[i;bs])
27 Region Dependences: in( ), out( ), inout( ) ~ 3 Examples using 1-dim regions (whole array) int A[N]; #pragma omp task in(a) // whole array used to compute dependences int A[N]; #pragma omp task in(a[0:n-1]) // whole array used to compute dependences int A[N]; #pragma omp task in(a[0;n]) // whole array used to compute dependences Examples using 1-dim regions (array section) int A[N]; #pragma omp task in(a[0:3]) // first 4 elements of the array used to compute dependences int a[n]; #pragma omp task in(a[0;4]) // first 4 elements of the array used to compute dependences
28 Region Dependences: in( ), out( ), inout( ) ~ 4 Indicating n-dim matrix regions Single element region in/out(a[i][j] ) First element and size in/out(a[i;s1][j;s2] ) A[i;s1][j;s2] is a 2-dim block of s1xs2 elements starting in element A[i][j] Range of elements in/out(a[i:k][j:l] ) A[i:k][j:l] is a 2-dim block from element A[i][j] to element A[k][l] (whole area included) Examples using n-dim regions int A[N][M]; #pragma omp task in(a[2:4][3:6]) // 3 x 4 block of A starting at A[2][3] int A[N][M]; #pragma omp task in(a[2;3][3;4]) // 3 x 4 block of A starting at A[2][3] 0 1 i k j 4 5 l 7 3 s1 s2
29 Nested Tasks and Dependences Nested Tasks Tasks creating tasks themselves Hierarchical Dependence Task Graph An specific dependence domain per task Several dependence task graphs at the same time Task creation dependences (in, out, inout) registered in parent domain Task synchronization (taskwait on) registered in the current domain Parent task probably need to wait for its children taskwait Different level tasks share the same resources When ready, queued into the same ready queues No priority differences between parent tasks and its children
30 Nested Tasks and Dependences (example) Using nested tasks: Matrix Multiply #pragma omp task in([bs][bs]a, [BS][BS]B) inout([bs][bs]c) void block_dgemm (float *A, float *B, float *C); #pragma omp task in([n]a, [N]B) inout([n]c) void dgemm ( float (*A)[N], float (*B)[N], float (*C)[N] ) { for (int i=0; i< N; i+=bs) for (int j=0; j< N; j+=bs) return for (int k=0; k< N; k+=bs) block_dgemm( &A[i][k*BS], &B[k][j*BS], &C[i][j*BS] ); #pragma omp taskwait return; main() { dgemm(a,b,c); dgemm(d,e,f); dgemm(c,f,g); #pragma omp taskwait return; return C F return G return
31 Dependence Sentinels Mechanism to handle complex dependences when difficult to specify proper in/out clauses when trying to avoid using memory regions (larger data-structures) when want to specify artificial dependences (introduced by users) To be avoided if possible! #pragma omp task out(*sentinel) void foo (..., int *sentinel); #pragma omp task in(*sentinel) void bar (..., int *sentinel); foo void main () { int dummy; foo (..., &dummy); bar (..., &dummy) bar 33
32 Parallel Programming Methodology Correctness in sequential program (starting point) Detecting program hotspots (profile) targets Gradually increment taskification (concurrence level) Not many changes at the same time Test every included task with forced single thread in-order execution Use extra taskwaits to force certain levels of serialization Gradually increase execution complexity Single thread in-order execution: --schedule-queue=fifo Single thread out-of-order execution --schedule-queue=lifo Increment number of threads (parallelism) Test performance 34
33 Hands-on Exercises: Machine Description Minotauro (system overview) 2 Intel E5649 6C at 2.53 GHz 2 M2090 NVIDIA GPU Cards 24 GB of Main memory Peak Performance: TFlops 250 GB SSD as local storage 2 Infiniband QDR (40 Gbit each) Top 500 Ranking (11/2012): 35
34 Hands-on Exercises: Methodology Exercises in: ompss-exercises-mt.tar.gz 02-basics_on_ompss dot_product complete dependence clauses, synchronization multisort debuggin errors on task execution matmul parallelize kernel partition = debug reservation = summer-school (at job script: run-x.sh) Paraver configuration files Organized in directories tasks: task related information runtime: runtime internals events scheduling: task-graph and scheduling cuda: specific to CUDA implementations (GPU devices) data-transfer: specific for data send and receive operations mpi-ompss: specific to MPI + OmpSs hybrid parallelization 36
35 Thank you! For further information please contact 37
OpenMP Tasking Model Unstructured parallelism
www.bsc.es OpenMP Tasking Model Unstructured parallelism Xavier Teruel and Xavier Martorell What is a task in OpenMP? Tasks are work units whose execution may be deferred or it can be executed immediately!!!
More informationMake the Most of OpenMP Tasking. Sergi Mateo Bellido Compiler engineer
Make the Most of OpenMP Tasking Sergi Mateo Bellido Compiler engineer 14/11/2017 Outline Intro Data-sharing clauses Cutoff clauses Scheduling clauses 2 Intro: what s a task? A task is a piece of code &
More informationHPCSE - II. «OpenMP Programming Model - Tasks» Panos Hadjidoukas
HPCSE - II «OpenMP Programming Model - Tasks» Panos Hadjidoukas 1 Recap of OpenMP nested loop parallelism functional parallelism OpenMP tasking model how to use how it works examples Outline Nested Loop
More informationOmpSs Specification. BSC Programming Models
OmpSs Specification BSC Programming Models March 30, 2017 CONTENTS 1 Introduction to OmpSs 3 1.1 Reference implementation........................................ 3 1.2 A bit of history..............................................
More informationPOSIX Threads and OpenMP tasks
POSIX Threads and OpenMP tasks Jimmy Aguilar Mena February 16, 2018 Introduction Pthreads Tasks Two simple schemas Independent functions # include # include void f u n c t i
More informationOmpSs Fundamentals. ISC 2017: OpenSuCo. Xavier Teruel
OmpSs Fundamentals ISC 2017: OpenSuCo Xavier Teruel Outline OmpSs brief introduction OmpSs overview and influence in OpenMP Execution model and parallelization approaches Memory model and target copies
More informationModule 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program
The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program Amdahl's Law About Data What is Data Race? Overview to OpenMP Components of OpenMP OpenMP Programming Model OpenMP Directives
More informationTutorial OmpSs: Overlapping communication and computation
www.bsc.es Tutorial OmpSs: Overlapping communication and computation PATC course Parallel Programming Workshop Rosa M Badia, Xavier Martorell PATC 2013, 18 October 2013 Tutorial OmpSs Agenda 10:00 11:00
More informationOpenMP 4.0/4.5: New Features and Protocols. Jemmy Hu
OpenMP 4.0/4.5: New Features and Protocols Jemmy Hu SHARCNET HPC Consultant University of Waterloo May 10, 2017 General Interest Seminar Outline OpenMP overview Task constructs in OpenMP SIMP constructs
More informationOmpSs + OpenACC Multi-target Task-Based Programming Model Exploiting OpenACC GPU Kernel
www.bsc.es OmpSs + OpenACC Multi-target Task-Based Programming Model Exploiting OpenACC GPU Kernel Guray Ozen guray.ozen@bsc.es Exascale in BSC Marenostrum 4 (13.7 Petaflops ) General purpose cluster (3400
More informationSession 4: Parallel Programming with OpenMP
Session 4: Parallel Programming with OpenMP Xavier Martorell Barcelona Supercomputing Center Agenda Agenda 10:00-11:00 OpenMP fundamentals, parallel regions 11:00-11:30 Worksharing constructs 11:30-12:00
More informationParallel algorithm templates. Threads, tasks and parallel patterns Programming with. From parallel algorithms templates to tasks
COMP528 Task-based programming in OpenMP www.csc.liv.ac.uk/~alexei/comp528 Alexei Lisitsa Dept of Computer Science University of Liverpool a.lisitsa@.liverpool.ac.uk Parallel algorithm templates We have
More informationOpenMP Examples - Tasking
Dipartimento di Ingegneria Industriale e dell Informazione University of Pavia December 4, 2017 Outline 1 2 Assignment 2: Quicksort Assignment 3: Jacobi Outline 1 2 Assignment 2: Quicksort Assignment 3:
More informationOverview: The OpenMP Programming Model
Overview: The OpenMP Programming Model motivation and overview the parallel directive: clauses, equivalent pthread code, examples the for directive and scheduling of loop iterations Pi example in OpenMP
More informationOpen Multi-Processing: Basic Course
HPC2N, UmeåUniversity, 901 87, Sweden. May 26, 2015 Table of contents Overview of Paralellism 1 Overview of Paralellism Parallelism Importance Partitioning Data Distributed Memory Working on Abisko 2 Pragmas/Sentinels
More informationA brief introduction to OpenMP
A brief introduction to OpenMP Alejandro Duran Barcelona Supercomputing Center Outline 1 Introduction 2 Writing OpenMP programs 3 Data-sharing attributes 4 Synchronization 5 Worksharings 6 Task parallelism
More informationParallel Programming with OpenMP. CS240A, T. Yang
Parallel Programming with OpenMP CS240A, T. Yang 1 A Programmer s View of OpenMP What is OpenMP? Open specification for Multi-Processing Standard API for defining multi-threaded shared-memory programs
More informationIntroduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines
Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines What is OpenMP? What does OpenMP stands for? What does OpenMP stands for? Open specifications for Multi
More informationMPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016
MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 Message passing vs. Shared memory Client Client Client Client send(msg) recv(msg) send(msg) recv(msg) MSG MSG MSG IPC Shared
More informationProgramming Shared-memory Platforms with OpenMP. Xu Liu
Programming Shared-memory Platforms with OpenMP Xu Liu Introduction to OpenMP OpenMP directives concurrency directives parallel regions loops, sections, tasks Topics for Today synchronization directives
More informationReview. Lecture 12 5/22/2012. Compiler Directives. Library Functions Environment Variables. Compiler directives for construct, collapse clause
Review Lecture 12 Compiler Directives Conditional compilation Parallel construct Work-sharing constructs for, section, single Synchronization Work-tasking Library Functions Environment Variables 1 2 13b.cpp
More informationOpenMP 4.5: Threading, vectorization & offloading
OpenMP 4.5: Threading, vectorization & offloading Michal Merta michal.merta@vsb.cz 2nd of March 2018 Agenda Introduction The Basics OpenMP Tasks Vectorization with OpenMP 4.x Offloading to Accelerators
More informationOpenMP examples. Sergeev Efim. Singularis Lab, Ltd. Senior software engineer
OpenMP examples Sergeev Efim Senior software engineer Singularis Lab, Ltd. OpenMP Is: An Application Program Interface (API) that may be used to explicitly direct multi-threaded, shared memory parallelism.
More informationTasking and OpenMP Success Stories
Tasking and OpenMP Success Stories Christian Terboven 23.03.2011 / Aachen, Germany Stand: 21.03.2011 Version 2.3 Rechen- und Kommunikationszentrum (RZ) Agenda OpenMP: Tasking
More informationIntroduction to OpenMP.
Introduction to OpenMP www.openmp.org Motivation Parallelize the following code using threads: for (i=0; i
More informationReview. Tasking. 34a.cpp. Lecture 14. Work Tasking 5/31/2011. Structured block. Parallel construct. Working-Sharing contructs.
Review Lecture 14 Structured block Parallel construct clauses Working-Sharing contructs for, single, section for construct with different scheduling strategies 1 2 Tasking Work Tasking New feature in OpenMP
More informationOmpSs-2 Specification
OmpSs-2 Specification Release BSC Programming Models Nov 29, 2018 CONTENTS 1 Introduction 1 1.1 License and liability disclaimer..................................... 1 1.2 A bit of history..............................................
More informationEPL372 Lab Exercise 5: Introduction to OpenMP
EPL372 Lab Exercise 5: Introduction to OpenMP References: https://computing.llnl.gov/tutorials/openmp/ http://openmp.org/wp/openmp-specifications/ http://openmp.org/mp-documents/openmp-4.0-c.pdf http://openmp.org/mp-documents/openmp4.0.0.examples.pdf
More informationOpenMP tasking model for Ada: safety and correctness
www.bsc.es www.cister.isep.ipp.pt OpenMP tasking model for Ada: safety and correctness Sara Royuela, Xavier Martorell, Eduardo Quiñones and Luis Miguel Pinho Vienna (Austria) June 12-16, 2017 Parallel
More informationExploring Dynamic Parallelism on OpenMP
www.bsc.es Exploring Dynamic Parallelism on OpenMP Guray Ozen, Eduard Ayguadé, Jesús Labarta WACCPD @ SC 15 Guray Ozen - Exploring Dynamic Parallelism in OpenMP Austin, Texas 2015 MACC: MACC: Introduction
More informationOpenMPSuperscalar: Task-Parallel Simulation and Visualization of Crowds with Several CPUs and GPUs
www.bsc.es OpenMPSuperscalar: Task-Parallel Simulation and Visualization of Crowds with Several CPUs and GPUs Hugo Pérez UPC-BSC Benjamin Hernandez Oak Ridge National Lab Isaac Rudomin BSC March 2015 OUTLINE
More informationParallel Programming. Libraries and Implementations
Parallel Programming Libraries and Implementations Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationOpenMP. Today s lecture. Scott B. Baden / CSE 160 / Wi '16
Lecture 8 OpenMP Today s lecture 7 OpenMP A higher level interface for threads programming http://www.openmp.org Parallelization via source code annotations All major compilers support it, including gnu
More informationOpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013
OpenMP António Abreu Instituto Politécnico de Setúbal 1 de Março de 2013 António Abreu (Instituto Politécnico de Setúbal) OpenMP 1 de Março de 2013 1 / 37 openmp what? It s an Application Program Interface
More informationLecture 4: OpenMP Open Multi-Processing
CS 4230: Parallel Programming Lecture 4: OpenMP Open Multi-Processing January 23, 2017 01/23/2017 CS4230 1 Outline OpenMP another approach for thread parallel programming Fork-Join execution model OpenMP
More informationShared Memory Programming : OpenMP
Multicore & GPU Programming : An Integrated Approach Shared Memory Programming : OpenMP By G. Barlas Objectives Learn how to use OpenMP compiler directives to introduce concurrency in a sequential program.
More informationOpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2.
OpenMP Overview in 30 Minutes Christian Terboven 06.12.2010 / Aachen, Germany Stand: 03.12.2010 Version 2.3 Rechen- und Kommunikationszentrum (RZ) Agenda OpenMP: Parallel Regions,
More informationOpenMP 4.0. Mark Bull, EPCC
OpenMP 4.0 Mark Bull, EPCC OpenMP 4.0 Version 4.0 was released in July 2013 Now available in most production version compilers support for device offloading not in all compilers, and not for all devices!
More informationOpenMP 4.0/4.5. Mark Bull, EPCC
OpenMP 4.0/4.5 Mark Bull, EPCC OpenMP 4.0/4.5 Version 4.0 was released in July 2013 Now available in most production version compilers support for device offloading not in all compilers, and not for all
More informationIntroduction to Standard OpenMP 3.1
Introduction to Standard OpenMP 3.1 Massimiliano Culpo - m.culpo@cineca.it Gian Franco Marras - g.marras@cineca.it CINECA - SuperComputing Applications and Innovation Department 1 / 59 Outline 1 Introduction
More informationOur new HPC-Cluster An overview
Our new HPC-Cluster An overview Christian Hagen Universität Regensburg Regensburg, 15.05.2009 Outline 1 Layout 2 Hardware 3 Software 4 Getting an account 5 Compiling 6 Queueing system 7 Parallelization
More informationOpenMP Fundamentals Fork-join model and data environment
www.bsc.es OpenMP Fundamentals Fork-join model and data environment Xavier Teruel and Xavier Martorell Agenda: OpenMP Fundamentals OpenMP brief introduction The fork-join model Data environment OpenMP
More informationIWOMP Dresden, Germany
IWOMP 2009 Dresden, Germany Providing Observability for OpenMP 3.0 Applications Yuan Lin, Oleg Mazurov Overview Objective The Data Model OpenMP Runtime API for Profiling Collecting Data Examples Overhead
More informationOpenMP. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen
OpenMP Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS16/17 Worksharing constructs To date: #pragma omp parallel created a team of threads We distributed
More informationParallel Processing Top manufacturer of multiprocessing video & imaging solutions.
1 of 10 3/3/2005 10:51 AM Linux Magazine March 2004 C++ Parallel Increase application performance without changing your source code. Parallel Processing Top manufacturer of multiprocessing video & imaging
More informationEE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California
EE/CSCI 451 Introduction to Parallel and Distributed Computation Discussion #4 2/3/2017 University of Southern California 1 USC HPCC Access Compile Submit job OpenMP Today s topic What is OpenMP OpenMP
More informationOpenMP. Application Program Interface. CINECA, 14 May 2012 OpenMP Marco Comparato
OpenMP Application Program Interface Introduction Shared-memory parallelism in C, C++ and Fortran compiler directives library routines environment variables Directives single program multiple data (SPMD)
More informationData Environment: Default storage attributes
COSC 6374 Parallel Computation Introduction to OpenMP(II) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel) Edgar Gabriel Fall 2014 Data Environment: Default storage attributes
More informationOpenMP programming. Thomas Hauser Director Research Computing Research CU-Boulder
OpenMP programming Thomas Hauser Director Research Computing thomas.hauser@colorado.edu CU meetup 1 Outline OpenMP Shared-memory model Parallel for loops Declaring private variables Critical sections Reductions
More informationTasking in OpenMP 4. Mirko Cestari - Marco Rorro -
Tasking in OpenMP 4 Mirko Cestari - m.cestari@cineca.it Marco Rorro - m.rorro@cineca.it Outline Introduction to OpenMP General characteristics of Taks Some examples Live Demo Multi-threaded process Each
More informationAnnouncements. Scott B. Baden / CSE 160 / Wi '16 2
Lecture 8 Announcements Scott B. Baden / CSE 160 / Wi '16 2 Recapping from last time: Minimal barrier synchronization in odd/even sort Global bool AllDone; for (s = 0; s < MaxIter; s++) { barr.sync();
More informationComparing OpenACC 2.5 and OpenMP 4.1 James C Beyer PhD, Sept 29 th 2015
Comparing OpenACC 2.5 and OpenMP 4.1 James C Beyer PhD, Sept 29 th 2015 Abstract As both an OpenMP and OpenACC insider I will present my opinion of the current status of these two directive sets for programming
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical
More informationCSL 860: Modern Parallel
CSL 860: Modern Parallel Computation Hello OpenMP #pragma omp parallel { // I am now thread iof n switch(omp_get_thread_num()) { case 0 : blah1.. case 1: blah2.. // Back to normal Parallel Construct Extremely
More informationMultithreading in C with OpenMP
Multithreading in C with OpenMP ICS432 - Spring 2017 Concurrent and High-Performance Programming Henri Casanova (henric@hawaii.edu) Pthreads are good and bad! Multi-threaded programming in C with Pthreads
More informationKepler Overview Mark Ebersole
Kepler Overview Mark Ebersole TFLOPS TFLOPS 3x Performance in a Single Generation 3.5 3 2.5 2 1.5 1 0.5 0 1.25 1 Single Precision FLOPS (SGEMM) 2.90 TFLOPS.89 TFLOPS.36 TFLOPS Xeon E5-2690 Tesla M2090
More informationAllows program to be incrementally parallelized
Basic OpenMP What is OpenMP An open standard for shared memory programming in C/C+ + and Fortran supported by Intel, Gnu, Microsoft, Apple, IBM, HP and others Compiler directives and library support OpenMP
More informationParallel Programming Libraries and implementations
Parallel Programming Libraries and implementations Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License.
More informationOpenMP. OpenMP. Portable programming of shared memory systems. It is a quasi-standard. OpenMP-Forum API for Fortran and C/C++
OpenMP OpenMP Portable programming of shared memory systems. It is a quasi-standard. OpenMP-Forum 1997-2002 API for Fortran and C/C++ directives runtime routines environment variables www.openmp.org 1
More informationIntroduction to OpenMP
Christian Terboven, Dirk Schmidl IT Center, RWTH Aachen University Member of the HPC Group terboven,schmidl@itc.rwth-aachen.de IT Center der RWTH Aachen University History De-facto standard for Shared-Memory
More informationOpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen
OpenMP - II Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 OpenMP References Using OpenMP: Portable Shared Memory Parallel Programming. The MIT
More informationITCS 4/5145 Parallel Computing Test 1 5:00 pm - 6:15 pm, Wednesday February 17, 2016 Solutions Name:...
ITCS 4/5145 Parallel Computing Test 1 5:00 pm - 6:15 pm, Wednesday February 17, 016 Solutions Name:... Answer questions in space provided below questions. Use additional paper if necessary but make sure
More informationOpenMP Tutorial. Seung-Jai Min. School of Electrical and Computer Engineering Purdue University, West Lafayette, IN
OpenMP Tutorial Seung-Jai Min (smin@purdue.edu) School of Electrical and Computer Engineering Purdue University, West Lafayette, IN 1 Parallel Programming Standards Thread Libraries - Win32 API / Posix
More informationOpenMP Lab on Nested Parallelism and Tasks
OpenMP Lab on Nested Parallelism and Tasks Nested Parallelism 2 Nested Parallelism Some OpenMP implementations support nested parallelism A thread within a team of threads may fork spawning a child team
More informationhttps://www.youtube.com/playlist?list=pllx- Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG
https://www.youtube.com/playlist?list=pllx- Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG OpenMP Basic Defs: Solution Stack HW System layer Prog. User layer Layer Directives, Compiler End User Application OpenMP library
More informationExtending the Task-Aware MPI (TAMPI) Library to Support Asynchronous MPI primitives
Extending the Task-Aware MPI (TAMPI) Library to Support Asynchronous MPI primitives Kevin Sala, X. Teruel, J. M. Perez, V. Beltran, J. Labarta 24/09/2018 OpenMPCon 2018, Barcelona Overview TAMPI Library
More informationConcurrent Programming with OpenMP
Concurrent Programming with OpenMP Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico October 11, 2012 CPD (DEI / IST) Parallel and Distributed
More informationIntroduction to OpenMP
1 Introduction to OpenMP NTNU-IT HPC Section John Floan Notur: NTNU HPC http://www.notur.no/ www.hpc.ntnu.no/ Name, title of the presentation 2 Plan for the day Introduction to OpenMP and parallel programming
More informationIntroduction to OpenMP
Introduction to OpenMP Christian Terboven 10.04.2013 / Darmstadt, Germany Stand: 06.03.2013 Version 2.3 Rechen- und Kommunikationszentrum (RZ) History De-facto standard for
More informationIntroduction to OpenMP
Introduction to OpenMP Ricardo Fonseca https://sites.google.com/view/rafonseca2017/ Outline Shared Memory Programming OpenMP Fork-Join Model Compiler Directives / Run time library routines Compiling and
More informationECE/ME/EMA/CS 759 High Performance Computing for Engineering Applications
ECE/ME/EMA/CS 759 High Performance Computing for Engineering Applications Work Sharing in OpenMP November 2, 2015 Lecture 21 Dan Negrut, 2015 ECE/ME/EMA/CS 759 UW-Madison Quote of the Day Success consists
More informationECE 574 Cluster Computing Lecture 10
ECE 574 Cluster Computing Lecture 10 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 1 October 2015 Announcements Homework #4 will be posted eventually 1 HW#4 Notes How granular
More informationParallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops
Parallel Programming Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops Single computers nowadays Several CPUs (cores) 4 to 8 cores on a single chip Hyper-threading
More informationSynchronisation in Java - Java Monitor
Synchronisation in Java - Java Monitor -Every object and class is logically associated with a monitor - the associated monitor protects the variable in the object/class -The monitor of an object/class
More informationCellSs Making it easier to program the Cell Broadband Engine processor
Perez, Bellens, Badia, and Labarta CellSs Making it easier to program the Cell Broadband Engine processor Presented by: Mujahed Eleyat Outline Motivation Architecture of the cell processor Challenges of
More informationOpenMP 4. CSCI 4850/5850 High-Performance Computing Spring 2018
OpenMP 4 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical
More informationDPHPC: Introduction to OpenMP Recitation session
SALVATORE DI GIROLAMO DPHPC: Introduction to OpenMP Recitation session Based on http://openmp.org/mp-documents/intro_to_openmp_mattson.pdf OpenMP An Introduction What is it? A set of compiler directives
More informationIntroduction to OpenMP
Introduction to OpenMP Ekpe Okorafor School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014 A little about me! PhD Computer Engineering Texas A&M University Computer Science
More informationCOMP4300/8300: The OpenMP Programming Model. Alistair Rendell. Specifications maintained by OpenMP Architecture Review Board (ARB)
COMP4300/8300: The OpenMP Programming Model Alistair Rendell See: www.openmp.org Introduction to High Performance Computing for Scientists and Engineers, Hager and Wellein, Chapter 6 & 7 High Performance
More informationCOMP4300/8300: The OpenMP Programming Model. Alistair Rendell
COMP4300/8300: The OpenMP Programming Model Alistair Rendell See: www.openmp.org Introduction to High Performance Computing for Scientists and Engineers, Hager and Wellein, Chapter 6 & 7 High Performance
More informationShared Memory programming paradigm: openmp
IPM School of Physics Workshop on High Performance Computing - HPC08 Shared Memory programming paradigm: openmp Luca Heltai Stefano Cozzini SISSA - Democritos/INFM
More informationHybrid MPI + OmpSs Advanced concept of OmpSs
www.bsc.es Hybrid MPI + OmpSs Advanced concept of OmpSs Rosa Badia, Xavier Martorell Session 2: Hybrid MPI + OmpSs and Load Balance! Hybrid programming Issues and basic examples! Load Balance! OmpSs advanced
More informationProgramming Shared-memory Platforms with OpenMP
Programming Shared-memory Platforms with OpenMP John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 7 31 February 2017 Introduction to OpenMP OpenMP
More information15-418, Spring 2008 OpenMP: A Short Introduction
15-418, Spring 2008 OpenMP: A Short Introduction This is a short introduction to OpenMP, an API (Application Program Interface) that supports multithreaded, shared address space (aka shared memory) parallelism.
More informationOpenMP. A parallel language standard that support both data and functional Parallelism on a shared memory system
OpenMP A parallel language standard that support both data and functional Parallelism on a shared memory system Use by system programmers more than application programmers Considered a low level primitives
More informationBest Practice Guide for Writing MPI + OmpSs Interoperable Programs. Version 1.0, 3 rd April 2017
Best Practice Guide for Writing MPI + OmpSs Interoperable Programs Version 1.0, 3 rd April 2017 Copyright INTERTWinE Consortium 2017 Table of Contents 1 INTRODUCTION... 1 1.1 PURPOSE... 1 1.2 GLOSSARY
More informationTowards task-parallel reductions in OpenMP
www.bsc.es Towards task-parallel reductions in OpenMP J. Ciesko, S. Mateo, X. Teruel, X. Martorell, E. Ayguadé, J. Labarta, A. Duran, B. De Supinski, S. Olivier, K. Li, A. Eichenberger IWOMP - Aachen,
More informationParallel Programming. OpenMP Parallel programming for multiprocessors for loops
Parallel Programming OpenMP Parallel programming for multiprocessors for loops OpenMP OpenMP An application programming interface (API) for parallel programming on multiprocessors Assumes shared memory
More informationLittle Motivation Outline Introduction OpenMP Architecture Working with OpenMP Future of OpenMP End. OpenMP. Amasis Brauch German University in Cairo
OpenMP Amasis Brauch German University in Cairo May 4, 2010 Simple Algorithm 1 void i n c r e m e n t e r ( short a r r a y ) 2 { 3 long i ; 4 5 for ( i = 0 ; i < 1000000; i ++) 6 { 7 a r r a y [ i ]++;
More informationLeveraging OpenMP Infrastructure for Language Level Parallelism Darryl Gove. 15 April
1 Leveraging OpenMP Infrastructure for Language Level Parallelism Darryl Gove Senior Principal Software Engineer Outline Proposal and Motivation Overview of
More informationOpenMP: Open Multiprocessing
OpenMP: Open Multiprocessing Erik Schnetter May 20-22, 2013, IHPC 2013, Iowa City 2,500 BC: Military Invents Parallelism Outline 1. Basic concepts, hardware architectures 2. OpenMP Programming 3. How to
More informationMore Advanced OpenMP. Saturday, January 30, 16
More Advanced OpenMP This is an abbreviated form of Tim Mattson s and Larry Meadow s (both at Intel) SC 08 tutorial located at http:// openmp.org/mp-documents/omp-hands-on-sc08.pdf All errors are my responsibility
More informationAdvanced OpenMP Features
Christian Terboven, Dirk Schmidl IT Center, RWTH Aachen University Member of the HPC Group {terboven,schmidl@itc.rwth-aachen.de IT Center der RWTH Aachen University Vectorization 2 Vectorization SIMD =
More informationOpenMP. Dr. William McDoniel and Prof. Paolo Bientinesi WS17/18. HPAC, RWTH Aachen
OpenMP Dr. William McDoniel and Prof. Paolo Bientinesi HPAC, RWTH Aachen mcdoniel@aices.rwth-aachen.de WS17/18 Loop construct - Clauses #pragma omp for [clause [, clause]...] The following clauses apply:
More informationOpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018
OpenMP 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives
More informationME759 High Performance Computing for Engineering Applications
ME759 High Performance Computing for Engineering Applications Parallel Computing on Multicore CPUs October 25, 2013 Dan Negrut, 2013 ME964 UW-Madison A programming language is low level when its programs
More informationIntroduction to OpenMP. Lecture 4: Work sharing directives
Introduction to OpenMP Lecture 4: Work sharing directives Work sharing directives Directives which appear inside a parallel region and indicate how work should be shared out between threads Parallel do/for
More informationOpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono
OpenMP Algoritmi e Calcolo Parallelo References Useful references Using OpenMP: Portable Shared Memory Parallel Programming, Barbara Chapman, Gabriele Jost and Ruud van der Pas OpenMP.org http://openmp.org/
More informationCS4961 Parallel Programming. Lecture 9: Task Parallelism in OpenMP 9/22/09. Administrative. Mary Hall September 22, 2009.
Parallel Programming Lecture 9: Task Parallelism in OpenMP Administrative Programming assignment 1 is posted (after class) Due, Tuesday, September 22 before class - Use the handin program on the CADE machines
More information