Make the Most of OpenMP Tasking. Sergi Mateo Bellido Compiler engineer

Similar documents
OpenMP Tasking Model Unstructured parallelism

Advanced OpenMP. Tasks

Tasking in OpenMP 4. Mirko Cestari - Marco Rorro -

OpenMP Examples - Tasking

POSIX Threads and OpenMP tasks

OpenMP Tutorial. Dirk Schmidl. IT Center, RWTH Aachen University. Member of the HPC Group Christian Terboven

Parallel algorithm templates. Threads, tasks and parallel patterns Programming with. From parallel algorithms templates to tasks

HPCSE - II. «OpenMP Programming Model - Tasks» Panos Hadjidoukas

Review. Lecture 12 5/22/2012. Compiler Directives. Library Functions Environment Variables. Compiler directives for construct, collapse clause

OpenMP 4.0/4.5. Mark Bull, EPCC

OpenMP 4.0/4.5: New Features and Protocols. Jemmy Hu

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

A brief introduction to OpenMP

CS 470 Spring Mike Lam, Professor. Advanced OpenMP

Introduction to OpenMP

Review. Tasking. 34a.cpp. Lecture 14. Work Tasking 5/31/2011. Structured block. Parallel construct. Working-Sharing contructs.

Towards task-parallel reductions in OpenMP

New Features after OpenMP 2.5

OpenMP Lab on Nested Parallelism and Tasks

Advanced OpenMP Features

Fundamentals of OmpSs

CS 470 Spring Mike Lam, Professor. Advanced OpenMP

OpenMP. Dr. William McDoniel and Prof. Paolo Bientinesi WS17/18. HPAC, RWTH Aachen

Introduction to OpenMP

Introduction to OpenMP

Overview: The OpenMP Programming Model

Tasking in OpenMP. Paolo Burgio.

OpenMP 4.5: Threading, vectorization & offloading

Programming Shared-memory Platforms with OpenMP. Xu Liu

OpenMP 4.0. Mark Bull, EPCC

Tasking and OpenMP Success Stories

OpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2.

OpenMP Tasking Concepts

CME 213 S PRING Eric Darve

Advanced OpenMP. Lecture 11: OpenMP 4.0

OmpSs Specification. BSC Programming Models

OmpSs-2 Specification

Introduction to OpenMP. Lecture 4: Work sharing directives

Introduction to OpenMP

EPL372 Lab Exercise 5: Introduction to OpenMP

Session 4: Parallel Programming with OpenMP

Progress on OpenMP Specifications

Concurrent Programming with OpenMP

Introduction to OpenMP. Tasks. N.M. Maclaren September 2017

IWOMP Dresden, Germany

Shared Memory Programming with OpenMP

COMP4300/8300: The OpenMP Programming Model. Alistair Rendell. Specifications maintained by OpenMP Architecture Review Board (ARB)

COMP4300/8300: The OpenMP Programming Model. Alistair Rendell

Jukka Julku Multicore programming: Low-level libraries. Outline. Processes and threads TBB MPI UPC. Examples

ME759 High Performance Computing for Engineering Applications

UvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP

Task-parallel reductions in OpenMP and OmpSs

Shared Memory Programming with OpenMP. Lecture 3: Parallel Regions

OmpSs Fundamentals. ISC 2017: OpenSuCo. Xavier Teruel

Department of Informatics V. HPC-Lab. Session 2: OpenMP M. Bader, A. Breuer. Alex Breuer

Shared Memory Programming : OpenMP

COMP528: Multi-core and Multi-Processor Computing

More Advanced OpenMP. Saturday, January 30, 16

OpenMP Fundamentals Fork-join model and data environment

Introduction to Standard OpenMP 3.1

OpenMP - Introduction

Masterpraktikum - High Performance Computing

Programming Shared-memory Platforms with OpenMP

Introduction to OpenMP

Ruud van der Pas. Senior Principal So2ware Engineer SPARC Microelectronics. Santa Clara, CA, USA

Data Environment: Default storage attributes

EE/CSCI 451: Parallel and Distributed Computation

A Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh

1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008

OpenMP Tutorial. Dirk Schmidl. IT Center, RWTH Aachen University. Member of the HPC Group Christian Terboven

Parallel Computing Parallel Programming Languages Hwansoo Han

ADAPTIVE TASK SCHEDULING USING LOW-LEVEL RUNTIME APIs AND MACHINE LEARNING

OpenMP. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen

Introduction to OpenMP

OpenMP Technical Report 3 on OpenMP 4.0 enhancements

Threaded Programming. Lecture 3: Parallel Regions. Parallel region directive. Code within a parallel region is executed by all threads.

Topics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP

OpenMP 3.0 Tasking Implementation in OpenUH

OpenMP dynamic loops. Paolo Burgio.

Multithreading in C with OpenMP

Parallel Programming

OpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

OpenMP C and C++ Application Program Interface Version 1.0 October Document Number

15-418, Spring 2008 OpenMP: A Short Introduction

ECE 574 Cluster Computing Lecture 10

Comparing OpenACC 2.5 and OpenMP 4.1 James C Beyer PhD, Sept 29 th 2015

OpenMP loops. Paolo Burgio.

Parallel Computing Using OpenMP/MPI. Presented by - Jyotsna 29/01/2008

Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG

CS 5220: Shared memory programming. David Bindel

Open Multi-Processing: Basic Course

!OMP #pragma opm _OPENMP

OpenMP tasking model for Ada: safety and correctness

[Potentially] Your first parallel application

Parallel Programming

OpenMP API Version 5.0

OpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018

OpenMP Application Program Interface

OpenMP Application Program Interface

OpenMP. Dr. William McDoniel and Prof. Paolo Bientinesi WS17/18. HPAC, RWTH Aachen

Transcription:

Make the Most of OpenMP Tasking Sergi Mateo Bellido Compiler engineer 14/11/2017

Outline Intro Data-sharing clauses Cutoff clauses Scheduling clauses 2

Intro: what s a task? A task is a piece of code & its data environment & ICVs int foo(int x) int res = 0; #pragma omp task shared(res) firstprivate(x) res += x; int main() int var = 0; #pragma omp taskwait #pragma omp parallel #pragma omp master var = foo(4); Task s execution may be deferred o Liveness of variables o Guarantee task completeness Parallel Region Task pool 3

Intro: the task construct Syntax: #pragma omp task [clauses] structured-block C/C++!$omp task [clauses] structured-block!$omp end task Fortran Where clauses can be: private firstprivate shared default(shared none) Data sharing clauses if(expr) final(expr) mergeable priority(expr) untied depend(dep-type: expr) Cutoff clauses Scheduling clauses 4

Intro: other task-generating constructs The taskloop construct o specifies that the iterations of a loop will be executed in parallel using tasks int foo(int n, int *v) #pragma omp parallel for for(int i = 0; i < n; ++i) compute(v[i]); int foo(int n, int *v) int #pragma NB = omp n / parallel BS; #pragma omp parallel single for(int #pragma i omp = 0; single taskloop i < n; ++i) for(int #pragma i omp = 0; task i < NB; n; ++i) #pragma compute(v[i]); task for (int ii = i*bs; OpenMP ii < 4.5 (i+1)*bs; ++ii) compute(v[i]); OpenMP 3.0 Several target constructs are also task-generating constructs! OpenMP 3.0 5

Intro: task-synchronization constructs The taskwait construct o Waits on the completion of child tasks of the current task o Implicit task scheduling point The taskgroup construct o Waits on the completion of all descendant tasks of the current task created in the taskgroup region o Implicit task scheduling point #pragma omp task printf( child task\n ); #pragma omp task printf( grandchild task\n ); #pragma omp taskwait #pragma omp taskgroup #pragma omp task printf( Child task\n ); #pragma omp task printf( Grandchild task\n ); 6

Data-Sharing clauses 7

Data-sharing clauses Why default data-sharings of a task construct are different from a parallel construct? o Synchronous vs Asynchronous execution int foo_parallel() int r = 4; printf( 1. &r=%p\n, &r); #pragma omp parallel #pragma omp single printf( 2. &r=%p\n, &r); 1. &r=0x7ffd2b38010c 2. &r=0x7ffd2b38010c Variables that don t have a previous data-sharing are firstprivate o Be careful with arrays!! I recommend to use default(none) int foo_task() int r = 4; printf( 1. &r=%p\n, &r); #pragma omp task printf( 2. &r=%p\n, &r); 1. &r=0x7ffd2b38010c 2. &r=0x7ffeae74ff8c 8

Cutoff clauses 9

Cutoff clauses Cutoff clauses as a way to avoid creating small tasks! if(expr) clause o If expr evaluates to false, the execution of the task will be as if the task construct was ignored undeferred and executed immediately by the thread that was creating the task int foo(int x) printf( entering foo function\n ); int res = 0; #pragma omp task shared(res) if(false) res += x; printf( leaving foo function\n ); o Really useful to debug tasking applications!!! 10

Cutoff strategies: clauses (2) final(expr) clause o For recursive & nested applications, it stops task creation at a certain depth once we have enough parallelism Reduce overhead! int fib(int n) int r1, r2; #pragma omp task final(n < 10) shared(r1) r1 = fib(n-1); #pragma omp task final(n < 10) shared(r2) r2 = fib(n-2); #pragma omp taskwait return res1 + res2; mergeable clause o The implementation might merge the task s data environment if the generated task is included or undeferred fib(8) fib(10) fib(9) fib(12) fib(9) fib(11) fib(8) fib(10) fib(9) No commercial implementation takes advantage of them :( 11

Scheduling clauses 12

Scheduling clauses Scheduling hints & constraints untied clause o Tasks are tied by default, using the untied clause they can potentially switch to another thread priority(expr) clause o Provides a HINT of the task s relevance o Prioritize critical tasks o Balance unbalanced executions int foo(int N, int **elems, int *sizes) for (int i = 0; i < N; ++i) #pragma omp task priority(sizes[i]) compute_elem(elems[i]); elems sizes 1 2 1 1 4 13

Scheduling clauses: depend clause Task dependences as a way to define task-execution constraints int x = 0; #pragma omp parallel #pragma omp single #pragma omp task std::cout << x << std::endl; #pragma omp task x++; OpenMP 3.1 int x = 0; #pragma omp parallel #pragma omp single #pragma omp task depend(in: x) std::cout << x << std::endl; #pragma omp taskwait Task dependences can help us to remove strong synchronizations, increasing the look ahead!!! OpenMP 4.0 #pragma omp task depend(inout: x) x++; OpenMP 3.1 OpenMP 4.0 t1 t2 t1 t2 Task s creation time Task s execution time 14

Scheduling clauses: depend clause(2) depend(dep-type: list), where: o dep-type may be in, out or inout o list may be: A variable, e.g. depend(inout: x) An array section, e.g. depend(inout: a[10:20]) A task cannot be executed until all its predecessor tasks are completed If a task defines an in dependence over a variable o the task will depend on all previously generated sibling tasks that reference at least one of the list items in an out or inout dependence If a task defines an out/inout dependence over a variable o the task will depend on all previously generated sibling tasks that reference at least one of the list items in an in, out or inout dependence 15

Scheduling clauses: depend clause(2) depend(dep-type: list), where: o dep-type may be in, out or inout o list may be: // test1_raw.cc int x A = variable, 0; e.g. depend(inout: x) #pragma omp parallel #pragma An array omp section, single e.g. depend(inout: a[10:20]) #pragma omp task depend(inout: x) //T1... A task cannot be executed until all its predecessor tasks are completed T1 If a task #pragma defines omp task an in depend(in: dependence x) //T2 over a variable... o the task will depend on all previously generated sibling T2 tasks that T3 reference at least #pragma one of omp the task list items depend(in: an out x) or inout //T3 dependence... If a task defines an out/inout dependence over a variable T4 #pragma omp task depend(inout: x) //T4 o the... task will depend on all previously generated sibling tasks that reference at least one of the list items in an in, out or inout dependence 16

Scheduling clauses: depend clause(3) Properly combining dependences and data-sharings allow us to define a task data-flow model o Enhances the composability o Eases the parallelization of new regions of your code int x = 0, y = 0; #pragma omp parallel #pragma omp single #pragma omp task depend(inout: x) x++; y++; //!!! #pragma omp task depend(in: x) std::cout << x << std::endl; int int x int = x 0, = x 0, = y 0, = y 0; = y 0; = 0; #pragma omp omp omp parallel #pragma omp omp omp single #pragma omp omp task omp task task depend(inout: x) x) x, y) x++; x++; x++; y++; y++; y++; #pragma omp omp task omp task task depend(in: x) x) x) std::cout << << x << x << x << std::endl; If all tasks are properly annotated, we only have to worry about the dependendences & data-sharings of the new task!!! #pragma omp taskwait std::cout << y << std::endl; #pragma omp omp task omp task task depend(in: y) x) y) std::cout << << y << y << y << std::endl; 17

Thank you sergi.mateo@bsc.es