CS 470 Spring Mike Lam, Professor. Advanced OpenMP

Similar documents
CS 470 Spring Mike Lam, Professor. Advanced OpenMP

Parallel Programming using OpenMP

Parallel Programming using OpenMP

DPHPC: Introduction to OpenMP Recitation session

Data Handling in OpenMP

Shared Memory Programming with OpenMP (3)

Concurrent Programming with OpenMP

ECE 574 Cluster Computing Lecture 10

Parallel Programming with OpenMP. CS240A, T. Yang, 2013 Modified from Demmel/Yelick s and Mary Hall s Slides

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing

Programming with Shared Memory PART II. HPC Fall 2012 Prof. Robert van Engelen

Programming with Shared Memory PART II. HPC Fall 2007 Prof. Robert van Engelen

Review. 35a.cpp. 36a.cpp. Lecture 13 5/29/2012. Compiler Directives. Library Functions Environment Variables

CS 470 Spring Mike Lam, Professor. OpenMP

Jukka Julku Multicore programming: Low-level libraries. Outline. Processes and threads TBB MPI UPC. Examples

CS691/SC791: Parallel & Distributed Computing

OpenMP Introduction. CS 590: High Performance Computing. OpenMP. A standard for shared-memory parallel programming. MP = multiprocessing

CS 470 Spring Mike Lam, Professor. OpenMP

Introduction to Parallel Programming Part 4 Confronting Race Conditions

OPENMP OPEN MULTI-PROCESSING

OpenMP C and C++ Application Program Interface Version 1.0 October Document Number

POSIX Threads and OpenMP tasks

A brief introduction to OpenMP

CS 261 Fall Mike Lam, Professor. Threads

Computational Mathematics

Parallel Programming with OpenMP. CS240A, T. Yang

Parallel Computing Parallel Programming Languages Hwansoo Han

OpenMP loops. Paolo Burgio.

Shared Memory Programming Model

Warm-up question (CS 261 review) What is the primary difference between processes and threads from a developer s perspective?

Parallel Programming

Advanced C Programming Winter Term 2008/09. Guest Lecture by Markus Thiele

Multithreading in C with OpenMP

1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono

Lecture 16: Recapitulations. Lecture 16: Recapitulations p. 1

Introduction to. Slides prepared by : Farzana Rahman 1

OpenMP 4.0. Mark Bull, EPCC

OpenMP Examples - Tasking

Review. Tasking. 34a.cpp. Lecture 14. Work Tasking 5/31/2011. Structured block. Parallel construct. Working-Sharing contructs.

CS4961 Parallel Programming. Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms 9/5/12. Administrative. Mary Hall September 4, 2012

Overview: The OpenMP Programming Model

CSL 860: Modern Parallel

Shared Memory: Virtual Shared Memory, Threads & OpenMP

Parallel Computing. Lecture 16: OpenMP - IV

15-418, Spring 2008 OpenMP: A Short Introduction

Parallel and Distributed Programming. OpenMP

Session 4: Parallel Programming with OpenMP

More Advanced OpenMP. Saturday, January 30, 16

OpenMP 4.0/4.5. Mark Bull, EPCC

CS 470 Spring Mike Lam, Professor. Semaphores and Conditions

Make the Most of OpenMP Tasking. Sergi Mateo Bellido Compiler engineer

Topics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP

Concurrency, Thread. Dongkun Shin, SKKU

Parallel algorithm templates. Threads, tasks and parallel patterns Programming with. From parallel algorithms templates to tasks

Programming Shared-memory Platforms with OpenMP. Xu Liu

UvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP

High Performance Computing: Tools and Applications

An Introduction to OpenMP

OpenMP - III. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

Lecture 4: OpenMP Open Multi-Processing

Parallel Numerical Algorithms

Introduction to OpenMP.

Introduction to OpenMP. Lecture 4: Work sharing directives

Shared Memory Programming: Threads and OpenMP. Lecture 6

GCC Developers Summit Ottawa, Canada, June 2006

Heterogeneous Compu.ng using openmp lecture 2

HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP)

Allows program to be incrementally parallelized

OpenMP dynamic loops. Paolo Burgio.

Advanced OpenMP. Lecture 11: OpenMP 4.0

Introduction to OpenMP

Synchronising Threads

Open Multi-Processing: Basic Course

CME 213 S PRING Eric Darve

Parallel Computing. Prof. Marco Bertini

EE/CSCI 451: Parallel and Distributed Computation

Programming Shared Address Space Platforms using OpenMP

OpenMP. OpenMP. Portable programming of shared memory systems. It is a quasi-standard. OpenMP-Forum API for Fortran and C/C++

Shared Memory Parallel Programming with Pthreads An overview

Introduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah

OpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013

Shared Memory Parallelism using OpenMP

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

OpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Introduction to OpenMP

Programming Shared Address Space Platforms

Shared Memory Programming : OpenMP

COMP4300/8300: The OpenMP Programming Model. Alistair Rendell. Specifications maintained by OpenMP Architecture Review Board (ARB)

COMP4300/8300: The OpenMP Programming Model. Alistair Rendell

OpenMP examples. Sergeev Efim. Singularis Lab, Ltd. Senior software engineer

Standard promoted by main manufacturers Fortran. Structure: Directives, clauses and run time calls

Shared Memory Programming with OpenMP

OpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2.

Standard promoted by main manufacturers Fortran

CS 5220: Shared memory programming. David Bindel

Parallel programming using OpenMP

OpenMP 4. CSCI 4850/5850 High-Performance Computing Spring 2018

OpenMP Application Program Interface

A Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh

Transcription:

CS 470 Spring 2017 Mike Lam, Professor Advanced OpenMP

Atomics OpenMP provides access to highly-efficient hardware synchronization mechanisms Use the atomic pragma to annotate a single statement Statement must be a single increment/decrement or in the following form: x <op>= <expr>; // <op> can be +, -, *, /, &,, ^, <<, >> Many processors provide a load/modify/store instruction In x86-64, specified using the LOCK prefix Far more efficient than using a mutex (i.e., critical) This requires multiple function calls!

Locks OpenMP provides a basic locking system Useful for protecting a data structure rather than a region of code omp_lock_t: lock variable Similar to pthread_mutex_t omp_lock_init: initialize lock Similar to pthread_mutex_init omp_set_lock: acquire lock Similar to pthread_mutex_lock omp_unset_lock: release lock Similar to pthread_mutex_unlock omp_lock_destroy: clean up a lock Similar to pthread_mutex_destroy

Thread safety Don't mix mutual exclusion mechanisms #pragma omp critical #pragma omp atomic omp_set_lock() Don't nest mutual exclusion mechanisms Nesting unnamed critical sections guarantees deadlock! The thread cannot enter the second section because it is still in the first section, and unnamed sections share a name If you must, use named critical sections or nested locks

Nested locks Simple vs. nested locks omp_nest_lock_* instead of omp_lock_* A nested lock may be acquired multiple times Must be in the same thread Must be released the same number of times Allows you to write functions that call each other but need to acquire the same lock

Tasks OpenMP is most often used for data parallelism (parallel for) Newer versions (3.1+) have explicit task parallelism #pragma omp parallel Spawn worker threads #pragma omp task Create a new task (should be in a parallel block) Task is assigned to an available worker by the runtime (may be deferred) #pragma omp taskwait Waits for all created tasks to finish (but doesn t destroy workers) main: # pragma omp parallel # pragma omp single nowait quick_sort(items, n); quicksort: <select pivot and partition> // recursively sort each partition # pragma omp task quick_sort(items, p+1); # pragma omp task quick_sort(items+q, n-q); # pragma omp taskwait

Aside Often useful: multiple for-loops inside a parallel region Many pragmas bind dynamically to any active parallel region Less thread creation/joining overhead Private variables can be re-used across multiple loops # pragma omp parallel default(none) shared(n,m) { int tid = omp_get_thread_num(); # pragma omp for for (int i = 0; i < n; i++) { // do something that requires tid } # pragma omp for for (int j = 0; j < m; j++) { // do something else that requires tid } }

Loop scheduling Use the schedule clause to control how parallel forloop iterations are allocated to threads Modified by chunksize parameter static: split into chunks before loop is executed dynamic: split into chunks, dynamically allocated to threads (similar to thread pool or tasks) guided: like dynamic, but chunk sizes decrease The specified chunksize is the minimum auto: allows the compiler or runtime to choose runtime: allows specification using OMP_SCHEDULE

Loop scheduling (static) (static, 1) (static, 2) (dynamic, 2) (guided) Iteration 00 on thread 0 Iteration 01 on thread 0 Iteration 02 on thread 0 Iteration 03 on thread 0 Iteration 04 on thread 0 Iteration 05 on thread 0 Iteration 06 on thread 0 Iteration 07 on thread 0 Iteration 08 on thread 1 Iteration 09 on thread 1 Iteration 10 on thread 1 Iteration 11 on thread 1 Iteration 12 on thread 1 Iteration 13 on thread 1 Iteration 14 on thread 1 Iteration 15 on thread 1 Iteration 16 on thread 2 Iteration 17 on thread 2 Iteration 18 on thread 2 Iteration 19 on thread 2 Iteration 20 on thread 2 Iteration 21 on thread 2 Iteration 22 on thread 2 Iteration 23 on thread 2 Iteration 24 on thread 3 Iteration 25 on thread 3 Iteration 26 on thread 3 Iteration 27 on thread 3 Iteration 28 on thread 3 Iteration 29 on thread 3 Iteration 30 on thread 3 Iteration 31 on thread 3 Iteration 00 on thread 0 Iteration 01 on thread 1 Iteration 02 on thread 2 Iteration 03 on thread 3 Iteration 04 on thread 0 Iteration 05 on thread 1 Iteration 06 on thread 2 Iteration 07 on thread 3 Iteration 08 on thread 0 Iteration 09 on thread 1 Iteration 10 on thread 2 Iteration 11 on thread 3 Iteration 12 on thread 0 Iteration 13 on thread 1 Iteration 14 on thread 2 Iteration 15 on thread 3 Iteration 16 on thread 0 Iteration 17 on thread 1 Iteration 18 on thread 2 Iteration 19 on thread 3 Iteration 20 on thread 0 Iteration 21 on thread 1 Iteration 22 on thread 2 Iteration 23 on thread 3 Iteration 24 on thread 0 Iteration 25 on thread 1 Iteration 26 on thread 2 Iteration 27 on thread 3 Iteration 28 on thread 0 Iteration 29 on thread 1 Iteration 30 on thread 2 Iteration 31 on thread 3 Iteration 00 on thread 0 Iteration 01 on thread 0 Iteration 02 on thread 1 Iteration 03 on thread 1 Iteration 04 on thread 2 Iteration 05 on thread 2 Iteration 06 on thread 3 Iteration 07 on thread 3 Iteration 08 on thread 0 Iteration 09 on thread 0 Iteration 10 on thread 1 Iteration 11 on thread 1 Iteration 12 on thread 2 Iteration 13 on thread 2 Iteration 14 on thread 3 Iteration 15 on thread 3 Iteration 16 on thread 0 Iteration 17 on thread 0 Iteration 18 on thread 1 Iteration 19 on thread 1 Iteration 20 on thread 2 Iteration 21 on thread 2 Iteration 22 on thread 3 Iteration 23 on thread 3 Iteration 24 on thread 0 Iteration 25 on thread 0 Iteration 26 on thread 1 Iteration 27 on thread 1 Iteration 28 on thread 2 Iteration 29 on thread 2 Iteration 30 on thread 3 Iteration 31 on thread 3 Iteration 00 on thread 1 Iteration 01 on thread 1 Iteration 02 on thread 3 Iteration 03 on thread 3 Iteration 04 on thread 2 Iteration 05 on thread 2 Iteration 06 on thread 0 Iteration 07 on thread 0 Iteration 08 on thread 3 Iteration 09 on thread 3 Iteration 10 on thread 3 Iteration 11 on thread 3 Iteration 12 on thread 3 Iteration 13 on thread 3 Iteration 14 on thread 3 Iteration 15 on thread 3 Iteration 16 on thread 2 Iteration 17 on thread 2 Iteration 18 on thread 3 Iteration 19 on thread 3 Iteration 20 on thread 2 Iteration 21 on thread 2 Iteration 22 on thread 1 Iteration 23 on thread 1 Iteration 24 on thread 3 Iteration 25 on thread 3 Iteration 26 on thread 1 Iteration 27 on thread 1 Iteration 28 on thread 1 Iteration 29 on thread 1 Iteration 30 on thread 0 Iteration 31 on thread 0 Iteration 00 on thread 2 Iteration 01 on thread 2 Iteration 02 on thread 2 Iteration 03 on thread 2 Iteration 04 on thread 2 Iteration 05 on thread 2 Iteration 06 on thread 2 Iteration 07 on thread 2 Iteration 08 on thread 0 Iteration 09 on thread 0 Iteration 10 on thread 0 Iteration 11 on thread 0 Iteration 12 on thread 0 Iteration 13 on thread 0 Iteration 14 on thread 1 Iteration 15 on thread 1 Iteration 16 on thread 1 Iteration 17 on thread 1 Iteration 18 on thread 1 Iteration 19 on thread 3 Iteration 20 on thread 3 Iteration 21 on thread 3 Iteration 22 on thread 3 Iteration 23 on thread 2 Iteration 24 on thread 2 Iteration 25 on thread 2 Iteration 26 on thread 2 Iteration 27 on thread 2 Iteration 28 on thread 2 Iteration 29 on thread 1 Iteration 30 on thread 1 Iteration 31 on thread 3