Advanced OpenMP. Memory model, flush and atomics

Similar documents
Shared Memory Programming with OpenMP. Lecture 8: Memory model, flush and atomics

Advanced OpenMP. OpenMP Basics

Shared Memory Programming Paradigm!

CS510 Advanced Topics in Concurrency. Jonathan Walpole

Introduc4on to OpenMP and Threaded Libraries Ivan Giro*o

1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008

Parallelising Scientific Codes Using OpenMP. Wadud Miah Research Computing Group

OpenMP. OpenMP. Portable programming of shared memory systems. It is a quasi-standard. OpenMP-Forum API for Fortran and C/C++

OpenMP Fundamentals Fork-join model and data environment

Parallel Programming Libraries and implementations

CS 5220: Shared memory programming. David Bindel

Parallel Programming. Libraries and Implementations

Lecture 24: Multiprocessing Computer Architecture and Systems Programming ( )

Parallel Numerical Algorithms

Introduction to OpenMP

C++ Memory Model. Don t believe everything you read (from shared memory)

OpenMP C and C++ Application Program Interface Version 1.0 October Document Number

Data Environment: Default storage attributes

The OpenMP Memory Model

CME 213 S PRING Eric Darve

Parallel Programming with OpenMP. CS240A, T. Yang

Allows program to be incrementally parallelized

Questions from last time

OpenMP Application Program Interface

OpenMP Tutorial. Dirk Schmidl. IT Center, RWTH Aachen University. Member of the HPC Group Christian Terboven

Message-Passing Programming with MPI. Message-Passing Concepts

15-418, Spring 2008 OpenMP: A Short Introduction

Barbara Chapman, Gabriele Jost, Ruud van der Pas

Introduction to. Slides prepared by : Farzana Rahman 1

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

Concurrent Programming in the D Programming Language. by Walter Bright Digital Mars

Introduction to Standard OpenMP 3.1

OPENMP TIPS, TRICKS AND GOTCHAS

Introduction to OpenMP

Introduction to OpenMP

More Advanced OpenMP. Saturday, January 30, 16

OPENMP TIPS, TRICKS AND GOTCHAS

Fractals. Investigating task farms and load imbalance

Concurrency, Thread. Dongkun Shin, SKKU

Building Blocks. Operating Systems, Processes, Threads

Programming with Shared Memory PART II. HPC Fall 2007 Prof. Robert van Engelen

A Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh

Parallel Programming with OpenMP

Introduction to OpenMP

Parallel Programming with OpenMP. CS240A, T. Yang, 2013 Modified from Demmel/Yelick s and Mary Hall s Slides

Parallel Programming. Libraries and implementations

Fractals exercise. Investigating task farms and load imbalance

Shared Memory Consistency Models: A Tutorial

Parallel Computing. Lecture 16: OpenMP - IV

Overview: The OpenMP Programming Model

Shared Memory Programming with OpenMP

A brief introduction to OpenMP

Implementation of Parallelization

OpenMP Tasking Model Unstructured parallelism

Introduction to OpenMP

CSL 860: Modern Parallel

Scientific Programming in C XIV. Parallel programming

CMSC 330: Organization of Programming Languages

Using Weakly Ordered C++ Atomics Correctly. Hans-J. Boehm

The OpenMP Memory Model

CS 470 Spring Mike Lam, Professor. Advanced OpenMP

Using OpenMP. Shaohao Chen Research Boston University

Introduction to OpenMP

Introduction to OpenMP. Lecture 4: Work sharing directives

OpenMP. Application Program Interface. CINECA, 14 May 2012 OpenMP Marco Comparato

Shared Memory Programming Model

Parallel Programming: OpenMP

Introduction to OpenMP

Programming with Shared Memory PART II. HPC Fall 2012 Prof. Robert van Engelen

Introduction to Object- Oriented Programming

Lecture 16: Recapitulations. Lecture 16: Recapitulations p. 1

Introduction to OpenMP.

CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)

Tasking in OpenMP 4. Mirko Cestari - Marco Rorro -

Jukka Julku Multicore programming: Low-level libraries. Outline. Processes and threads TBB MPI UPC. Examples

Java Memory Model. Jian Cao. Department of Electrical and Computer Engineering Rice University. Sep 22, 2016

Reusing this material

Multithreading in C with OpenMP

CS 470 Spring Mike Lam, Professor. OpenMP

POSIX Threads and OpenMP tasks

Introduction to OpenMP. Lecture 2: OpenMP fundamentals

Open Multi-Processing: Basic Course

Introduction to OpenMP

Advanced OpenMP. Tasks

CS 470 Spring Mike Lam, Professor. OpenMP

Parallel Computer Architecture Spring Memory Consistency. Nikos Bellas

CS4961 Parallel Programming. Lecture 12: Advanced Synchronization (Pthreads) 10/4/11. Administrative. Mary Hall October 4, 2011

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...

Concurrent Programming Lecture 10

OpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013

Overview: Memory Consistency

High-level languages

Review. Tasking. 34a.cpp. Lecture 14. Work Tasking 5/31/2011. Structured block. Parallel construct. Working-Sharing contructs.

CS4961 Parallel Programming. Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms 9/5/12. Administrative. Mary Hall September 4, 2012

An OpenMP Validation Suite

Using OpenMP. Shaohao Chen High performance Louisiana State University

Shared Memory Programming Models I

Comparing OpenACC 2.5 and OpenMP 4.1 James C Beyer PhD, Sept 29 th 2015

GPU Concurrency: Weak Behaviours and Programming Assumptions

!OMP #pragma opm _OPENMP

Transcription:

Advanced OpenMP Memory model, flush and atomics

Why do we need a memory model? On modern computers code is rarely executed in the same order as it was specified in the source code. Compilers, processors and memory systems reorder code to achieve maximum performance. Individual threads, when considered in isolation, exhibit as-if-serial semantics. Programmer s assumptions based on the memory model hold even in the face of code reordering performed by the compiler, the processors and the memory. 2

Example Reasoning about multithreaded execution is not that simple. Thread 1 Thread 2 x=1; y=1; int r1=y; int r2=x; If there is no reordering and T2 sees value of y on read to be 1 then the following read of x should also return the value 1. If code in T1 is reordered we can no longer make this assumption. 3

OpenMP Memory Model OpenMP supports a relaxed-consistency shared memory model. Threads can maintain a temporary view of shared memory which is not consistent with that of other threads. These temporary views are made consistent only at certain points in the program. The operation which enforces consistency is called the flush operation 4

Flush operation Defines a sequence point at which a thread is guaranteed to see a consistent view of memory - All previous read/writes by this thread have completed and are visible to other threads - No subsequent read/writes by this thread have occurred - A flush operation is analogous to a fence in other shared memory API s 5

Flush and synchronization A flush operation is implied by OpenMP synchronizations, e.g. - at entry/exit of parallel regions - at implicit and explicit barriers - at entry/exit of critical regions - whenever a lock is set or unset. (but not at entry to worksharing regions or entry/exit of master regions) Note: using the volatile qualifier in C/C++ does not give sufficient guarantees about multithreaded execution. 6

Example: producer-consumer pattern Thread 0 Thread 1 a = foo(); flag = 1; while (!flag); b = a; This is incorrect code The compiler and/or hardware may re-order the reads/writes to a and flag, or flag may be held in a register. OpenMP has a flush directive which specifies an explicit flush operation -can be used to make the above example work!$omp flush 7

Using flush In order for a write of a variable on one thread to be guaranteed visible and valid on a second thread, the following operations must occur in the following order: 1. Thread A writes the variable 2. Thread A executes a flush operation 3. Thread B executes a flush operation 4. Thread B reads the variable 8

Example: producer-consumer pattern Thread 0 a = foo(); flag = 1; First flush ensures flag is written after a Second flush ensures flag is written to memory Thread 1 while (!flag){ } b = a; 9 First and second flushes ensure flag is read from memory Third flush ensures correct ordering of flushes

Using flush Using flush correctly is difficult and prone to subtle bugs - extremely hard to test whether code is correct - may execute correctly on one platform/compiler but not on another - bugs can be triggered by changing the optimisation level on the compiler Don t use it unless you are 100% confident you know what you are doing! - and even then 10

ATOMIC directive Used to protect a single update to a shared variable. Applies only to a single statement. Syntax: Fortran:!$OMP ATOMIC statement where statement must have one of these forms: x = x op expr, x = expr op x, x = intr (x, expr) or x = intr(expr, x) op is one of +, *, -, /,.and.,.or.,.eqv., or.neqv. intr is one of MAX, MIN, IAND, IOR or IEOR 11

ATOMIC directive (cont) C/C++: #pragma omp atomic statement where statement must have one of the forms: x binop = expr, x++, ++x, x--, or --x and binop is one of +, *, -, /, &, ^, <<, or >> Note that the evaluation of expr is not atomic. May be more efficient than using CRITICAL directives, e.g. if different array elements can be protected separately. No interaction with CRITICAL directives 12

ATOMIC directive (cont) Example (compute degree of each vertex in a graph): #pragma omp parallel for for (j=0; j<nedges; j++){ #pragma omp atomic degree[edge[j].vertex1]++; #pragma omp atomic degree[edge[j].vertex2]++; } 13

Other atomic forms Sometimes we may wish to enforce atomic behaviour for operations other than updates #pragma omp atomic read v = x; #pragma omp atomic write x = expr; #pragma omp atomic capture {v = x; x binop= expr;}!$omp atomic read v = x!$omp atomic write x = expr!$omp atomic capture v = x x = x op expr!$omp end atomic 14

Example: producer-consumer pattern Thread 0 a = foo(); #pragma omp atomic write flag = 1; Thread 1 while (!myflag){ #pragma omp atomic read myflag = flag; } b = a; To be strictly correct we should use atomics to avoid the race condition on flag. 15

Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us This means you are free to copy and redistribute the material and adapt and build on the material under the following terms: You must give appropriate credit, provide a link to the license and indicate if changes were made. If you adapt or build on the material you must distribute your work under the same license as the original. Note that this presentation contains images owned by others. Please seek their permission before reusing these images. 16