Data Handling in OpenMP

Similar documents
Parallel and Distributed Programming. OpenMP

OPENMP OPEN MULTI-PROCESSING

Advanced C Programming Winter Term 2008/09. Guest Lecture by Markus Thiele

Introduction to. Slides prepared by : Farzana Rahman 1

OpenMP Application Program Interface

OpenMP Application Program Interface

OpenMP C and C++ Application Program Interface Version 1.0 October Document Number

OpenMP+F90 p OpenMP+F90

PC to HPC. Xiaoge Wang ICER Jan 27, 2016

Introduction [1] 1. Directives [2] 7

Programming with Shared Memory PART II. HPC Fall 2012 Prof. Robert van Engelen

Programming with Shared Memory PART II. HPC Fall 2007 Prof. Robert van Engelen

CSL 860: Modern Parallel

Programming Shared-memory Platforms with OpenMP. Xu Liu

COMP4300/8300: The OpenMP Programming Model. Alistair Rendell. Specifications maintained by OpenMP Architecture Review Board (ARB)

COMP4300/8300: The OpenMP Programming Model. Alistair Rendell

Introduction to OpenMP

Review. Tasking. 34a.cpp. Lecture 14. Work Tasking 5/31/2011. Structured block. Parallel construct. Working-Sharing contructs.

Introduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah

OpenMP Library Functions and Environmental Variables. Most of the library functions are used for querying or managing the threading environment

Introduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah

HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP)

HPCSE - I. «OpenMP Programming Model - Part I» Panos Hadjidoukas

OpenMP Technical Report 3 on OpenMP 4.0 enhancements

Introduction to OpenMP. Rogelio Long CS 5334/4390 Spring 2014 February 25 Class

Introduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah

Shared Memory Parallelism - OpenMP

Programming Shared Address Space Platforms

Parallel Programming/OpenMP

A Source-to-Source OpenMP Compiler MARIO SOUKUP

A brief introduction to OpenMP

Overview: The OpenMP Programming Model

OpenMP Language Features

Review. 35a.cpp. 36a.cpp. Lecture 13 5/29/2012. Compiler Directives. Library Functions Environment Variables

An Introduction to OpenMP

CSL 730: Parallel Programming. OpenMP

CS 470 Spring Mike Lam, Professor. Advanced OpenMP

by system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by using function call

OpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018

CS 470 Spring Mike Lam, Professor. Advanced OpenMP

Compiling and running OpenMP programs. C/C++: cc fopenmp o prog prog.c -lomp CC fopenmp o prog prog.c -lomp. Programming with OpenMP*

Programming Shared Address Space Platforms using OpenMP

Shared Memory Programming Models I

Programming Shared-memory Platforms with OpenMP

Parallel Computing Parallel Programming Languages Hwansoo Han

Data Environment: Default storage attributes

OpenMP Lab on Nested Parallelism and Tasks

Practical in Numerical Astronomy, SS 2012 LECTURE 12

OpenMP. Table of Contents

Shared Memory Parallelism using OpenMP

Mango DSP Top manufacturer of multiprocessing video & imaging solutions.

OpenMP. OpenMP. Portable programming of shared memory systems. It is a quasi-standard. OpenMP-Forum API for Fortran and C/C++

Multithreading in C with OpenMP

EPL372 Lab Exercise 5: Introduction to OpenMP

DPHPC: Introduction to OpenMP Recitation session

An Introduction to OpenMP

Parallel Programming with OpenMP. CS240A, T. Yang

OpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means

Parallel programming using OpenMP

OpenMP threading: parallel regions. Paolo Burgio

More Advanced OpenMP. Saturday, January 30, 16

Parallel Programming using OpenMP

Session 4: Parallel Programming with OpenMP

Parallel Programming using OpenMP

Standard promoted by main manufacturers Fortran. Structure: Directives, clauses and run time calls

Alfio Lazzaro: Introduction to OpenMP

Introduction to OpenMP

Lecture 4: OpenMP Open Multi-Processing

Standard promoted by main manufacturers Fortran

COL 730: Parallel Programming. OpenMP

CS691/SC791: Parallel & Distributed Computing

Shared Memory Programming : OpenMP

Parallel Programming with OpenMP. CS240A, T. Yang, 2013 Modified from Demmel/Yelick s and Mary Hall s Slides

Introduction to OpenMP

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing

COSC 6374 Parallel Computation. Introduction to OpenMP. Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)

Introduction to Programming with OpenMP

Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG

OpenMP on Ranger and Stampede (with Labs)

<Insert Picture Here> OpenMP on Solaris

Shared Memory Programming Model

Practical stuff! ü OpenMP. Ways of actually get stuff done in HPC:

How to use OpenMP within TiViPE

[Potentially] Your first parallel application

ME964 High Performance Computing for Engineering Applications

OpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2.

KeyStone Training. Keystone Device Tooling

<Insert Picture Here> An Introduction Into The OpenMP Programming Model

The GNU OpenMP Implementation

CS4961 Parallel Programming. Lecture 9: Task Parallelism in OpenMP 9/22/09. Administrative. Mary Hall September 22, 2009.

In-Class Guerrilla Development of MPI Examples

Parallel Software Engineering with OpenMP

Introduction to OpenMP

Shared Memory Parallelism

OpenMP. Application Program Interface. CINECA, 14 May 2012 OpenMP Marco Comparato

OpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013

ME964 High Performance Computing for Engineering Applications

Multi-core Architecture and Programming

OpenMP Fundamentals Fork-join model and data environment

OpenMP Introduction. CS 590: High Performance Computing. OpenMP. A standard for shared-memory parallel programming. MP = multiprocessing

Transcription:

Data Handling in OpenMP Manipulate data by threads By private: a thread initializes and uses a variable alone Keep local copies, such as loop indices By firstprivate: a thread repeatedly reads a variable that has been initialized earlier in the program Make a copy and inherit the value at the time of thread creation When a thread is scheduled on the processor, the data can reside at the same processor (in its cache) No interprocessor communication By reduction: multiple threads manipulate a single piece of data Break manipulations into local operations followed by a global operation Counting, summation 1

Data Handling in OpenMP (cont d) Localization: If multiple threads manipulate different parts of a large data structure, the programmer should break it into smaller data structures and make them private to the threads By shared: after all the above techniques have been explored The threadprivate directive Some objects persist through parallel and serial blocks The number of threads remains the same Avoid copying into the master thread s data space and reinitializing at the next parallel block Initialized once before they are accessed in a parallel region The copyin(variable_list) directive Assign the same value to threadprivate variables across all threads in a parallel region 2

Data Handling in OpenMP (cont d) Data in threadprivate objects is guaranteed to persist only if the dynamic threads mechanism is "turned off" and the number of threads in different parallel regions remains constant The default setting of dynamic threads is undefined. 3

Clauses/Directives Summary The following OpenMP directives do NOT accept clauses master, critical, barrier, atomic, flush, ordered, and threadprivate 4

Controlling Number of Threads and Processors 5

Controlling Number of Threads and Processors omp_set_num_threads: set the default number of threads Outside the scope of a parallel region Dynamic adjustment of threads must be enabled By OMP_DYNAMIC or omp_set_dynamic() omp_get_num_threads: return the number of threads Bind to the closest parallel directive omp_get_max_threads: return the maximum number of thread that could be created by a parallel directive omp_get_thread_num: return a unique thread i.d. From 0 to omp_get_num_threads()-1 omp_get_num_procs: return the number of processors that are available to execute the threaded program omp_in_parallel: in parallel region or not 6

Controlling and Monitoring Thread Creation omp_set_dynamic: allow the programmer to dynamically alter the number of threads To disable: the value dynamic_threads is set to 0 Called outside the parallel regions omp_get_dynamic: determine dynamic adjustment is enable or not omp_set_nested: enable nested parallelism if the nested is non-zero If disabled, any nested parallel regions are serialized omp_get_nested: return the state of nested parallelism 7

Mutual Exclusion Omp_init_lock: initialize a lock before using it Omp_destroy_lock: discard a lock Omp_set_lock: acquire a lock Omp_unset_lock: unlock the lock The result of a thread attempting to unlock a lock owned by another thread is undefined Omp_test_lock: non-blocking lock Non-zero: successfully set the lock 8

Mutual Exclusion (cont d) Nestable locks: can be locked multiple times by the same thread Similar to recursive mutexes in Pthreads 9

Environment Variables in OpenMP OMP_NUM_THREADS: the default number of threads Changed by the omp_set_num_threads function or the num_threads clause Requirement: the variable OMP_SET_DYNAMIC is set to TRUE or if the function omp_set_dynamic has been called with a non-zero argument Example (on bash): export OMP_NUM_THREADS=8 OMP_DYNAMIC: allow the number of threads to be controlled at runtime Disabled: call omp_set_dynamic function with a zero argument OMP_NESTED: enable or disable nested parallelism 10

Environment Variables in OpenMP (cont d) OMP_SCHEDULE: control the assignment of iteration spaces associated with for directives (runtime scheduling) Support static, dynamic, and guided with optional chunk size Examples: export OMP_SCHEDULE=static,4 export OMP_SCHEDULE=dynamic The default chunk size is 1 export OMP_SCHEDULE=guided The default chunk size is 1 11

Explicit Threads versus OpenMP Based Programming OpenMP: a layer on top of native threads Avoid tasks of initializing attributes objects, setting up arguments to threads, partitioning iteration spaces, etc Convenient for static and regular problems The overheads is minimal Explicit threads: Data exchange is more apparent Alleviate overheads from data movement, false sharing, and contention Richer APIs in the form of condition waits, locks of different types, and increased flexibility for building composite synchronization operations Better tools and support Used more widely than OpenMP 12