Parallel Computing. Lecture 16: OpenMP - IV

Similar documents
Computational Mathematics

Parallel Computing. Lecture 13: OpenMP - I

15-418, Spring 2008 OpenMP: A Short Introduction

1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008

OpenMP and more Deadlock 2/16/18

Shared Memory Programming Model

COMP4300/8300: The OpenMP Programming Model. Alistair Rendell. Specifications maintained by OpenMP Architecture Review Board (ARB)

COMP4300/8300: The OpenMP Programming Model. Alistair Rendell

OpenMP 4. CSCI 4850/5850 High-Performance Computing Spring 2018

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

Shared Memory Parallelism using OpenMP

CS 470 Spring Mike Lam, Professor. Advanced OpenMP

CS 470 Spring Mike Lam, Professor. Advanced OpenMP

Shared Memory Parallelism - OpenMP

Overview: The OpenMP Programming Model

Parallel Computing Using OpenMP/MPI. Presented by - Jyotsna 29/01/2008

Shared Memory Programming with OpenMP. Lecture 8: Memory model, flush and atomics

CME 213 S PRING Eric Darve

EE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California

OpenMP Library Functions and Environmental Variables. Most of the library functions are used for querying or managing the threading environment

EE/CSCI 451: Parallel and Distributed Computation

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono

Multithreading in C with OpenMP

Concurrent Programming with OpenMP

OpenMP - Introduction

5.12 EXERCISES Exercises 263

Introduction to. Slides prepared by : Farzana Rahman 1

Shared Memory Programming with OpenMP (3)

Parallel Programming using OpenMP

Parallel Programming using OpenMP

Compiling and running OpenMP programs. C/C++: cc fopenmp o prog prog.c -lomp CC fopenmp o prog prog.c -lomp. Programming with OpenMP*

CS691/SC791: Parallel & Distributed Computing

Parallel Programming: OpenMP

Programming Shared-memory Platforms with OpenMP. Xu Liu

ECE 574 Cluster Computing Lecture 10

CS691/SC791: Parallel & Distributed Computing

Mango DSP Top manufacturer of multiprocessing video & imaging solutions.

OpenMP Tutorial. Dirk Schmidl. IT Center, RWTH Aachen University. Member of the HPC Group Christian Terboven

CS 470 Spring Mike Lam, Professor. OpenMP

A brief introduction to OpenMP

CS 470 Spring Mike Lam, Professor. OpenMP

Compositional C++ Page 1 of 17

Heterogeneous Compu.ng using openmp lecture 2

Open Multi-Processing: Basic Course

Parallel Numerical Algorithms

Distributed Systems + Middleware Concurrent Programming with OpenMP

POSIX Threads and OpenMP tasks

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

CS516 Programming Languages and Compilers II

Introduction to OpenMP. Tasks. N.M. Maclaren September 2017

Parallel Computing. Prof. Marco Bertini

OpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2.

by system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by using function call

Programming Shared Memory Systems with OpenMP Part I. Book

Parallel Programming

CS 5220: Shared memory programming. David Bindel

OPENMP OPEN MULTI-PROCESSING

Jukka Julku Multicore programming: Low-level libraries. Outline. Processes and threads TBB MPI UPC. Examples

Lecture 4: OpenMP Open Multi-Processing

OpenMP Lab on Nested Parallelism and Tasks

CS4961 Parallel Programming. Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms 9/5/12. Administrative. Mary Hall September 4, 2012

OpenCL TM & OpenMP Offload on Sitara TM AM57x Processors

Review. Lecture 12 5/22/2012. Compiler Directives. Library Functions Environment Variables. Compiler directives for construct, collapse clause

OpenMP. Application Program Interface. CINECA, 14 May 2012 OpenMP Marco Comparato

Advanced OpenMP. Tasks

OpenMP loops. Paolo Burgio.

Lecture 16: Recapitulations. Lecture 16: Recapitulations p. 1

Parallel Programming with OpenMP

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing

Shared Memory: Virtual Shared Memory, Threads & OpenMP

GLOSSARY. OpenMP. OpenMP brings the power of multiprocessing to your C, C++, and. Fortran programs. BY WOLFGANG DAUTERMANN

CSL 860: Modern Parallel

Allows program to be incrementally parallelized

Programming with Shared Memory PART II. HPC Fall 2012 Prof. Robert van Engelen

CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)

Programming with Shared Memory PART II. HPC Fall 2007 Prof. Robert van Engelen

Synchronisation in Java - Java Monitor

An Introduction to OpenMP

OpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013

Questions from last time

Shared Memory Parallel Programming with Pthreads An overview

Parallel Computing Parallel Programming Languages Hwansoo Han

Review. Tasking. 34a.cpp. Lecture 14. Work Tasking 5/31/2011. Structured block. Parallel construct. Working-Sharing contructs.

EPL372 Lab Exercise 5: Introduction to OpenMP

Synchronization. CSCI 5103 Operating Systems. Semaphore Usage. Bounded-Buffer Problem With Semaphores. Monitors Other Approaches

[Potentially] Your first parallel application

Advanced OpenMP. OpenMP Basics

OpenMP 4.0/4.5: New Features and Protocols. Jemmy Hu

Advanced OpenMP. Memory model, flush and atomics

Introduction to Parallel Programming Part 4 Confronting Race Conditions

Advanced OpenMP. Lecture 11: OpenMP 4.0

Introduction to OpenMP.

Programming with Shared Memory

Introduction to OpenMP. Lecture 2: OpenMP fundamentals

Shared Memory Programming Models I

OpenMP: Open Multi-Processing

OpenMP examples. Sergeev Efim. Singularis Lab, Ltd. Senior software engineer

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...

Introduction to OpenMP

Session 4: Parallel Programming with OpenMP

Transcription:

CSCI-UA.0480-003 Parallel Computing Lecture 16: OpenMP - IV Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com

PRODUCERS AND CONSUMERS

Queues A natural data structure to use in many multithreaded applications. The two main operations: enqueue and dequeue For example, suppose we have several producer threads and several consumer threads. Producer threads might produce requests for data. Consumer threads might consume the request by finding or generating the requested data.

Message-Passing Each thread could have a shared message queue, and when one thread wants to send a message to another thread, it could enqueue the message in the destination thread s queue. A thread could receive a message by dequeuing the message at the head of its message queue. Thread Y Thread X Queue Thread X

The Big Picture Each thread executes the following:

Send_msg()

Try_receive() When queue size is 1, dequeue affects the tail pointer.

Termination Detection each thread increments this after completing its for loop

Startup (1) When the program begins execution, a single thread, the master thread, will get command line arguments and allocate an array of message queues: one for each thread. This array needs to be shared among the threads.

Startup (2) One or more threads may finish allocating their queues before some other threads. We need an explicit barrier so that when a thread encounters the barrier, it blocks until all the threads in the team have reached the barrier.

The Atomic Directive Higher performance than critical It can only protect critical sections that consist of a single C assignment statement. Further, the statement must have one of the following forms: Must not reference X

Critical Sections OpenMP provides the option of adding a name to a critical directive: When we do this, two blocks protected with critical directives with different names can be executed simultaneously. However, the names are set during compilation, and we want a different critical section for each thread s queue.

Locks A lock consists of a data structure and functions that allow the programmer to explicitly enforce mutual exclusion in a critical section.

Locks: main actions

Locks: main actions void omp_init_lock(omp_lock_t * void omp_set_lock(omp_lock_t * void omp_unset_lock(omp_lock_t * void omp_destroy_lock(omp_lock_t * lock_p); lock_p); lock_p); lock_p);

Using Locks in the Message- Passing Program

Using Locks in the Message- Passing Program

Some Caveats 1. You shouldn t mix the different types of mutual exclusion for a single critical section. 2. There is no guarantee of fairness in mutual exclusion constructs. 3. It can be dangerous to nest mutual exclusion constructs.

Dividing Work Among Threads

Dividing Work Among Threads #pragma omp for for_loop

Dividing Work Among Threads #pragm omp parallel #pragma omp sections { #pragma omp section structured_block } #pragma omp section structured_block

#pragma omp parallel { #pragma omp sections {{ a=...; b=...; } #pragma omp section { c=...; d=...; } #pragma omp section { e=...; f=...; } #pragma omp section { g=...; h=...; } } /*omp end sections*/ } /*omp end parallel*/

Dividing Work Among Threads Specifies that the enclosed code is to be executed by only one thread in the team. #pragma omp single [clause...] structured_block

Conclusions We have seen three mechanisms to enforce mutual exclusion: atomic, critical, and locks atomic is fastest but with limitations critical can name sections but at compile time locks better be used when we need mutual exclusion for data structure instead of code