OpenMP. A parallel language standard that support both data and functional Parallelism on a shared memory system

Similar documents
Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP

OpenMP programming. Thomas Hauser Director Research Computing Research CU-Boulder

Parallel Programming. OpenMP Parallel programming for multiprocessors for loops

Parallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops

Multicore Computing. Arash Tavakkol. Instructor: Department of Computer Engineering Sharif University of Technology Spring 2016

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

In-Class Guerrilla Development of MPI Examples

OpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013

Introduction to. Slides prepared by : Farzana Rahman 1

OpenMP examples. Sergeev Efim. Singularis Lab, Ltd. Senior software engineer

Parallel Programming with OpenMP. CS240A, T. Yang

Lecture 4: OpenMP Open Multi-Processing

CS 470 Spring Mike Lam, Professor. OpenMP

DPHPC: Introduction to OpenMP Recitation session

OpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018

CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)

OpenMP I. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen

Introduction to OpenMP

OpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means

ECE 574 Cluster Computing Lecture 10

Computer Architecture

ITCS 4/5145 Parallel Computing Test 1 5:00 pm - 6:15 pm, Wednesday February 17, 2016 Solutions Name:...

EPL372 Lab Exercise 5: Introduction to OpenMP

CS 470 Spring Mike Lam, Professor. OpenMP

Parallel Programming using OpenMP

Parallel Programming using OpenMP

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono

Chip Multiprocessors COMP Lecture 9 - OpenMP & MPI

Distributed Systems + Middleware Concurrent Programming with OpenMP

Alfio Lazzaro: Introduction to OpenMP

Shared Memory Parallelism - OpenMP

OpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

CS 5220: Shared memory programming. David Bindel

Data Environment: Default storage attributes

Allows program to be incrementally parallelized

OpenMP Fundamentals Fork-join model and data environment

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

Introduction to OpenMP

OpenMP 4. CSCI 4850/5850 High-Performance Computing Spring 2018

Synchronisation in Java - Java Monitor

COSC 6374 Parallel Computation. Introduction to OpenMP(I) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)

HPCSE - I. «OpenMP Programming Model - Part I» Panos Hadjidoukas

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

OpenMP, Part 2. EAS 520 High Performance Scientific Computing. University of Massachusetts Dartmouth. Spring 2015

Shared Memory programming paradigm: openmp

Introduction to OpenMP

Synchronization. Event Synchronization

OpenMPand the PGAS Model. CMSC714 Sept 15, 2015 Guest Lecturer: Ray Chen

Assignment 1 OpenMP Tutorial Assignment

Introduction to OpenMP

Introduction to OpenMP.

OpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.

OpenMP Tutorial. Seung-Jai Min. School of Electrical and Computer Engineering Purdue University, West Lafayette, IN

Introduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah

UvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP

Programming Shared Memory Systems with OpenMP Part I. Book

Multithreading in C with OpenMP

Parallel Programming. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

OpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.

Parallel and Distributed Programming. OpenMP

1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP

HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP)

Parallel programming using OpenMP

Multi-core Architecture and Programming

Parallel Processing/Programming

DPHPC: Introduction to OpenMP Recitation session

OpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2.

Raspberry Pi Basics. CSInParallel Project

COSC 6374 Parallel Computation. Introduction to OpenMP. Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)

Introduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah

Concurrent Programming with OpenMP

High Performance Computing Lecture 41. Matthew Jacob Indian Institute of Science

ECE Spring 2017 Exam 2

CS691/SC791: Parallel & Distributed Computing

Overview: The OpenMP Programming Model

Session 4: Parallel Programming with OpenMP

Acknowledgments. Amdahl s Law. Contents. Programming with MPI Parallel programming. 1 speedup = (1 P )+ P N. Type to enter text

Amdahl s Law. AMath 483/583 Lecture 13 April 25, Amdahl s Law. Amdahl s Law. Today: Amdahl s law Speed up, strong and weak scaling OpenMP

Parallel Numerical Algorithms

by system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by using function call

OpenMP Programming. Aiichiro Nakano

HPC Workshop University of Kentucky May 9, 2007 May 10, 2007

COMP4300/8300: The OpenMP Programming Model. Alistair Rendell. Specifications maintained by OpenMP Architecture Review Board (ARB)

COMP4300/8300: The OpenMP Programming Model. Alistair Rendell

Shared Memory Programming Model

Scientific Computing

EE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California

Parallel Processing Top manufacturer of multiprocessing video & imaging solutions.

Topics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP

COMP Parallel Computing. SMM (2) OpenMP Programming Model

EE/CSCI 451: Parallel and Distributed Computation

Little Motivation Outline Introduction OpenMP Architecture Working with OpenMP Future of OpenMP End. OpenMP. Amasis Brauch German University in Cairo

Introduction to OpenMP

Introduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah

Parallel Programming

Shared Memory Parallelism using OpenMP

OpenMP on Ranger and Stampede (with Labs)

Transcription:

OpenMP A parallel language standard that support both data and functional Parallelism on a shared memory system

Use by system programmers more than application programmers Considered a low level primitives package Application programmers need a higher level constructs/directives which hide the mechanics of manipulating threads Directive based languages have recently been standardized in the form of OpenMP OpenMP provides support for concurrency, synchronization, and data handling while hiding the need for explicit thread management

OpenMP: An application programming interface (API) for parallel programming on multiprocessors Compiler directives Library of support functions OpenMP works in conjunction with Fortran, C, or C++

C + OpenMP sufficient to program multiprocessors C + MPI + OpenMP a good way to program multicomputers built out of multiprocessors IBM RS/6000 SP Fujitsu AP3000 Dell High Performance Computing Cluster

Data Parallelism Emphasize loop parallelism: elements in the loop will be processed in parallel Function Parallelism OpenMP allows different threads to be assigned with different portions of code

C programs often express data parallel operations as for loops for (i = first; i < size; i += prime) marked[i] = 1; OpenMP makes it easy to indicate when the iterations of a loop may execute in parallel Compiler takes care of generating code that forks/joins threads and allocates the iterations to threads

Pragma: a compiler directive in C or C++ Stands for pragmatic information A way for the programmer to communicate with the compiler Syntax: #pragma omp <rest of pragma> <Clause>

Format: #pragma omp parallel for for (i = 0; i < N; i++) a[i] = b[i] + c[i]; Compiler must be able to verify the run time system will have information it needs to schedule loop iterations

Every thread has its own execution context Execution context: address space containing all of the variables a thread may access Contents of execution context: static variables dynamically allocated data structures in the heap variables on the run time stack additional run time stack for functions invoked by the thread

Shared variable: has same address in execution context of every thread Private variable: has different address in execution context of every thread A thread cannot access the private variables of another thread

int main (int argc, char *argv[]) { int b[3]; char *cptr; int i; cptr = malloc(1); #pragma omp parallel for for (i = 0; i < 3; i++) b[i] = i; Heap Stack i b Master Thread (Thread 0) cptr i i Thread 1

Clause: an optional, additional component to a pragma Private clause: directs compiler to make one or more variables private private ( <variable list> )

Consider this C program segment to compute π using the rectangle rule: double area, pi, x; int i, n;... area = 0.0; for (i = 0; i < n; i++) { x = (i+0.5)/n; area += 4.0/(1.0 + x*x); } pi = area / n;

If we simply parallelize the loop... double area, pi, x; int i, n;... area = 0.0; #pragma omp parallel for private(x) for (i = 0; i < n; i++) { x = (i+0.5)/n; area += 4.0/(1.0 + x*x); } pi = area / n;

... we set up a race condition in which one process may race ahead of another and not see its change to shared variable area area 11.667 15.432 15.230 Answer should be 18.995 Thread A 11.667 15.432 Thread B 11.667 15.230 area += 4.0/(1.0 + x*x)

Race Condition Time Line Value of area Thread A Thread B 11.667 11.667 15.432 15.230 + 3.765 + 3.563

Critical section: a portion of code that only thread at a time may execute We denote a critical section by putting the pragma #pragma omp critical in front of a block of C code

double area, pi, x; int i, n;... area = 0.0; #pragma omp parallel for private(x) for (i = 0; i < n; i++) { x = (i+0.5)/n; #pragma omp critical area += 4.0/(1.0 + x*x); } pi = area / n;

Update to area inside a critical section Only one thread at a time may execute the statement; i.e., it is sequential code Time to execute statement significant part of loop By Amdahl s Law we know speedup will be severely constrained

Reductions are so common that OpenMP provides support for them May add reduction clause to parallel for pragma Specify reduction operation and reduction variable OpenMP takes care of storing partial results in private variables and combining partial results after the loop

The reduction clause has this syntax: reduction (<op> :<variable>) Operators +Sum * Product & Bitwise and Bitwise or ^ Bitwise exclusive or && Logical and Logical or

double area, pi, x; int i, n;... area = 0.0; #pragma omp parallel for \ private(x) reduction(+:area) for (i = 0; i < n; i++) { x = (i + 0.5)/n; area += 4.0/(1.0 + x*x); } pi = area / n;

The parallel pragma precedes a block of code that should be executed by all of the threads Note: execution is replicated among all threads #pragma omp parallel <Clause> <statement_block>

#pragma omp parallel private(i,j) for (i = 0; i < m; i++) { low = a[i]; high = b[i]; if (low > high) { printf ("Exiting (%d)\n", i); break; } for (j = low; j < high; j++) c[j] = (c[j] - a[i])/b[i]; }

Suppose we only want to see the output once The single pragma directs compiler that only a single thread should execute the block of code the pragma precedes Syntax: #pragma omp single <Clause> <statement_block>

#pragma omp parallel private(i,j) for (i = 0; i < m; i++) { low = a[i]; high = b[i]; if (low > high) { #pragma omp single printf ("Exiting (%d)\n", i); break; } #pragma omp for for (j = low; j < high; j++) c[j] = (c[j] - a[i])/b[i]; }

v = alpha(); w = beta(); x = gamma(v, w); y = delta(); printf ("%6.2f\n", epsilon(x,y)); alpha beta May execute alpha, beta, and delta in parallel gamma delta epsilon

Precedes a block of k blocks of code that may be executed concurrently by k threads Syntax: #pragma omp parallel sections

Precedes each block of code within the encompassing block preceded by the parallel sections pragma May be omitted for first parallel section after the parallel sections pragma Syntax: #pragma omp section

#pragma omp parallel sections { #pragma omp section /* Optional */ v = alpha(); #pragma omp section w = beta(); #pragma omp section y = delta(); } x = gamma(v, w); printf ("%6.2f\n", epsilon(x,y));

omp_get_thread_num = MPI_Comm_rank omp_get_num_threads = MPI_Comm_size omp_get_num_procs = Return number of processors available in the system. omp_set_num_threads = Specify a number of threads needed in the computation.

#include <omp.h> int main (int argc, char *argv[]) { int nt, rank; #pragma omp parallel private(nt, rank) { rank = omp_get_thread_num(); printf("hello World from thread = %d\n", rank); if (rank == 0) { nt = omp_get_num_threads(); printf("number of threads = %d\n", nt); } } }