OpenMP, Part 2. EAS 520 High Performance Scientific Computing. University of Massachusetts Dartmouth. Spring 2015

Similar documents
Introduction to OpenMP

Assignment 1 OpenMP Tutorial Assignment

High Performance Computing: Tools and Applications

CS 470 Spring Mike Lam, Professor. OpenMP

Lecture 4: OpenMP Open Multi-Processing

CS 470 Spring Mike Lam, Professor. OpenMP

OpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013

CS 61C: Great Ideas in Computer Architecture. OpenMP, Transistors

CS 61C: Great Ideas in Computer Architecture. OpenMP, Transistors

Shared Memory programming paradigm: openmp

Shared Memory Programming Paradigm!

OpenMP programming. Thomas Hauser Director Research Computing Research CU-Boulder

ECE 574 Cluster Computing Lecture 10

OpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018

High Performance Computing

For those with laptops

OpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

Shared Memory Parallelism - OpenMP

You Are Here! Peer Instruction: Synchronization. Synchronization in MIPS. Lock and Unlock Synchronization 7/19/2011

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

by system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by using function call

COSC 6374 Parallel Computation. Introduction to OpenMP. Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)

CS 61C: Great Ideas in Computer Architecture. Synchronization, OpenMP

Data Environment: Default storage attributes

/Users/engelen/Sites/HPC folder/hpc/openmpexamples.c

Parallel Programming: OpenMP

Topics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono

Parallel Programming

Multithreading in C with OpenMP

OpenMP I. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen

EPL372 Lab Exercise 5: Introduction to OpenMP

Parallelising Scientific Codes Using OpenMP. Wadud Miah Research Computing Group

Mango DSP Top manufacturer of multiprocessing video & imaging solutions.

OpenMP. A parallel language standard that support both data and functional Parallelism on a shared memory system

Computer Architecture

OpenMP. Today s lecture. Scott B. Baden / CSE 160 / Wi '16

CS691/SC791: Parallel & Distributed Computing

OpenMPand the PGAS Model. CMSC714 Sept 15, 2015 Guest Lecturer: Ray Chen

Open Multi-Processing: Basic Course

Cluster Clonetroop: HowTo 2014

cp r /global/scratch/workshop/openmp-wg-oct2017 cd openmp-wg-oct2017 && ls Current directory

Announcements. Scott B. Baden / CSE 160 / Wi '16 2

Shared Memory Parallelism using OpenMP

OpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means

Review!of!Last!Lecture! CS!61C:!Great!Ideas!in!! Computer!Architecture! Synchroniza+on,- OpenMP- Great!Idea!#4:!Parallelism! Agenda!

OpenMP Fundamentals Fork-join model and data environment

OpenMP examples. Sergeev Efim. Singularis Lab, Ltd. Senior software engineer

OpenMP: Open Multiprocessing

ITCS 4/5145 Parallel Computing Test 1 5:00 pm - 6:15 pm, Wednesday February 17, 2016 Solutions Name:...

OpenMP Introduction. CS 590: High Performance Computing. OpenMP. A standard for shared-memory parallel programming. MP = multiprocessing

Introduction to OpenMP

Department of Informatics V. HPC-Lab. Session 2: OpenMP M. Bader, A. Breuer. Alex Breuer

Introduction to Standard OpenMP 3.1

Introduction to OpenMP

OpenMP on Ranger and Stampede (with Labs)

Introduction to OpenMP

ECE/ME/EMA/CS 759 High Performance Computing for Engineering Applications

OpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2.

Workshop Agenda Feb 25 th 2015

Session 4: Parallel Programming with OpenMP

Shared Memory Programming with OpenMP. Lecture 3: Parallel Regions

Using OpenMP. Shaohao Chen Research Boston University

OpenMP 4. CSCI 4850/5850 High-Performance Computing Spring 2018

INTRODUCTION TO OPENMP

Introduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah

High Performance Computing: Tools and Applications

Raspberry Pi Basics. CSInParallel Project

OpenMP Tutorial. Dirk Schmidl. IT Center, RWTH Aachen University. Member of the HPC Group Christian Terboven

In-Class Guerrilla Development of MPI Examples

Introduction to OpenMP

OpenMP. Dr. William McDoniel and Prof. Paolo Bientinesi WS17/18. HPAC, RWTH Aachen

Using OpenMP. Shaohao Chen High performance Louisiana State University

Practical Introduction to

HPC Workshop University of Kentucky May 9, 2007 May 10, 2007

Introduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah

DPHPC: Introduction to OpenMP Recitation session

INTRODUCTION TO OPENMP & OPENACC. Kadin Tseng Boston University Research Computing Services

Distributed Systems + Middleware Concurrent Programming with OpenMP

OpenMP: Open Multiprocessing

Parallel Processing/Programming

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP

Shared Memory Programming Model

INTRODUCTION TO OPENMP & OPENACC. Kadin Tseng Boston University Research Computing Services

Parallel Processing Top manufacturer of multiprocessing video & imaging solutions.

A Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh

Threaded Programming. Lecture 3: Parallel Regions. Parallel region directive. Code within a parallel region is executed by all threads.

Parallel programming using OpenMP

Introduc)on to OpenMP

Introduction to OpenMP. Rogelio Long CS 5334/4390 Spring 2014 February 25 Class

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

EE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California

OpenMP. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen

Shared Memory Programming with OpenMP

Introduction to OpenMP

OpenMP threading: parallel regions. Paolo Burgio

Multi-core Architecture and Programming

Introduction to OpenMP

COSC 6374 Parallel Computation. Introduction to OpenMP(I) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)

Programming with OpenMP*

Transcription:

OpenMP, Part 2 EAS 520 High Performance Scientific Computing University of Massachusetts Dartmouth Spring 2015

References This presentation is almost an exact copy of Dartmouth College's openmp tutorial. The link can be found in: http://www.dartmouth.edu/~rc/classes/intro_openmp/ Changes from the original document are related to compilers and job submissions for UMass Dartmouth clusters.

How to Compile and run an OpenMP program On the UMD cluster, the default gcc and gfortran compilers support OpenMP. Altneratively, you could load a different compiler via a module command: $ module load eas520/compilers/gcc-4.8.2 Then you can compile the source code $ gfortran -fopenmp -o hello-f hello-f.f90 and run the program $./hello-f The output should look like (thread calls order can be different) Hello from thread 1, nthreads 4 Hello from thread 3, nthreads 4 Hello from thread 2, nthreads 4 Hello from thread 0, nthreads 4

Approaches to Parallelism Using OpenMP Two main approaches: loop-level parallel regions Loop-Level Parallelism: sometimes call fine-grained parallelism individual loops parallelized each thread assigned a unique range of the loop index execution starts on a single serial thread multiple threads are spawned inside a parallel loop after parallel loop execution is serial relatively easy to implement (We gave simplistic examples of this last time.)

Continued.. Parallel Regions Parallelism: sometimes called coarse-grained parallelism any sections of codes can be parallelized (not just loops) using the thread identifier to distribute the work execution starts on a single serial thread multiple threads are started for parallel regions (not necessarily at a loop) ends on a single serial thread

Shared vs Private Variables By default all variables in a loop share the same address space All threads can modify and access all variables (except the loop index) Can result in incorrect results Can use shared and private clauses with parallel for or parallel do Fortran Example Example of Code with No Data Dependencies!$omp parallel do private(temp) shared(n,a,b,c) temp = 2.0*a(i) a(i) = temp b(i) = c(i)/temp enddo #pragma omp parallel for private(temp) shared(n,a,b,c) for(i=1; i<=n; i++) temp = 2.0*a[i]; a[i] = temp; b[i] = c[i]/temp;

Example of Parallelizing a Loop: Fortran Example: workshare.f90 program workshare use omp_lib implicit none integer :: nthreads, tid, ncores, i integer, parameter :: n = 100 real, dimension(n) :: a, b, c character(len=50), parameter :: fmt = '(A, I2, A,I3, A, F8.2)'! some initializations a(i) = i b(i) = a(i) end do ncores = 8 call omp_set_num_threads(ncores)!$omp parallel shared(a,b,c) private(i,tid) tid = omp_get_thread_num() if (tid.eq. 0) then nthreads = omp_get_num_threads() write(*,*) 'number of threads =', nthreads end if write(*,*) 'thread',tid,' starting...'!$omp do c(i) = a(i) + b(i) write(*,fmt) ' thread', tid, ': c(', i,')=', c(i) end do!$omp end do nowait write(*,*) 'thread',tid,' done.'!$omp end parallel end program workshare

Example of Parallelizing a Loop #include <omp.h> #include <stdio.h> #include <stdlib.h> int main (int argc, char *argv[]) int nthreads, tid; /* Fork a team of threads giving them their own copies of variables */ #pragma omp parallel private(nthreads, tid) /* Obtain thread number */ tid = omp_get_thread_num(); printf("hello World from thread = %d\n", tid); /* Only master thread does this */ if (tid == 0) nthreads = omp_get_num_threads(); printf("number of threads = %d\n", nthreads); /* All threads join master thread and disband */

Basic OpenMP Functions omp_get_num_threads() - get the number of threads used in a parallel region omp_get_thread_num() - get the thread rank in a parallel region (0 to omp_get_num_threads() -1) omp_set_num_threads(nthreads) - set the number of threads used in a parallel region Fortran Example!$omp parallel write(*,*) ' Thread rank: ', omp_get_thread_num()!$omp end parallel # pragma omp parallel printf("thread rank: %d\n", omp_get_thread_num());

Other OpenMP Clauses (firstprivate, lastprivate, ordered) firstprivate initialize a variable from the serial part of the code private clause doesn't initialize the variable Fortran Example j = jstart!$omp parallel do firstprivate(j) if(i == 1.or. i == n) then j = j + 1 endif a(i) = a(i) + j end do j = jstart; #pragma omp parallel for firstprivate(j) for(i=1; i<=n; i++) if(i == 1 i == n) j = j + 1; a[i] = a[i] + j;

Continued... lastprivate thread that executes the ending loop index copies its value to the master (serial) thread this gives the same result as serial execution Fortran Example!$omp parallel do lastprivate(x) x = sin(pi*dx*real(i)) a(i) = exp(x) end do lastx = x #pragma omp parallel for lastprivate(x) for(i=1; i<=n; i++) x = sin( pi * dx * (float)i ); a[i] = exp(x); lastx = x;

Continued... ordered used when part of the loop must execute in serial order ordered clause plus an ordered directive Fortran Example!$omp parallel do private(myval) ordered myval = do_lots_of_work(i)!$omp ordered write(*,*) i, myval!$omp end ordered end do lastx = x

Continued... #pragma omp parallel for private(myval) ordered for(i=1; i<=n; i++) myval = do_lots_of_work(i); #pragma omp ordered printf("%d %d\n", i, myval);

Reduction Operations An example of a reduction operation is a summation: Fortran Example sum = sum + a(i) end do How reduction works: sum is the reduction variable cannot be declared shared threads would overwrite the value of sum cannot be declared private for(i=1; i<=n; i++) sum = sum + a[i]; private variables don't persist outside of parallel region specified reduction operation performed on individual values from each thread

Example of reduction clause Fortran Example!$omp parallel do reduction(+:sum) sum = sum + a(i) end do Fortran Reduction Operands Operator Initial Value + 0 * 1-0.AND..true..OR..false..IEOR. 0.IOR. 0.IAND. All bits on.eqv..true. MIN Largest positive # MAX Most negative # #pragma omp parallel for reduction(+:sum) for(i=1; i<=n; i++) sum = sum + a[i]; C/C++ Reduction Operands Operator Initial Value + 0 * 1-0 & 0 0 ˆ 0 && 1 0