OpenMP Programming. Aiichiro Nakano

Similar documents
Hybrid MPI+OpenMP Parallel MD

Hybrid MPI+OpenMP+CUDA Programming

Hybrid MPI+OpenMP Parallel MD

Hybrid MPI+CUDA Programming

Hybrid MPI and OpenMP Parallel Programming

OpenMP examples. Sergeev Efim. Singularis Lab, Ltd. Senior software engineer

Programming Shared Memory Systems with OpenMP Part I. Book

Lecture 4: OpenMP Open Multi-Processing

Multithreading in C with OpenMP

Parallel Computing: Overview

Parallel Programming with OpenMP. CS240A, T. Yang

EE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California

Parallel Computing. Lecture 17: OpenMP Last Touch

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

DPHPC: Introduction to OpenMP Recitation session

HPC Workshop University of Kentucky May 9, 2007 May 10, 2007

OpenMP Fundamentals Fork-join model and data environment

OpenMPand the PGAS Model. CMSC714 Sept 15, 2015 Guest Lecturer: Ray Chen

Introduction to OpenMP.

CUDA Programming. Aiichiro Nakano

1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008

OpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

OpenMP. A parallel language standard that support both data and functional Parallelism on a shared memory system

CS 470 Spring Mike Lam, Professor. OpenMP

CS 470 Spring Mike Lam, Professor. OpenMP

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono

Open Multi-Processing: Basic Course

<Insert Picture Here> OpenMP on Solaris

OpenMP I. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen

Parallel Computing. Lecture 13: OpenMP - I

Parallel Numerical Algorithms

cp r /global/scratch/workshop/openmp-wg-oct2017 cd openmp-wg-oct2017 && ls Current directory

OpenMP Introduction. CS 590: High Performance Computing. OpenMP. A standard for shared-memory parallel programming. MP = multiprocessing

ECE 574 Cluster Computing Lecture 10

Shared Memory Programming Model

EPL372 Lab Exercise 5: Introduction to OpenMP

Introduction to OpenMP

CS691/SC791: Parallel & Distributed Computing

Introduction to OpenMP

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP

Parallel Programming Languages 1 - OpenMP

INTRODUCTION TO OPENMP (PART II)

Our new HPC-Cluster An overview

Lecture 14: Mixed MPI-OpenMP programming. Lecture 14: Mixed MPI-OpenMP programming p. 1

Parallel Programming with OpenMP. CS240A, T. Yang, 2013 Modified from Demmel/Yelick s and Mary Hall s Slides

Advanced C Programming Winter Term 2008/09. Guest Lecture by Markus Thiele

Data Environment: Default storage attributes

High Performance Computing: Tools and Applications

Shared Memory programming paradigm: openmp

DPHPC: Introduction to OpenMP Recitation session

Alfio Lazzaro: Introduction to OpenMP

Parallel Programming using OpenMP

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

Parallel Programming using OpenMP

Concurrent Programming with OpenMP

OpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing

Parallel programming using OpenMP

Assignment 1 OpenMP Tutorial Assignment

Lecture 2 A Hand-on Introduction to OpenMP

OpenMP, Part 2. EAS 520 High Performance Scientific Computing. University of Massachusetts Dartmouth. Spring 2015

CS 5220: Shared memory programming. David Bindel

15-418, Spring 2008 OpenMP: A Short Introduction

Introduction to. Slides prepared by : Farzana Rahman 1

Distributed Systems + Middleware Concurrent Programming with OpenMP

Computer Architecture

Introduction to HPC and Optimization Tutorial VI

HPCSE - I. «OpenMP Programming Model - Part I» Panos Hadjidoukas

CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)

OpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013

UvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP

Parallel and Distributed Programming. OpenMP

A brief introduction to OpenMP

Parallel Programming. OpenMP Parallel programming for multiprocessors for loops

Parallel Programming: OpenMP

Review. Tasking. 34a.cpp. Lecture 14. Work Tasking 5/31/2011. Structured block. Parallel construct. Working-Sharing contructs.

Mango DSP Top manufacturer of multiprocessing video & imaging solutions.

INTRODUCTION TO OPENMP

High Performance Computing

Session 4: Parallel Programming with OpenMP

Parallele Numerik. Blatt 1

Parallel Processing Top manufacturer of multiprocessing video & imaging solutions.

Parallel Programming

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

GCC Developers Summit Ottawa, Canada, June 2006

Introduction to OpenMP

Introduction to OpenMP

ECE/ME/EMA/CS 759 High Performance Computing for Engineering Applications

HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP)

Introduction to OpenMP

Programming with Shared Memory PART II. HPC Fall 2007 Prof. Robert van Engelen

Scientific Computing

Department of Informatics V. HPC-Lab. Session 2: OpenMP M. Bader, A. Breuer. Alex Breuer

Programming with Shared Memory PART II. HPC Fall 2012 Prof. Robert van Engelen

Parallel Programming in C with MPI and OpenMP

Shared Memory Programming Paradigm!

Programming with OpenMP*

ECE Spring 2017 Exam 2

OpenMP - Introduction

COSC 6374 Parallel Computation. Introduction to OpenMP. Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)

Transcription:

OpenMP Programming Aiichiro Nakano Collaboratory for Advanced Computing & Simulations Department of Computer Science Department of Physics & Astronomy Department of Chemical Engineering & Materials Science Department of Biological Sciences University of Southern California Email: anakano@usc.edu

OpenMP Portable application program interface (API) for shared-memory parallel programming based on multi-threading by compiler directives OpenMP = Open specifications for Multi Processing OpenMP homepage www.openmp.org OpenMP tutorial www.llnl.gov/computing/tutorials/openmp Process: an instance of program running Thread: a sequence of instructions being executed, possibly sharing resources with other threads within a process send write read receive MPI (distributed memory) share OpenMP (shared memory)

OpenMP Programming Model Fork-join parallelism Fork: master thread spawns a team of threads as needed Join: when the team of threads complete the statements in the parallel section, they terminate synchronously, leaving only the master thread OpenMP threads communicate by sharing variables

OpenMP Example: omp_example.c parallel section #include <stdio.h> #include <omp.h> void main () int nthreads,tid; printf("sequential section: # of threads = %d\n",nthreads); /* Fork multi-threads with own copies of variable */ #pragma omp parallel private(tid) Get the number of threads Each threads gets a private variable /* Obtain & print thread id */ tid = omp_get_thread_num(); Get my thread ID: 0, 1,... printf("parallel section: Hello world from thread %d\n",tid); /* Only master thread does this */ if (tid == 0) printf("parallel section: # of threads = %d\n",nthreads); /* All created threads terminate */ Obtain the number of threads & my thread ID (cf. MPI_Comm_size & MPI_Comm_rank) By default, all variables are shared unless selectively changing storage attributes using private clauses

OpenMP Example: omp_example.c Compilation on hpc-login3.usc.edu source /usr/usc/openmpi/default/setup.sh (if bash) gcc -o omp_example omp_example.c -fopenmp PBS script #!/bin/bash #PBS -l nodes=1:ppn=2 #PBS -l walltime=00:00:59 #PBS -o omp_example.out Set the # of threads #PBS -j oe using environment #PBS -N omp_example parameter OMP_NUM_THREADS=2 WORK_HOME=/home/rcf-proj/an2/anakano cd $WORK_HOME./omp_example Output Sequential section: # of threads = 1 Parallel section: Hello world from thread 1 Parallel section: Hello world from thread 0 Parallel section: # of threads = 2

Setting the Number of Threads #include <stdio.h> #include <omp.h> void main () int nthreads,tid; omp_set_num_threads(2); printf("sequential section: # of threads = %d\n",nthreads); /* Fork multi-threads with own copies of variable */ #pragma omp parallel private(tid) /* Obtain & print thread id */ tid = omp_get_thread_num(); printf("parallel section: Hello world from thread %d\n",tid); /* Only master thread does this */ if (tid == 0) printf("parallel section: # of threads = %d\n",nthreads); /* All created threads terminate */ Setting the number of threads to be used in parallel sections within the program (no need to set OMP_NUM_THREADS); see omp_example_set.c

OpenMP Programming Model OpenMP is typically used to parallelize (big) loops Use synchronization mechanisms to avoid race conditions (i.e., the result changes for different thread schedules) Critical section: only one thread at a time can enter #pragma omp parallel... #pragma omp critical...... Threads wait their turn only one at a time executes the critical section

Numerical integration Example: Calculating p Discretization: D = 1/N: step = 1/NBIN x i = (i+0.5)d (i = 0,,N-1) #include <stdio.h> #define NBIN 100000 void main() int i; double step,x,sum=0.0,pi; step = 1.0/NBIN; for (i=0; i<nbin; i++) x = (i+0.5)*step; sum += 4.0/(1.0+x*x); pi = sum*step; printf( PI = %f\n,pi);

OpenMP Program: omp_pi_critical.c #include <stdio.h> #include <omp.h> #define NBIN 100000 void main() double step,sum=0.0,pi; step = 1.0/NBIN; # pragma omp parallel int nthreads,tid,i; double x; tid = omp_get_thread_num(); for (i=tid; i<nbin; i+=nthreads) x = (i+0.5)*step; #pragma omp critical sum += 4.0/(1.0+x*x); pi = sum*step; printf("pi = %f\n",pi); Thread-private variables: Either declare private or define within a parallel section Shared variables Private (local) variables This has to be atomic

Avoid Critical Section: omp_pi.c #include <stdio.h> #include <omp.h> #define NBIN 100000 #define MAX_THREADS 8 void main() int nthreads,tid; double step,sum[max_threads]=0.0,pi=0.0; step = 1.0/NBIN; Data privatization The serial critical section degrades the scalability #pragma omp parallel private(tid) int i; double x; tid = omp_get_thread_num(); for (i=tid; i<nbin; i+=nthreads) x = (i+0.5)*step; sum[tid] += 4.0/(1.0+x*x); for(tid=0; tid<nthreads; tid++) pi += sum[tid]*step; printf("pi = %f\n",pi); Array of partial sums for multi-threads Private accumulator Inter-thread reduction

Avoid Critical Section: Wrong Way #include <stdio.h> #include <omp.h> omp_pi_noncritical.c #define NBIN 100000 void main() double step,sum=0.0,pi; step = 1.0/NBIN; # pragma omp parallel int nthreads,tid,i; double x; tid = omp_get_thread_num(); for (i=tid; i<nbin; i+=nthreads) x = (i+0.5)*step; // #pragma omp critical sum += 4.0/(1.0+x*x); pi = sum*step; printf("pi = %f\n",pi); Prof. Kunle Olukotun (Stanford) (Sep. 28, 2017 at USC) F. Niu et al., NIPS11 [anakano@hpc-login3 src]$./omp_pi_critical PI = 3.141593 [anakano@hpc-login3 src]$./omp_pi_noncritical PI = 0.558481 16-thread run

Load Balancing Interleaved assignment of loop-index values to threads balances the loads among the threads for (i=tid; i<nbin; i+=nthreads)... A bad example