Shared Memory programming paradigm: openmp

Similar documents
Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

Lecture 4: OpenMP Open Multi-Processing

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono

Alfio Lazzaro: Introduction to OpenMP

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP

OpenMP on Ranger and Stampede (with Labs)

CS691/SC791: Parallel & Distributed Computing

Advanced C Programming Winter Term 2008/09. Guest Lecture by Markus Thiele

OpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Introduction to OpenMP

EPL372 Lab Exercise 5: Introduction to OpenMP

EE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California

An Introduction to OpenMP

OpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means

OpenMP, Part 2. EAS 520 High Performance Scientific Computing. University of Massachusetts Dartmouth. Spring 2015

SHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008

Topics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing

Shared Memory Programming with OpenMP

OpenMP - Introduction

OpenMPand the PGAS Model. CMSC714 Sept 15, 2015 Guest Lecturer: Ray Chen

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

Parallel Programming

Practical in Numerical Astronomy, SS 2012 LECTURE 12

Parallel Computing Using OpenMP/MPI. Presented by - Jyotsna 29/01/2008

Introduction to OpenMP

OpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013

OpenMP programming. Thomas Hauser Director Research Computing Research CU-Boulder

1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008

Multithreading in C with OpenMP

Introduction to OpenMP

HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP)

OpenMP 4. CSCI 4850/5850 High-Performance Computing Spring 2018

Open Multi-Processing: Basic Course

[Potentially] Your first parallel application

<Insert Picture Here> OpenMP on Solaris

Introduction to Standard OpenMP 3.1

A Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh

UvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP

OpenMP threading: parallel regions. Paolo Burgio

OpenMP. OpenMP. Portable programming of shared memory systems. It is a quasi-standard. OpenMP-Forum API for Fortran and C/C++

Introduction to OpenMP

OpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2.

Synchronisation in Java - Java Monitor

Parallel Programming Libraries and implementations

ECE 574 Cluster Computing Lecture 10

Shared Memory Parallelism using OpenMP

CS 470 Spring Mike Lam, Professor. OpenMP

OpenMP: Open Multiprocessing

Parallel Programming. Libraries and Implementations

CS 470 Spring Mike Lam, Professor. OpenMP

DPHPC: Introduction to OpenMP Recitation session

OpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.

Shared memory programming

Introduction to. Slides prepared by : Farzana Rahman 1

A brief introduction to OpenMP

OpenMP. A parallel language standard that support both data and functional Parallelism on a shared memory system

High Performance Computing: Tools and Applications

Session 4: Parallel Programming with OpenMP

Parallel Programming. OpenMP Parallel programming for multiprocessors for loops

Computer Architecture

OpenMP Introduction. CS 590: High Performance Computing. OpenMP. A standard for shared-memory parallel programming. MP = multiprocessing

Introduction to OpenMP.

OpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.

Shared Memory Programming Paradigm!

HPC Workshop University of Kentucky May 9, 2007 May 10, 2007

Shared Memory Programming Model

Shared memory parallel computing

Parallel Programming: OpenMP

Parallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops

OpenMP I. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen

Shared Memory Parallelism - OpenMP

Distributed Systems + Middleware Concurrent Programming with OpenMP

Introduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah

Introduction to OpenMP

Parallel Computing Parallel Programming Languages Hwansoo Han

Parallel Programming with OpenMP. CS240A, T. Yang

OpenMP: Open Multiprocessing

Introduction to OpenMP

Introduction to OpenMP

Parallel Programming using OpenMP

Amdahl s Law. AMath 483/583 Lecture 13 April 25, Amdahl s Law. Amdahl s Law. Today: Amdahl s law Speed up, strong and weak scaling OpenMP

OpenMP. Today s lecture. Scott B. Baden / CSE 160 / Wi '16

Parallel Programming in C with MPI and OpenMP

Parallel Programming using OpenMP

Our new HPC-Cluster An overview

Parallel and Distributed Programming. OpenMP

Programming with Shared Memory PART II. HPC Fall 2007 Prof. Robert van Engelen

Practical stuff! ü OpenMP. Ways of actually get stuff done in HPC:

GCC Developers Summit Ottawa, Canada, June 2006

In-Class Guerrilla Development of MPI Examples

OpenMP Fundamentals Fork-join model and data environment

COSC 6374 Parallel Computation. Introduction to OpenMP(I) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)

Announcements. Scott B. Baden / CSE 160 / Wi '16 2

EE/CSCI 451: Parallel and Distributed Computation

OpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

Introduction to Programming with OpenMP

Introduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah

OpenMP examples. Sergeev Efim. Singularis Lab, Ltd. Senior software engineer

Transcription:

IPM School of Physics Workshop on High Performance Computing - HPC08 Shared Memory programming paradigm: openmp Luca Heltai <luca.heltai@sissa.it> Stefano Cozzini <cozzini@democritos.it> SISSA - Democritos/INFM 1

Recap Basic Shared Memory Architecture Processors all connected to a large shared memory Local caches for each processor Cost: much cheaper to cache than main memory P1 P2 Pn $ $ $ network memory 2

How to program with shared memory? Automatic (implicit) parallelization: compilers do (?) the job for you Manual parallelization: Insert parallel directives by yourself to help compilers OpenMP THE standard Multi threading programming: more complex but more efficient use a threads library to create task by yourself 3

Parallelization Strategy How to parallelize a code with automatic parallelization : 1. Perform profiling to find the time consuming parts 2.Use APC with default settings to auto-parallelize these parts 3.Profile the parallelized code 4.Study the individual behavior of the parallelized parts 5.If the speed-up is not sufficient, check why 6.Try to find options&directives to achieve your goal 7.If found, go to step 3 8. If not found, implement changes in the code and go to step 2 4

Help the Compiler: Directives and Pragmas Special Fortran comments or C pragmas can be inserted in application sources directing the compiler to generate the appropriate parallel code Features powerful and easy to use enabled by compiler options code can be still be maintained for portability and serial execution can be mixed with message passing to create hybrid programs What happens at compile time? Parallel region is converted to a subroutine/function shared variables are passed as arguments local/private variables are made local to subroutines/functions 5

Which set of directives? The OpenMP Application Program Interface (API) supports multiplatform shared-memory parallel programming in C/C++ and Fortran on all architectures, including Unix platforms and Windows NT platforms. Jointly defined by a group of major computer hardware and software vendors, OpenMP is a portable, scalable model that gives shared-memory parallel programmers a simple and flexible interface for developing parallel applications for platforms ranging from the desktop to the supercomputer. from WWW.OPENMP.ORG 6

OpenMP programming model Explicit Parallelism: OpenMP is an explicit (not automatic) programming model, offering the programmer full control over parallelization. Fork - Join Model: 7

Terminology OpenMP Team := Master + Workers A Parallel Region is a block of code executed by all threads simultaneously The master thread always has thread ID 0 Thread adjustment (if enabled) is only done before entering a parallel region An "if" clause can be used to guard the parallel region; in case the condition evaluates to "false", the code is executed serially A work-sharing construct divides the execution of the enclosed code region among the members of the team; in other words: they split the work 8

Examples: Assertions do i=1,n A(I+K)=A(I)+B(I) end do If K>-1 or less then N no dependencies! Compiler cannot know this... C$ASSERT NO_DEPENDENCIES do i=1,n A(I+K)=A(I)+B(I) end do 9

Examples: Pragmas do i=1,n A(I+K)=A(I)+B(I) end do Make the execution parallel! C$OMP PARALLEL DO do i=1,n A(I+K)=A(I)+B(I) end do 10

Fine-Grained vs. Coarse-grained Fine-grained parallelism (loop decomposition) can be implemented incrementally (one loop a time) does not require a deep knowledge of the code a lot of loop have to be parallelized to achieve a decent speedup potentially many synchronization points Coarse-grained parallelism (domain decomposition) parallelizes larger loops at higher levels, enclosing many smaller loops more code is parallelized fewer loop is parallelized, reducing overhead requires deeper knowledge of the code 11

Local and shared data/1 Shared Memory does not mean that all data is actually shared There is often a need for local data as well Consider the following example: for (i=0; i<10; i++) a[i]= b[i] + c[i]; Let s assume we run this on 2 processors: processor 1 for I=0,2,4,6,8 processor 2 for I=1,3,5,7,9 12

Local and shared data/2 Processor 1 Processor 2 For i1=0,2,4,6,8,do: a[i1]=b[i1]+c[i1]; For i2=1,3,5,7,9 do: a[i2]=b[i2]+c[i2]; Read: b,c Write: a Read: b,c Write: a private area shared area private area i1 a b c i2 P1 P2 13

hello world in openmp #include <omp.h> int main() { int iam =0, np = 1; #pragma omp parallel private(iam, np) { #if defined (_OPENMP) np = omp_get_num_threads(); iam = omp_get_thread_num(); #endif printf( Hello from thread %d out of %d \n, iam, np); } } 14

compile&run the code Gnu compiler (only from version 4.3 on): gcc -fopenmp openmp.c -o hello 15

Example: Manual Parallelization program tests implicit none integer i,n parameter(n=10000000) real*8 a(n),s!$omp PARALLEL!$OMP PARALLEL DO SHARED(a,n) PRIVATE(i) do i=1,n a(i)=dcos(dfloat(i))**2+1.d0 enddo!$omp END PARALLEL DO!$OMP PARALLEL DO SHARED(a,n) PRIVATE(i) REDUCTION(+:s) do i=1,n s= s +a(i) end do!$omp END PARALLEL DO!$OMP END PARALLEL print*, a(1),a(n),s stop end 16

Directives. 17

A more complex example int main (int argc, char *argv[]) { } unsigned int rank = 0, threads = 1; #pragma omp parallel private(rank, threads) { rank = omp_get_thread_num (); threads = omp_get_num_threads(); #pragma omp master { cout << "I'm the master: " << rank << endl; } #pragma omp for for(int i=0; i<4; ++i) sleep(1); } return 0; 18

Run time library routines 19

IPM Workshop on High Performance Computing (HPC08) openmp env variables OpenMP provides four environment variables for controlling the execution of parallel code. All environment variable names are uppercase. Most important: OMP_NUM_THREADS Sets the maximum number of threads to use during execution. Others: OMP_SCHEDULE: Applies only to DO, PARALLEL DO (Fortran) and for, parallel for (C/C++) directives which have their schedule clause set to RUNTIME. The value of this variable determines how iterations of the loop are scheduled on processors. For example: setenv OMP_SCHEDULE "guided, 4" setenv OMP_SCHEDULE "dynamic" OMP_DYNAMIC: Enables or disables dynamic adjustment of the number of threads available for execution of parallel regions. Valid values are TRUE or FALSE. OMP_NESTED: Enables or disables nested parallelism. Valid values are TRUE or FALSE 20

OpenMP libraries Today many vendor provided libraries are openmp enabled OpenMP Parallelism comes for free OpenMP libraries can be used jointly with MPI internodes in a mixed MPI/openMP programming model. Examples for AMD ACML and Intel MKL are provided in afternoon sessions 21

Conclusions Auto-parallelizing compilers can do a lot but they are limited in their capabilities to analyze data dependencies. Manual parallelization with directives is easy to implement, however care must be given to treatment of shared data OpenMP is the standard for this approach It is important to understand the exact semantic of the parallelization directives To obtain scalable performance it is often necessary to parallelize at coarse-grain level 22