Tools for OpenMP Programming

Similar documents
Debugging OpenMP Programs

Debugging with TotalView

Introduction: OpenMP is clear and easy..

OpenMP Case Studies. Dieter an Mey. Center for Computing and Communication Aachen University

Two OpenMP Programming Patterns

Parallel Programming in OpenMP Introduction

!OMP #pragma opm _OPENMP

Performance Tuning and OpenMP

AMath 483/583 Lecture 14

Introduction to OpenMP

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

Parallel Software Engineering with OpenMP

Amdahl s Law. AMath 483/583 Lecture 13 April 25, Amdahl s Law. Amdahl s Law. Today: Amdahl s law Speed up, strong and weak scaling OpenMP

OPENMP TIPS, TRICKS AND GOTCHAS

Ruud van der Pas Nawal Copty Eric Duncan Oleg Mazurov

From a Vector Computer to an SMP-Cluster Hybrid Parallelization of the CFD Code PANTA

<Insert Picture Here> OpenMP on Solaris

Practical in Numerical Astronomy, SS 2012 LECTURE 12

OPENMP TIPS, TRICKS AND GOTCHAS

OpenMP on Ranger and Stampede (with Labs)

Compiling and running OpenMP programs. C/C++: cc fopenmp o prog prog.c -lomp CC fopenmp o prog prog.c -lomp. Programming with OpenMP*

OpenMP. Application Program Interface. CINECA, 14 May 2012 OpenMP Marco Comparato

Introduction to OpenMP

Overview: The OpenMP Programming Model

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP

Parallelising serial applications. Darryl Gove Compiler Performance Engineering

Introduction to OpenMP

Introduction to OpenMP

OpenMP: Open Multiprocessing

Parallelising Scientific Codes Using OpenMP. Wadud Miah Research Computing Group

Advanced C Programming Winter Term 2008/09. Guest Lecture by Markus Thiele

CS691/SC791: Parallel & Distributed Computing

Exploiting Object-Oriented Abstractions to parallelize Sparse Linear Algebra Codes

Shared Memory Programming Model

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

OpenMP: Open Multiprocessing

OpenMP programming. Thomas Hauser Director Research Computing Research CU-Boulder

OpenMP Shared Memory Programming

Parallel Numerical Algorithms

Programming Shared Memory Systems with OpenMP Part I. Book

UvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP

OpenMP Introduction. CS 590: High Performance Computing. OpenMP. A standard for shared-memory parallel programming. MP = multiprocessing

OpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018

The OMPlab on Sun Systems

A Source-to-Source OpenMP Compiler

OpenMP programming Part II. Shaohao Chen High performance Louisiana State University

Allows program to be incrementally parallelized

Parallel Programming

41391 High performance computing: Miscellaneous parallel programmes in Fortran

Introduction to OpenMP

OpenMP - III. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

Introduction to Standard OpenMP 3.1

OpenMP. Dr. William McDoniel and Prof. Paolo Bientinesi WS17/18. HPAC, RWTH Aachen

Practical stuff! ü OpenMP. Ways of actually get stuff done in HPC:

Parallel Programming with OpenMP. CS240A, T. Yang

OpenMP Tutorial. Dirk Schmidl. IT Center, RWTH Aachen University. Member of the HPC Group Christian Terboven

OpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

Introduction to OpenMP. Lecture 4: Work sharing directives

GUIDE Reference Manual (C/C++ Edition)

OpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means

Automatic Scoping of Variables in Parallel Regions of an OpenMP Program

Shared Memory Programming with OpenMP

Parallel Programming: OpenMP

COSC 6374 Parallel Computation. Introduction to OpenMP(I) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)

Introduction to OpenMP

Parallel Programming with OpenMP. CS240A, T. Yang, 2013 Modified from Demmel/Yelick s and Mary Hall s Slides

Debugging with Totalview. Martin Čuma Center for High Performance Computing University of Utah

Scientific Computing

Shared memory programming

Introduction to OpenMP. Lecture 2: OpenMP fundamentals

A Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh

Department of Informatics V. HPC-Lab. Session 2: OpenMP M. Bader, A. Breuer. Alex Breuer

Introduction to OpenMP. Tasks. N.M. Maclaren September 2017

Synchronization. Event Synchronization

OpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2.

OpenMP - Introduction

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...

Introduction to OpenMP

1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008

Implementation of Parallelization

CS691/SC791: Parallel & Distributed Computing

Lecture 4: OpenMP Open Multi-Processing

[Potentially] Your first parallel application

Introduction to. Slides prepared by : Farzana Rahman 1

Session 4: Parallel Programming with OpenMP

OpenMP. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen

Speeding Up Reactive Transport Code Using OpenMP. OpenMP

Shared Memory Programming With OpenMP Exercise Instructions

OpenACC Course. Office Hour #2 Q&A

Towards OpenMP for Java

NUMERICAL PARALLEL COMPUTING

Introduction to OpenMP

Masterpraktikum - High Performance Computing

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...

Data Environment: Default storage attributes

Multi-core Architecture and Programming

Barbara Chapman, Gabriele Jost, Ruud van der Pas

OpenMP 4.5 target. Presenters: Tom Scogland Oscar Hernandez. Wednesday, June 28 th, Credits for some of the material

First Experiences with Intel Cluster OpenMP

Introduction to OpenMP

Transcription:

Tools for OpenMP Programming Dieter an Mey Center for Computing and Communication Aachen University anmey@rz rz.rwth-aachen.de 1

Tools for OpenMP Programming Debugging of OpenMP Codes KAP/Pro Toolset from KAI/Intel Guide - Compilers Assure GuideView TotalView from Etnus 2

Debugging of OpenMP-Programs Programs (1) Prepare the serial code Carefully select a reasonable test case! Is the serial program delivering the right results? ( use at least O3 ) How about compiler warnings (lint, f90 Xlist)? Fortran: Put all local variables on the stack: f90 stackvar... Now try the OpenMP version Check the stacksize limits! export STACKSIZE=... ulimit s... Respect compiler messages f90: USE omp_lib f90 xcommonchk xvpara xloopinfo XlistMP... Try the OpenMP dummy library? (link with [x]openmp=stubs / guide: execute with KMP_LIBARY=serial ) 3

Debugging of OpenMP-Programs Programs (2) Is the OpenMP program running with a single thread? Is the OpenMP program running correctly sometimes with more than one thread? Race Conditions? Thread Safety? Use of static variables within a parallel region? (f90: SAVE, DATA,..., C: static, extern ) Check your program with Assure! (Intel Thread Checker) compare Sun and Guide compilers! guidexx... WGopt=0, When compiling with Guide, compile without optimization and with g -> use the TotalView debugger together with guide Turn on and off single parallel Regions! serialise single parts of long parallel regions: omp single... omp end single introduce additional barriers for testing Different rounding errors matter? -fsimple=0 Don t parallelize reductions 4

Debugging of OpenMP Programs (3) Data Races The typical OpenMP programming errors: Data Races One thread modifies a memory location, which another thread reads or writes in the same region (between 2 synchronisation points). Take care: The sequence of the execution of parallel loop iterations is non deterministic and may change from run to run. Test: The serial code should give the same answers, when running the parallelized loop backwards. Assure traces all memory references and detects possible data races. It verifies that the OpenMP code gives the same results than a serial program run. In many cases private clauses, barriers, or critical regions are missing. Assure does not accept OpenMP runtime functions. (The Thread Checker does) 5

TotalView Debugging of OpenMP-Programs Programs (1) See TotalView User s Guide: Each parallel region is outlined into a separate Routine Each parallel loop is outlined into a separate Routine The names of these outlined routines base on the original name of the calling routine and the line number of the parallel directive Shared variables are declared in the calling routine and passed to the outlined routine. Private variables are declared in the outlined routine. The slave threads are generated on entry of the parallel region You must not step into a parallel region, but run into a previously defined breakpoint. 6

TotalView Debugging of OpenMP-Programs Programs (2) Use the Guide-OpenMP-compiler, because TotalView does not yet support OpenMP debugging with the Sun compilers Compile and Link separately #!/bin/ksh guidef90 c WG,-cmpo=i \ WGkeepcpp prog.f90 -orguidef90 c WG,-cmpo=i g prog.f90 #!/bin/ksh guidec c g prog.f90 guidec o a.out g prog.o export OMP_NUM_THREADS=2 totalview a.out guidef90 o a.out WG,-cmpo=i g prog.o export OMP_NUM_THREADS=2 totalview a.out 7

KAP Pro/Toolset Guide Compilers versus Sun-Compilers Guide compilers guidef77 / guidef90 / guidec / guidec++: preprocessors replacing OpenMP constructs by calls to additional runtime library using pthreads evoking underlying native Fortran / C compilers guide*: any optimization level of the underlying native compiler can be selected => debugging is possible guide*: supported by the TotalView parallel debugger guidef90: no internal subroutines in parallel regions guidec++ includes the famous KCC C++ compiler Sun compilers CC: automatically turns on xo3 => debugging is impossible cc / f90 / f95: new option for debugging xopenmp=noopt f90 / f95 / cc: combination auf OpenMP and auto parallelization is supported Attention: different performance characteristics, different defaults! 8

Assure Usage Like the guide compilers, assure is a preprocessor which instruments the source code collects additional information about the code evokes the native compiler assurec assurec++ assuref77 assuref90 -WGpname=project \ -fast... sourcefiles -o a.out The executable is run in serial mode (and takes a lot of memory and run time) all memory references are traced possible data races are detected in a postprocessing phase (for the given dataset!) a.out The results of the analysis can be reported in line mode or presented with a GUI. assureview -pname=project -txt assureview -pname=project 9

Assure Example: : Jacobi (1)!$omp parallel private(resid,k_local) k_local = 1 do while (k_local.le.maxit.and. error.gt.tol)!$omp do do j=1,m; do i=1,n; uold(i,j) = u(i,j); enddo; enddo!$omp single error = 0.0!$omp end single!$omp do reduction(+:error) do j..;do i..;resid=..;u(i,j)=..;error=..;enddo;enddo!$omp single error = sqrt(error)/dble(n*m)!$omp end single k_local = k_local + 1 enddo!$omp master k = k_local!$omp end master!$omp end parallel 10

Assure Example: : Jacobi (2)!$omp parallel private(resid,k_local) k_local = 1 do while (k_local.le.maxit.and. error.gt.tol)!$omp do do j=1,m; do i=1,n; uold(i,j) = u(i,j); enddo; enddo error = 0.0!$omp do reduction(+:error) do j..;do i..;resid=..;u(i,j)=..;error=..;enddo;enddo!$omp single error = sqrt(error)/dble(n*m)!$omp end single k_local = k_local + 1 enddo!$omp master k = k_local!$omp end master!$omp end parallel 11

Assure Example: : Jacobi (3) 12

Assure Example: : Jacobi (3) 13

Assure Example: : Jacobi (4) 14

c$omp parallel... c$omp do private(l,tmp) DO I=1,N L = ind(i) tmp = X(L)*a(I)+Y(L)*b(I) X(L) = X(L)-tmp*a(I) Y(L) = Y(L)-tmp*b(I) END DO c$omp end do... c$omp end parallel Assure Example: Thermoflow (1) User: The values of the index array IND are certainly disjoint! But: Assure complains Check: c$omp single open (unit=99,file="ind.dat") do i = 1,n write(99,*) ind(i) end do close (99) c$omp end single 2 values out of 2000 occured twice! sort ind.dat > ind.sort sort -u ind.dat > ind.usort diff ind.sort ind.usort 98d97 < 1085 1619d1617 < 505 15

Assure Example: Thermoflow (2) C$omp parallel... DO iter = 1,maxiter c$omp do DO I = 3,n-2 y(i) = (x(i-1) + x(i) + x(i+1)) / 3.0d0 END DO c$omp end do c$omp do DO I = 3,n-2 x(i) = y(i) END DO c$omp end do Assure complains! What is wrong? x(2) = y(3) x(n-1) = y(n-2) END DO... C$omp parallel 16

Assure Example: Thermoflow (3) C$omp parallel... DO iter = 1,maxiter c$omp do DO I = 3,n-2 y(i) = (x(i-1) + x(i) + x(i+1)) / 3.0d0 END DO c$omp end do c$omp do DO I = 3,n-2 x(i) = y(i) END DO c$omp end do nowait c$omp single x(2) = y(3) x(n-1) = y(n-2) c$omp end single END DO... C$omp parallel This barrier can be omitted. This barrier was missing Assure complains! What is wrong? 17

Assure My Advice Never put an OpenMP code into production...... without using Assure... 18

Intel Thread Checker... or the Intel Thread Checker......which is the successor of Assure since Intel bought KAI. Currently the Thread- Checker only runs on the MS Windows platform. 19

GuideView Usage Compile with the guide compiler guidec guidec++ guidef77 guidef90 \ -c -fast... sourcefiles Link with the guide compiler driver and add the -Wgstats option guidec guidec++ guidef77 guidef90 -WGstats \ -fast... objectfiles -o a.out Execute the program, at the end a statistics file is written OMP_NUM_THREADS=4 a.out Visualize the statistics file with the GuideView GUI guideview 20

GuideView Example: : Jacobi (1) Barrier 1 Barrier 2 Barrier 3 Barrier 4!$omp parallel private(resid,k_local) k_local = 1 do while (k_local.le.maxit.and. error.gt.tol)!$omp do do j=1,m; do i=1,n; uold(i,j) = u(i,j); enddo; enddo!$omp single error = 0.0!$omp end single!$omp do reduction(+:error) do j..;do i..;resid=..;u(i,j)=..;error=..;enddo;enddo!$omp single error = sqrt(error)/dble(n*m)!$omp end single k_local = k_local + 1 enddo!$omp master k = k_local!$omp end master!$omp end parallel 21

GuideView Example: : Jacobi (2) Barrier 1 Barrier 2 Barrier 3 Barrier 4!$omp parallel private(re k_local = 1 do while (k_local.l!$omp do do j=1,m; do i=1!$omp single error = 0.0!$omp end single!$omp do reduction(+:erro do j..;do i..;resi!$omp single error = sqrt(err!$omp end single k_local = k_loca enddo!$omp master k = k_local!$omp end master!$omp end parallel 22

GuideView Example: : Jacobi (2) Barrier 1 Barrier 2 Barrier 3 Barrier 4!$omp parallel private(re k_local = 1 do while (k_local.l!$omp do do j=1,m; do i=1!$omp single error = 0.0!$omp end single!$omp do reduction(+:erro do j..;do i..;resi!$omp single error = sqrt(err!$omp end single k_local = k_loca enddo!$omp master k = k_local!$omp end master!$omp end parallel 23

GuideView Example: : Jacobi (3) Wait at a barrier Wait at the end of a parallel region Overhead when entering a parallel region Parallel time Waiting at a critical region Waiting for a lock 24

GuideView Example: : TFS (1) 25

GuideView Example: : TFS (1) 26

Loop Scheduling Example Matrix Transpose (1) export OMP_NUM_THREADS=8 ulimit -s 300000 export STACKSIZE=300000 guidef90 -WGstats -fast transpose.f90 export KMP_STATSFILE=static8.gvs export OMP_SCHEDULE=static,8 a.out guideview!$omp parallel do schedule(runtime) private(h) do i = 1, n-1 do j = i+1, n h = a(j,i) a(j,i) = a(i,j) a(i,j) = h end do end do end do 27

dynamic,1 10.24 sec Loop Scheduling Example Matrix Transpose (2) matrix size: 5000x5000 11 repetitions static,1 10.41 sec static 6.30 sec dynamic,8 4.96 sec guided,1 3.35 sec guided,8 3.31 sec static,8 4.12 sec 28

Loop Scheduling Example Matrix Transpose (3) 1 0.8 0.6 0.4 0.2 0 best version using the Sun compiler static static,1 static,8 dyn.,1 dyn.,8 guided,1 guided,8 matrix size: 5000x5000 average time (sec) best version using the Guide compiler guidef90 f90 -openmp 29

Summary Debugging of OpenMP codes: Parallelize carefully! Watch out for compiler messages (-XlistMP) Use Assure (or ThreadChecker) Most likely, using a debugger on OpenMP codes is not necessary. If it is, you can use TotalView in combination with Guide Runtime analysis of OpenMP codes: Sun s Analyzer is an excellent and very powerfull tool On the OpenMP directive level, GuideView statistics are sometimes easier to understand 30