Tools for OpenMP Programming

Size: px
Start display at page:

Download "Tools for OpenMP Programming"

Transcription

1 Tools for OpenMP Programming Dieter an Mey Center for Computing and Communication Aachen University rz.rwth-aachen.de 1

2 Tools for OpenMP Programming Debugging of OpenMP Codes KAP/Pro Toolset from KAI/Intel Guide - Compilers Assure GuideView TotalView from Etnus 2

3 Debugging of OpenMP-Programs Programs (1) Prepare the serial code Carefully select a reasonable test case! Is the serial program delivering the right results? ( use at least O3 ) How about compiler warnings (lint, f90 Xlist)? Fortran: Put all local variables on the stack: f90 stackvar... Now try the OpenMP version Check the stacksize limits! export STACKSIZE=... ulimit s... Respect compiler messages f90: USE omp_lib f90 xcommonchk xvpara xloopinfo XlistMP... Try the OpenMP dummy library? (link with [x]openmp=stubs / guide: execute with KMP_LIBARY=serial ) 3

4 Debugging of OpenMP-Programs Programs (2) Is the OpenMP program running with a single thread? Is the OpenMP program running correctly sometimes with more than one thread? Race Conditions? Thread Safety? Use of static variables within a parallel region? (f90: SAVE, DATA,..., C: static, extern ) Check your program with Assure! (Intel Thread Checker) compare Sun and Guide compilers! guidexx... WGopt=0, When compiling with Guide, compile without optimization and with g -> use the TotalView debugger together with guide Turn on and off single parallel Regions! serialise single parts of long parallel regions: omp single... omp end single introduce additional barriers for testing Different rounding errors matter? -fsimple=0 Don t parallelize reductions 4

5 Debugging of OpenMP Programs (3) Data Races The typical OpenMP programming errors: Data Races One thread modifies a memory location, which another thread reads or writes in the same region (between 2 synchronisation points). Take care: The sequence of the execution of parallel loop iterations is non deterministic and may change from run to run. Test: The serial code should give the same answers, when running the parallelized loop backwards. Assure traces all memory references and detects possible data races. It verifies that the OpenMP code gives the same results than a serial program run. In many cases private clauses, barriers, or critical regions are missing. Assure does not accept OpenMP runtime functions. (The Thread Checker does) 5

6 TotalView Debugging of OpenMP-Programs Programs (1) See TotalView User s Guide: Each parallel region is outlined into a separate Routine Each parallel loop is outlined into a separate Routine The names of these outlined routines base on the original name of the calling routine and the line number of the parallel directive Shared variables are declared in the calling routine and passed to the outlined routine. Private variables are declared in the outlined routine. The slave threads are generated on entry of the parallel region You must not step into a parallel region, but run into a previously defined breakpoint. 6

7 TotalView Debugging of OpenMP-Programs Programs (2) Use the Guide-OpenMP-compiler, because TotalView does not yet support OpenMP debugging with the Sun compilers Compile and Link separately #!/bin/ksh guidef90 c WG,-cmpo=i \ WGkeepcpp prog.f90 -orguidef90 c WG,-cmpo=i g prog.f90 #!/bin/ksh guidec c g prog.f90 guidec o a.out g prog.o export OMP_NUM_THREADS=2 totalview a.out guidef90 o a.out WG,-cmpo=i g prog.o export OMP_NUM_THREADS=2 totalview a.out 7

8 KAP Pro/Toolset Guide Compilers versus Sun-Compilers Guide compilers guidef77 / guidef90 / guidec / guidec++: preprocessors replacing OpenMP constructs by calls to additional runtime library using pthreads evoking underlying native Fortran / C compilers guide*: any optimization level of the underlying native compiler can be selected => debugging is possible guide*: supported by the TotalView parallel debugger guidef90: no internal subroutines in parallel regions guidec++ includes the famous KCC C++ compiler Sun compilers CC: automatically turns on xo3 => debugging is impossible cc / f90 / f95: new option for debugging xopenmp=noopt f90 / f95 / cc: combination auf OpenMP and auto parallelization is supported Attention: different performance characteristics, different defaults! 8

9 Assure Usage Like the guide compilers, assure is a preprocessor which instruments the source code collects additional information about the code evokes the native compiler assurec assurec++ assuref77 assuref90 -WGpname=project \ -fast... sourcefiles -o a.out The executable is run in serial mode (and takes a lot of memory and run time) all memory references are traced possible data races are detected in a postprocessing phase (for the given dataset!) a.out The results of the analysis can be reported in line mode or presented with a GUI. assureview -pname=project -txt assureview -pname=project 9

10 Assure Example: : Jacobi (1)!$omp parallel private(resid,k_local) k_local = 1 do while (k_local.le.maxit.and. error.gt.tol)!$omp do do j=1,m; do i=1,n; uold(i,j) = u(i,j); enddo; enddo!$omp single error = 0.0!$omp end single!$omp do reduction(+:error) do j..;do i..;resid=..;u(i,j)=..;error=..;enddo;enddo!$omp single error = sqrt(error)/dble(n*m)!$omp end single k_local = k_local + 1 enddo!$omp master k = k_local!$omp end master!$omp end parallel 10

11 Assure Example: : Jacobi (2)!$omp parallel private(resid,k_local) k_local = 1 do while (k_local.le.maxit.and. error.gt.tol)!$omp do do j=1,m; do i=1,n; uold(i,j) = u(i,j); enddo; enddo error = 0.0!$omp do reduction(+:error) do j..;do i..;resid=..;u(i,j)=..;error=..;enddo;enddo!$omp single error = sqrt(error)/dble(n*m)!$omp end single k_local = k_local + 1 enddo!$omp master k = k_local!$omp end master!$omp end parallel 11

12 Assure Example: : Jacobi (3) 12

13 Assure Example: : Jacobi (3) 13

14 Assure Example: : Jacobi (4) 14

15 c$omp parallel... c$omp do private(l,tmp) DO I=1,N L = ind(i) tmp = X(L)*a(I)+Y(L)*b(I) X(L) = X(L)-tmp*a(I) Y(L) = Y(L)-tmp*b(I) END DO c$omp end do... c$omp end parallel Assure Example: Thermoflow (1) User: The values of the index array IND are certainly disjoint! But: Assure complains Check: c$omp single open (unit=99,file="ind.dat") do i = 1,n write(99,*) ind(i) end do close (99) c$omp end single 2 values out of 2000 occured twice! sort ind.dat > ind.sort sort -u ind.dat > ind.usort diff ind.sort ind.usort 98d97 < d1617 <

16 Assure Example: Thermoflow (2) C$omp parallel... DO iter = 1,maxiter c$omp do DO I = 3,n-2 y(i) = (x(i-1) + x(i) + x(i+1)) / 3.0d0 END DO c$omp end do c$omp do DO I = 3,n-2 x(i) = y(i) END DO c$omp end do Assure complains! What is wrong? x(2) = y(3) x(n-1) = y(n-2) END DO... C$omp parallel 16

17 Assure Example: Thermoflow (3) C$omp parallel... DO iter = 1,maxiter c$omp do DO I = 3,n-2 y(i) = (x(i-1) + x(i) + x(i+1)) / 3.0d0 END DO c$omp end do c$omp do DO I = 3,n-2 x(i) = y(i) END DO c$omp end do nowait c$omp single x(2) = y(3) x(n-1) = y(n-2) c$omp end single END DO... C$omp parallel This barrier can be omitted. This barrier was missing Assure complains! What is wrong? 17

18 Assure My Advice Never put an OpenMP code into production without using Assure... 18

19 Intel Thread Checker... or the Intel Thread Checker......which is the successor of Assure since Intel bought KAI. Currently the Thread- Checker only runs on the MS Windows platform. 19

20 GuideView Usage Compile with the guide compiler guidec guidec++ guidef77 guidef90 \ -c -fast... sourcefiles Link with the guide compiler driver and add the -Wgstats option guidec guidec++ guidef77 guidef90 -WGstats \ -fast... objectfiles -o a.out Execute the program, at the end a statistics file is written OMP_NUM_THREADS=4 a.out Visualize the statistics file with the GuideView GUI guideview 20

21 GuideView Example: : Jacobi (1) Barrier 1 Barrier 2 Barrier 3 Barrier 4!$omp parallel private(resid,k_local) k_local = 1 do while (k_local.le.maxit.and. error.gt.tol)!$omp do do j=1,m; do i=1,n; uold(i,j) = u(i,j); enddo; enddo!$omp single error = 0.0!$omp end single!$omp do reduction(+:error) do j..;do i..;resid=..;u(i,j)=..;error=..;enddo;enddo!$omp single error = sqrt(error)/dble(n*m)!$omp end single k_local = k_local + 1 enddo!$omp master k = k_local!$omp end master!$omp end parallel 21

22 GuideView Example: : Jacobi (2) Barrier 1 Barrier 2 Barrier 3 Barrier 4!$omp parallel private(re k_local = 1 do while (k_local.l!$omp do do j=1,m; do i=1!$omp single error = 0.0!$omp end single!$omp do reduction(+:erro do j..;do i..;resi!$omp single error = sqrt(err!$omp end single k_local = k_loca enddo!$omp master k = k_local!$omp end master!$omp end parallel 22

23 GuideView Example: : Jacobi (2) Barrier 1 Barrier 2 Barrier 3 Barrier 4!$omp parallel private(re k_local = 1 do while (k_local.l!$omp do do j=1,m; do i=1!$omp single error = 0.0!$omp end single!$omp do reduction(+:erro do j..;do i..;resi!$omp single error = sqrt(err!$omp end single k_local = k_loca enddo!$omp master k = k_local!$omp end master!$omp end parallel 23

24 GuideView Example: : Jacobi (3) Wait at a barrier Wait at the end of a parallel region Overhead when entering a parallel region Parallel time Waiting at a critical region Waiting for a lock 24

25 GuideView Example: : TFS (1) 25

26 GuideView Example: : TFS (1) 26

27 Loop Scheduling Example Matrix Transpose (1) export OMP_NUM_THREADS=8 ulimit -s export STACKSIZE= guidef90 -WGstats -fast transpose.f90 export KMP_STATSFILE=static8.gvs export OMP_SCHEDULE=static,8 a.out guideview!$omp parallel do schedule(runtime) private(h) do i = 1, n-1 do j = i+1, n h = a(j,i) a(j,i) = a(i,j) a(i,j) = h end do end do end do 27

28 dynamic, sec Loop Scheduling Example Matrix Transpose (2) matrix size: 5000x repetitions static, sec static 6.30 sec dynamic, sec guided, sec guided, sec static, sec 28

29 Loop Scheduling Example Matrix Transpose (3) best version using the Sun compiler static static,1 static,8 dyn.,1 dyn.,8 guided,1 guided,8 matrix size: 5000x5000 average time (sec) best version using the Guide compiler guidef90 f90 -openmp 29

30 Summary Debugging of OpenMP codes: Parallelize carefully! Watch out for compiler messages (-XlistMP) Use Assure (or ThreadChecker) Most likely, using a debugger on OpenMP codes is not necessary. If it is, you can use TotalView in combination with Guide Runtime analysis of OpenMP codes: Sun s Analyzer is an excellent and very powerfull tool On the OpenMP directive level, GuideView statistics are sometimes easier to understand 30

Debugging OpenMP Programs

Debugging OpenMP Programs Debugging OpenMP Programs Dieter an Mey Center for Computing and Communication Aachen University anmey@rz.rwth-aachen.de aachen.de 1 Debugging OpenMP Programs General Hints dbx Sun IDE Debugger TotalView

More information

Debugging with TotalView

Debugging with TotalView Debugging with TotalView Dieter an Mey Center for Computing and Communication Aachen University of Technology anmey@rz.rwth-aachen.de 1 TotalView, Dieter an Mey, SunHPC 2006 Debugging on Sun dbx line mode

More information

Introduction: OpenMP is clear and easy..

Introduction: OpenMP is clear and easy.. Gesellschaft für Parallele Anwendungen und Systeme mbh OpenMP Tools Hans-Joachim Plum, Pallas GmbH edited by Matthias Müller, HLRS Pallas GmbH Hermülheimer Straße 10 D-50321 Brühl, Germany info@pallas.de

More information

OpenMP Case Studies. Dieter an Mey. Center for Computing and Communication Aachen University

OpenMP Case Studies. Dieter an Mey. Center for Computing and Communication Aachen University OpenMP Case Studies Dieter an Mey Center for Computing and Communication Aachen University anmey@rz rz.rwth-aachen.de 1 OpenMP Case Studies, Dieter an Mey, SunHPC 2004 OpenMP Case Studies Parallelization

More information

Two OpenMP Programming Patterns

Two OpenMP Programming Patterns Two OpenMP Programming Patterns Dieter an Mey Center for Computing and Communication, Aachen University anmey@rz.rwth-aachen.de 1 Introduction A collection of frequently occurring OpenMP programming patterns

More information

Parallel Programming in OpenMP Introduction

Parallel Programming in OpenMP Introduction Parallel Programming in OpenMP Introduction Dieter an Mey Center for Computing and Communication Aachen University of Technology anmey@rz rz.rwth-aachen.de 1 OpenMP - Introduction, Dieter an Mey, 18 January

More information

!OMP #pragma opm _OPENMP

!OMP #pragma opm _OPENMP Advanced OpenMP Lecture 12: Tips, tricks and gotchas Directives Mistyping the sentinel (e.g.!omp or #pragma opm ) typically raises no error message. Be careful! The macro _OPENMP is defined if code is

More information

Performance Tuning and OpenMP

Performance Tuning and OpenMP Performance Tuning and OpenMP mueller@hlrs.de University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de Höchstleistungsrechenzentrum Stuttgart Outline Motivation Performance

More information

AMath 483/583 Lecture 14

AMath 483/583 Lecture 14 AMath 483/583 Lecture 14 Outline: OpenMP: Parallel blocks, critical sections, private and shared variables Parallel do loops, reductions Reading: class notes: OpenMP section of Bibliography $UWHPSC/codes/openmp

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP p. 1/?? Introduction to OpenMP Synchronisation Nick Maclaren Computing Service nmm1@cam.ac.uk, ext. 34761 June 2011 Introduction to OpenMP p. 2/?? Summary Facilities here are relevant

More information

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines What is OpenMP? What does OpenMP stands for? What does OpenMP stands for? Open specifications for Multi

More information

Parallel Software Engineering with OpenMP

Parallel Software Engineering with OpenMP Parallel Software Engineering with OpenMP NDL#NDLFRP KWWSZZZNDLFRP Outline Introduction What is Parallel Software Engineering Parallel Software Engineering Issues OpenMP KAP/Pro for OpenMP Conclusions

More information

Amdahl s Law. AMath 483/583 Lecture 13 April 25, Amdahl s Law. Amdahl s Law. Today: Amdahl s law Speed up, strong and weak scaling OpenMP

Amdahl s Law. AMath 483/583 Lecture 13 April 25, Amdahl s Law. Amdahl s Law. Today: Amdahl s law Speed up, strong and weak scaling OpenMP AMath 483/583 Lecture 13 April 25, 2011 Amdahl s Law Today: Amdahl s law Speed up, strong and weak scaling OpenMP Typically only part of a computation can be parallelized. Suppose 50% of the computation

More information

OPENMP TIPS, TRICKS AND GOTCHAS

OPENMP TIPS, TRICKS AND GOTCHAS OPENMP TIPS, TRICKS AND GOTCHAS OpenMPCon 2015 2 Directives Mistyping the sentinel (e.g.!omp or #pragma opm ) typically raises no error message. Be careful! Extra nasty if it is e.g. #pragma opm atomic

More information

Ruud van der Pas Nawal Copty Eric Duncan Oleg Mazurov

Ruud van der Pas Nawal Copty Eric Duncan Oleg Mazurov 1 Ruud van der Pas Nawal Copty Eric Duncan Oleg Mazurov Systems Group Sun Microsystems Menlo Park, CA, USA IWOMP, Sun Studio Compilers and Tools 2 Fortran (f95), C (cc) and C++ (CC) compilers Support sequential

More information

From a Vector Computer to an SMP-Cluster Hybrid Parallelization of the CFD Code PANTA

From a Vector Computer to an SMP-Cluster Hybrid Parallelization of the CFD Code PANTA From a Vector Computer to an SMP-Cluster Hybrid Parallelization of the CFD Code PANTA Dieter an Mey Computing Center Aachen University of Technology anmey@rz.rwth-aachen.de http://www.rz.rwth-aachen.de

More information

<Insert Picture Here> OpenMP on Solaris

<Insert Picture Here> OpenMP on Solaris 1 OpenMP on Solaris Wenlong Zhang Senior Sales Consultant Agenda What s OpenMP Why OpenMP OpenMP on Solaris 3 What s OpenMP Why OpenMP OpenMP on Solaris

More information

Practical in Numerical Astronomy, SS 2012 LECTURE 12

Practical in Numerical Astronomy, SS 2012 LECTURE 12 Practical in Numerical Astronomy, SS 2012 LECTURE 12 Parallelization II. Open Multiprocessing (OpenMP) Lecturer Eduard Vorobyov. Email: eduard.vorobiev@univie.ac.at, raum 006.6 1 OpenMP is a shared memory

More information

OPENMP TIPS, TRICKS AND GOTCHAS

OPENMP TIPS, TRICKS AND GOTCHAS OPENMP TIPS, TRICKS AND GOTCHAS Mark Bull EPCC, University of Edinburgh (and OpenMP ARB) markb@epcc.ed.ac.uk OpenMPCon 2015 OpenMPCon 2015 2 A bit of background I ve been teaching OpenMP for over 15 years

More information

OpenMP on Ranger and Stampede (with Labs)

OpenMP on Ranger and Stampede (with Labs) OpenMP on Ranger and Stampede (with Labs) Steve Lantz Senior Research Associate Cornell CAC Parallel Computing at TACC: Ranger to Stampede Transition November 6, 2012 Based on materials developed by Kent

More information

Compiling and running OpenMP programs. C/C++: cc fopenmp o prog prog.c -lomp CC fopenmp o prog prog.c -lomp. Programming with OpenMP*

Compiling and running OpenMP programs. C/C++: cc fopenmp o prog prog.c -lomp CC fopenmp o prog prog.c -lomp. Programming with OpenMP* Advanced OpenMP Compiling and running OpenMP programs C/C++: cc fopenmp o prog prog.c -lomp CC fopenmp o prog prog.c -lomp 2 1 Running Standard environment variable determines the number of threads: tcsh

More information

OpenMP. Application Program Interface. CINECA, 14 May 2012 OpenMP Marco Comparato

OpenMP. Application Program Interface. CINECA, 14 May 2012 OpenMP Marco Comparato OpenMP Application Program Interface Introduction Shared-memory parallelism in C, C++ and Fortran compiler directives library routines environment variables Directives single program multiple data (SPMD)

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP p. 1/?? Introduction to OpenMP Tasks Nick Maclaren nmm1@cam.ac.uk September 2017 Introduction to OpenMP p. 2/?? OpenMP Tasks In OpenMP 3.0 with a slightly different model A form

More information

Overview: The OpenMP Programming Model

Overview: The OpenMP Programming Model Overview: The OpenMP Programming Model motivation and overview the parallel directive: clauses, equivalent pthread code, examples the for directive and scheduling of loop iterations Pi example in OpenMP

More information

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP COMP4510 Introduction to Parallel Computation Shared Memory and OpenMP Thanks to Jon Aronsson (UofM HPC consultant) for some of the material in these notes. Outline (cont d) Shared Memory and OpenMP Including

More information

Parallelising serial applications. Darryl Gove Compiler Performance Engineering

Parallelising serial applications. Darryl Gove Compiler Performance Engineering Parallelising serial applications Darryl Gove Compiler Performance Engineering Topics Process Tools Expectations 2 Profile Compile with debug info > -g [C/Fortran] > -g0 [C++] > Enables mapping of disassembly

More information

Introduction to OpenMP

Introduction to OpenMP 1 Introduction to OpenMP NTNU-IT HPC Section John Floan Notur: NTNU HPC http://www.notur.no/ www.hpc.ntnu.no/ Name, title of the presentation 2 Plan for the day Introduction to OpenMP and parallel programming

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP Lecture 9: Performance tuning Sources of overhead There are 6 main causes of poor performance in shared memory parallel programs: sequential code communication load imbalance synchronisation

More information

OpenMP: Open Multiprocessing

OpenMP: Open Multiprocessing OpenMP: Open Multiprocessing Erik Schnetter June 7, 2012, IHPC 2012, Iowa City Outline 1. Basic concepts, hardware architectures 2. OpenMP Programming 3. How to parallelise an existing code 4. Advanced

More information

Parallelising Scientific Codes Using OpenMP. Wadud Miah Research Computing Group

Parallelising Scientific Codes Using OpenMP. Wadud Miah Research Computing Group Parallelising Scientific Codes Using OpenMP Wadud Miah Research Computing Group Software Performance Lifecycle Scientific Programming Early scientific codes were mainly sequential and were executed on

More information

Advanced C Programming Winter Term 2008/09. Guest Lecture by Markus Thiele

Advanced C Programming Winter Term 2008/09. Guest Lecture by Markus Thiele Advanced C Programming Winter Term 2008/09 Guest Lecture by Markus Thiele Lecture 14: Parallel Programming with OpenMP Motivation: Why parallelize? The free lunch is over. Herb

More information

CS691/SC791: Parallel & Distributed Computing

CS691/SC791: Parallel & Distributed Computing CS691/SC791: Parallel & Distributed Computing Introduction to OpenMP Part 2 1 OPENMP: SORTING 1 Bubble Sort Serial Odd-Even Transposition Sort 2 Serial Odd-Even Transposition Sort First OpenMP Odd-Even

More information

Exploiting Object-Oriented Abstractions to parallelize Sparse Linear Algebra Codes

Exploiting Object-Oriented Abstractions to parallelize Sparse Linear Algebra Codes Exploiting Object-Oriented Abstractions to parallelize Sparse Linear Algebra Codes Christian Terboven, Dieter an Mey, Paul Kapinos, Christopher Schleiden, Igor Merkulow {terboven, anmey, kapinos, schleiden,

More information

Shared Memory Programming Model

Shared Memory Programming Model Shared Memory Programming Model Ahmed El-Mahdy and Waleed Lotfy What is a shared memory system? Activity! Consider the board as a shared memory Consider a sheet of paper in front of you as a local cache

More information

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program Amdahl's Law About Data What is Data Race? Overview to OpenMP Components of OpenMP OpenMP Programming Model OpenMP Directives

More information

OpenMP: Open Multiprocessing

OpenMP: Open Multiprocessing OpenMP: Open Multiprocessing Erik Schnetter May 20-22, 2013, IHPC 2013, Iowa City 2,500 BC: Military Invents Parallelism Outline 1. Basic concepts, hardware architectures 2. OpenMP Programming 3. How to

More information

OpenMP programming. Thomas Hauser Director Research Computing Research CU-Boulder

OpenMP programming. Thomas Hauser Director Research Computing Research CU-Boulder OpenMP programming Thomas Hauser Director Research Computing thomas.hauser@colorado.edu CU meetup 1 Outline OpenMP Shared-memory model Parallel for loops Declaring private variables Critical sections Reductions

More information

OpenMP Shared Memory Programming

OpenMP Shared Memory Programming OpenMP Shared Memory Programming John Burkardt, Information Technology Department, Virginia Tech.... Mathematics Department, Ajou University, Suwon, Korea, 13 May 2009.... http://people.sc.fsu.edu/ jburkardt/presentations/

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms http://sudalab.is.s.u-tokyo.ac.jp/~reiji/pna16/ [ 9 ] Shared Memory Performance Parallel Numerical Algorithms / IST / UTokyo 1 PNA16 Lecture Plan General Topics 1. Architecture

More information

Programming Shared Memory Systems with OpenMP Part I. Book

Programming Shared Memory Systems with OpenMP Part I. Book Programming Shared Memory Systems with OpenMP Part I Instructor Dr. Taufer Book Parallel Programming in OpenMP by Rohit Chandra, Leo Dagum, Dave Kohr, Dror Maydan, Jeff McDonald, Ramesh Menon 2 1 Machine

More information

UvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP

UvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP Parallel Programming with Compiler Directives OpenMP Clemens Grelck University of Amsterdam UvA-SARA High Performance Computing Course June 2013 OpenMP at a Glance Loop Parallelization Scheduling Parallel

More information

OpenMP Introduction. CS 590: High Performance Computing. OpenMP. A standard for shared-memory parallel programming. MP = multiprocessing

OpenMP Introduction. CS 590: High Performance Computing. OpenMP. A standard for shared-memory parallel programming. MP = multiprocessing CS 590: High Performance Computing OpenMP Introduction Fengguang Song Department of Computer Science IUPUI OpenMP A standard for shared-memory parallel programming. MP = multiprocessing Designed for systems

More information

OpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018

OpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018 OpenMP 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives

More information

The OMPlab on Sun Systems

The OMPlab on Sun Systems 1 The OMPlab on Sun Systems Ruud van der Pas Senior Staff Engineer Sun Microsystems Menlo Park, CA, USA IWOMP 2007 Tsinghua University IWOMP 2007 -, 2 Sun Studio Compilers and Tools Fortran (f95), C (cc)

More information

A Source-to-Source OpenMP Compiler

A Source-to-Source OpenMP Compiler A Source-to-Source OpenMP Compiler Mario Soukup and Tarek S. Abdelrahman The Edward S. Rogers Sr. Department of Electrical and Computer Engineering University of Toronto Toronto, Ontario, Canada M5S 3G4

More information

OpenMP programming Part II. Shaohao Chen High performance Louisiana State University

OpenMP programming Part II. Shaohao Chen High performance Louisiana State University OpenMP programming Part II Shaohao Chen High performance computing @ Louisiana State University Part II Optimization for performance Trouble shooting and debug Common Misunderstandings and Frequent Errors

More information

Allows program to be incrementally parallelized

Allows program to be incrementally parallelized Basic OpenMP What is OpenMP An open standard for shared memory programming in C/C+ + and Fortran supported by Intel, Gnu, Microsoft, Apple, IBM, HP and others Compiler directives and library support OpenMP

More information

Parallel Programming

Parallel Programming Parallel Programming OpenMP Dr. Hyrum D. Carroll November 22, 2016 Parallel Programming in a Nutshell Load balancing vs Communication This is the eternal problem in parallel computing. The basic approaches

More information

41391 High performance computing: Miscellaneous parallel programmes in Fortran

41391 High performance computing: Miscellaneous parallel programmes in Fortran 1391 High performance computing: Miscellaneous parallel programmes in Fortran Nilas Mandrup Hansen, Ask Hjorth Larsen January 19, 0 1 Introduction This document concerns the implementation of a Fortran

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP Lecture 2: OpenMP fundamentals Overview Basic Concepts in OpenMP History of OpenMP Compiling and running OpenMP programs 2 1 What is OpenMP? OpenMP is an API designed for programming

More information

OpenMP - III. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

OpenMP - III. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen OpenMP - III Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 OpenMP References Using OpenMP: Portable Shared Memory Parallel Programming. The MIT

More information

Introduction to Standard OpenMP 3.1

Introduction to Standard OpenMP 3.1 Introduction to Standard OpenMP 3.1 Massimiliano Culpo - m.culpo@cineca.it Gian Franco Marras - g.marras@cineca.it CINECA - SuperComputing Applications and Innovation Department 1 / 59 Outline 1 Introduction

More information

OpenMP. Dr. William McDoniel and Prof. Paolo Bientinesi WS17/18. HPAC, RWTH Aachen

OpenMP. Dr. William McDoniel and Prof. Paolo Bientinesi WS17/18. HPAC, RWTH Aachen OpenMP Dr. William McDoniel and Prof. Paolo Bientinesi HPAC, RWTH Aachen mcdoniel@aices.rwth-aachen.de WS17/18 Loop construct - Clauses #pragma omp for [clause [, clause]...] The following clauses apply:

More information

Practical stuff! ü OpenMP. Ways of actually get stuff done in HPC:

Practical stuff! ü OpenMP. Ways of actually get stuff done in HPC: Ways of actually get stuff done in HPC: Practical stuff! Ø Message Passing (send, receive, broadcast,...) Ø Shared memory (load, store, lock, unlock) ü MPI Ø Transparent (compiler works magic) Ø Directive-based

More information

Parallel Programming with OpenMP. CS240A, T. Yang

Parallel Programming with OpenMP. CS240A, T. Yang Parallel Programming with OpenMP CS240A, T. Yang 1 A Programmer s View of OpenMP What is OpenMP? Open specification for Multi-Processing Standard API for defining multi-threaded shared-memory programs

More information

OpenMP Tutorial. Dirk Schmidl. IT Center, RWTH Aachen University. Member of the HPC Group Christian Terboven

OpenMP Tutorial. Dirk Schmidl. IT Center, RWTH Aachen University. Member of the HPC Group Christian Terboven OpenMP Tutorial Dirk Schmidl IT Center, RWTH Aachen University Member of the HPC Group schmidl@itc.rwth-aachen.de IT Center, RWTH Aachen University Head of the HPC Group terboven@itc.rwth-aachen.de 1 IWOMP

More information

OpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

OpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen OpenMP - II Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 OpenMP References Using OpenMP: Portable Shared Memory Parallel Programming. The MIT

More information

Introduction to OpenMP. Lecture 4: Work sharing directives

Introduction to OpenMP. Lecture 4: Work sharing directives Introduction to OpenMP Lecture 4: Work sharing directives Work sharing directives Directives which appear inside a parallel region and indicate how work should be shared out between threads Parallel do/for

More information

GUIDE Reference Manual (C/C++ Edition)

GUIDE Reference Manual (C/C++ Edition) GUIDE Reference Manual (C/C++ Edition) Version 3.6 Document #9607001 Kuck & Associates, Inc. GUIDE Reference Manual Version 3.6 Revised October, 1998 Kuck & Associates, Inc. 1906 Fox Drive Champaign, IL

More information

OpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means

OpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means High Performance Computing: Concepts, Methods & Means OpenMP Programming Prof. Thomas Sterling Department of Computer Science Louisiana State University February 8 th, 2007 Topics Introduction Overview

More information

Automatic Scoping of Variables in Parallel Regions of an OpenMP Program

Automatic Scoping of Variables in Parallel Regions of an OpenMP Program Automatic Scoping of Variables in Parallel Regions of an OpenMP Program Yuan Lin 1 Christian Terboven 2 Dieter an Mey 2 Nawal Copty 1 1 Sun Microsystems, USA 2 RWTH Aachen University, Germany The Problem

More information

Shared Memory Programming with OpenMP

Shared Memory Programming with OpenMP Shared Memory Programming with OpenMP (An UHeM Training) Süha Tuna Informatics Institute, Istanbul Technical University February 12th, 2016 2 Outline - I Shared Memory Systems Threaded Programming Model

More information

Parallel Programming: OpenMP

Parallel Programming: OpenMP Parallel Programming: OpenMP Xianyi Zeng xzeng@utep.edu Department of Mathematical Sciences The University of Texas at El Paso. November 10, 2016. An Overview of OpenMP OpenMP: Open Multi-Processing An

More information

COSC 6374 Parallel Computation. Introduction to OpenMP(I) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)

COSC 6374 Parallel Computation. Introduction to OpenMP(I) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel) COSC 6374 Parallel Computation Introduction to OpenMP(I) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel) Edgar Gabriel Fall 2014 Introduction Threads vs. processes Recap of

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP Le Yan Scientific computing consultant User services group High Performance Computing @ LSU Goals Acquaint users with the concept of shared memory parallelism Acquaint users with

More information

Parallel Programming with OpenMP. CS240A, T. Yang, 2013 Modified from Demmel/Yelick s and Mary Hall s Slides

Parallel Programming with OpenMP. CS240A, T. Yang, 2013 Modified from Demmel/Yelick s and Mary Hall s Slides Parallel Programming with OpenMP CS240A, T. Yang, 203 Modified from Demmel/Yelick s and Mary Hall s Slides Introduction to OpenMP What is OpenMP? Open specification for Multi-Processing Standard API for

More information

Debugging with Totalview. Martin Čuma Center for High Performance Computing University of Utah

Debugging with Totalview. Martin Čuma Center for High Performance Computing University of Utah Debugging with Totalview Martin Čuma Center for High Performance Computing University of Utah mcuma@chpc.utah.edu Overview Totalview introduction. Basic operation. Serial debugging. Parallel debugging.

More information

Scientific Computing

Scientific Computing Lecture on Scientific Computing Dr. Kersten Schmidt Lecture 20 Technische Universität Berlin Institut für Mathematik Wintersemester 2014/2015 Syllabus Linear Regression, Fast Fourier transform Modelling

More information

Shared memory programming

Shared memory programming CME342- Parallel Methods in Numerical Analysis Shared memory programming May 14, 2014 Lectures 13-14 Motivation Popularity of shared memory systems is increasing: Early on, DSM computers (SGI Origin 3000

More information

Introduction to OpenMP. Lecture 2: OpenMP fundamentals

Introduction to OpenMP. Lecture 2: OpenMP fundamentals Introduction to OpenMP Lecture 2: OpenMP fundamentals Overview 2 Basic Concepts in OpenMP History of OpenMP Compiling and running OpenMP programs What is OpenMP? 3 OpenMP is an API designed for programming

More information

A Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh

A Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh A Short Introduction to OpenMP Mark Bull, EPCC, University of Edinburgh Overview Shared memory systems Basic Concepts in Threaded Programming Basics of OpenMP Parallel regions Parallel loops 2 Shared memory

More information

Department of Informatics V. HPC-Lab. Session 2: OpenMP M. Bader, A. Breuer. Alex Breuer

Department of Informatics V. HPC-Lab. Session 2: OpenMP M. Bader, A. Breuer. Alex Breuer HPC-Lab Session 2: OpenMP M. Bader, A. Breuer Meetings Date Schedule 10/13/14 Kickoff 10/20/14 Q&A 10/27/14 Presentation 1 11/03/14 H. Bast, Intel 11/10/14 Presentation 2 12/01/14 Presentation 3 12/08/14

More information

Introduction to OpenMP. Tasks. N.M. Maclaren September 2017

Introduction to OpenMP. Tasks. N.M. Maclaren September 2017 2 OpenMP Tasks 2.1 Introduction Introduction to OpenMP Tasks N.M. Maclaren nmm1@cam.ac.uk September 2017 These were introduced by OpenMP 3.0 and use a slightly different parallelism model from the previous

More information

Synchronization. Event Synchronization

Synchronization. Event Synchronization Synchronization Synchronization: mechanisms by which a parallel program can coordinate the execution of multiple threads Implicit synchronizations Explicit synchronizations Main use of explicit synchronization

More information

OpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2.

OpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2. OpenMP Overview in 30 Minutes Christian Terboven 06.12.2010 / Aachen, Germany Stand: 03.12.2010 Version 2.3 Rechen- und Kommunikationszentrum (RZ) Agenda OpenMP: Parallel Regions,

More information

OpenMP - Introduction

OpenMP - Introduction OpenMP - Introduction Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı - 21.06.2012 Outline What is OpenMP? Introduction (Code Structure, Directives, Threads etc.) Limitations Data Scope Clauses Shared,

More information

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads... OPENMP PERFORMANCE 2 A common scenario... So I wrote my OpenMP program, and I checked it gave the right answers, so I ran some timing tests, and the speedup was, well, a bit disappointing really. Now what?.

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP Christian Terboven 10.04.2013 / Darmstadt, Germany Stand: 06.03.2013 Version 2.3 Rechen- und Kommunikationszentrum (RZ) History De-facto standard for

More information

1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008

1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008 1 of 6 Lecture 7: March 4 CISC 879 Software Support for Multicore Architectures Spring 2008 Lecture 7: March 4, 2008 Lecturer: Lori Pollock Scribe: Navreet Virk Open MP Programming Topics covered 1. Introduction

More information

Implementation of Parallelization

Implementation of Parallelization Implementation of Parallelization OpenMP, PThreads and MPI Jascha Schewtschenko Institute of Cosmology and Gravitation, University of Portsmouth May 9, 2018 JAS (ICG, Portsmouth) Implementation of Parallelization

More information

CS691/SC791: Parallel & Distributed Computing

CS691/SC791: Parallel & Distributed Computing CS691/SC791: Parallel & Distributed Computing Introduction to OpenMP 1 Contents Introduction OpenMP Programming Model and Examples OpenMP programming examples Task parallelism. Explicit thread synchronization.

More information

Lecture 4: OpenMP Open Multi-Processing

Lecture 4: OpenMP Open Multi-Processing CS 4230: Parallel Programming Lecture 4: OpenMP Open Multi-Processing January 23, 2017 01/23/2017 CS4230 1 Outline OpenMP another approach for thread parallel programming Fork-Join execution model OpenMP

More information

[Potentially] Your first parallel application

[Potentially] Your first parallel application [Potentially] Your first parallel application Compute the smallest element in an array as fast as possible small = array[0]; for( i = 0; i < N; i++) if( array[i] < small ) ) small = array[i] 64-bit Intel

More information

Introduction to. Slides prepared by : Farzana Rahman 1

Introduction to. Slides prepared by : Farzana Rahman 1 Introduction to OpenMP Slides prepared by : Farzana Rahman 1 Definition of OpenMP Application Program Interface (API) for Shared Memory Parallel Programming Directive based approach with library support

More information

Session 4: Parallel Programming with OpenMP

Session 4: Parallel Programming with OpenMP Session 4: Parallel Programming with OpenMP Xavier Martorell Barcelona Supercomputing Center Agenda Agenda 10:00-11:00 OpenMP fundamentals, parallel regions 11:00-11:30 Worksharing constructs 11:30-12:00

More information

OpenMP. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen

OpenMP. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen OpenMP Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS16/17 Worksharing constructs To date: #pragma omp parallel created a team of threads We distributed

More information

Speeding Up Reactive Transport Code Using OpenMP. OpenMP

Speeding Up Reactive Transport Code Using OpenMP. OpenMP Speeding Up Reactive Transport Code Using OpenMP By Jared McLaughlin OpenMP A standard for parallelizing Fortran and C/C++ on shared memory systems Minimal changes to sequential code required Incremental

More information

Shared Memory Programming With OpenMP Exercise Instructions

Shared Memory Programming With OpenMP Exercise Instructions Shared Memory Programming With OpenMP Exercise Instructions John Burkardt Interdisciplinary Center for Applied Mathematics & Information Technology Department Virginia Tech... Advanced Computational Science

More information

OpenACC Course. Office Hour #2 Q&A

OpenACC Course. Office Hour #2 Q&A OpenACC Course Office Hour #2 Q&A Q1: How many threads does each GPU core have? A: GPU cores execute arithmetic instructions. Each core can execute one single precision floating point instruction per cycle

More information

Towards OpenMP for Java

Towards OpenMP for Java Towards OpenMP for Java Mark Bull and Martin Westhead EPCC, University of Edinburgh, UK Mark Kambites Dept. of Mathematics, University of York, UK Jan Obdrzalek Masaryk University, Brno, Czech Rebublic

More information

NUMERICAL PARALLEL COMPUTING

NUMERICAL PARALLEL COMPUTING Lecture 4: More on OpenMP http://people.inf.ethz.ch/iyves/pnc11/ Peter Arbenz, Andreas Adelmann Computer Science Dept, ETH Zürich, E-mail: arbenz@inf.ethz.ch Paul Scherrer Institut, Villigen E-mail: andreas.adelmann@psi.ch

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP Ricardo Fonseca https://sites.google.com/view/rafonseca2017/ Outline Shared Memory Programming OpenMP Fork-Join Model Compiler Directives / Run time library routines Compiling and

More information

Masterpraktikum - High Performance Computing

Masterpraktikum - High Performance Computing Masterpraktikum - High Performance Computing OpenMP Michael Bader Alexander Heinecke Alexander Breuer Technische Universität München, Germany 2 #include ... #pragma omp parallel for for(i = 0; i

More information

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads... OPENMP PERFORMANCE 2 A common scenario... So I wrote my OpenMP program, and I checked it gave the right answers, so I ran some timing tests, and the speedup was, well, a bit disappointing really. Now what?.

More information

Data Environment: Default storage attributes

Data Environment: Default storage attributes COSC 6374 Parallel Computation Introduction to OpenMP(II) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel) Edgar Gabriel Fall 2014 Data Environment: Default storage attributes

More information

Multi-core Architecture and Programming

Multi-core Architecture and Programming Multi-core Architecture and Programming Yang Quansheng( 杨全胜 ) http://www.njyangqs.com School of Computer Science & Engineering 1 http://www.njyangqs.com Programming with OpenMP Content What is PpenMP Parallel

More information

Barbara Chapman, Gabriele Jost, Ruud van der Pas

Barbara Chapman, Gabriele Jost, Ruud van der Pas Using OpenMP Portable Shared Memory Parallel Programming Barbara Chapman, Gabriele Jost, Ruud van der Pas The MIT Press Cambridge, Massachusetts London, England c 2008 Massachusetts Institute of Technology

More information

OpenMP 4.5 target. Presenters: Tom Scogland Oscar Hernandez. Wednesday, June 28 th, Credits for some of the material

OpenMP 4.5 target. Presenters: Tom Scogland Oscar Hernandez. Wednesday, June 28 th, Credits for some of the material OpenMP 4.5 target Wednesday, June 28 th, 2017 Presenters: Tom Scogland Oscar Hernandez Credits for some of the material IWOMP 2016 tutorial James Beyer, Bronis de Supinski OpenMP 4.5 Relevant Accelerator

More information

First Experiences with Intel Cluster OpenMP

First Experiences with Intel Cluster OpenMP First Experiences with Intel Christian Terboven, Dieter an Mey, Dirk Schmidl, Marcus Wagner surname@rz.rwth aachen.de Center for Computing and Communication RWTH Aachen University, Germany IWOMP 2008 May

More information

Introduction to OpenMP

Introduction to OpenMP 1 / 7 Introduction to OpenMP: Exercises and Handout Introduction to OpenMP Christian Terboven Center for Computing and Communication, RWTH Aachen University Seffenter Weg 23, 52074 Aachen, Germany Abstract

More information