Debugging with TotalView

Similar documents
Debugging OpenMP Programs

Tools for OpenMP Programming

Debugging with TotalView

!OMP #pragma opm _OPENMP

Debugging with Totalview. Martin Čuma Center for High Performance Computing University of Utah

TotalView. Debugging Tool Presentation. Josip Jakić

OPENMP TIPS, TRICKS AND GOTCHAS

OPENMP TIPS, TRICKS AND GOTCHAS

Parallel Debugging with TotalView BSC-CNS

Hands-on Workshop on How To Debug Codes at the Institute

Automatic Scoping of Variables in Parallel Regions of an OpenMP Program

Le Yan Louisiana Optical Network Initiative. 8/3/2009 Scaling to Petascale Virtual Summer School

<Insert Picture Here> OpenMP on Solaris

High Performance Computing on Windows. Debugging with VS2005 Debugging parallel programs. Christian Terboven

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

Performance Tools for Technical Computing

Welcome. HRSK Practical on Debugging, Zellescher Weg 12 Willers-Bau A106 Tel

Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems. Ed Hinkel Senior Sales Engineer

OpenMP programming Part II. Shaohao Chen High performance Louisiana State University

First Experiences with Intel Cluster OpenMP

C C V OpenMP V3.0. Dieter an Mey Center for Computing and Communication, RWTH Aachen University, Germany

Parallel Programming: OpenMP

Introduction to OpenMP

TotalView. Users Guide. August 2001 Version 5.0

Introduction to OpenMP

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP

Debugging with GDB and DDT

OpenMP Tutorial. Dirk Schmidl. IT Center, RWTH Aachen University. Member of the HPC Group Christian Terboven

HPCC - Hrothgar. Getting Started User Guide TotalView. High Performance Computing Center Texas Tech University

Debugging with GDB and DDT

Implementation of Parallelization

OpenMP at Sun. EWOMP 2000, Edinburgh September 14-15, 2000 Larry Meadows Sun Microsystems

HPC Tools on Windows. Christian Terboven Center for Computing and Communication RWTH Aachen University.

ECMWF Workshop on High Performance Computing in Meteorology. 3 rd November Dean Stewart

Debugging Applications Using Totalview

SHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008

Introduction to debugging. Martin Čuma Center for High Performance Computing University of Utah

OpenMP on Ranger and Stampede (with Labs)

Compiling and running OpenMP programs. C/C++: cc fopenmp o prog prog.c -lomp CC fopenmp o prog prog.c -lomp. Programming with OpenMP*

Binding Nested OpenMP Programs on Hierarchical Memory Architectures

Two OpenMP Programming Patterns

Debugging Serial and Parallel Programs with Visual Studio

Improving the Productivity of Scalable Application Development with TotalView May 18th, 2010

Amdahl s Law. AMath 483/583 Lecture 13 April 25, Amdahl s Law. Amdahl s Law. Today: Amdahl s law Speed up, strong and weak scaling OpenMP

DDT: A visual, parallel debugger on Ra

Parallel Programming

OpenMP Case Studies. Dieter an Mey. Center for Computing and Communication Aachen University

SGI Altix Getting Correct Code Reiner Vogelsang SGI GmbH

Lab: Scientific Computing Tsunami-Simulation

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...

CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)

OpenMP 4.0/4.5: New Features and Protocols. Jemmy Hu

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...

OpenMP Shared Memory Programming

HPC on Windows. Visual Studio 2010 and ISV Software

A Tutorial for ECE 175

Debugging Intel Xeon Phi KNC Tutorial

Department of Informatics V. HPC-Lab. Session 2: OpenMP M. Bader, A. Breuer. Alex Breuer

TotalView Debugger New Features Guide. version 8.4.0

OpenACC Course. Office Hour #2 Q&A

OpenMP: Open Multiprocessing

Debugging, benchmarking, tuning i.e. software development tools. Martin Čuma Center for High Performance Computing University of Utah

DDT Debugging Techniques

Debugging CUDA Applications with Allinea DDT. Ian Lumb Sr. Systems Engineer, Allinea Software Inc.

Debugging. P.Dagna, M.Cremonesi. May 2015

Introduction to OpenMP

Allinea Unified Environment

CS691/SC791: Parallel & Distributed Computing

GPU Debugging Made Easy. David Lecomber CTO, Allinea Software

OpenMP Introduction. CS 590: High Performance Computing. OpenMP. A standard for shared-memory parallel programming. MP = multiprocessing

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

Debugging Programs Accelerated with Intel Xeon Phi Coprocessors

Introduction to OpenMP.

Oracle Developer Studio 12.6

Shared Memory Programming With OpenMP Exercise Instructions

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing

OpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

Shared memory programming

OpenMP programming. Thomas Hauser Director Research Computing Research CU-Boulder

Allinea DDT Debugger. Dan Mazur, McGill HPC March 5,

C++ and OpenMP. 1 ParCo 07 Terboven C++ and OpenMP. Christian Terboven. Center for Computing and Communication RWTH Aachen University, Germany

A Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh

Shared Memory Programming with OpenMP

Data Environment: Default storage attributes

IBM PSSC Montpellier Customer Center. Content

OpenMP. Dr. William McDoniel and Prof. Paolo Bientinesi WS17/18. HPAC, RWTH Aachen

Shared Memory programming paradigm: openmp

Introduction to OpenMP

Introduction to OpenMP

OpenMP: Open Multiprocessing

Shared Memory Programming With OpenMP Computer Lab Exercises

OpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2.

Introduction to OpenMP. Lecture 2: OpenMP fundamentals

Short Introduction to Debugging Tools on the Cray XC40

Introduction to OpenMP

Understanding Dynamic Parallelism

Debugging HPC Applications. David Lecomber CTO, Allinea Software

Using Intel VTune Amplifier XE for High Performance Computing

[Potentially] Your first parallel application

Lecture 4: OpenMP Open Multi-Processing

Transcription:

Debugging with TotalView Dieter an Mey Center for Computing and Communication Aachen University of Technology anmey@rz.rwth-aachen.de 1 TotalView, Dieter an Mey, SunHPC 2006

Debugging on Sun dbx line mode debugger serial and multi-threaded not covered here Sun Forte IDE debugger based on dbx not covered here TotalView (Etnus) prime serial + parallel debugger (MPI + multi-threaded) on many platforms new memory debugging features (not covered here) DDT (Allinea) serial and MPI a newcomer, available on Linux and Solaris not covered here 2 TotalView, Dieter an Mey, SunHPC 2006

Help TotalView is a commercial debugger of Etnus Inc.: www.etnus.com TotalView Users Guide TotalView Online Help Appendix in RWTH Primer http://www.rz.rwth-aachen.de/hpc/primer Online Tutorial LLNL http:// www.llnl.gov/computing/tutorials/totalview/ 3 TotalView, Dieter an Mey, SunHPC 2006

Start of TotalView on the Sun Fire Current Version is 8.0 Sun Compilers are supported from Studio 7 on Compile with-g and without optimization Sun HPC ClusterTools V5/V6 (MPI) are supported Start executable with totalview a.out totalview a.out a args totalview mprun -a -np 8... a.out # MPI totalview mprun -a -np 8... a.out args # MPI totalview a.out core # investigate a core file totalview # and attach a running program 4 TotalView, Dieter an Mey, SunHPC 2006

TotalView Windows - Overview Root Window Variable Window Stack Trace Stack Frame Process Window Expression List Terminal Session 5 TotalView, Dieter an Mey, SunHPC 2006 Action Points

Root Window 6 TotalView, Dieter an Mey, SunHPC 2006

Status + ID Process Window Stack Trace Stack + Register Threads Action Points 7 TotalView, Dieter an Mey, SunHPC 2006

Variable Window (1) RMB: Dive in New Window Type casting 8 TotalView, Dieter an Mey, SunHPC 2006

Variable Window (2) Slicing Filtering 9 TotalView, Dieter an Mey, SunHPC 2006

Surface View Window Tools - Visualize 10 TotalView, Dieter an Mey, SunHPC 2006

Surface View Window 11 TotalView, Dieter an Mey, SunHPC 2006

Watchpoint if ( k%50==0 ) { $visualize (u); $stop; } 12 TotalView, Dieter an Mey, SunHPC 2006

OpenMP - Debugging 13 TotalView, Dieter an Mey, SunHPC 2006

Debugging OpenMP Programs? If you want to debug your OpenMP code try to avoid using TotalView Well try to avoid to use any debugger with OpenMP Debug your serial program first. And then look out for data races which may not show up during the debugging process If you still want to debug your OpenMP code you may want to use TotalView 14 TotalView, Dieter an Mey, SunHPC 2006

OpenMP Debugging Recepy 1 of 2 Prepare the serial code Carefully select a reasonable test case! Is the serial program delivering the right results? ( with optimization turned on? ) How about compiler warnings? Fortran: Put all local variables on the stack first. Now try the OpenMP version Need to encrease the stacksize limits? export STACKSIZE=... # not yet standardized; ulimit s... Respect compiler messages (your compiler may have switches to turn on excessive checking) Fortran: USE omp_lib Try the OpenMP dummy library, which your compiler may provide. (Sun Studio: link with lompstubs ) 15 TotalView, Dieter an Mey, SunHPC 2006

OpenMP Debugging Recepy 2 of 2 Is the OpenMP program running well with a single thread? Is the OpenMP program running correctly sometimes with more than one thread? Race Conditions? Thread Safety? Use of static or global variables within a parallel region? (f90: SAVE, DATA, initializations,..., C: static, extern ) Use a data race detection tool (Intel Thread Checker, Sun Studio 12) Turn on and off single parallel regions! serialise parts of long parallel regions (single directive) introduce additional barriers for testing Different rounding errors matter? Turn off certain compiler optimizations Don t parallelize reductions 16 TotalView, Dieter an Mey, SunHPC 2006

Data Races The typical OpenMP programming errors: Data Races One thread modifies a memory location, which another thread reads or writes in the same region (between 2 synchronisation points). Take care: The sequence of the execution of parallel loop iterations is non deterministic and may change from run to run. Necessary condition for parallelizing a loop: The serial code should give the same answers, when running the loop backwards. Data race detection tools trace memory references and detect possible data races which may never occur while you step through your code with a debugger In many cases private clauses, barriers, or critical regions are missing. 17 TotalView, Dieter an Mey, SunHPC 2006

Using TotalView with OpenMP See TotalView User s Guide: Each parallel region is outlined into a separate routine Each parallel loop is outlined into a separate routine The names of these outlined routines base on the original name of the calling routine and the line number of the parallel directive Shared variables are declared in the calling routine and passed to the outlined routine. Private variables are declared in the outlined routine. The slave threads are generated on entry of the parallel region You must not step into a parallel region, but run into a previously defined breakpoint. 18 TotalView, Dieter an Mey, SunHPC 2006

Example OpenMP-Program x = 43.0 h = 1.0d0 / n sum = 0.0d0!$omp parallel do private(i,x) reduction(+:sum) shared(n,h) do i = 1,n x = h * ( i - 0.5 ) sum = sum + 4.0d0 / ( 1.0d0 + x * x ) end do!$omp end parallel do pi = h * sum x =? In a parallel region (loop...), when watching a shared variable, look at the variable in the original routine. when watching a private variable, look at that variable in the outlined routine. In OpenMP V2.5 the private version of the master thread may share the memory location of the original variable. In OpenMP V3.0 this may no longer be allowed. 19 TotalView, Dieter an Mey, SunHPC 2006

MPI - Debugging 20 TotalView, Dieter an Mey, SunHPC 2006

Start Process Window Root Window 21 TotalView, Dieter an Mey, SunHPC 2006

Start Process Window Root Window 22 TotalView, Dieter an Mey, SunHPC 2006

Start Process Window Root Window 23 TotalView, Dieter an Mey, SunHPC 2006

Root Window Open another process window by right clicking and selecting Dive in New Window Process ID State: B Breakpoint (stopped) E Error (stopped) H Hold I Idle K in Kernel M Mixed Name R Running S Sleeping T Stopped W Watchpoint (stopped) Z Zombie 24 TotalView, Dieter an Mey, SunHPC 2006

Process Window Switch between processes by clicking on P- or P+ Switch between threads by clicking on T- or T+ 25 TotalView, Dieter an Mey, SunHPC 2006

Process Window Evaluation Point Breakpoint Barrier Conditional Watchpoint Unconditional Watchpoint 26 TotalView, Dieter an Mey, SunHPC 2006

Laminate Variables Variable my_rank Contains the id of each process 27 TotalView, Dieter an Mey, SunHPC 2006

Laminate Variables 28 TotalView, Dieter an Mey, SunHPC 2006

Laminate Variables 29 TotalView, Dieter an Mey, SunHPC 2006

Summary Debugging of Parallel codes: Parallelize carefully! Parallelization adds one dimension to the error space Let the compiler statically analyse your code + watch the messag Check the interfaces In the case of MPI export MPI_SHOW_ERRORS=1; export MPI_CHECK_ARGS=1 There are deadlock detection tools out there In the case of OpenMP: Most likely, using a debugger on OpenMP codes is not necessary. If it is, TotalView might help you. Never put an OpenMP Code into production without checking for data races beforehand! 30 TotalView, Dieter an Mey, SunHPC 2006

MPO Demo on Opteron-based Systems $ cd /home/hpc/kurse/sunhpc2007/mpo/f (or../c) $ gmake n build $ gmake build $ gmake n go $ gmake go ----------------------------------------------------------- export OMP_NUM_THREADS=4; export SUNW_MP_PROCBIND="0 2 4 6"; # Running Jacobi without memory placement optimizations (MPO) echo '2000,2000\n0.8\n1.0\n1e-12\n100\nF\nF\n' jacobi2.x # Running Jacobi with memory placement optimizations by initializing data in parallel (first touch) echo '2000,2000\n0.8\n1.0\n1e-12\n100\nT\nF\n' jacobi2.x ) # Running Jacobi with memory placement optimizations by using the madvise API call (next touch) echo '2000,2000\n0.8\n1.0\n1e-12\n100\nF\nT\n' jacobi2.x ) # Running Jacobi with MPO by using the madv.so library to distribute data export LD_PRELOAD=madv.so.1; export MADV=access_many echo '2000,2000\n0.8\n1.0\n1e-12\n100\nF\nF\n' jacobi2.x ) Fortran / C Performance 1190 / 1290 Mflop/s 2960 / 3240 Mflop/s 2810 / 3060 Mflop/s 2130 / 2160 Mflop/s 31 TotalView, Dieter an Mey, SunHPC 2006