Debugging with TotalView Dieter an Mey Center for Computing and Communication Aachen University of Technology anmey@rz.rwth-aachen.de 1 TotalView, Dieter an Mey, SunHPC 2006
Debugging on Sun dbx line mode debugger serial and multi-threaded not covered here Sun Forte IDE debugger based on dbx not covered here TotalView (Etnus) prime serial + parallel debugger (MPI + multi-threaded) on many platforms new memory debugging features (not covered here) DDT (Allinea) serial and MPI a newcomer, available on Linux and Solaris not covered here 2 TotalView, Dieter an Mey, SunHPC 2006
Help TotalView is a commercial debugger of Etnus Inc.: www.etnus.com TotalView Users Guide TotalView Online Help Appendix in RWTH Primer http://www.rz.rwth-aachen.de/hpc/primer Online Tutorial LLNL http:// www.llnl.gov/computing/tutorials/totalview/ 3 TotalView, Dieter an Mey, SunHPC 2006
Start of TotalView on the Sun Fire Current Version is 8.0 Sun Compilers are supported from Studio 7 on Compile with-g and without optimization Sun HPC ClusterTools V5/V6 (MPI) are supported Start executable with totalview a.out totalview a.out a args totalview mprun -a -np 8... a.out # MPI totalview mprun -a -np 8... a.out args # MPI totalview a.out core # investigate a core file totalview # and attach a running program 4 TotalView, Dieter an Mey, SunHPC 2006
TotalView Windows - Overview Root Window Variable Window Stack Trace Stack Frame Process Window Expression List Terminal Session 5 TotalView, Dieter an Mey, SunHPC 2006 Action Points
Root Window 6 TotalView, Dieter an Mey, SunHPC 2006
Status + ID Process Window Stack Trace Stack + Register Threads Action Points 7 TotalView, Dieter an Mey, SunHPC 2006
Variable Window (1) RMB: Dive in New Window Type casting 8 TotalView, Dieter an Mey, SunHPC 2006
Variable Window (2) Slicing Filtering 9 TotalView, Dieter an Mey, SunHPC 2006
Surface View Window Tools - Visualize 10 TotalView, Dieter an Mey, SunHPC 2006
Surface View Window 11 TotalView, Dieter an Mey, SunHPC 2006
Watchpoint if ( k%50==0 ) { $visualize (u); $stop; } 12 TotalView, Dieter an Mey, SunHPC 2006
OpenMP - Debugging 13 TotalView, Dieter an Mey, SunHPC 2006
Debugging OpenMP Programs? If you want to debug your OpenMP code try to avoid using TotalView Well try to avoid to use any debugger with OpenMP Debug your serial program first. And then look out for data races which may not show up during the debugging process If you still want to debug your OpenMP code you may want to use TotalView 14 TotalView, Dieter an Mey, SunHPC 2006
OpenMP Debugging Recepy 1 of 2 Prepare the serial code Carefully select a reasonable test case! Is the serial program delivering the right results? ( with optimization turned on? ) How about compiler warnings? Fortran: Put all local variables on the stack first. Now try the OpenMP version Need to encrease the stacksize limits? export STACKSIZE=... # not yet standardized; ulimit s... Respect compiler messages (your compiler may have switches to turn on excessive checking) Fortran: USE omp_lib Try the OpenMP dummy library, which your compiler may provide. (Sun Studio: link with lompstubs ) 15 TotalView, Dieter an Mey, SunHPC 2006
OpenMP Debugging Recepy 2 of 2 Is the OpenMP program running well with a single thread? Is the OpenMP program running correctly sometimes with more than one thread? Race Conditions? Thread Safety? Use of static or global variables within a parallel region? (f90: SAVE, DATA, initializations,..., C: static, extern ) Use a data race detection tool (Intel Thread Checker, Sun Studio 12) Turn on and off single parallel regions! serialise parts of long parallel regions (single directive) introduce additional barriers for testing Different rounding errors matter? Turn off certain compiler optimizations Don t parallelize reductions 16 TotalView, Dieter an Mey, SunHPC 2006
Data Races The typical OpenMP programming errors: Data Races One thread modifies a memory location, which another thread reads or writes in the same region (between 2 synchronisation points). Take care: The sequence of the execution of parallel loop iterations is non deterministic and may change from run to run. Necessary condition for parallelizing a loop: The serial code should give the same answers, when running the loop backwards. Data race detection tools trace memory references and detect possible data races which may never occur while you step through your code with a debugger In many cases private clauses, barriers, or critical regions are missing. 17 TotalView, Dieter an Mey, SunHPC 2006
Using TotalView with OpenMP See TotalView User s Guide: Each parallel region is outlined into a separate routine Each parallel loop is outlined into a separate routine The names of these outlined routines base on the original name of the calling routine and the line number of the parallel directive Shared variables are declared in the calling routine and passed to the outlined routine. Private variables are declared in the outlined routine. The slave threads are generated on entry of the parallel region You must not step into a parallel region, but run into a previously defined breakpoint. 18 TotalView, Dieter an Mey, SunHPC 2006
Example OpenMP-Program x = 43.0 h = 1.0d0 / n sum = 0.0d0!$omp parallel do private(i,x) reduction(+:sum) shared(n,h) do i = 1,n x = h * ( i - 0.5 ) sum = sum + 4.0d0 / ( 1.0d0 + x * x ) end do!$omp end parallel do pi = h * sum x =? In a parallel region (loop...), when watching a shared variable, look at the variable in the original routine. when watching a private variable, look at that variable in the outlined routine. In OpenMP V2.5 the private version of the master thread may share the memory location of the original variable. In OpenMP V3.0 this may no longer be allowed. 19 TotalView, Dieter an Mey, SunHPC 2006
MPI - Debugging 20 TotalView, Dieter an Mey, SunHPC 2006
Start Process Window Root Window 21 TotalView, Dieter an Mey, SunHPC 2006
Start Process Window Root Window 22 TotalView, Dieter an Mey, SunHPC 2006
Start Process Window Root Window 23 TotalView, Dieter an Mey, SunHPC 2006
Root Window Open another process window by right clicking and selecting Dive in New Window Process ID State: B Breakpoint (stopped) E Error (stopped) H Hold I Idle K in Kernel M Mixed Name R Running S Sleeping T Stopped W Watchpoint (stopped) Z Zombie 24 TotalView, Dieter an Mey, SunHPC 2006
Process Window Switch between processes by clicking on P- or P+ Switch between threads by clicking on T- or T+ 25 TotalView, Dieter an Mey, SunHPC 2006
Process Window Evaluation Point Breakpoint Barrier Conditional Watchpoint Unconditional Watchpoint 26 TotalView, Dieter an Mey, SunHPC 2006
Laminate Variables Variable my_rank Contains the id of each process 27 TotalView, Dieter an Mey, SunHPC 2006
Laminate Variables 28 TotalView, Dieter an Mey, SunHPC 2006
Laminate Variables 29 TotalView, Dieter an Mey, SunHPC 2006
Summary Debugging of Parallel codes: Parallelize carefully! Parallelization adds one dimension to the error space Let the compiler statically analyse your code + watch the messag Check the interfaces In the case of MPI export MPI_SHOW_ERRORS=1; export MPI_CHECK_ARGS=1 There are deadlock detection tools out there In the case of OpenMP: Most likely, using a debugger on OpenMP codes is not necessary. If it is, TotalView might help you. Never put an OpenMP Code into production without checking for data races beforehand! 30 TotalView, Dieter an Mey, SunHPC 2006
MPO Demo on Opteron-based Systems $ cd /home/hpc/kurse/sunhpc2007/mpo/f (or../c) $ gmake n build $ gmake build $ gmake n go $ gmake go ----------------------------------------------------------- export OMP_NUM_THREADS=4; export SUNW_MP_PROCBIND="0 2 4 6"; # Running Jacobi without memory placement optimizations (MPO) echo '2000,2000\n0.8\n1.0\n1e-12\n100\nF\nF\n' jacobi2.x # Running Jacobi with memory placement optimizations by initializing data in parallel (first touch) echo '2000,2000\n0.8\n1.0\n1e-12\n100\nT\nF\n' jacobi2.x ) # Running Jacobi with memory placement optimizations by using the madvise API call (next touch) echo '2000,2000\n0.8\n1.0\n1e-12\n100\nF\nT\n' jacobi2.x ) # Running Jacobi with MPO by using the madv.so library to distribute data export LD_PRELOAD=madv.so.1; export MADV=access_many echo '2000,2000\n0.8\n1.0\n1e-12\n100\nF\nF\n' jacobi2.x ) Fortran / C Performance 1190 / 1290 Mflop/s 2960 / 3240 Mflop/s 2810 / 3060 Mflop/s 2130 / 2160 Mflop/s 31 TotalView, Dieter an Mey, SunHPC 2006