Le Yan Louisiana Optical Network Initiative. 8/3/2009 Scaling to Petascale Virtual Summer School

Similar documents
Debugging with TotalView

TotalView. Debugging Tool Presentation. Josip Jakić

Introduction to debugging. Martin Čuma Center for High Performance Computing University of Utah

Parallel Debugging with TotalView BSC-CNS

Debugging with TotalView

Debugging for the hybrid-multicore age (A HPC Perspective) David Lecomber CTO, Allinea Software

Debugging with GDB and DDT

Debugging HPC Applications. David Lecomber CTO, Allinea Software

Debugging Applications Using Totalview

Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems. Ed Hinkel Senior Sales Engineer

Debugging with GDB and DDT

HPCC - Hrothgar. Getting Started User Guide TotalView. High Performance Computing Center Texas Tech University

Hands-on Workshop on How To Debug Codes at the Institute

DDT Debugging Techniques

GPU Debugging Made Easy. David Lecomber CTO, Allinea Software

Supplement: Visual C++ Debugging

Debugging at Scale Lindon Locks

Allinea DDT Debugger. Dan Mazur, McGill HPC March 5,

SHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008

IBM PSSC Montpellier Customer Center. Content

Improving the Productivity of Scalable Application Development with TotalView May 18th, 2010

Parallel Debugging. ª Objective. ª Contents. ª Learn the basics of debugging parallel programs

ECMWF Workshop on High Performance Computing in Meteorology. 3 rd November Dean Stewart

Debugging with Totalview. Martin Čuma Center for High Performance Computing University of Utah

Scalable Debugging with TotalView on Blue Gene. John DelSignore, CTO TotalView Technologies

Scalable Interaction with Parallel Applications

Allinea Unified Environment

SGI Altix Getting Correct Code Reiner Vogelsang SGI GmbH

CS2141 Software Development using C/C++ Debugging

DDT: A visual, parallel debugger on Ra

Short Introduction to Debugging Tools on the Cray XC40

Optimization of MPI Applications Rolf Rabenseifner

Welcome. HRSK Practical on Debugging, Zellescher Weg 12 Willers-Bau A106 Tel

Eliminate Threading Errors to Improve Program Stability

Debugging CUDA Applications with Allinea DDT. Ian Lumb Sr. Systems Engineer, Allinea Software Inc.

TotalView Users Guide. version 8.8

Intro to MS Visual C++ Debugging

7 The Integrated Debugger

3 TUTORIAL. In This Chapter. Figure 1-0. Table 1-0. Listing 1-0.

Eliminate Threading Errors to Improve Program Stability

Introduction to Parallel Performance Engineering

Tutorial: Analyzing MPI Applications. Intel Trace Analyzer and Collector Intel VTune Amplifier XE

Debugging. John Lockman Texas Advanced Computing Center

NetBeans Tutorial. For Introduction to Java Programming By Y. Daniel Liang. This tutorial applies to NetBeans 6, 7, or a higher version.

OPENMP TIPS, TRICKS AND GOTCHAS

Performance Analysis with Periscope

Welcomes PRACE/LinkSCEEM 2011 Winter School Jacques Philouze Vice President Sales & Marketing

GPU Technology Conference Three Ways to Debug Parallel CUDA Applications: Interactive, Batch, and Corefile

Message Passing Programming. Designing MPI Applications

MIT OpenCourseWare Multicore Programming Primer, January (IAP) Please use the following citation format:

Memory & Thread Debugger

Debugging OpenMP Programs

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

Debugging on Blue Waters

Laboratory Assignment #4 Debugging in Eclipse CDT 1

OPENMP TIPS, TRICKS AND GOTCHAS

Program Debugging. David Newman (from Tom Logan Slides from Ed Kornkven)

A Tutorial for ECE 175

This guide will show you how to use Intel Inspector XE to identify and fix resource leak errors in your programs before they start causing problems.

Jackson Marusarz Software Technical Consulting Engineer

Announcement. Exercise #2 will be out today. Due date is next Monday

Understanding Dynamic Parallelism

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP

Eliminate Memory Errors and Improve Program Stability

Using Intel VTune Amplifier XE and Inspector XE in.net environment

TotalView. Users Guide. August 2001 Version 5.0

Virtual Memory COMPSCI 386

Lab 0 Introduction to the MSP430F5529 Launchpad-based Lab Board and Code Composer Studio

Eliminate Memory Errors to Improve Program Stability

GDB Tutorial. A Walkthrough with Examples. CMSC Spring Last modified March 22, GDB Tutorial

Tools and Methodology for Ensuring HPC Programs Correctness and Performance. Beau Paisley

COSC 6374 Parallel Computation. Debugging MPI applications. Edgar Gabriel. Spring 2008

Typical Bugs in parallel Programs

Debugging Your CUDA Applications With CUDA-GDB

Short Introduction to Tools on the Cray XC systems

TotalView. User Guide. June 2004 Version 6.5

IBM VisualAge for Java,Version3.5. Distributed Debugger for Workstations

Acknowledgments. Amdahl s Law. Contents. Programming with MPI Parallel programming. 1 speedup = (1 P )+ P N. Type to enter text

PTP - PLDT Parallel Language Development Tools Overview, Status & Plans

Using Java for Scientific Computing. Mark Bul EPCC, University of Edinburgh

Scaling Tuple-Space Communication in the Distributive Interoperable Executive Library. Jason Coan, Zaire Ali, David White and Kwai Wong

TotalView Training. Developing parallel, data-intensive applications is hard. We make it easier. Copyright 2012 Rogue Wave Software, Inc.

Computer Science II Lab 3 Testing and Debugging

Guillimin HPC Users Meeting July 14, 2016

Performance analysis with Periscope

Short Introduction to tools on the Cray XC system. Making it easier to port and optimise apps on the Cray XC30

Debugging. P.Dagna, M.Cremonesi. May 2015

Designing Parallel Programs. This review was developed from Introduction to Parallel Computing

Allows program to be incrementally parallelized

Parallel Performance and Optimization

The Concurrency Viewpoint

Visual Debugging of MPI Applications

Using Eclipse and the

CERN IT Technical Forum

6.828: OS/Language Co-design. Adam Belay

Flow of Control: Loops

CS691/SC791: Parallel & Distributed Computing

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing

Introduction to C/C++ Programming

NightStar. NightView Source Level Debugger. Real-Time Linux Debugging and Analysis Tools BROCHURE

Transcription:

Parallel Debugging Techniques Le Yan Louisiana Optical Network Initiative 8/3/2009 Scaling to Petascale Virtual Summer School

Outline Overview of parallel debugging Challenges Tools Strategies Gtf Get familiar with TtlVi TotalView/DDT through h hands on exercises 8/3/2009 Scaling to Petascale Virtual Summer School 1

Bugs in Parallel Programming Parallel programs are prone to the usual bugs found in sequential programs Improper p pointer usage Stepping over array bounds Infinite loops Plus 8/3/2009 Scaling to Petascale Virtual Summer School 2

Common Types of Bugs in Parallel Programming Erroneous use of language features Mismatched parameters, missing mandatory calls etc. Defective space decomposition Incorrect/improper p synchronization Hidden serialization http://www.hpcbugbase.org/index.php/main_page 8/3/2009 Scaling to Petascale Virtual Summer School 3

Debugging Essentials Reproducibility Find the scenario where the error is reproducible Reduction Reduce the problem to its essence Deduction Form hypotheses on what the problem mightbe Experimentation Filterout invalid hypotheses Terrence Parr, Learn The Essentials of Debugging http://www.ibm.com/developerworks/web/library/wa debug.html?ca=dgr lnxw03dbug 8/3/2009 Scaling to Petascale Virtual Summer School 4

Challenges in Parallel Debugging Reproducibility Many problems cannot be easily reproduced Reduction Smallest estscale might tstill betoo large geand dcomplex to handle Deduction Need to consider concurrent and interdependent program instances Experimentation Cyclic debugging might be very expensive 8/3/2009 Scaling to Petascale Virtual Summer School 5

A Nasty Little Bug integer*4 :: ii i,ista,iendi integer*4 :: chunksize=1024*1024 call MPI_Comm_Rank(MPI_COMM_WORLD, & myrank,error) ista=myrank*chunksize+1 iend=(myrank+1)*chunksize do i = it ista,iendi enddo What is the potential problem? 8/3/2009 Scaling to Petascale Virtual Summer School 6

A Nasty Little Bug g integer*4 :: ii i,ista,iendi integer*4 :: chunksize=1024*1024 call MPI_Comm_Rank(MPI_COMM_WORLD, & g myrank,error) ista=myrank*chunksize+1 iend=(myrank+1)*chunksize do i = it ista,iendi enddo Integer overflow if myrank 4096! A bug that shows up only when running with more than 4096 cores 8/3/2009 Scaling to Petascale Virtual Summer School 7

printf/write Debugging Extremely easy to use, therefore dangerously attractive, but Need to edit, recompile and rerun when additional information is desired May change program behavior Only capable of displaying a subset of the program s state Output size grows rapidly with increasing core count and harder to comprehend Not scalable, not recommended 8/3/2009 Scaling to Petascale Virtual Summer School 8

Compilers Can Help Most compilers can (at runtime) Check array bounds Trap floating operation errors Provide traceback information Relatively scalable, but Overhead added Limited capability Non interactive ti 8/3/2009 Scaling to Petascale Virtual Summer School 9

Parallel Debuggers Capable of what serials s debuggers can do Control program execution Set action points View/change values of variables More importantly Control program execution at various levels Group/process/thread View MPImessage queues 8/3/2009 Scaling to Petascale Virtual Summer School 10

An Ideal Parallel Debugger Should allow easy process/thread control and navigation Should support multiplehighperformance performance computing platforms Should not limit the number of processes being debugged and should allow it to vary at runtime 8/3/2009 Scaling to Petascale Virtual Summer School 11

How Parallel Debuggers Work Frontend GUI Debugger engine Debugger Agents User processes Agent Agent Agent Control application processes Send data back Db i to the debugger GUI engine to analyze Compute nodes Debugger engine Interactive node 8/3/2009 Scaling to Petascale Virtual Summer School 12

At Very Large Scale The debugger itself becomes a large parallel application Bottlenecks Debugger framework startup cost Communication i between frontend and agents Access to shared resources, e.g. file system 8/3/2009 Scaling to Petascale Virtual Summer School 13

Validation Is Crucial Have a solid validation procedure to check the correctness Test smaller components before putting them together 8/3/2009 Scaling to Petascale Virtual Summer School 14

General Parallel Debugging Strategy Incremental ce e tadebuggingg Downscale if possible Participating processes, problem size and/or number of iterationsti Example: run with one single thread to detect scope errors in OpenMP programs Add more instances to reveal other issues Example: run MPI programs on more than one node to detect problems introduced by network delays 8/3/2009 Scaling to Petascale Virtual Summer School 15

Strategy at Large Scale Again, downscale if possible Reduce the number of processes to which the debugger is attached Reduces overhead Reduces the required number of license seats as well Focus on one or a small number of processes/threads Analyze call path and message queues to find problematic processes Control the execution of as few processes/threads as possible while keeping others running Provides the context where the error occurs 8/3/2009 Scaling to Petascale Virtual Summer School 16

Trends in Debugging Technology Lightweight trace analysis tools Help to identify processes/threads that have similar behavior and reduce the search space Complementary to full feature debuggers Example: Stack Trace Analysis Tool (STAT) Replay/Reverse execution ReplayEngine now available from TotalView Checkpointing supported in DDT 2.4 Post mortem statistical analysis Detect anomaliesby analyzingprofile dissimilarity of multiple runs 8/3/2009 Scaling to Petascale Virtual Summer School 17

Hands on Exercise Debug MPI and dopenmp programs poga sthat atsolve a simple problem to get familiar with Basic functionalities of parallel debuggers TotalView: Pople, Bluefire and Athena DDT: Ranger Some common types of bugs in parallel programming Programs and instructions can be found at http://www.cct.lsu.edu/~lyan1/summerschool09 p// / / 8/3/2009 Scaling to Petascale Virtual Summer School 18

Problem 0 1 2 3 4 5 6 7 8 4 5 A 1 D periodic array with N elements Initial value C: cell(x)=x%10 Fortran: cell(x)=mod(x 1,10) In each iteration, all elements are updated with the value of two adjacent elements: cell(x)i+1=[cell(x 1)i+cell(x+1)i]%10 Execute Niter iterations The final outputs are the global maximum and average http://www.hpcbugbase.org/index.php/main_page 8/3/2009 Scaling to Petascale Virtual Summer School 19

Sequential Program Use an integer array ayto hold odcurrent values aues Use another integer array to hold the calculated values Swap the pointers at the end of each iteration The result is used to check the correctness of the parallel programs Chances are that we will not have such a luxury for large jobs 8/3/2009 Scaling to Petascale Virtual Summer School 20

MPI Program 0 1 2 3 4 5 6 7 8 4 5 5 0 1 2 5 6 5 6 7 8 2 3 7 8 9 0 5 0 Process 1 Process 2 Process n Divide id the array among n processes Each process works on its local array Exchange boundary data with neighbor processes at the end of each iteration Ring topology 8/3/2009 Scaling to Petascale Virtual Summer School 21

OpenMP Program 0 1 2 3 4 5 6 7 8 4 5 Thread 0 Thread 1 Thread n Each thread works on its own part of the global array All threads have access to the entire array, so no data exchange is necessary 8/3/2009 Scaling to Petascale Virtual Summer School 22

Three Ways to Start TotalView/DDT Start with core dumps Startby attaching to one or more running processes Start the executable within TotalView/DDT 8/3/2009 Scaling to Petascale Virtual Summer School 23

TotalView Root Window Host name Status Status Code Blank B E H K M R T W Description Exited At breakpoint Error Held In kernel Mixed Running Stopped At watchpoint TotalView ID MPI Rank 8/3/2009 Scaling to Petascale Virtual Summer School 24

TotalView Process Window Stack trace pane Call stack of routines Stack frame pane Local variables, ibl registers and function parameters Source pane Source code Action points, processes, threads pane Manage action points, processes and threads 8/3/2009 Scaling to Petascale Virtual Summer School 25

DDT Main Window Project window Process group window Variable window Source code window Parallel stack view and output window Evaluation window 8/3/2009 Scaling to Petascale Virtual Summer School 26

Controlling Execution The process window (TotalView) or main window (DDT) always focuses on one process/thread Switch between processes/threads TotalView: p+/p, t+/t, double click in root window, process/thread tab DDT: click on process rank in process window Need to set the appropriate scope when Giving control commands Setting action points 8/3/2009 Scaling to Petascale Virtual Summer School 27

Control Commands TotalView DDT Description Go Play/Continue Start/resume execution Halt Pause Stop execution Kill Restart Terminate the job Restarts a running program Next Step over Run to the next source line without stepping into another function Step Step into Run to next source line Out Step out Run to the completion of current function Run to Run to line Run to the indicated location 8/3/2009 Scaling to Petascale Virtual Summer School 28

Process/Thread Groups TotalView Scope of commands and action points Group(control) All processes and threads Group(workers) All threads that are executing user code Rank X Current process and its threads Process(workers) User threads in the current process Thread X.Y Current tthread User defined group Group > Custom Groups, or Create in call graph 8/3/2009 Scaling to Petascale Virtual Summer School 29

Process/Thread Groups DDT Create custom groups Ctrl+click on all desired processes Right click on the process window then create group 8/3/2009 Scaling to Petascale Virtual Summer School 30

Action Points Breakpoints stop the execution of the processes and threads that reach it Unconditional Conditional: stop only if the condition is satisfied Evaluation: stop and execute a code fragment when reached Useful when testing small patches Process barrier points synchronize a set of processes or threads TotalView only Watchpoints monitor a location in memory and stop execution when its value changes 8/3/2009 Scaling to Petascale Virtual Summer School 31

Setting Action Points TotalView Breakpoints Right click on a source line > Set breakpoint Click on the line number Watch points Right click on a variable > Create watchpoint Barrier points Right click on a source line > Set barrier Edit action point property Right click on a action point in the Action Points tab > Properties 8/3/2009 Scaling to Petascale Virtual Summer School 32

Setting Action Points DDT Breakpoints Double click on a source code line Right click in the Breakpoints tab > > Add breakpoint Watch points Right click on a variable > Add to Watches Right click in the Watches tab > > Add Watch 8/3/2009 Scaling to Petascale Virtual Summer School 33

Viewing/Editing Data View values and types of variables At one process/thread Across all processes/threads Edit variable value and type Array Data Slicing Filtering Visualization Statistics 8/3/2009 Scaling to Petascale Virtual Summer School 34

Viewing/Editing Data TotalView Viewing gdata in Stack frame Expression list Variable window (dive on a variable by double clicking on its name) Editing data by clicking on the value in Stack frame Variable window 8/3/2009 Scaling to Petascale Virtual Summer School 35

Viewing/Editing Data DDT Viewing data in Variable window (in the main window) Evaluation window Editing data Right click on the variable name in the evaluation window Then choose Edit value or Edit type 8/3/2009 Scaling to Petascale Virtual Summer School 36

Viewing Dynamic Arrays in C/C++ TotalView Edit type in the variable window Tell TotalView how to interpret the memory from a starting location Example To view an array of 100 integers Int * > int[100]* DDT Drag a pointer variable into the evaluation window Right click on the variable > > View as vector 8/3/2009 Scaling to Petascale Virtual Summer School 37

MPI Message Queues Detect Deadlocks Load balancing issues TotalView Tools > Message Queue Graph More options available DDT View > Message Queues 8/3/2009 Scaling to Petascale Virtual Summer School 38

TotalView Call Graph Tools > > Call graph Quick view of program state Nodes: functions Edges: calls Look for outliers 8/3/2009 Scaling to Petascale Virtual Summer School 39

DDT Parallel Stack View Allow users to see the position of each process/thread in the source code in same window Hover over any function to see a list of processes that arecurrently at that location 8/3/2009 Scaling to Petascale Virtual Summer School 40

References TotalView user manual http://www.totalviewtech.com/support/documentation/totalview/ind ex.html DDT user manual http://www.allinea.com/downloads/userguide.pdf pdf LLNL TotalView tutorial https://computing.llnl.gov/tutorials/totalview NCSA Cyberinfrastructure Tutor Debugging Serial and Parallel Codes course HPCBugBase http://hpcbugbase.org/index.php/main_page php/main 8/3/2009 Scaling to Petascale Virtual Summer School 41