Scalable Debugging with TotalView on Blue Gene John DelSignore, CTO TotalView Technologies
Agenda TotalView on Blue Gene A little history Current status Recent TotalView improvements ReplayEngine (reverse debugging) Remote Display TotalView Script (batch debugging) Future work BG/* Heterogeneous systems Many core, transactional memory, speculative execution Peta scale debugging 2
Supported Blue Gene Architectures and Compilers Blue Gene/L and Blue Gene/P Languages / Compilers C/C++, Fortran, Assembly GNU Compilers IBM Compilers IBM OpenMP (on BG/P) Parallel Environments IBM MPI IBM OpenMP (on BG/P) Pthreads (BG/P) Runtime linking/loading (BG/P) Shared libraries Dynamically loaded shared libraries 3
Blue Gene Architecture TotalView client (GUI/CLI) runs on the Front End node Client communicates with the TotalView debugger servers running on the I/O nodes via a socket The debugger servers communicate with the CIOD to control processes and threads running on the Compute nodes Fan out ratios (CNs/server) BG/L: 32 64, 2 cores/cn, 128 threads/server BG/P:128 256, 4 cores/cn, 1024 threads/server Ratio increasing (8K thr/svr?) Parallelize server operation 4
TotalView Blue Gene/L Support TotalView involvement since 2003 Support for Blue Gene/L since 2005 Debugging interfaces developed via close collaboration with IBM Used on DOE/NNSA/LLNL's Blue Gene/L system containing 212 K cores Heap memory debugging support added Blue Gene/L scaling and performance tuning project TotalView has debugged jobs as large as 8,192 processes (LLNL) Work on Blue Gene/L facilitated Blue Gene/P support 5
TotalView Blue Gene/P Support Blue Gene/P supported since Q4 2007 Continued close collaboration with IBM to develop multi threaded debugging interfaces Support for shared libraries and dynamically loaded libraries Scalability improvements TotalView has debugged jobs as large as 32K (Jülich) 6
TotalView Blue Gene/P Sites Currently running at over 30 sites in Germany, France, UK, and US, including Argonne Boston University Daresbury IDRIS Jülich LLNL Max Planck ORNL Princeton University Rensselaer Polytechnic Institute Jülich workshop, March 08 Argonne workshop, May 08 7
Recent TotalView Improvements on Blue Gene and Linux Remote Display Run a remote version of the TotalView GUI display it locally, with fast, interactive performance Easy, fast, secure tvscript Simplifies debugging batch jobs Event/action paradigm Configurable ReplayEngine Step execution back in time Uses reverse debugging technology Linux x86 and x86 64 (currently only) 8
Remote Display Presents a window on your machine that will display TotalView executing on a remote system Two components: Client, runs on the local system, available for Linux x86, x86 64 Windows XP, Vista Server, which runs on any system supported by TotalView, invisibly managing the connections between the host and client The Client also provides for submission of jobs to batch queuing systems PBS Pro and LoadLeveler 9
Batch Scripting Designed for debugging in a batch environment tvscript lets you define the events to act on, the actions to take when an event occurs Typical events Action point (e.g., breakpoint) Memory error (e.g., malloc returns 0, guard block corruption) Errors (e.g., SEGV, FPE) Typical actions Display a backtrace List memory leaks Print variables and arrays Configurable Supports external script files Allows generation of even more complex actions and events 10
Replay Engine Intuitive user interface, integrated with TotalView Step forward over functions Step backward over functions Step forward into functions Step backward into functions Advance forward out of current Function, after the call Advance backward out of current Function, to before the call Advance forward to selected line Advance backward to selected line Advance forward to live session 11
Possible Future Blue Gene Work BG/* support Support future generations of Blue Gene Fast conditional breakpoints/watchpoints Expressions compiled/patched into target, excute in parallel, about 10usecs/expression Asynchronous thread control Thread barrier breakpoint, thread single stepping User programmable visual data Allows user define complex data access function Debugging optimized code Post mortem debugging Fast DLL debugging interface LLNL collaboration for scalable subset attach Integrates with lightweight tools such as STAT 12
Possible Other Future Work Scalability/performance Continue scalability and performance improvements Tree based infrastructure for logarithmic scaling Peta scale debugging Hundreds of thousands of threads Heterogeneous systems IBM Roadrunner (x86 64/Cell) GPUs Emerging technologies Many core Transactional memory Speculative execution <number>
Questions? More Information Blue Gene Technical Development Interest Group Contact chris.gottbrath@totalviewtech.com Technical support support@totalviewtech.com BG LLNL case study /pdf/case_study_scientific_computing.pdf Customer training or webinars contacttraininggroup@totalviewtech.com Web site <number>