Scalable Debugging with TotalView on Blue Gene. John DelSignore, CTO TotalView Technologies

Similar documents
Improving the Productivity of Scalable Application Development with TotalView May 18th, 2010

ECMWF Workshop on High Performance Computing in Meteorology. 3 rd November Dean Stewart

Facing the challenges of. New Approaches To Debugging Complex Codes! Ed Hinkel, Sales Engineer Rogue Wave Software

Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems. Ed Hinkel Senior Sales Engineer

Debugging Programs Accelerated with Intel Xeon Phi Coprocessors

Debugging and Optimizing Programs Accelerated with Intel Xeon Phi Coprocessors

Parallel Debugging with TotalView BSC-CNS

TotalView Training. Developing parallel, data-intensive applications is hard. We make it easier. Copyright 2012 Rogue Wave Software, Inc.

TotalView. Debugging Tool Presentation. Josip Jakić

TotalView on IBM PowerLE and CORAL Sierra/Summit

Porting Applications to Blue Gene/P

IBM High Performance Computing Toolkit

GPU Debugging Made Easy. David Lecomber CTO, Allinea Software

IBM PSSC Montpellier Customer Center. Content

Welcomes PRACE/LinkSCEEM 2011 Winter School Jacques Philouze Vice President Sales & Marketing

GPU Technology Conference Three Ways to Debug Parallel CUDA Applications: Interactive, Batch, and Corefile

Welcome. HRSK Practical on Debugging, Zellescher Weg 12 Willers-Bau A106 Tel

Debugging CUDA Applications with Allinea DDT. Ian Lumb Sr. Systems Engineer, Allinea Software Inc.

Debugging at Scale Lindon Locks

Cray RS Programming Environment

Hands-on Workshop on How To Debug Codes at the Institute

The Cray Programming Environment. An Introduction

Debugging Intel Xeon Phi KNC Tutorial

Debugging with GDB and DDT

TotalView Training. Dean Stewart. Rogue Wave Software. Cray XE6 Performance Workshop July 12th, Copyright 2012 Rogue Wave Software, Inc.

DEBUGGING ON FERMI PREPARING A DEBUGGABLE APPLICATION GDB. GDB on front-end nodes

Blue Gene/Q A system overview

Debugging with TotalView

SGI Altix Getting Correct Code Reiner Vogelsang SGI GmbH

Designing Parallel Programs. This review was developed from Introduction to Parallel Computing

Le Yan Louisiana Optical Network Initiative. 8/3/2009 Scaling to Petascale Virtual Summer School

The Eclipse Parallel Tools Platform Project

Checkpointing using DMTCP, Condor, Matlab and FReD

Stockholm Brain Institute Blue Gene/L

Eclipse-PTP: An Integrated Environment for the Development of Parallel Applications

NightStar. NightView Source Level Debugger. Real-Time Linux Debugging and Analysis Tools BROCHURE

Early experience with Blue Gene/P. Jonathan Follows IBM United Kingdom Limited HPCx Annual Seminar 26th. November 2007

Performance analysis on Blue Gene/Q with

Debugging scalable hybrid and accelerated applications on the Cray XC30 and CS300 with TotalView

The Red Storm System: Architecture, System Update and Performance Analysis

Large Scale Debugging

Programming Environment 4/11/2015

Development tools to enable Multicore

TotalView 2018 Release Notes

Blue Gene/Q User Workshop. Debugging

MPI versions. MPI History

Addressing Heterogeneity in Manycore Applications

Scalasca support for Intel Xeon Phi. Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany

FFTSS Library Version 3.0 User s Guide

Debugging with GDB and DDT

Resource allocation and utilization in the Blue Gene/L supercomputer

High Performance Computing IBM collaborations with EDF R&D on IBM Blue Gene system

Intel Parallel Studio XE 2015

Performance Tools for Technical Computing

Introduction to MPI. EAS 520 High Performance Scientific Computing. University of Massachusetts Dartmouth. Spring 2014

Analyzing the Performance of IWAVE on a Cluster using HPCToolkit

IBM PSSC Montpellier Customer Center. Information Sources

Debugging OpenMP Programs

Eliminate Memory Errors to Improve Program Stability

Debugging Applications Using Totalview

Windows-HPC Environment at RWTH Aachen University

Debugging HPC Applications. David Lecomber CTO, Allinea Software

CSinParallel Workshop. OnRamp: An Interactive Learning Portal for Parallel Computing Environments

Proceedings of the GCC Developers Summit. June 17th 19th, 2008 Ottawa, Ontario Canada

The Eclipse Parallel Tools Platform

Short Introduction to Debugging Tools on the Cray XC40

Guillimin HPC Users Meeting July 14, 2016

Overcoming Distributed Debugging Challenges in the MPI+OpenMP Programming Model

STARTING THE DDT DEBUGGER ON MIO, AUN, & MC2. (Mouse over to the left to see thumbnails of all of the slides)

Oracle Developer Studio Performance Analyzer

TotalView Users Guide. version 8.8

Hybrid Model Parallel Programs

Short Introduction to Tools on the Cray XC systems

TotalView Release Notes

Short Introduction to tools on the Cray XC system. Making it easier to port and optimise apps on the Cray XC30

IBM POWER Systems Compiler Roadmap

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

The Cray Programming Environment. An Introduction

TotalView Debugger New Features Guide. version 8.4.0

TotalView Release Notes

PL in the Broader Research Community

Developing Scientific Applications with the IBM Parallel Environment Developer Edition

Introduction to CELL B.E. and GPU Programming. Agenda

Debugging with TotalView

IDE for medical device software development. Hyun-Do Lee, Field Application Engineer

IBM PSSC Montpellier Customer Center. Blue Gene/P ASIC IBM Corporation

IBM System p Compiler Roadmap

Parallel I/O on JUQUEEN

Debugging for the hybrid-multicore age (A HPC Perspective) David Lecomber CTO, Allinea Software

MPI History. MPI versions MPI-2 MPICH2

Introduction to OpenMP. Lecture 2: OpenMP fundamentals

PROGRAMMING MODEL EXAMPLES

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

Profiling and debugging. Carlos Rosales September 18 th 2009 Texas Advanced Computing Center The University of Texas at Austin

Allinea Unified Environment

Implementation of Parallelization

Adventures in Load Balancing at Scale: Successes, Fizzles, and Next Steps

The IBM Blue Gene/Q: Application performance, scalability and optimisation

Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2

High Performance Computing in C and C++

Transcription:

Scalable Debugging with TotalView on Blue Gene John DelSignore, CTO TotalView Technologies

Agenda TotalView on Blue Gene A little history Current status Recent TotalView improvements ReplayEngine (reverse debugging) Remote Display TotalView Script (batch debugging) Future work BG/* Heterogeneous systems Many core, transactional memory, speculative execution Peta scale debugging 2

Supported Blue Gene Architectures and Compilers Blue Gene/L and Blue Gene/P Languages / Compilers C/C++, Fortran, Assembly GNU Compilers IBM Compilers IBM OpenMP (on BG/P) Parallel Environments IBM MPI IBM OpenMP (on BG/P) Pthreads (BG/P) Runtime linking/loading (BG/P) Shared libraries Dynamically loaded shared libraries 3

Blue Gene Architecture TotalView client (GUI/CLI) runs on the Front End node Client communicates with the TotalView debugger servers running on the I/O nodes via a socket The debugger servers communicate with the CIOD to control processes and threads running on the Compute nodes Fan out ratios (CNs/server) BG/L: 32 64, 2 cores/cn, 128 threads/server BG/P:128 256, 4 cores/cn, 1024 threads/server Ratio increasing (8K thr/svr?) Parallelize server operation 4

TotalView Blue Gene/L Support TotalView involvement since 2003 Support for Blue Gene/L since 2005 Debugging interfaces developed via close collaboration with IBM Used on DOE/NNSA/LLNL's Blue Gene/L system containing 212 K cores Heap memory debugging support added Blue Gene/L scaling and performance tuning project TotalView has debugged jobs as large as 8,192 processes (LLNL) Work on Blue Gene/L facilitated Blue Gene/P support 5

TotalView Blue Gene/P Support Blue Gene/P supported since Q4 2007 Continued close collaboration with IBM to develop multi threaded debugging interfaces Support for shared libraries and dynamically loaded libraries Scalability improvements TotalView has debugged jobs as large as 32K (Jülich) 6

TotalView Blue Gene/P Sites Currently running at over 30 sites in Germany, France, UK, and US, including Argonne Boston University Daresbury IDRIS Jülich LLNL Max Planck ORNL Princeton University Rensselaer Polytechnic Institute Jülich workshop, March 08 Argonne workshop, May 08 7

Recent TotalView Improvements on Blue Gene and Linux Remote Display Run a remote version of the TotalView GUI display it locally, with fast, interactive performance Easy, fast, secure tvscript Simplifies debugging batch jobs Event/action paradigm Configurable ReplayEngine Step execution back in time Uses reverse debugging technology Linux x86 and x86 64 (currently only) 8

Remote Display Presents a window on your machine that will display TotalView executing on a remote system Two components: Client, runs on the local system, available for Linux x86, x86 64 Windows XP, Vista Server, which runs on any system supported by TotalView, invisibly managing the connections between the host and client The Client also provides for submission of jobs to batch queuing systems PBS Pro and LoadLeveler 9

Batch Scripting Designed for debugging in a batch environment tvscript lets you define the events to act on, the actions to take when an event occurs Typical events Action point (e.g., breakpoint) Memory error (e.g., malloc returns 0, guard block corruption) Errors (e.g., SEGV, FPE) Typical actions Display a backtrace List memory leaks Print variables and arrays Configurable Supports external script files Allows generation of even more complex actions and events 10

Replay Engine Intuitive user interface, integrated with TotalView Step forward over functions Step backward over functions Step forward into functions Step backward into functions Advance forward out of current Function, after the call Advance backward out of current Function, to before the call Advance forward to selected line Advance backward to selected line Advance forward to live session 11

Possible Future Blue Gene Work BG/* support Support future generations of Blue Gene Fast conditional breakpoints/watchpoints Expressions compiled/patched into target, excute in parallel, about 10usecs/expression Asynchronous thread control Thread barrier breakpoint, thread single stepping User programmable visual data Allows user define complex data access function Debugging optimized code Post mortem debugging Fast DLL debugging interface LLNL collaboration for scalable subset attach Integrates with lightweight tools such as STAT 12

Possible Other Future Work Scalability/performance Continue scalability and performance improvements Tree based infrastructure for logarithmic scaling Peta scale debugging Hundreds of thousands of threads Heterogeneous systems IBM Roadrunner (x86 64/Cell) GPUs Emerging technologies Many core Transactional memory Speculative execution <number>

Questions? More Information Blue Gene Technical Development Interest Group Contact chris.gottbrath@totalviewtech.com Technical support support@totalviewtech.com BG LLNL case study /pdf/case_study_scientific_computing.pdf Customer training or webinars contacttraininggroup@totalviewtech.com Web site <number>