Debugging / Profiling

Similar documents
Valgrind. Philip Blakely. Laboratory for Scientific Computing, University of Cambridge. Philip Blakely (LSC) Valgrind 1 / 21

Debugging with gdb and valgrind

Debugging and Profiling

Compiling environment

Performance Analysis of Parallel Scientific Applications In Eclipse

DEBUGGING: DYNAMIC PROGRAM ANALYSIS

valgrind overview: runtime memory checker and a bit more What can we do with it?

CSCI-1200 Data Structures Spring 2016 Lecture 6 Pointers & Dynamic Memory

Use Dynamic Analysis Tools on Linux

Programming Tips for CS758/858

CS2141 Software Development using C/C++ Debugging

Debugging. Marcelo Ponce SciNet HPC Consortium University of Toronto. July 15, /41 Ontario HPC Summerschool 2016 Central Edition: Toronto

SGI Altix Getting Correct Code Reiner Vogelsang SGI GmbH

Bruce Merry. IOI Training Dec 2013

Using Intel VTune Amplifier XE and Inspector XE in.net environment

Parallel Performance and Optimization

Cache Profiling with Callgrind

Scientific Programming in C IX. Debugging

DAY 4. CS3600, Northeastern University. Alan Mislove

The Valgrind Memory Checker. (i.e., Your best friend.)

MUST. MPI Runtime Error Detection Tool

Development in code_aster Debugging a code_aster execution

Introduction to debugging. Martin Čuma Center for High Performance Computing University of Utah

CSE 160 Discussion Section. Winter 2017 Week 3

The Valgrind Memory Checker. (i.e., Your best friend.)

Core dumped - on debuggers and other tools

Compiling environment

MUST. MPI Runtime Error Detection Tool

IBM PSSC Montpellier Customer Center. Content

Both parts center on the concept of a "mesa", and make use of the following data type:

Welcome. HRSK Practical on Debugging, Zellescher Weg 12 Willers-Bau A106 Tel

Debugging with TotalView

Guillimin HPC Users Meeting July 14, 2016

Massif-Visualizer. Memory Profiling UI. Milian Wolff Desktop Summit

A Novel Approach to Explain the Detection of Memory Errors and Execution on Different Application Using Dr Memory.

A configurable binary instrumenter

Data and File Structures Laboratory

TI2725-C, C programming lab, course

Building a Runnable Program and Code Improvement. Dario Marasco, Greg Klepic, Tess DiStefano

Recent Performance Analysis with Memphis. Collin McCurdy Future Technologies Group

Language Translation. Compilation vs. interpretation. Compilation diagram. Step 1: compile. Step 2: run. compiler. Compiled program. program.

Debugging, Profiling and Optimising Scientific Codes. Wadud Miah Research Computing Group

TAU by example - Mpich

Process Concepts. CSC400 - Operating Systems. 3. Process Concepts. J. Sumey

Parallel Performance and Optimization

The assignment requires solving a matrix access problem using only pointers to access the array elements, and introduces the use of struct data types.

Debugging with GDB and DDT

Profiling and debugging. Carlos Rosales September 18 th 2009 Texas Advanced Computing Center The University of Texas at Austin

Debugging. ICS312 Machine-Level and Systems Programming. Henri Casanova

Particles. The Center for Astrophysical Thermonuclear Flashes. Chris Daley

Debugging with GDB and DDT

Parallel and Distributed Computing Programming Assignment 1

CSCI-1200 Data Structures Fall 2015 Lecture 6 Pointers & Dynamic Memory

Dynamic code analysis tools

ICHEC. Using Valgrind. Using Valgrind :: detecting memory errors. Introduction. Program Compilation TECHNICAL REPORT

PLASMA TAU Guide. Parallel Linear Algebra Software for Multicore Architectures Version 2.0

System Assertions. Andreas Zeller

Performance Tools. Tulin Kaman. Department of Applied Mathematics and Statistics

Enhanced Memory debugging of MPI-parallel Applications in Open MPI

Single Processor Optimization III

Debugging (Part 2) 1

Short Introduction to tools on the Cray XC system. Making it easier to port and optimise apps on the Cray XC30

ECMWF Workshop on High Performance Computing in Meteorology. 3 rd November Dean Stewart

The Cray Programming Environment. An Introduction

Memory & Thread Debugger

Performance Measurement

Computer Systems and Networks

CptS 360 (System Programming) Unit 4: Debugging

TAU 2.19 Quick Reference

CS168: Debugging. Introduc)on to GDB, Wireshark and Valgrind. CS168 - Debugging Helpsession

Eliminate Threading Errors to Improve Program Stability

CS102 Software Engineering Principles

Embedded Software TI2726 B. 3. C tools. Koen Langendoen. Embedded Software Group

Memory Checking and Single Processor Optimization with Valgrind [05b]

Parallel Debugging with TotalView BSC-CNS

Profiling with TAU. Le Yan. 6/6/2012 LONI Parallel Programming Workshop

Enhanced Debugging with Traces

Performance Analysis and Debugging Tools

Improving Error Checking and Unsafe Optimizations using Software Speculation. Kirk Kelsey and Chen Ding University of Rochester

Profiling with TAU. Le Yan. User Services LSU 2/15/2012

CSE 4/521 Introduction to Operating Systems

Accelerate HPC Development with Allinea Performance Tools

Eliminate Threading Errors to Improve Program Stability

CSE 374 Final Exam 3/15/17. Name UW ID#

21. This is a screenshot of the Android Studio Debugger. It shows the current thread and the object tree for a certain variable.

Source level debugging. October 18, 2016

Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems. Ed Hinkel Senior Sales Engineer

GPU Debugging Made Easy. David Lecomber CTO, Allinea Software

Memory Analysis tools

Memory Management. Kevin Webb Swarthmore College February 27, 2018

Identifying Memory Corruption Bugs with Compiler Instrumentations. 이병영 ( 조지아공과대학교

OPENMP TIPS, TRICKS AND GOTCHAS

Implementation of Parallelization

DEBUGGING ON FERMI PREPARING A DEBUGGABLE APPLICATION GDB. GDB on front-end nodes

Message Passing Programming. Designing MPI Applications

Debugging. Erwan Demairy Dream

Computer Systems A Programmer s Perspective 1 (Beta Draft)

WhatÕs New in the Message-Passing Toolkit

PANOPTES: A BINARY TRANSLATION FRAMEWORK FOR CUDA. Chris Kennelly D. E. Shaw Research

Debugging Your CUDA Applications With CUDA-GDB

Transcription:

The Center for Astrophysical Thermonuclear Flashes Debugging / Profiling Chris Daley 23 rd June An Advanced Simulation & Computing (ASC) Academic Strategic Alliances Program (ASAP) Center at

Motivation Bugs are inevitable (even in FLASH): Compiler options can help locate bugs. Good practise to customize [C F L]FLAGS_DEBUG in Makefile.h to use the full range of debugging options for your compiler. The sites directory in the FLASH source tree contains sample Makefile.h files with good starting points. The infamous: Segmentation fault (core dumped) can be extremely hard to resolve unless you use compiler options and/or a memory debugger. Note: Debugging becomes much easier when binaries are compiled with debugging information (-g option).

Segmentation fault (core dumped) Always worth investigating the stack backtrace using gdb (or some other debugger, e.g. totalview): gdb flash3 core (gdb) bt #0 0x0000000000490ca4 in gr_expanddomain (mype=0, numprocs=1, particlesinitialized=.false.) at gr_expanddomain.f90:157 #1 0x0000000000432d88 in grid_initdomain (mype=0, numprocs=1, restart=.false., particlesinitialized=.false.) at Grid_initDomain.F90:94 #2 0x0000000000420f02 in driver_initflash () at Driver_initFlash.F90:152 #3 0x0000000000427fc9 in flash () at Flash.F90:38 Frame #0 shows the line containing the error. Note: This error may itself be a symptom of an earlier memory error.

Compiler options The original cause of most segfaults is an array that is accessed out of bounds or accessed before being allocated: Add bounds checking: -fbounds-check. Other important options: Add a default initial value: -finit-real=nan Add a check for floating point exceptions: -ffpe-trap=invalid,zero,overflow Add a stacktrace print out: -fbacktrace Compiler options help catch the original memory violation, e.g. the seg-fault shown on previous slide was caused by: At line 100 of file Simulation_initBlock.F90 Fortran runtime error: Array reference out of bounds for array 'blklimits', upper bound of dimension 1 exceeded (890 > 2)

Deadlocks A debugger can be attached to a running process. This is very useful especially when the program deadlocks. > pgrep flash3 2553 2554 > gdb flash3 2553 (gdb) bt... No need to start the program within the debugger!

Valgrind A collection of programming tools including a memory debugger, cache simulator and heap memory profiler (http://valgrind.org/). No special compilation or linking required: Raw binary runs in the valgrind CPU simulator. Can detect errors in libraries (no need to have source). Most popular tool is the memory debugger named memcheck: Detects usage of uninitialized memory. Detects reads or writes beyond array bounds. But only for heap allocated arrays. Detects memory leaks.

Usage and example output: Valgrind's memcheck tool mpirun -np N valgrind --tool=memcheck --track-origins=yes --log-file=valgrind.log.%p./flash3 ==22257== Conditional jump or move depends on uninitialised value(s) ==22257== at 0x42BF36: grid_getcellcoords_ (Grid_getCellCoords.F90:144) ==22257== by 0x448B93: simulation_initblock_ (Simulation_initBlock.F90:136) ==22257== by 0x490871: gr_expanddomain_ (gr_expanddomain.f90:161) ==22257== by 0x43255B: grid_initdomain_ (Grid_initDomain.F90:94) ==22257== by 0x42076D: driver_initflash_ (Driver_initFlash.F90:152) ==22257== by 0x42776C: MAIN (Flash.F90:38) ==22257== by 0x56E359: main (in /home/chris/flash/flash3_trunk/sedov_info/flash3) ==22257== Uninitialised value was created by a stack allocation ==22257== at 0x44843D: simulation_initblock_ (Simulation_initBlock.F90:45) A way to locate hard to find errors. But: Output can be extremely verbose. False-positives can happen. Program usually runs an order of magnitude slower.

Profiling Can use FLASH timers to meaure time spent: Timers_start() and Timers_stop(). For more complete (and automated) measurement use a specialist tool, e.g. TAU (http://www.cs.uoregon.edu/research/tau/home.php): Very simple to use TAU to measure time spent in the application at a finer level of granularity (e.g. subroutine and loop level) and with information about MPI calls. TAU provides tools to analyse scaling performance. Setup FLASH with -tau argument containing the name of a TAU Makefile, e.g../setup Sedov -auto -tau=/opt/tau-2.18.1/x86_64/lib/makefile.tau-callpath-mpi-pdt

TAU Callpath profiling captures the time spent in a routine when it is called from each parent routine: Can detect load balance issues by clicking on the routine name:

Custom profiling TAU selective instrumentation file in FLASH source tree at tools /tau/select.tau. We may wish to exclude certain files from the list to reduce measurement overhead. In select.tau edit exclude list: BEGIN_EXCLUDE_LIST INTERP MONOT AMR_1BLK_CC_CP_REMOTE END_EXCLUDE_LIST