Welcomes PRACE/LinkSCEEM 2011 Winter School Jacques Philouze Vice President Sales & Marketing
|
|
- Amos McDowell
- 5 years ago
- Views:
Transcription
1 Welcomes PRACE/LinkSCEEM 2011 Winter School Jacques Philouze Vice President Sales & Marketing
2 Content Company Background Products in more depth Allinea OPT (Optimization and Profiling Tool) Allinea DDT (Distributed Debugging Tool) Presentation Let s practice on some examples
3 New Market Drivers Software has become the #1 roadblock Many applications will need a major redesign - IDC HPC Update, June 2010 Most ISV codes do not scale High programming costs are delaying GPU usage Development tools are a vital part of the solution
4 Allinea Software HPC tools company since 2001 Allinea DDT the scalable parallel debugger Allinea OPT the optimization tool for MPI and non-mpi Allinea DDTLite the parallel debugging plug-in for Microsoft Visual Studio Large European and US customer base 12 of top 20 systems run Allinea DDT in EMEA World's only Petascale debugger! Most scalable and cost effective debugger for Cuda Users debugging regularly at all scales at 1 or 100,000 cores.. but also easy to use on small clusters!
5 Academic Over 200 universities Clients and Partners Major research centres ANL, EPCC, CEA, Genci, Juelich, NERSC, ORNL... Aviation and Defence Airbus, AWE, BAE, Dassault, DLR, EADS... Energy CGG Veritas, IFP, Total... EDA Cadence, Intel, Synopsys Climate and Weather UK Met Office, Meteo France, NOAA
6
7 Optimization & Profiling Tool A powerful Optimization and Profiling Tool Supports MPI profiling Supports Scalar profiling Supports internal sampler, GProf and PAPI Cross-platform support Most flavours of Linux GNU, IBM, Intel, PGI and PathScale compilers x86, x86-64, ia64, PowerPC, Bluegene/P and Cell architectures Most major MPIs OpenMPI, MPICH, MPICH2, Intel, HP, Bull, LAM, SGI and SCALI Support for major scheduling systems
8 OPT Preparation Allinea OPT services need to run on physical server PostgreSQL database Logging server GUI server (client connections) Allinea OPT is a link time library Support for major MPIs or scalar OPT MPI library sits between program and MPI No need to carry out a full recompile, just re-link
9 OPT Launching Jobs Supports both GUI and CLI job launch GUI can be configured per job Launch method local, scheduler Enable sampler Sample interval Snapshot interval #procs stdin/out/err MPI functions CLI business as usual Use scheduler as normal Use your own launch script Uses opt.conf for configuration (user or system)
10 OPT Session Manager Web services interface Allows connectivity to remote GUI server Authentication via database or PAM Local Server requires no additional authentication
11 OPT Session Manager Web services interface Allows connectivity to remote GUI server Authentication via database or PAM Local Server requires no additional authentication
12 OPT Session Manager Web services interface Allows connectivity to remote GUI server Authentication via database or PAM Local Server requires no additional authentication Collect and analyse data from multiple sources Different clusters within environment Different compilers and their options Different MPIs and their options Different number of MPI processes Offers a TRUE Cloud enabled tool
13 OPT Standard Features MPI Timeline view
14 OPT Standard Features Metric Time View
15 OPT Standard Features Metric Statistics
16 OPT Standard Features Histogram
17 OPT Standard Features Job Comparison
18 OPT Bigger Picture FOCUS is the key... Too much visual information can be confusing Good parallel tools should make things simple Target the areas which consume most time! Functions which you spend most run-time in! Allinea OPT embraces this Focused approach See the big picture first Drill down successively for more information.. Don t drown users in too much data Move to a mixture of sampling and selective tracing
19 OPT Call Graph Highlights problem functions Directly linked with the time line Works with all of the following MPI function calls MPI Overview Bytes in/out Computation and communication time Profiling GProf Sampler Snapshot details (hw counters)
20 OPT Call Graph Example Select Communication and Computation time Flat Profile select function with greatest Self time Select Average, Self Seconds and Depth=3 The red cells are areas that are most significant Most of our time is spent in MPI_Alltoall Select Cumulative, see who is calling MPI_Alltoall Change depth to 4 to increase visibility Double click block_mtranspose Zoom in
21 OPT Call Graph Example Select Communication and Computation time Flat Profile select function with greatest Self time Select Average, Self Seconds and Depth=3 The red cells are areas that are most significant Most of our time is spent in MPI_Alltoall Select Cumulative, see who is calling MPI_Alltoall Change depth to 4 to increase visibility
22 OPT Call Graph Example Select Communication and Computation time Flat Profile select function with greatest Self time Select Average, Self Seconds and Depth=3 The red cells are areas that are most significant Most of our time is spent in MPI_Alltoall Select Cumulative, see who is calling MPI_Alltoall Change depth to 4 to increase visibility Double click block_mtranspose We now have bigger picture Far more Focused
23 OPT Call Graph Example Select Communication and Computation time Flat Profile select function with greatest Self time Select Average, Self Seconds and Depth=3 The red cells are areas that are most significant Most of our time is spent in MPI_Alltoall Select Cumulative, see who is calling MPI_Alltoall Change depth to 4 to increase visibility Double click block_mtranspose We now have bigger picture Far more Focused Could we have worked this out from a MPI timeline?
24 OPT Selective Logging Collecting far too much data!! Allinea OPT selective logging Logging is on by default On at MPI_Init Off by MPI_Finalize
25 OPT Selective Logging Collecting far too much data!! Allinea OPT selective logging Logging is on by default On at MPI_Init Off by MPI_Finalize Requirements C/C++ need to replace mpi.h for optmpi.h Turn ON and OFF at will In C/C++ OPT_Start_Logging(); and OPT_Stop_Logging(); In Fortran call OPT_Start_Logging and call OPT_Stop_Logging
26 OPT User Defined Regions Need to highlight a specific area of code!! Allinea OPT has user defined regions Off by default
27 OPT User Defined Regions Need to highlight a specific area of code!! Allinea OPT has user defined regions Off by default Requirements C/C++ need to replace mpi.h for optmpi.h Needs to be turned ON and OFF In C/C++ OPT_Start_Region( My Region ); and OPT_Stop_Region(); In Fortran call OPT_Start_Region('My Region') and call OPT_Stop_Region
28 Try it! Allinea OPT available for download at
29 Debugging at Target-Scale Hybrid code Lindon Locks Pre-sales engineer
30 Background Debugging: A necessary part of development Reproducing and fixing software problems Complexity of scaling and GPU architecture will introduce bugs Debuggers interactively examine processes and data Fastest way to debug with less chance of introducing more bugs Bugs at Target-scale need a debugger at Target-scale until recently debuggers limited to ~4,000-8,000 cores Bugs on GPUs need a debugger for GPUs until recently GPU software couldn't be debugged Allinea DDT is the first graphical debugger to do both
31 Problems at Target-Scale Increasing job sizes leads to unanticipated errors Regular bugs Data issues from larger data sets: i.e. garbage in..., overflow Logic issues and control flow Increasing probability of independent random error Memory errors/exhaustion: Random bugs! System problems: MPI and Operating System Pushing coded boundaries Algorithmic (performance) Hard-wired limits ( magic numbers ) Unknown unknowns
32 Bug Fixing Strategies: Part l Improved coding standards Unit tests, assertions and consistency checks Good practice: But tend to be single-process checks Parallel checks also valid and good practice Only checks for things you predict when developed Coverage is rarely perfect Unexpected problems, particularly random/system issues Often missed Debugger still required Combines well with debuggers Find why a failure occurs not just a pass/fail
33 Bug Fixing Strategies: Part ll Logging: printf and write The oldest debugger still in active use Tried and tested - as easy as hello world If you have good intuition into the problem Edit code, insert print, recompile and re-run Slow and iterative Use to log exceptions, progress or state Post-mortem analysis only Hard to establish real causal order of output of multiple processes Output can be lost by process termination Rapid growth in log output size Unscalable
34 Bug Fixing Strategies: Part lll Reproduce at a smaller scale Attempt to make problem happen on fewer nodes Often requires reduced data set the large one may not fit Didn't you already try the code at small scale? Smaller data set may not trigger the problem Does the bug even exist on smaller problems? Is it a system issue i.e. an MPI problem? Is probability stacking up against you? Example: 1 in 10,000 independent probability of error? Unlikely to spot on smaller runs without many many runs But near guaranteed to see it on a 10,000 core run What can a parallel debugger do to help? Debug at the scale of the problem - Now
35 Use a Parallel Debugger Many benefits to graphical parallel debuggers Large feature sets for common bugs Richness of user interface and real control of processes Historically all parallel debuggers hit scale problems Bottleneck at the frontend: Direct GUI nodes architectures Linear performance in number of processes Human factors limit mouse fatigue and brain overload Are tools ready for the task? Allinea DDT has changed the game
36 Distributed Debugging Tool Allinea DDT a mature and highly intuitive software tool Traditional focus has been HPC market Additional Client-Server capabilities Developing new features(multicore, GP-GPU) Cross-platform support Linux, AIX, Blue Gene/P, Super-UX, CLE GNU, Absoft, IBM, Intel, PGI, PathScale, Sun compilers x86, x86-64, ia64, PowerPC, UltraSparc, Cray XT4/5/6, NEC, NVIDIA CUDA architectures Across all MPIs OpenMP and Threads All major scheduling systems (PBS, LSF, SGE)
37 DDT : Basic Principles Sophisticated GUI helps the user to control parallel execution and helps to find and focus on potential problems User controls actions by groups.. Set breakpoints, step in / out, align stacks etc Focus in on individual threads / processes when necessary Create groups both Manually: select processes via drag and drop Automatically: by process stack, values etc
38 Start Debugging Ability to run in numerous modes Advanced interactive launch options or Simple clear interactive launch options
39 Options Select Change options
40 DDT Scheduler Launch Enhanced Queuing Submission Simple mechanism to customise options Submit non-mpi jobs to a queue Set pre-defined variables in submission script System administrator configures Alter the pre-defined variables at run-time User selectable values
41 Features Scalar Features Advanced C++ support including STL, namespaces, virtual functions and templates Advanced Fortran 90, 95 and 2003 support including modules, allocatable data, pointers and derived types
42 Features Scalar Features Advanced C++ support including STL, namespaces, virtual functions and templates Advanced Fortran 90, 95 and 2003 support including modules, allocatable data, pointers and derived types Multithreading & OpenMP features Perform actions individually or collectively
43 Features Scalar Features Advanced C++ support including STL, namespaces, virtual functions and templates Advanced Fortran 90, 95 and 2003 support including modules, allocatable data, pointers and derived types Multithreading & OpenMP features Perform actions individually or collectively MPI features Control processes Individually or by groups Visualize message queues
44 Breakpoints / Watchpoints Quickly and easily control you debug session Breakpoints Double click to add them For all processes, groups or a single process Enable/disable from window panel Add a condition Default settings (enable/disable)
45 Breakpoints / Watchpoints Quickly and easily control you debug session Breakpoints Double click to add them For all processes, groups or a single process Enable/disable from window panel Add a condition Default settings (enable/disable) Watchpoints Select from evaluate, locals or current line Set for a variable or expression Program halts if change occurs Single process only
46 Data Comparison Cross Process/Thread Comparison (CPC) Run from evaluate, locals or current line Variable or expression Create groups from results Visualize 3D OpenGL array viewer
47 Data Comparison Cross Process/Thread Comparison (CPC) Run from evaluate, locals or current line Variable or expression Create groups from results Visualize
48 Data Comparison Cross Process/Thread Comparison (CPC) Run from evaluate, locals or current line Variable or expression Create groups from results Visualize Multi Dimensional Array viewer 3D OpenGL array viewer Export csv file for post analysis
49 Memory Debugging Find memory leaks Or stop on read/write beyond end of array:
50 Scalable Process Control Parallel Stack View Find rogue processes quickly Identify classes of process behaviour Rapid grouping of processes Control Processes by Groups Set breakpoints, step, play, stop Scalable groups view: Compact group display
51 Large Array Support Browse arrays 1, 2, 3 dimensions Table view Filtering Look for an outlying value Export Save to a spreadsheet View arrays from multiple processes Search terabytes for rogue data: in parallel with v3.0
52 Checkpointing GDB Single session, no option to save/restore MPI support No thread support Still in the experimental stage
53 Checkpointing GDB Single session, no option to save/restore MPI support No thread support Still in the experimental stage OpenMPI/BLCR Prepared and ready for official OpenMPI release Ability to save/restore session Also in experimental stage
54 Checkpointing GDB Single session, no option to save/restore MPI support No thread support Still in the experimental stage OpenMPI/BLCR Prepared and ready for official OpenMPI release Ability to save/restore session Also in experimental stage
55 Petascale Debugging DDT is delivering Petascale debugging today 0,12 DDT Performance Collaborations with ORNL on Jaguar Cray XT and CEA 0,1 Logarithmic performance Many operations now faster at 220,000 than previously at 1,000 cores Time (s) 0,08 0,06 0,04 All Step All Breakpoint ~1/10 th of a second to step and gather all stacks at 220,000 cores 0, MPI Processes
56 Debugging GPU Applications
57 CUDA Debugging Options Old world printf NVIDIA SDK 3.2 allows this (new) but has limitations Fake it Run the kernel on the host x86_64 processor Languages often support targeting host CPU instead of GPU Different numeric precision different answer? Different scheduling different answer? A reasonable option for some bugs Or run on the GPU with Allinea DDT...
58 Debugging Kernels Debugging CPU and GPU concurrently Browse source, examine variables, control processes and threads Set breakpoints Or automatically stop on kernel launch Stop at a line of CUDA code Kernels stop when breakpoint reached Hover the mouse for more information Step a warp
59 Examine Thread Data At a glance display of variables Expressions, local variables and current line Also possible to edit values Displays the memory types Shared, parameter, constant, register
60 Kernel Scale Made Easy View all threads in parallel stack view At a glance, see all GPU and CPU threads together Links with thread selection Pick a tree node to select one of the CUDA threads at that location Full MPI support See GPU and CPU threads from multiple nodes
61 Current Limitations Mix of SDK and driver limitations Only one warp can be stepped per GPU at any time Rapid progress as SDKs are released Strong partnership with NVIDIA and others helping to extend capabilities Other languages and devices HMPP, PGI Accelerators, OpenCL Can only support when the compilers/drivers make it possible...
62 DDT CUDA News SDK 3.2 with DDT Released January 2011 Multi-device support Fermi and Tesla support CUDA Memcheck support for memory errors MPI and CUDA support for GPU clusters Breakpoints, thread control, and data evaluation Stop on kernel launch
63 Hands on session
64 Training at Allinea This workshop is part of Allinea regular training process Webinar Presentation Workshop Lecture Hands on On site Training
65 Expected walkthroughs Getting Started Straightforward Crashes Memory Errors Memory Leaks Incorrect Results GPU Debugging
Debugging at Scale Lindon Locks
Debugging at Scale Lindon Locks llocks@allinea.com Debugging at Scale At scale debugging - from 100 cores to 250,000 Problems faced by developers on real systems Alternative approaches to debugging and
More informationGPU Debugging Made Easy. David Lecomber CTO, Allinea Software
GPU Debugging Made Easy David Lecomber CTO, Allinea Software david@allinea.com Allinea Software HPC development tools company Leading in HPC software tools market Wide customer base Blue-chip engineering,
More informationDebugging for the hybrid-multicore age (A HPC Perspective) David Lecomber CTO, Allinea Software
Debugging for the hybrid-multicore age (A HPC Perspective) David Lecomber CTO, Allinea Software david@allinea.com Agenda What is HPC? How is scale affecting HPC? Achieving tool scalability Scale in practice
More informationDebugging HPC Applications. David Lecomber CTO, Allinea Software
Debugging HPC Applications David Lecomber CTO, Allinea Software david@allinea.com Agenda Bugs and Debugging Debugging parallel applications Debugging OpenACC and other hybrid codes Debugging for Petascale
More informationImproving the Productivity of Scalable Application Development with TotalView May 18th, 2010
Improving the Productivity of Scalable Application Development with TotalView May 18th, 2010 Chris Gottbrath Principal Product Manager Rogue Wave Major Product Offerings 2 TotalView Technologies Family
More informationDebugging CUDA Applications with Allinea DDT. Ian Lumb Sr. Systems Engineer, Allinea Software Inc.
Debugging CUDA Applications with Allinea DDT Ian Lumb Sr. Systems Engineer, Allinea Software Inc. ilumb@allinea.com GTC 2013, San Jose, March 20, 2013 Embracing GPUs GPUs a rival to traditional processors
More informationUnderstanding Dynamic Parallelism
Understanding Dynamic Parallelism Know your code and know yourself Presenter: Mark O Connor, VP Product Management Agenda Introduction and Background Fixing a Dynamic Parallelism Bug Understanding Dynamic
More informationAddressing the Increasing Challenges of Debugging on Accelerated HPC Systems. Ed Hinkel Senior Sales Engineer
Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems Ed Hinkel Senior Sales Engineer Agenda Overview - Rogue Wave & TotalView GPU Debugging with TotalView Nvdia CUDA Intel Phi 2
More informationAllinea Unified Environment
Allinea Unified Environment Allinea s unified tools for debugging and profiling HPC Codes Beau Paisley Allinea Software bpaisley@allinea.com 720.583.0380 Today s Challenge Q: What is the impact of current
More informationDeveloping, Debugging, and Optimizing GPU Codes for High Performance Computing with Allinea Forge
Developing, Debugging, and Optimizing GPU Codes for High Performance Computing with Allinea Forge Ryan Hulguin Applications Engineer ryan.hulguin@arm.com Agenda Introduction Overview of Allinea Products
More informationDevelopment tools to enable Multicore
Development tools to enable Multicore From the desktop to the extreme A perspective on multicore looking in from HPC David Lecomber CTO, Allinea Software david@allinea.com Introduction The Multicore Challenge
More informationDDT: A visual, parallel debugger on Ra
DDT: A visual, parallel debugger on Ra David M. Larue dlarue@mines.edu High Performance & Research Computing Campus Computing, Communications, and Information Technologies Colorado School of Mines March,
More informationECMWF Workshop on High Performance Computing in Meteorology. 3 rd November Dean Stewart
ECMWF Workshop on High Performance Computing in Meteorology 3 rd November 2010 Dean Stewart Agenda Company Overview Rogue Wave Product Overview IMSL Fortran TotalView Debugger Acumem ThreadSpotter 1 Copyright
More informationDevelopment Tools for Parallel Computing. David Lecomber CTO, Allinea Software
Development Tools for Parallel Computing David Lecomber CTO, Allinea Software david@allinea.com Agenda Introduction What is HPC Bugs and Debugging Debugging parallel applications Challenges for the future
More informationTools and Methodology for Ensuring HPC Programs Correctness and Performance. Beau Paisley
Tools and Methodology for Ensuring HPC Programs Correctness and Performance Beau Paisley bpaisley@allinea.com About Allinea Over 15 years of business focused on parallel programming development tools Strong
More informationParallel Debugging with TotalView BSC-CNS
Parallel Debugging with TotalView BSC-CNS AGENDA What debugging means? Debugging Tools in the RES Allinea DDT as alternative (RogueWave Software) What is TotalView Compiling Your Program Starting totalview
More informationIntroduction to debugging. Martin Čuma Center for High Performance Computing University of Utah
Introduction to debugging Martin Čuma Center for High Performance Computing University of Utah m.cuma@utah.edu Overview Program errors Simple debugging Graphical debugging DDT and Totalview Intel tools
More informationScalable Debugging with TotalView on Blue Gene. John DelSignore, CTO TotalView Technologies
Scalable Debugging with TotalView on Blue Gene John DelSignore, CTO TotalView Technologies Agenda TotalView on Blue Gene A little history Current status Recent TotalView improvements ReplayEngine (reverse
More informationTotalView. Debugging Tool Presentation. Josip Jakić
TotalView Debugging Tool Presentation Josip Jakić josipjakic@ipb.ac.rs Agenda Introduction Getting started with TotalView Primary windows Basic functions Further functions Debugging parallel programs Topics
More informationIBM High Performance Computing Toolkit
IBM High Performance Computing Toolkit Pidad D'Souza (pidsouza@in.ibm.com) IBM, India Software Labs Top 500 : Application areas (November 2011) Systems Performance Source : http://www.top500.org/charts/list/34/apparea
More informationParallel Programming and Debugging with CUDA C. Geoff Gerfin Sr. System Software Engineer
Parallel Programming and Debugging with CUDA C Geoff Gerfin Sr. System Software Engineer CUDA - NVIDIA s Architecture for GPU Computing Broad Adoption Over 250M installed CUDA-enabled GPUs GPU Computing
More informationDebugging with GDB and DDT
Debugging with GDB and DDT Ramses van Zon SciNet HPC Consortium University of Toronto June 13, 2014 1/41 Ontario HPC Summerschool 2014 Central Edition: Toronto Outline Debugging Basics Debugging with the
More informationDebugging with GDB and DDT
Debugging with GDB and DDT Ramses van Zon SciNet HPC Consortium University of Toronto June 28, 2012 1/41 Ontario HPC Summerschool 2012 Central Edition: Toronto Outline Debugging Basics Debugging with the
More informationNightStar. NightView Source Level Debugger. Real-Time Linux Debugging and Analysis Tools BROCHURE
NightStar Real-Time Linux Debugging and Analysis Tools Concurrent s NightStar is a powerful, integrated tool set for debugging and analyzing time-critical Linux applications. NightStar tools run with minimal
More informationProfiling and debugging. Carlos Rosales September 18 th 2009 Texas Advanced Computing Center The University of Texas at Austin
Profiling and debugging Carlos Rosales carlos@tacc.utexas.edu September 18 th 2009 Texas Advanced Computing Center The University of Texas at Austin Outline Debugging Profiling GDB DDT Basic use Attaching
More informationSTARTING THE DDT DEBUGGER ON MIO, AUN, & MC2. (Mouse over to the left to see thumbnails of all of the slides)
STARTING THE DDT DEBUGGER ON MIO, AUN, & MC2 (Mouse over to the left to see thumbnails of all of the slides) ALLINEA DDT Allinea DDT is a powerful, easy-to-use graphical debugger capable of debugging a
More informationDebugging with TotalView
Debugging with TotalView Le Yan HPC Consultant User Services Goals Learn how to start TotalView on Linux clusters Get familiar with TotalView graphic user interface Learn basic debugging functions of TotalView
More informationOPT User Guide. Version 1.4
Version 1.4 Table of Contents 1 Introduction...1 1.1 Reading this User Guide...2 2 Preparing Your Application...4 2.1 MPI Profiling...4 2.1.1 Supported MPI Libraries...4 2.1.2 Compiling For Shared MPI
More informationTesla GPU Computing A Revolution in High Performance Computing
Tesla GPU Computing A Revolution in High Performance Computing Gernot Ziegler, Developer Technology (Compute) (Material by Thomas Bradley) Agenda Tesla GPU Computing CUDA Fermi What is GPU Computing? Introduction
More informationAccelerate HPC Development with Allinea Performance Tools
Accelerate HPC Development with Allinea Performance Tools 19 April 2016 VI-HPS, LRZ Florent Lebeau / Ryan Hulguin flebeau@allinea.com / rhulguin@allinea.com Agenda 09:00 09:15 Introduction 09:15 09:45
More informationPerformance Tools for Technical Computing
Christian Terboven terboven@rz.rwth-aachen.de Center for Computing and Communication RWTH Aachen University Intel Software Conference 2010 April 13th, Barcelona, Spain Agenda o Motivation and Methodology
More informationIntroduction to the SHARCNET Environment May-25 Pre-(summer)school webinar Speaker: Alex Razoumov University of Ontario Institute of Technology
Introduction to the SHARCNET Environment 2010-May-25 Pre-(summer)school webinar Speaker: Alex Razoumov University of Ontario Institute of Technology available hardware and software resources our web portal
More informationOpenACC Course. Office Hour #2 Q&A
OpenACC Course Office Hour #2 Q&A Q1: How many threads does each GPU core have? A: GPU cores execute arithmetic instructions. Each core can execute one single precision floating point instruction per cycle
More information[Scalasca] Tool Integrations
Mitglied der Helmholtz-Gemeinschaft [Scalasca] Tool Integrations Aug 2011 Bernd Mohr CScADS Performance Tools Workshop Lake Tahoe Contents Current integration of various direct measurement tools Paraver
More informationDebugging and Optimizing Programs Accelerated with Intel Xeon Phi Coprocessors
Debugging and Optimizing Programs Accelerated with Intel Xeon Phi Coprocessors Chris Gottbrath Rogue Wave Software Boulder, CO Chris.Gottbrath@roguewave.com Abstract Intel Xeon Phi coprocessors present
More informationDebugging, benchmarking, tuning i.e. software development tools. Martin Čuma Center for High Performance Computing University of Utah
Debugging, benchmarking, tuning i.e. software development tools Martin Čuma Center for High Performance Computing University of Utah m.cuma@utah.edu SW development tools Development environments Compilers
More informationDDT Debugging Techniques
DDT Debugging Techniques Carlos Rosales carlos@tacc.utexas.edu Scaling to Petascale 2010 July 7, 2010 Debugging Parallel Programs Usual problems Memory access issues Special cases not accounted for in
More informationAn Introduction to OpenACC
An Introduction to OpenACC Alistair Hart Cray Exascale Research Initiative Europe 3 Timetable Day 1: Wednesday 29th August 2012 13:00 Welcome and overview 13:15 Session 1: An Introduction to OpenACC 13:15
More informationWelcome. HRSK Practical on Debugging, Zellescher Weg 12 Willers-Bau A106 Tel
Center for Information Services and High Performance Computing (ZIH) Welcome HRSK Practical on Debugging, 03.04.2009 Zellescher Weg 12 Willers-Bau A106 Tel. +49 351-463 - 31945 Matthias Lieber (matthias.lieber@tu-dresden.de)
More informationProgress on GPU Parallelization of the NIM Prototype Numerical Weather Prediction Dynamical Core
Progress on GPU Parallelization of the NIM Prototype Numerical Weather Prediction Dynamical Core Tom Henderson NOAA/OAR/ESRL/GSD/ACE Thomas.B.Henderson@noaa.gov Mark Govett, Jacques Middlecoff Paul Madden,
More informationAllinea DDT Debugger. Dan Mazur, McGill HPC March 5,
Allinea DDT Debugger Dan Mazur, McGill HPC daniel.mazur@mcgill.ca guillimin@calculquebec.ca March 5, 2015 1 Outline Introduction and motivation Guillimin login and DDT configuration Compiling for a debugger
More informationLe Yan Louisiana Optical Network Initiative. 8/3/2009 Scaling to Petascale Virtual Summer School
Parallel Debugging Techniques Le Yan Louisiana Optical Network Initiative 8/3/2009 Scaling to Petascale Virtual Summer School Outline Overview of parallel debugging Challenges Tools Strategies Gtf Get
More informationDebugging Your CUDA Applications With CUDA-GDB
Debugging Your CUDA Applications With CUDA-GDB Outline Introduction Installation & Usage Program Execution Control Thread Focus Program State Inspection Run-Time Error Detection Tips & Miscellaneous Notes
More informationCompiler support for Fortran 2003 and 2008 standards Ian Chivers & Jane Sleightholme Fortranplus
Compiler support for Fortran 2003 and Ian Chivers & Jane Sleightholme Fortranplus ian@fortranplus.co.uk www.fortranplus.co.uk 15 th June 2012 jane@fortranplus.co.uk A bit about us Ian & Jane Worked in
More informationProductive Performance on the Cray XK System Using OpenACC Compilers and Tools
Productive Performance on the Cray XK System Using OpenACC Compilers and Tools Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. 1 The New Generation of Supercomputers Hybrid
More informationRWTH GPU-Cluster. Sandra Wienke March Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky
RWTH GPU-Cluster Fotos: Christian Iwainsky Sandra Wienke wienke@rz.rwth-aachen.de March 2012 Rechen- und Kommunikationszentrum (RZ) The GPU-Cluster GPU-Cluster: 57 Nvidia Quadro 6000 (29 nodes) innovative
More informationPortable and Productive Performance with OpenACC Compilers and Tools. Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc.
Portable and Productive Performance with OpenACC Compilers and Tools Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. 1 Cray: Leadership in Computational Research Earth Sciences
More informationGuillimin HPC Users Meeting July 14, 2016
Guillimin HPC Users Meeting July 14, 2016 guillimin@calculquebec.ca McGill University / Calcul Québec / Compute Canada Montréal, QC Canada Outline Compute Canada News System Status Software Updates Training
More informationIntel Parallel Studio XE 2017 Composer Edition BETA C++ - Debug Solutions Release Notes
Developer Zone Intel Parallel Studio XE 2017 Composer Edition BETA C++ - Debug Solutions Release Notes Submitted by Georg Z. (Intel) on August 5, 2016 This page provides the current Release Notes for the
More informationGPU Technology Conference Three Ways to Debug Parallel CUDA Applications: Interactive, Batch, and Corefile
GPU Technology Conference 2015 Three Ways to Debug Parallel CUDA Applications: Interactive, Batch, and Corefile Three Ways to Debug Parallel CUDA Applications: Interactive, Batch, and Corefile What do
More informationUsing Java for Scientific Computing. Mark Bul EPCC, University of Edinburgh
Using Java for Scientific Computing Mark Bul EPCC, University of Edinburgh markb@epcc.ed.ac.uk Java and Scientific Computing? Benefits of Java for Scientific Computing Portability Network centricity Software
More informationDDT User Guide. Version 2.2.1
DDT User Guide Version 2.2.1 Contents Contents...1 1 Introduction...5 2 Installation and Configuration...6 2.1 Installation...6 2.1.1 Graphical Install...6 2.1.2 Text-mode Install...8 2.1.3 Licence Files...8
More informationParallel Debugging. ª Objective. ª Contents. ª Learn the basics of debugging parallel programs
ª Objective ª Learn the basics of debugging parallel programs ª Contents ª Launching a debug session ª The Parallel Debug Perspective ª Controlling sets of processes ª Controlling individual processes
More informationFacing the challenges of. New Approaches To Debugging Complex Codes! Ed Hinkel, Sales Engineer Rogue Wave Software
Facing the challenges of or New Approaches To Debugging Complex Codes! Ed Hinkel, Sales Engineer Rogue Wave Software Agenda Introduction Rogue Wave! TotalView! Approaching the Debugging Challenge! 1 TVScript
More informationHands-on Workshop on How To Debug Codes at the Institute
Hands-on Workshop on How To Debug Codes at the Institute H. Birali Runesha, Shuxia Zhang and Ben Lynch (612) 626 0802 (help) help@msi.umn.edu October 13, 2005 Outline Debuggers at the Institute Totalview
More informationThe Cray Programming Environment. An Introduction
The Cray Programming Environment An Introduction Vision Cray systems are designed to be High Productivity as well as High Performance Computers The Cray Programming Environment (PE) provides a simple consistent
More informationShort Introduction to Debugging Tools on the Cray XC40
Short Introduction to Debugging Tools on the Cray XC40 Overview Debugging Get your code up and running correctly. Profiling Locate performance bottlenecks. Light weight At most relinking. Get a first picture
More informationMoab Workload Manager on Cray XT3
Moab Workload Manager on Cray XT3 presented by Don Maxwell (ORNL) Michael Jackson (Cluster Resources, Inc.) MOAB Workload Manager on Cray XT3 Why MOAB? Requirements Features Support/Futures 2 Why Moab?
More informationOracle Developer Studio Performance Analyzer
Oracle Developer Studio Performance Analyzer The Oracle Developer Studio Performance Analyzer provides unparalleled insight into the behavior of your application, allowing you to identify bottlenecks and
More informationOpenACC/CUDA/OpenMP... 1 Languages and Libraries... 3 Multi-GPU support... 4 How OpenACC Works... 4
OpenACC Course Class #1 Q&A Contents OpenACC/CUDA/OpenMP... 1 Languages and Libraries... 3 Multi-GPU support... 4 How OpenACC Works... 4 OpenACC/CUDA/OpenMP Q: Is OpenACC an NVIDIA standard or is it accepted
More informationCUDA Development Using NVIDIA Nsight, Eclipse Edition. David Goodwin
CUDA Development Using NVIDIA Nsight, Eclipse Edition David Goodwin NVIDIA Nsight Eclipse Edition CUDA Integrated Development Environment Project Management Edit Build Debug Profile SC'12 2 Powered By
More informationDebugging Programs Accelerated with Intel Xeon Phi Coprocessors
Debugging Programs Accelerated with Intel Xeon Phi Coprocessors A White Paper by Rogue Wave Software. Rogue Wave Software 5500 Flatiron Parkway, Suite 200 Boulder, CO 80301, USA www.roguewave.com Debugging
More informationAutomatic Tuning of HPC Applications with Periscope. Michael Gerndt, Michael Firbach, Isaias Compres Technische Universität München
Automatic Tuning of HPC Applications with Periscope Michael Gerndt, Michael Firbach, Isaias Compres Technische Universität München Agenda 15:00 15:30 Introduction to the Periscope Tuning Framework (PTF)
More informationThe Distributed Debugging Tool
The Distributed Debugging Tool Userguide - PAGE 1 - - PAGE 2 - Contents Contents... 3 1. Introduction... 7 2. Installation... 8 Licence Files... 8 Floating Licences... 9 Getting Help... 10 3. Starting
More informationAnalyzing the Performance of IWAVE on a Cluster using HPCToolkit
Analyzing the Performance of IWAVE on a Cluster using HPCToolkit John Mellor-Crummey and Laksono Adhianto Department of Computer Science Rice University {johnmc,laksono}@rice.edu TRIP Meeting March 30,
More informationDebugging on Blue Waters
Debugging on Blue Waters Debugging tools and techniques for Blue Waters are described here with example sessions, output, and pointers to small test codes. For tutorial purposes, this material will work
More informationTool for Analysing and Checking MPI Applications
Tool for Analysing and Checking MPI Applications April 30, 2010 1 CONTENTS CONTENTS Contents 1 Introduction 3 1.1 What is Marmot?........................... 3 1.2 Design of Marmot..........................
More informationME964 High Performance Computing for Engineering Applications
ME964 High Performance Computing for Engineering Applications Debugging & Profiling CUDA Programs February 28, 2012 Dan Negrut, 2012 ME964 UW-Madison Everyone knows that debugging is twice as hard as writing
More informationIntel Parallel Studio XE 2015
2015 Create faster code faster with this comprehensive parallel software development suite. Faster code: Boost applications performance that scales on today s and next-gen processors Create code faster:
More informationUsing Intel VTune Amplifier XE and Inspector XE in.net environment
Using Intel VTune Amplifier XE and Inspector XE in.net environment Levent Akyil Technical Computing, Analyzers and Runtime Software and Services group 1 Refresher - Intel VTune Amplifier XE Intel Inspector
More informationCray RS Programming Environment
Cray RS Programming Environment Gail Alverson Cray Inc. Cray Proprietary Red Storm Red Storm is a supercomputer system leveraging over 10,000 AMD Opteron processors connected by an innovative high speed,
More informationBright Cluster Manager Advanced HPC cluster management made easy. Martijn de Vries CTO Bright Computing
Bright Cluster Manager Advanced HPC cluster management made easy Martijn de Vries CTO Bright Computing About Bright Computing Bright Computing 1. Develops and supports Bright Cluster Manager for HPC systems
More informationHPC Middle East. KFUPM HPC Workshop April Mohamed Mekias HPC Solutions Consultant. Agenda
KFUPM HPC Workshop April 29-30 2015 Mohamed Mekias HPC Solutions Consultant Agenda 1 Agenda-Day 1 HPC Overview What is a cluster? Shared v.s. Distributed Parallel v.s. Massively Parallel Interconnects
More informationThe GPU-Cluster. Sandra Wienke Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky
The GPU-Cluster Sandra Wienke wienke@rz.rwth-aachen.de Fotos: Christian Iwainsky Rechen- und Kommunikationszentrum (RZ) The GPU-Cluster GPU-Cluster: 57 Nvidia Quadro 6000 (29 nodes) innovative computer
More informationSystems software design. Software build configurations; Debugging, profiling & Quality Assurance tools
Systems software design Software build configurations; Debugging, profiling & Quality Assurance tools Who are we? Krzysztof Kąkol Software Developer Jarosław Świniarski Software Developer Presentation
More informationEvolving HPCToolkit John Mellor-Crummey Department of Computer Science Rice University Scalable Tools Workshop 7 August 2017
Evolving HPCToolkit John Mellor-Crummey Department of Computer Science Rice University http://hpctoolkit.org Scalable Tools Workshop 7 August 2017 HPCToolkit 1 HPCToolkit Workflow source code compile &
More informationThe Titan Tools Experience
The Titan Tools Experience Michael J. Brim, Ph.D. Computer Science Research, CSMD/NCCS Petascale Tools Workshop 213 Madison, WI July 15, 213 Overview of Titan Cray XK7 18,688+ compute nodes 16-core AMD
More informationHPC on Windows. Visual Studio 2010 and ISV Software
HPC on Windows Visual Studio 2010 and ISV Software Christian Terboven 19.03.2012 / Aachen, Germany Stand: 16.03.2012 Version 2.3 Rechen- und Kommunikationszentrum (RZ) Agenda
More informationCUDA Tools for Debugging and Profiling
CUDA Tools for Debugging and Profiling CUDA Course, Jülich Supercomputing Centre Andreas Herten, Forschungszentrum Jülich, 3 August 2016 Overview What you will learn in this session Use cuda-memcheck to
More informationParallel Programming. Libraries and implementations
Parallel Programming Libraries and implementations Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationEnabling the Next Generation of Computational Graphics with NVIDIA Nsight Visual Studio Edition. Jeff Kiel Director, Graphics Developer Tools
Enabling the Next Generation of Computational Graphics with NVIDIA Nsight Visual Studio Edition Jeff Kiel Director, Graphics Developer Tools Computational Graphics Enabled Problem: Complexity of Computation
More informationPerformance Analysis of MPI Programs with Vampir and Vampirtrace Bernd Mohr
Performance Analysis of MPI Programs with Vampir and Vampirtrace Bernd Mohr Research Centre Juelich (FZJ) John von Neumann Institute of Computing (NIC) Central Institute for Applied Mathematics (ZAM) 52425
More informationCS2141 Software Development using C/C++ Debugging
CS2141 Software Development using C/C++ Debugging Debugging Tips Examine the most recent change Error likely in, or exposed by, code most recently added Developing code incrementally and testing along
More informationBorland Optimizeit Enterprise Suite 6
Borland Optimizeit Enterprise Suite 6 Feature Matrix The table below shows which Optimizeit product components are available in Borland Optimizeit Enterprise Suite and which are available in Borland Optimizeit
More informationBatch Systems & Parallel Application Launchers Running your jobs on an HPC machine
Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike
More informationQualcomm Snapdragon Profiler
Qualcomm Technologies, Inc. Qualcomm Snapdragon Profiler User Guide September 21, 2018 Qualcomm Snapdragon is a product of Qualcomm Technologies, Inc. Other Qualcomm products referenced herein are products
More informationHPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)
HPC and IT Issues Session Agenda Deployment of Simulation (Trends and Issues Impacting IT) Discussion Mapping HPC to Performance (Scaling, Technology Advances) Discussion Optimizing IT for Remote Access
More informationAccelerator programming with OpenACC
..... Accelerator programming with OpenACC Colaboratorio Nacional de Computación Avanzada Jorge Castro jcastro@cenat.ac.cr 2018. Agenda 1 Introduction 2 OpenACC life cycle 3 Hands on session Profiling
More informationOracle Developer Studio 12.6
Oracle Developer Studio 12.6 Oracle Developer Studio is the #1 development environment for building C, C++, Fortran and Java applications for Oracle Solaris and Linux operating systems running on premises
More informationDebugging on Intel Platforms
White Paper Robert Mueller-Albrecht Developer Products Division Intel Corporation Debugging on Intel Platforms Introduction...3 Overview...3 Servers and Workstations...4 Support for Linux*, Mac OS X*,
More informationThe PAPI Cross-Platform Interface to Hardware Performance Counters
The PAPI Cross-Platform Interface to Hardware Performance Counters Kevin London, Shirley Moore, Philip Mucci, and Keith Seymour University of Tennessee-Knoxville {london, shirley, mucci, seymour}@cs.utk.edu
More informationIntroduction to Performance Tuning & Optimization Tools
Introduction to Performance Tuning & Optimization Tools a[i] a[i+1] + a[i+2] a[i+3] b[i] b[i+1] b[i+2] b[i+3] = a[i]+b[i] a[i+1]+b[i+1] a[i+2]+b[i+2] a[i+3]+b[i+3] Ian A. Cosden, Ph.D. Manager, HPC Software
More informationCoding Tools. (Lectures on High-performance Computing for Economists VI) Jesús Fernández-Villaverde 1 and Pablo Guerrón 2 March 25, 2018
Coding Tools (Lectures on High-performance Computing for Economists VI) Jesús Fernández-Villaverde 1 and Pablo Guerrón 2 March 25, 2018 1 University of Pennsylvania 2 Boston College Compilers Compilers
More informationTotalView 2018 Release Notes
These release notes contain a summary of new features and enhancements, late-breaking product issues, migration from earlier releases, and bug fixes. PLEASE NOTE: The version of this document in the product
More informationIntel Parallel Studio 2011
THE ULTIMATE ALL-IN-ONE PERFORMANCE TOOLKIT Studio 2011 Product Brief Studio 2011 Accelerate Development of Reliable, High-Performance Serial and Threaded Applications for Multicore Studio 2011 is a comprehensive
More informationBeyond Hardware IP An overview of Arm development solutions
Beyond Hardware IP An overview of Arm development solutions 2018 Arm Limited Arm Technical Symposia 2018 Advanced first design cost (US$ million) IC design complexity and cost aren t slowing down 542.2
More informationDebugging. John Lockman Texas Advanced Computing Center
Debugging John Lockman Texas Advanced Computing Center Debugging Outline GDB Basic use Attaching to a running job DDT Identify MPI problems using Message Queues Catch memory errors PTP For the extremely
More informationAMD CodeXL 1.3 GA Release Notes
AMD CodeXL 1.3 GA Release Notes Thank you for using CodeXL. We appreciate any feedback you have! Please use the CodeXL Forum to provide your feedback. You can also check out the Getting Started guide on
More informationCRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar
CRAY XK6 REDEFINING SUPERCOMPUTING - Sanjana Rakhecha - Nishad Nerurkar CONTENTS Introduction History Specifications Cray XK6 Architecture Performance Industry acceptance and applications Summary INTRODUCTION
More informationTotalView. Users Guide. August 2001 Version 5.0
TotalView Users Guide August 2001 Version 5.0 Copyright 1999 2001 by Etnus LLC. All rights reserved. Copyright 1998 1999 by Etnus Inc. All rights reserved. Copyright 1996 1998 by Dolphin Interconnect Solutions,
More information