Understanding Dynamic Parallelism Know your code and know yourself Presenter: Mark O Connor, VP Product Management
Agenda Introduction and Background Fixing a Dynamic Parallelism Bug Understanding Dynamic Parallelism Questions & Answers
Allinea The Company Parallel development tools company since 2002 Leading in HPC software tools market worldwide Global customer base Making parallel programming accessible to the widest range of scientists and programmers Design an unrivaled productive and easy-to-use development environment To help you reach the highest level of performance and scalability Define a new standard of customer support
Allinea Unified environment A modern integrated environment for HPC developers Supporting the lifecycle of application development and improvement Allinea DDT : Productively debug code Allinea MAP : Enhance application performance Designed for productivity Consistent easy to use tools Enables effective HPC development Improve system usage Fewer failed jobs Higher application performance
Unified building blocks in production since 2010 Shared Graphical Interface Shared Configuration Files Shared Scalable Architecture Shared Intelligence and Data Consolidation
Allinea MAP Increase application performance Parallel profiler designed for: C/C++, Fortran Multiprocess code Interdependent or independent processes Multithreaded code Monitor the main threads for each process Accelerated codes GPUs, Intel Xeon Phi Improve productivity : Helps you detect performance issues quickly and easily Tells you immediately where your time is spent in your source code Helps you to optimize your application efficiently
Allinea MAP Find performance issues quickly Look at the entire application on real data sets Visualize the entire run at full scale, not just reduced sets Zoom in to explore iterations, functions and loops Understand the nature of bottlenecks Source code viewer pinpoints bottleneck locations CPU, MPI and memory access metrics identify the cause
Allinea DDT Fix software problems - fast Graphical debugger designed for: C/C++, Fortran, UPC, CUDA Multithreaded code Single address space Multiprocess code Interdependent or independent processes Accelerated codes GPUs, Intel Xeon Phi Any mix of the above Slash your time to debug : Reproduces and triggers your bugs instantly Helps you easily understand where issues come from quickly Helps you to fix them as swiftly as possible
Allinea DDT Scalable debugging by design Where did it happen? Allinea DDT leaps to source automatically Merges stacks from processes and threads How did it happen? Some faults evident instantly from source Why did it happen? Real-time data comparison and consolidation Unique Smart Highlighting coloring differences and changes Sparklines comparing data across processes
New in Allinea DDT 4.1 Debug problems even quicker Debugging logbook Records debugging activity Compare runs side-by-side Extends offline debugging capabilities Benefit : Compare sane runs to buggy runs to quickly narrow down your problem.
New in Allinea DDT 4.1 Debug problems even quicker
New in Allinea DDT 4.1 Debug problems even quicker Version control integration Highlights where source code has been changed Source code annotated with a change heatmap Support for Mercurial, CVS, SVN, Git Benefit : Quickly identify the cause of regressions by seeing at a glance what has changed
New in Allinea DDT 4.1 Debug problems even quicker
New in Allinea DDT 4.1 Tighten the link with VisIt Visualization enhancements Pick cells and interact with them in the debugger e.g. set a watchpoint Display of multiple datasets Wizard to guide data layout Benefit: Link visualization to precise memory areas to shorten the debugging process
Leading the way to Innovation Support for accelerated environments CUDA 5.0 and Kepler 20 Intel Xeon Phi Coprocessor GPU directives (both OpenACC and non-openacc) Support for complex architectures Debug and profile MPI, OpenMP and CUDA combinations Supports low power CPU architectures (Moonshot program) Support for all major compilers, MPI and OpenMP implementations Murex : NVIDIA Carma Dev Kit Quick resolution of our customer issues 90% of support tickets are resolved within 7 days University of Gent
Today: Debugging Dynamic Parallelism on K20
Debugging Dynamic Parallelism
Debugging Dynamic Parallelism
Debugging Dynamic Parallelism wait, what?
Debugging Dynamic Parallelism
Debugging is About Understanding What actually happens here? Which values are put into data and when? What's the relationship between n and data? How many kernels are launched?
Allinea DDT + MAP An integrated, ready-to-run development suite Get correct results fast using the industry-leading parallel debugger Full support for NVIDIA CUDA 5 See which loops can be offloaded to the GPU most effectively with Allinea MAP
Questions and Answers Mark O Connor, VP Product Management Robert Rick, VP Sales, Director of Operations, Americas
Upcoming GTC Express Webinars July 10 Introduction to the CUDA Toolkit as an Application Build Tool Adam DeConinck, HPC Systems Engineer, NVIDIA July 11 Uncovering the Elusive HIV Capsid with Kepler GPUs Juan R. Perilla, Postdoctoral Fellow, University of Illinois at Urbana-Champaign Register at www.gputechconf.com/gtcexpress