TotalView on IBM PowerLE and CORAL Sierra/Summit

Size: px
Start display at page:

Download "TotalView on IBM PowerLE and CORAL Sierra/Summit"

Transcription

1 TotalView on IBM PowerLE and CORAL Sierra/Summit Martin Bakal ScicomP 5/25/2016

2 Agenda Corporate Overview Coral Milestones TotalView New architecture Demo Questions

3 Company snapshot We are the largest independent provider of crossplatform software development tools and embedded components Founded: Headquarters: Employees: Offices Worldwide: 1989 Louisville, CO Our capabilities cover different languages, code bases, and platforms. We meet development where and how it happens.

4 Meeting customer needs with capabilities

5 Our products and services Tools Libraries Klocwork On-the-fly static code analysis for app security CodeDynamics Commercial dynamic analysis OpenLogic Support Enterprise-grade SLA support OpenLogicAudits Detailed open source license and security risk guidance TotalView for HPC Scalable debugging Zend Server Enterprise PHP app server Zend Studio PHP IDE Zend Guard PHP encoding and obfuscation SourcePro OS, database, network, and analysis abstraction for C++ Visualization Real-time data visualization at scale PV-WAVE Visual data analysis IMSL Numerical Libraries Scalable math and statistics algorithms HydraExpress SOA/C++ modernization framework HostAccess Terminal emulation for Windows Stingray MFC GUI components

6 TotalView for HPC Comprehensive multi-core and multi-threaded analysis and debug environment Thread specific breakpoints Control individual thread execution View thread specific stack and data View complex data types easily Integrated Reverse debugging Track memory leaks in running applications Supports C/C++ on Linux Allowing the business to have Predictable development schedules Less time spent debugging Platform coverage Linux, BG/Q, CUDA GPUs, Xeon Phi, Linux-PowerLE with GPUs, etc

7 LLNL/Sierra Focus Areas Collaborative work Rogue Wave, LLNL, IBM, Nvidia, RWTH Aachen Focuses on three areas OpenMP 4 + GPUs debugging MPI+GPU debugger performance and scalability EVAL (conditional breakpoints) performance and scalability DO NOT COPY OR REDISTRIBUTE WITHOUT WRITTEN PERMISSION 7

8 OpenMP 4 + GPUs Debugging OpenMP 4 debugging support (CPUs and GPUs) for Sierra Collaborate on OpenMP Debug API (OMPD) design Three phases Phase 1: TotalView/OMPD: OMP3.1/CPU, x86_64 Phase 2: TotalView/OMPD: OMP4/CPU/GPU, x86_64 Phase 3: TotalView/OMPD: OMP4/CPU/GPU, PowerLE Phase 1 progress to-date follows Draft of OMPD for OpenMP 3.1 completed RWTH Aachen implemented OMPD DLL for Intel OpenMP RTL TotalView/OMPD feature development progressing DO NOT COPY OR REDISTRIBUTE WITHOUT WRITTEN PERMISSION 8

9 OMP Control Vars & Meta Info Intel OMPD DLL currently returns no control variable information Meta information shows version #, ID, and DLL path DO NOT COPY OR REDISTRIBUTE WITHOUT WRITTEN PERMISSION 9

10 OMP Parallel Regions Parallel region hierarchy at thread, process and group widths Aggregated, process/thread list: #p:#t[dpid-range.dtid-range, ] DO NOT COPY OR REDISTRIBUTE WITHOUT WRITTEN PERMISSION 10

11 OMP Task Regions Task region display is similar to parallel region display, but shows the task relationships DO NOT COPY OR REDISTRIBUTE WITHOUT WRITTEN PERMISSION 11

12 OMP Threads Thread-centric views of information available from OMPD DO NOT COPY OR REDISTRIBUTE WITHOUT WRITTEN PERMISSION 12

13 OMP Stack Filtering Raw, unfiltered stack displays the OMP RTL stack frames OMP RTL frames typically uninteresting to users DO NOT COPY OR REDISTRIBUTE WITHOUT WRITTEN PERMISSION 13

14 OMP Stack Filtering OMPD allows the debugger to portably find and filter-out OMP RTL frames DO NOT COPY OR REDISTRIBUTE WITHOUT WRITTEN PERMISSION 14

15 OMP Master/Slave Stack Linking Stack hyperlink connects a slave s thread frame to its master s thread frame Selecting the frame jumps to the parent thread and stack frame that invoked the parallel region DO NOT COPY OR REDISTRIBUTE WITHOUT WRITTEN PERMISSION 15

16 OMP Master/Slave Stack Linking Clicking again climbs the parallel region tree, focusing on its parent DO NOT COPY OR REDISTRIBUTE WITHOUT WRITTEN PERMISSION 16

17 OMP Master/Slave Stack Linking Now at the root of the OMP parallel region tree DO NOT COPY OR REDISTRIBUTE WITHOUT WRITTEN PERMISSION 17

18 OMP Mangled Outlined-Functions OMP outlined-function name mangling is not standard DWARF could connect an outlined function to its containing function E.g., DW_AT_omp_outlined <containing-die> Instead of L_func_42 par_region0_1_2 Debugger could reliably show something like func (parallel region 1 at file.c#42) Needed A DWARF OpenMP proposal A compiler developer to produce the DWARF DO NOT COPY OR REDISTRIBUTE WITHOUT WRITTEN PERMISSION 18

19 OMP Variable Information Users have asked for OpenMP variable information E.g., private, shared, firstprivate, copyin, reduction, etc. Compile-time attributes of the variable that the compiler knows DWARF could represent these attributes DO NOT COPY OR REDISTRIBUTE WITHOUT WRITTEN PERMISSION 19

20 OMP4 + GPUs OMP4 compilers currently produce no DWARF for GPU code IBM is working on a solution OMPD currently supports only OpenMP 3.1 (no GPUs) Specification must be extended for OpenMP 4 + GPU Seeking OMP4+GPU OMPD implementation TotalView modifications DEVICE and TARGET region support CUDA/GPU support Depends on the OMP RTL execution model DO NOT COPY OR REDISTRIBUTE WITHOUT WRITTEN PERMISSION 20

21 OpenMP 4 Loose Ends Help push toward OMPD standardization OMPD for IBM/LOMP When IBM implements the DLL TotalView should be able to just use it OMP aggregated logical call tree Reassemble the structure of an executing OMP4 program into a logical call tree DO NOT COPY OR REDISTRIBUTE WITHOUT WRITTEN PERMISSION 21

22 MPI+GPU Debugging at Scale Performance, scalability, and functionality on MPI+GPU targets Two phases of Application Driven Tuning (ADT) with GPUs Phase 1: Linux-x86_64 (LULESH/RAJA, HYPRE, LAMMPS) Phase 2: Linux-PowerLE (other benchmarks) NVidia CILP allows MPI processes to share GPUs on a node CILP (hardware pre-emption) CUDA Debug API Limitations Requires creating a debug agent process per target process TotalView creates a bushier MRNet tree Future work Fix the API to support true multi-process debugging Add support for MPS debugging DO NOT COPY OR REDISTRIBUTE WITHOUT WRITTEN PERMISSION 22

23 EVAL Point Performance and Scalability Support evaluating conditional breakpoints in the debugger servers Allows the interpreter to run in parallel in the servers Client contains heavyweight stuff: symbol data, compilers, IL generator Server remains lightweight adding a small IL interpreter TV Client Lex, Parse, Compile, Generate IL IL Interpreter Broadcast IL TV Server TV TV Server Server IL TV Server Interpreter IL TV Server IL TV Server Interpreter Interpreter IL TV Server TV Server IL Interpreter IL Interpreter IL Interpreter IL Interpreter Interpreter DO NOT COPY OR REDISTRIBUTE WITHOUT WRITTEN PERMISSION 23

24 Aggregation TotalView has been enhanced to add new types of aggregation Aggregated process and thread status New root window CLI dstatus Aggregated stack back trace Graphical call tree window CLI dwhere Aggregated data CLI dprint DO NOT COPY OR REDISTRIBUTE WITHOUT WRITTEN PERMISSION 24

25 Focus on Data Aggregation TotalView dprint command Allows the ability to get data about data and array from all threads pretty easily Added support for aggregated data collection On CUDA still prints for each thread dprint -g agg_1 Variable Focus: 64:32000[ ] 0x (0) : 64:16000[0-63.1, , , , ,...] 0x (1) : 64:16000[0-63.2, , , , ,...] One line for eachunique value of the variable # of MPI rank:# of threads[mpi rank range.threads Portable across platforms and will be supported on Linux-PowerLE CORAL Sierra/Summit Data on specific value

26 Licensing Licensing is an issue Flexera doesn t support Power with their FlexNet Publisher product This means we have to time bomb the product after a year 26

27 New architecture

28 New Architecture in building UI Used to be one large application Front-End Back-End Process Qt 4 Based Front-End TotalView Debug Interface (TVDI) Transport Mechanism Communication Channel TotalView Debug Wire Protocol (TVDWP) Transport Mechanism TotalView Debug Engine Interface (TVDEI) TotalView Debugger Engine (TVDE) TotalView Debug Server Process Process

29 Why does the architecture matter This isn t short term thinking No need for XWindows on target platforms Performance Scalability Platforms More effective debugger on all platforms Easier 3rd party integrations

30 30

31 Multi-threading made easier How do you debug a problem in a 50 thread application that occurs in 1 thread? Without TotalView, Set a breakpoint in code Run and hope you hit the right thread With TotalView set thread specific breakpoints Better multithreaded debugger Understand the state of all of your threads Focus on specific threads View stack and data Built to scale to HPC and leveraging that in mainstream commercial envs

32 Viewing complex data types How do you inspect complex data types for changes? Without TotalView, Look through pointer at memory Map to the data structures Recreate the data type by hand With TotalView View the data structure directly users get to focus on debugging Complex data types support includes STL collections Large multi-dimensional arrays Boost collection classes C++ 11 specific types

33 Reverse debugging How do you isolate an intermittent a failure? Without TotalView, Set a breakpoint in code Realize you ran past the problem Re-load Set breakpoint earlier Hope it fails Keep repeating With TotalView Start recording Set a breakpoint See failure Run backwards/forwards in context of failing execution Reverse Debugging Re-creates the context when going backwards Focus down to a specific problem area easily Saves days in recreating a failure

34 Memory Analysis How do you identify buffer overflows? Runtime Memory Analysis : Eliminate Memory Errors Detects memory leaks before they are a problem Explore heap memory usage Features Detects Malloc API misuse Memory leaks Buffer overflows Low runtime overhead Easy to use Works with vendor libraries No recompilation No instrumentation

35 Regression Testing How do you make sure a bug you fixed never returns? Build a regression test Issue is it typically is time consuming What is the method to build a regression test? Use the tools that helped you find it How do you run a regression test? Invoke it during your build process Enter TotalView scripts Command line driven Access to application internals Same commands as in the debugger!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Print!! Process:!./server (Debugger Process ID: 1, System ID: 12110)! Thread:! Debugger ID: 1.1, System ID: ! Time Stamp:! :04:09! Triggered from event:! actionpoint! Results:! foreign_addr = {! sin_family = 0x0002 (2)! sin_port = 0x1fb6 (8118)! sin_addr = {! s_addr = 0x6658a8c0 ( )! }! sin_zero = ""! }!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

36 Scales to meet your need Support debugging on thousands of cores MRNet is built to multicast Aggregates data to/from cores Remote Display Client Debug on a remote machine Easy to configure Focus on your debugging

Improving the Productivity of Scalable Application Development with TotalView May 18th, 2010

Improving the Productivity of Scalable Application Development with TotalView May 18th, 2010 Improving the Productivity of Scalable Application Development with TotalView May 18th, 2010 Chris Gottbrath Principal Product Manager Rogue Wave Major Product Offerings 2 TotalView Technologies Family

More information

Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems. Ed Hinkel Senior Sales Engineer

Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems. Ed Hinkel Senior Sales Engineer Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems Ed Hinkel Senior Sales Engineer Agenda Overview - Rogue Wave & TotalView GPU Debugging with TotalView Nvdia CUDA Intel Phi 2

More information

ECMWF Workshop on High Performance Computing in Meteorology. 3 rd November Dean Stewart

ECMWF Workshop on High Performance Computing in Meteorology. 3 rd November Dean Stewart ECMWF Workshop on High Performance Computing in Meteorology 3 rd November 2010 Dean Stewart Agenda Company Overview Rogue Wave Product Overview IMSL Fortran TotalView Debugger Acumem ThreadSpotter 1 Copyright

More information

Facing the challenges of. New Approaches To Debugging Complex Codes! Ed Hinkel, Sales Engineer Rogue Wave Software

Facing the challenges of. New Approaches To Debugging Complex Codes! Ed Hinkel, Sales Engineer Rogue Wave Software Facing the challenges of or New Approaches To Debugging Complex Codes! Ed Hinkel, Sales Engineer Rogue Wave Software Agenda Introduction Rogue Wave! TotalView! Approaching the Debugging Challenge! 1 TVScript

More information

GPU Technology Conference Three Ways to Debug Parallel CUDA Applications: Interactive, Batch, and Corefile

GPU Technology Conference Three Ways to Debug Parallel CUDA Applications: Interactive, Batch, and Corefile GPU Technology Conference 2015 Three Ways to Debug Parallel CUDA Applications: Interactive, Batch, and Corefile Three Ways to Debug Parallel CUDA Applications: Interactive, Batch, and Corefile What do

More information

Debugging scalable hybrid and accelerated applications on the Cray XC30 and CS300 with TotalView

Debugging scalable hybrid and accelerated applications on the Cray XC30 and CS300 with TotalView Debugging scalable hybrid and accelerated applications on the Cray XC30 and CS300 with TotalView Agenda Introduction Rogue Wave Update OpenLogic Klocwork Totalview Overview NVIDIA and Xeon Phi Memory Debugging

More information

Debugging Programs Accelerated with Intel Xeon Phi Coprocessors

Debugging Programs Accelerated with Intel Xeon Phi Coprocessors Debugging Programs Accelerated with Intel Xeon Phi Coprocessors A White Paper by Rogue Wave Software. Rogue Wave Software 5500 Flatiron Parkway, Suite 200 Boulder, CO 80301, USA www.roguewave.com Debugging

More information

Scalable Debugging with TotalView on Blue Gene. John DelSignore, CTO TotalView Technologies

Scalable Debugging with TotalView on Blue Gene. John DelSignore, CTO TotalView Technologies Scalable Debugging with TotalView on Blue Gene John DelSignore, CTO TotalView Technologies Agenda TotalView on Blue Gene A little history Current status Recent TotalView improvements ReplayEngine (reverse

More information

CodeDynamics 2018 Release Notes

CodeDynamics 2018 Release Notes These release notes contain a summary of new features and enhancements, late-breaking product issues, migration from earlier releases, and bug fixes. PLEASE NOTE: The version of this document in the product

More information

Jackson Marusarz Software Technical Consulting Engineer

Jackson Marusarz Software Technical Consulting Engineer Jackson Marusarz Software Technical Consulting Engineer What Will Be Covered Overview Memory/Thread analysis New Features Deep dive into debugger integrations Demo Call to action 2 Analysis Tools for Diagnosis

More information

TotalView Release Notes

TotalView Release Notes These release notes contain a summary of new features and enhancements, late-breaking product issues, migration from earlier releases, and bug fixes. PLEASE NOTE: The version of this document in the product

More information

Intel Parallel Studio XE 2015

Intel Parallel Studio XE 2015 2015 Create faster code faster with this comprehensive parallel software development suite. Faster code: Boost applications performance that scales on today s and next-gen processors Create code faster:

More information

Debugging and Optimizing Programs Accelerated with Intel Xeon Phi Coprocessors

Debugging and Optimizing Programs Accelerated with Intel Xeon Phi Coprocessors Debugging and Optimizing Programs Accelerated with Intel Xeon Phi Coprocessors Chris Gottbrath Rogue Wave Software Boulder, CO Chris.Gottbrath@roguewave.com Abstract Intel Xeon Phi coprocessors present

More information

Memory & Thread Debugger

Memory & Thread Debugger Memory & Thread Debugger Here is What Will Be Covered Overview Memory/Thread analysis New Features Deep dive into debugger integrations Demo Call to action Intel Confidential 2 Analysis Tools for Diagnosis

More information

TotalView 2018 Release Notes

TotalView 2018 Release Notes These release notes contain a summary of new features and enhancements, late-breaking product issues, migration from earlier releases, and bug fixes. PLEASE NOTE: The version of this document in the product

More information

TotalView. Debugging Tool Presentation. Josip Jakić

TotalView. Debugging Tool Presentation. Josip Jakić TotalView Debugging Tool Presentation Josip Jakić josipjakic@ipb.ac.rs Agenda Introduction Getting started with TotalView Primary windows Basic functions Further functions Debugging parallel programs Topics

More information

Allinea Unified Environment

Allinea Unified Environment Allinea Unified Environment Allinea s unified tools for debugging and profiling HPC Codes Beau Paisley Allinea Software bpaisley@allinea.com 720.583.0380 Today s Challenge Q: What is the impact of current

More information

OpenMP Tool Interfaces in OpenMP 5.0. Joachim Protze

OpenMP Tool Interfaces in OpenMP 5.0. Joachim Protze (protze@itc.rwth-aachen.de) Motivation Motivation Tool support essential for program development Users expect tool support, especially in today s complicated systems Question of productivity Users want

More information

Debugging CUDA Applications with Allinea DDT. Ian Lumb Sr. Systems Engineer, Allinea Software Inc.

Debugging CUDA Applications with Allinea DDT. Ian Lumb Sr. Systems Engineer, Allinea Software Inc. Debugging CUDA Applications with Allinea DDT Ian Lumb Sr. Systems Engineer, Allinea Software Inc. ilumb@allinea.com GTC 2013, San Jose, March 20, 2013 Embracing GPUs GPUs a rival to traditional processors

More information

Tools and Methodology for Ensuring HPC Programs Correctness and Performance. Beau Paisley

Tools and Methodology for Ensuring HPC Programs Correctness and Performance. Beau Paisley Tools and Methodology for Ensuring HPC Programs Correctness and Performance Beau Paisley bpaisley@allinea.com About Allinea Over 15 years of business focused on parallel programming development tools Strong

More information

Large Scale Debugging

Large Scale Debugging Large Scale Debugging Project Meeting Report - December 2015 Didier Nadeau Under the supervision of Michel Dagenais Distributed Open Reliable Systems Analysis Lab École Polytechnique de Montréal Table

More information

Eliminate Threading Errors to Improve Program Stability

Eliminate Threading Errors to Improve Program Stability Eliminate Threading Errors to Improve Program Stability This guide will illustrate how the thread checking capabilities in Parallel Studio can be used to find crucial threading defects early in the development

More information

TotalView Release Notes

TotalView Release Notes These release notes contain a summary of new features and enhancements, late-breaking product issues, migration from earlier releases, and bug fixes. PLEASE NOTE: The version of this document in the product

More information

Debugging Intel Xeon Phi KNC Tutorial

Debugging Intel Xeon Phi KNC Tutorial Debugging Intel Xeon Phi KNC Tutorial Last revised on: 10/7/16 07:37 Overview: The Intel Xeon Phi Coprocessor 2 Debug Library Requirements 2 Debugging Host-Side Applications that Use the Intel Offload

More information

Oracle Developer Studio Code Analyzer

Oracle Developer Studio Code Analyzer Oracle Developer Studio Code Analyzer The Oracle Developer Studio Code Analyzer ensures application reliability and security by detecting application vulnerabilities, including memory leaks and memory

More information

Eliminate Threading Errors to Improve Program Stability

Eliminate Threading Errors to Improve Program Stability Introduction This guide will illustrate how the thread checking capabilities in Intel Parallel Studio XE can be used to find crucial threading defects early in the development cycle. It provides detailed

More information

Debugging with TotalView

Debugging with TotalView Debugging with TotalView Dieter an Mey Center for Computing and Communication Aachen University of Technology anmey@rz.rwth-aachen.de 1 TotalView, Dieter an Mey, SunHPC 2006 Debugging on Sun dbx line mode

More information

AutoTune Workshop. Michael Gerndt Technische Universität München

AutoTune Workshop. Michael Gerndt Technische Universität München AutoTune Workshop Michael Gerndt Technische Universität München AutoTune Project Automatic Online Tuning of HPC Applications High PERFORMANCE Computing HPC application developers Compute centers: Energy

More information

Introduction to debugging. Martin Čuma Center for High Performance Computing University of Utah

Introduction to debugging. Martin Čuma Center for High Performance Computing University of Utah Introduction to debugging Martin Čuma Center for High Performance Computing University of Utah m.cuma@utah.edu Overview Program errors Simple debugging Graphical debugging DDT and Totalview Intel tools

More information

Using Intel VTune Amplifier XE and Inspector XE in.net environment

Using Intel VTune Amplifier XE and Inspector XE in.net environment Using Intel VTune Amplifier XE and Inspector XE in.net environment Levent Akyil Technical Computing, Analyzers and Runtime Software and Services group 1 Refresher - Intel VTune Amplifier XE Intel Inspector

More information

Eliminate Memory Errors to Improve Program Stability

Eliminate Memory Errors to Improve Program Stability Eliminate Memory Errors to Improve Program Stability This guide will illustrate how Parallel Studio memory checking capabilities can find crucial memory defects early in the development cycle. It provides

More information

Copyright Khronos Group, Page Graphic Remedy. All Rights Reserved

Copyright Khronos Group, Page Graphic Remedy. All Rights Reserved Avi Shapira Graphic Remedy Copyright Khronos Group, 2009 - Page 1 2004 2009 Graphic Remedy. All Rights Reserved Debugging and profiling 3D applications are both hard and time consuming tasks Companies

More information

Evolving HPCToolkit John Mellor-Crummey Department of Computer Science Rice University Scalable Tools Workshop 7 August 2017

Evolving HPCToolkit John Mellor-Crummey Department of Computer Science Rice University   Scalable Tools Workshop 7 August 2017 Evolving HPCToolkit John Mellor-Crummey Department of Computer Science Rice University http://hpctoolkit.org Scalable Tools Workshop 7 August 2017 HPCToolkit 1 HPCToolkit Workflow source code compile &

More information

6.S096 Lecture 10 Course Recap, Interviews, Advanced Topics

6.S096 Lecture 10 Course Recap, Interviews, Advanced Topics 6.S096 Lecture 10 Course Recap, Interviews, Advanced Topics Grab Bag & Perspective January 31, 2014 6.S096 Lecture 10 Course Recap, Interviews, Advanced Topics January 31, 2014 1 / 19 Outline 1 Perspective

More information

Oracle Developer Studio 12.6

Oracle Developer Studio 12.6 Oracle Developer Studio 12.6 Oracle Developer Studio is the #1 development environment for building C, C++, Fortran and Java applications for Oracle Solaris and Linux operating systems running on premises

More information

Parallel Programming. Libraries and Implementations

Parallel Programming. Libraries and Implementations Parallel Programming Libraries and Implementations Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Intel VTune Amplifier XE. Dr. Michael Klemm Software and Services Group Developer Relations Division

Intel VTune Amplifier XE. Dr. Michael Klemm Software and Services Group Developer Relations Division Intel VTune Amplifier XE Dr. Michael Klemm Software and Services Group Developer Relations Division Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS. NO LICENSE, EXPRESS

More information

Debugging at Scale Lindon Locks

Debugging at Scale Lindon Locks Debugging at Scale Lindon Locks llocks@allinea.com Debugging at Scale At scale debugging - from 100 cores to 250,000 Problems faced by developers on real systems Alternative approaches to debugging and

More information

TotalView Release Notes

TotalView Release Notes Platform Changes The following new platforms are now supported by TotalView: NVIDIA CUDA 5.0 and 5.5 Mac OS X Mavericks (10.9) Ubuntu 12.04, 12.10 and 13.04 Fedora 19 The following platforms are no longer

More information

Improve Web Application Performance with Zend Platform

Improve Web Application Performance with Zend Platform Improve Web Application Performance with Zend Platform Shahar Evron Zend Sr. PHP Specialist Copyright 2007, Zend Technologies Inc. Agenda Benchmark Setup Comprehensive Performance Multilayered Caching

More information

Parallel Debugging with TotalView BSC-CNS

Parallel Debugging with TotalView BSC-CNS Parallel Debugging with TotalView BSC-CNS AGENDA What debugging means? Debugging Tools in the RES Allinea DDT as alternative (RogueWave Software) What is TotalView Compiling Your Program Starting totalview

More information

Eliminate Memory Errors to Improve Program Stability

Eliminate Memory Errors to Improve Program Stability Introduction INTEL PARALLEL STUDIO XE EVALUATION GUIDE This guide will illustrate how Intel Parallel Studio XE memory checking capabilities can find crucial memory defects early in the development cycle.

More information

TotalView Release Notes

TotalView Release Notes These release notes contain a summary of new features and enhancements, late-breaking product issues, migration from earlier releases, and bug fixes. PLEASE NOTE: The version of this document in the product

More information

Understanding Dynamic Parallelism

Understanding Dynamic Parallelism Understanding Dynamic Parallelism Know your code and know yourself Presenter: Mark O Connor, VP Product Management Agenda Introduction and Background Fixing a Dynamic Parallelism Bug Understanding Dynamic

More information

Overcoming Distributed Debugging Challenges in the MPI+OpenMP Programming Model

Overcoming Distributed Debugging Challenges in the MPI+OpenMP Programming Model Overcoming Distributed Debugging Challenges in the MPI+OpenMP Programming Model Lai Wei, Ignacio Laguna, Dong H. Ahn Matthew P. LeGendre, Gregory L. Lee This work was performed under the auspices of the

More information

CS 553: Algorithmic Language Compilers (PLDI) Graduate Students and Super Undergraduates... Logistics. Plan for Today

CS 553: Algorithmic Language Compilers (PLDI) Graduate Students and Super Undergraduates... Logistics. Plan for Today Graduate Students and Super Undergraduates... CS 553: Algorithmic Language Compilers (PLDI) look for other sources of information make decisions, because all research problems are under-specified evaluate

More information

Debugging with TotalView

Debugging with TotalView Debugging with TotalView Le Yan HPC Consultant User Services Goals Learn how to start TotalView on Linux clusters Get familiar with TotalView graphic user interface Learn basic debugging functions of TotalView

More information

Debugging, benchmarking, tuning i.e. software development tools. Martin Čuma Center for High Performance Computing University of Utah

Debugging, benchmarking, tuning i.e. software development tools. Martin Čuma Center for High Performance Computing University of Utah Debugging, benchmarking, tuning i.e. software development tools Martin Čuma Center for High Performance Computing University of Utah m.cuma@utah.edu SW development tools Development environments Compilers

More information

An Introduction to the SPEC High Performance Group and their Benchmark Suites

An Introduction to the SPEC High Performance Group and their Benchmark Suites An Introduction to the SPEC High Performance Group and their Benchmark Suites Robert Henschel Manager, Scientific Applications and Performance Tuning Secretary, SPEC High Performance Group Research Technologies

More information

Malware

Malware reloaded Malware Research Team @ @xabiugarte Motivation Design principles / architecture Features Use cases Future work Dynamic Binary Instrumentation Techniques to trace the execution of a binary (or

More information

Introduction to Parallel Performance Engineering

Introduction to Parallel Performance Engineering Introduction to Parallel Performance Engineering Markus Geimer, Brian Wylie Jülich Supercomputing Centre (with content used with permission from tutorials by Bernd Mohr/JSC and Luiz DeRose/Cray) Performance:

More information

Performance analysis basics

Performance analysis basics Performance analysis basics Christian Iwainsky Iwainsky@rz.rwth-aachen.de 25.3.2010 1 Overview 1. Motivation 2. Performance analysis basics 3. Measurement Techniques 2 Why bother with performance analysis

More information

Open SpeedShop Capabilities and Internal Structure: Current to Petascale. CScADS Workshop, July 16-20, 2007 Jim Galarowicz, Krell Institute

Open SpeedShop Capabilities and Internal Structure: Current to Petascale. CScADS Workshop, July 16-20, 2007 Jim Galarowicz, Krell Institute 1 Open Source Performance Analysis for Large Scale Systems Open SpeedShop Capabilities and Internal Structure: Current to Petascale CScADS Workshop, July 16-20, 2007 Jim Galarowicz, Krell Institute 2 Trademark

More information

Agenda. Optimization Notice Copyright 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Agenda. Optimization Notice Copyright 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Agenda VTune Amplifier XE OpenMP* Analysis: answering on customers questions about performance in the same language a program was written in Concepts, metrics and technology inside VTune Amplifier XE OpenMP

More information

Developing, Debugging, and Optimizing GPU Codes for High Performance Computing with Allinea Forge

Developing, Debugging, and Optimizing GPU Codes for High Performance Computing with Allinea Forge Developing, Debugging, and Optimizing GPU Codes for High Performance Computing with Allinea Forge Ryan Hulguin Applications Engineer ryan.hulguin@arm.com Agenda Introduction Overview of Allinea Products

More information

Chapter 2: Operating-System Structures

Chapter 2: Operating-System Structures Chapter 2: Operating-System Structures Chapter 2: Operating-System Structures Operating System Services User Operating System Interface System Calls Types of System Calls System Programs Operating System

More information

Intel Parallel Studio 2011

Intel Parallel Studio 2011 THE ULTIMATE ALL-IN-ONE PERFORMANCE TOOLKIT Studio 2011 Product Brief Studio 2011 Accelerate Development of Reliable, High-Performance Serial and Threaded Applications for Multicore Studio 2011 is a comprehensive

More information

TotalView Training. Developing parallel, data-intensive applications is hard. We make it easier. Copyright 2012 Rogue Wave Software, Inc.

TotalView Training. Developing parallel, data-intensive applications is hard. We make it easier. Copyright 2012 Rogue Wave Software, Inc. TotalView Training Developing parallel, data-intensive applications is hard. We make it easier. 1 Agenda Introduction Startup Remote Display Debugging UI Navigation and Process Control Action Points Data

More information

Debugging HPC Applications. David Lecomber CTO, Allinea Software

Debugging HPC Applications. David Lecomber CTO, Allinea Software Debugging HPC Applications David Lecomber CTO, Allinea Software david@allinea.com Agenda Bugs and Debugging Debugging parallel applications Debugging OpenACC and other hybrid codes Debugging for Petascale

More information

Efficiently Introduce Threading using Intel TBB

Efficiently Introduce Threading using Intel TBB Introduction This guide will illustrate how to efficiently introduce threading using Intel Threading Building Blocks (Intel TBB), part of Intel Parallel Studio XE. It is a widely used, award-winning C++

More information

Optimizing Emulator Utilization by Russ Klein, Program Director, Mentor Graphics

Optimizing Emulator Utilization by Russ Klein, Program Director, Mentor Graphics Optimizing Emulator Utilization by Russ Klein, Program Director, Mentor Graphics INTRODUCTION Emulators, like Mentor Graphics Veloce, are able to run designs in RTL orders of magnitude faster than logic

More information

Bugloo: A Source Level Debugger for Scheme Programs Compiled into JVM Bytecode

Bugloo: A Source Level Debugger for Scheme Programs Compiled into JVM Bytecode Bugloo: A Source Level Debugger for Scheme Programs Compiled into JVM Bytecode Damien Ciabrini Manuel Serrano firstname.lastname@sophia.inria.fr INRIA Sophia Antipolis 2004 route des Lucioles - BP 93 F-06902

More information

Performance Tools for Technical Computing

Performance Tools for Technical Computing Christian Terboven terboven@rz.rwth-aachen.de Center for Computing and Communication RWTH Aachen University Intel Software Conference 2010 April 13th, Barcelona, Spain Agenda o Motivation and Methodology

More information

Outline. Threads. Single and Multithreaded Processes. Benefits of Threads. Eike Ritter 1. Modified: October 16, 2012

Outline. Threads. Single and Multithreaded Processes. Benefits of Threads. Eike Ritter 1. Modified: October 16, 2012 Eike Ritter 1 Modified: October 16, 2012 Lecture 8: Operating Systems with C/C++ School of Computer Science, University of Birmingham, UK 1 Based on material by Matt Smart and Nick Blundell Outline 1 Concurrent

More information

Debugging with GDB and DDT

Debugging with GDB and DDT Debugging with GDB and DDT Ramses van Zon SciNet HPC Consortium University of Toronto June 28, 2012 1/41 Ontario HPC Summerschool 2012 Central Edition: Toronto Outline Debugging Basics Debugging with the

More information

Little Motivation Outline Introduction OpenMP Architecture Working with OpenMP Future of OpenMP End. OpenMP. Amasis Brauch German University in Cairo

Little Motivation Outline Introduction OpenMP Architecture Working with OpenMP Future of OpenMP End. OpenMP. Amasis Brauch German University in Cairo OpenMP Amasis Brauch German University in Cairo May 4, 2010 Simple Algorithm 1 void i n c r e m e n t e r ( short a r r a y ) 2 { 3 long i ; 4 5 for ( i = 0 ; i < 1000000; i ++) 6 { 7 a r r a y [ i ]++;

More information

Noise Injection Techniques to Expose Subtle and Unintended Message Races

Noise Injection Techniques to Expose Subtle and Unintended Message Races Noise Injection Techniques to Expose Subtle and Unintended Message Races PPoPP2017 February 6th, 2017 Kento Sato, Dong H. Ahn, Ignacio Laguna, Gregory L. Lee, Martin Schulz and Christopher M. Chambreau

More information

Debugging OpenMP Programs

Debugging OpenMP Programs Debugging OpenMP Programs Dieter an Mey Center for Computing and Communication Aachen University anmey@rz.rwth-aachen.de aachen.de 1 Debugging OpenMP Programs General Hints dbx Sun IDE Debugger TotalView

More information

Intel Xeon Phi архитектура, модели программирования, оптимизация.

Intel Xeon Phi архитектура, модели программирования, оптимизация. Нижний Новгород, 2017 Intel Xeon Phi архитектура, модели программирования, оптимизация. Дмитрий Прохоров, Дмитрий Рябцев, Intel Agenda What and Why Intel Xeon Phi Top 500 insights, roadmap, architecture

More information

Performance Profiler. Klaus-Dieter Oertel Intel-SSG-DPD IT4I HPC Workshop, Ostrava,

Performance Profiler. Klaus-Dieter Oertel Intel-SSG-DPD IT4I HPC Workshop, Ostrava, Performance Profiler Klaus-Dieter Oertel Intel-SSG-DPD IT4I HPC Workshop, Ostrava, 08-09-2016 Faster, Scalable Code, Faster Intel VTune Amplifier Performance Profiler Get Faster Code Faster With Accurate

More information

CHAPTER 2: SYSTEM STRUCTURES. By I-Chen Lin Textbook: Operating System Concepts 9th Ed.

CHAPTER 2: SYSTEM STRUCTURES. By I-Chen Lin Textbook: Operating System Concepts 9th Ed. CHAPTER 2: SYSTEM STRUCTURES By I-Chen Lin Textbook: Operating System Concepts 9th Ed. Chapter 2: System Structures Operating System Services User Operating System Interface System Calls Types of System

More information

Oracle Developer Studio Performance Analyzer

Oracle Developer Studio Performance Analyzer Oracle Developer Studio Performance Analyzer The Oracle Developer Studio Performance Analyzer provides unparalleled insight into the behavior of your application, allowing you to identify bottlenecks and

More information

P17 System Testing Monday, September 24, 2007

P17 System Testing Monday, September 24, 2007 IBM Software Group P17 System Testing Monday, September 24, 2007 Module 8 : IBM Rational Testing Solutions Marty Swafford IBM Rational Software IBM Certified Solution Designer - Rational Manual Tester,

More information

Using Intel Inspector XE 2011 with Fortran Applications

Using Intel Inspector XE 2011 with Fortran Applications Using Intel Inspector XE 2011 with Fortran Applications Jackson Marusarz Intel Corporation Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS

More information

OPENMP TIPS, TRICKS AND GOTCHAS

OPENMP TIPS, TRICKS AND GOTCHAS OPENMP TIPS, TRICKS AND GOTCHAS OpenMPCon 2015 2 Directives Mistyping the sentinel (e.g.!omp or #pragma opm ) typically raises no error message. Be careful! Extra nasty if it is e.g. #pragma opm atomic

More information

OPENMP TIPS, TRICKS AND GOTCHAS

OPENMP TIPS, TRICKS AND GOTCHAS OPENMP TIPS, TRICKS AND GOTCHAS Mark Bull EPCC, University of Edinburgh (and OpenMP ARB) markb@epcc.ed.ac.uk OpenMPCon 2015 OpenMPCon 2015 2 A bit of background I ve been teaching OpenMP for over 15 years

More information

J2EE Development Best Practices: Improving Code Quality

J2EE Development Best Practices: Improving Code Quality Session id: 40232 J2EE Development Best Practices: Improving Code Quality Stuart Malkin Senior Product Manager Oracle Corporation Agenda Why analyze and optimize code? Static Analysis Dynamic Analysis

More information

Introduction to pthreads

Introduction to pthreads CS 220: Introduction to Parallel Computing Introduction to pthreads Lecture 25 Threads In computing, a thread is the smallest schedulable unit of execution Your operating system has a scheduler that decides

More information

Intel VTune Amplifier XE

Intel VTune Amplifier XE Intel VTune Amplifier XE Vladimir Tsymbal Performance, Analysis and Threading Lab 1 Agenda Intel VTune Amplifier XE Overview Features Data collectors Analysis types Key Concepts Collecting performance

More information

This guide will show you how to use Intel Inspector XE to identify and fix resource leak errors in your programs before they start causing problems.

This guide will show you how to use Intel Inspector XE to identify and fix resource leak errors in your programs before they start causing problems. Introduction A resource leak refers to a type of resource consumption in which the program cannot release resources it has acquired. Typically the result of a bug, common resource issues, such as memory

More information

Remote Health Monitoring for an Embedded System

Remote Health Monitoring for an Embedded System July 20, 2012 Remote Health Monitoring for an Embedded System Authors: Puneet Gupta, Kundan Kumar, Vishnu H Prasad 1/22/2014 2 Outline Background Background & Scope Requirements Key Challenges Introduction

More information

Allinea DDT Debugger. Dan Mazur, McGill HPC March 5,

Allinea DDT Debugger. Dan Mazur, McGill HPC  March 5, Allinea DDT Debugger Dan Mazur, McGill HPC daniel.mazur@mcgill.ca guillimin@calculquebec.ca March 5, 2015 1 Outline Introduction and motivation Guillimin login and DDT configuration Compiling for a debugger

More information

NightStar. NightView Source Level Debugger. Real-Time Linux Debugging and Analysis Tools BROCHURE

NightStar. NightView Source Level Debugger. Real-Time Linux Debugging and Analysis Tools BROCHURE NightStar Real-Time Linux Debugging and Analysis Tools Concurrent s NightStar is a powerful, integrated tool set for debugging and analyzing time-critical Linux applications. NightStar tools run with minimal

More information

Parallel Programming and Debugging with CUDA C. Geoff Gerfin Sr. System Software Engineer

Parallel Programming and Debugging with CUDA C. Geoff Gerfin Sr. System Software Engineer Parallel Programming and Debugging with CUDA C Geoff Gerfin Sr. System Software Engineer CUDA - NVIDIA s Architecture for GPU Computing Broad Adoption Over 250M installed CUDA-enabled GPUs GPU Computing

More information

SentinelOne Technical Brief

SentinelOne Technical Brief SentinelOne Technical Brief SentinelOne unifies prevention, detection and response in a fundamentally new approach to endpoint protection, driven by behavior-based threat detection and intelligent automation.

More information

Graphics Performance Analyzer for Android

Graphics Performance Analyzer for Android Graphics Performance Analyzer for Android 1 What you will learn from this slide deck Detailed optimization workflow of Graphics Performance Analyzer Android* System Analysis Only Please see subsequent

More information

Debugging Your CUDA Applications With CUDA-GDB

Debugging Your CUDA Applications With CUDA-GDB Debugging Your CUDA Applications With CUDA-GDB Outline Introduction Installation & Usage Program Execution Control Thread Focus Program State Inspection Run-Time Error Detection Tips & Miscellaneous Notes

More information

Memory Footprint of Locality Information On Many-Core Platforms Brice Goglin Inria Bordeaux Sud-Ouest France 2018/05/25

Memory Footprint of Locality Information On Many-Core Platforms Brice Goglin Inria Bordeaux Sud-Ouest France 2018/05/25 ROME Workshop @ IPDPS Vancouver Memory Footprint of Locality Information On Many- Platforms Brice Goglin Inria Bordeaux Sud-Ouest France 2018/05/25 Locality Matters to HPC Applications Locality Matters

More information

Debugging for the hybrid-multicore age (A HPC Perspective) David Lecomber CTO, Allinea Software

Debugging for the hybrid-multicore age (A HPC Perspective) David Lecomber CTO, Allinea Software Debugging for the hybrid-multicore age (A HPC Perspective) David Lecomber CTO, Allinea Software david@allinea.com Agenda What is HPC? How is scale affecting HPC? Achieving tool scalability Scale in practice

More information

Profiling & Optimization

Profiling & Optimization Lecture 18 Sources of Game Performance Issues? 2 Avoid Premature Optimization Novice developers rely on ad hoc optimization Make private data public Force function inlining Decrease code modularity removes

More information

Optimizing Your Android Applications

Optimizing Your Android Applications Optimizing Your Android Applications Alexander Nelson November 27th, 2017 University of Arkansas - Department of Computer Science and Computer Engineering The Problem Reminder Immediacy and responsiveness

More information

Chapter 2. Operating-System Structures

Chapter 2. Operating-System Structures Chapter 2 Operating-System Structures 2.1 Chapter 2: Operating-System Structures Operating System Services User Operating System Interface System Calls Types of System Calls System Programs Operating System

More information

Parallel Programming Libraries and implementations

Parallel Programming Libraries and implementations Parallel Programming Libraries and implementations Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License.

More information

Implementation of Parallelization

Implementation of Parallelization Implementation of Parallelization OpenMP, PThreads and MPI Jascha Schewtschenko Institute of Cosmology and Gravitation, University of Portsmouth May 9, 2018 JAS (ICG, Portsmouth) Implementation of Parallelization

More information

<Insert Picture Here> OpenMP on Solaris

<Insert Picture Here> OpenMP on Solaris 1 OpenMP on Solaris Wenlong Zhang Senior Sales Consultant Agenda What s OpenMP Why OpenMP OpenMP on Solaris 3 What s OpenMP Why OpenMP OpenMP on Solaris

More information

Chapter 2: Operating-System Structures

Chapter 2: Operating-System Structures Chapter 2: Operating-System Structures Chapter 2: Operating-System Structures Operating System Services User Operating System Interface System Calls Types of System Calls System Programs Operating System

More information

The Eclipse Parallel Tools Platform Project

The Eclipse Parallel Tools Platform Project The Eclipse Parallel Tools Platform Project EclipseCon 2005 LAUR-05-0574 Parallel Development Tools State of the Art Command-line compilers for Fortran and C/C++ Sometimes wrapped in a GUI Editors are

More information

SentinelOne Technical Brief

SentinelOne Technical Brief SentinelOne Technical Brief SentinelOne unifies prevention, detection and response in a fundamentally new approach to endpoint protection, driven by machine learning and intelligent automation. By rethinking

More information

!OMP #pragma opm _OPENMP

!OMP #pragma opm _OPENMP Advanced OpenMP Lecture 12: Tips, tricks and gotchas Directives Mistyping the sentinel (e.g.!omp or #pragma opm ) typically raises no error message. Be careful! The macro _OPENMP is defined if code is

More information

GPU Debugging Made Easy. David Lecomber CTO, Allinea Software

GPU Debugging Made Easy. David Lecomber CTO, Allinea Software GPU Debugging Made Easy David Lecomber CTO, Allinea Software david@allinea.com Allinea Software HPC development tools company Leading in HPC software tools market Wide customer base Blue-chip engineering,

More information