Performance Analysis and Optimization of Scientific Applications on Extreme-Scale Computer Systems

Size: px
Start display at page:

Download "Performance Analysis and Optimization of Scientific Applications on Extreme-Scale Computer Systems"

Transcription

1 Mitglied der Helmholtz-Gemeinschaft Performance Analysis and Optimization of Scientific Applications on Extreme-Scale Computer Systems Bernd Mohr 1 st Intl. Workshop on Strategic Development of High Performance Computers Tsukuba, March 18-19, 2013

2 Parallel Architectures: State of the Art Router Router Router Router Router Router Router or Router Router Router Router Router Router Router Router Router Router Router Router Router Router Router Router Router Router Router Router Router Router Router Router Router Router Router Network or Switch SMP Memory Interconnect A 0... Interconnect A m Memory A 0... A m... P 0 Interconnect... P n A 0... A m P 0... P n... Memory N 0 P 0 P n N 1 N k NUMA L3 0 L2 0 L2 r/2 L1 0 L L1 Core 0 Core 1... Core r P i A j IWSDHPC, Tsukuba, 2013 JSC 2

3 Exascale Performance Challenges Exascale systems will consist of Complex configurations With a huge number of components Very likely heterogeneous With not enough memory Dynamically changing configuration due to fault recovery and power saving Deep software hierarchies of large, complex software components will be required to make use of such systems Sophisticated integrated performance measurement, analysis, and optimization capabilities will be required to efficiently operate an Exascale system IWSDHPC, Tsukuba, 2013 JSC 3

4 Cross-Cutting Considerations Performance-aware design, development and deployment of hardware and software necessary Integration with OS, compilers, middleware and runtime systems required Support for performance observability in HW and SW (runtime) needed Enable performance measurement and optimization in case of HW and SW changes due to faults or power adaptation IWSDHPC, Tsukuba, 2013 JSC 4

5 Technical Challenges Heterogeneity Extreme Concurrency Perturbation and data volume Drawing insight from measurements Quality information sources This requires tools to be Portable Insightful Scalable Integrated IWSDHPC, Tsukuba, 2013 JSC 5

6 Not many Tools match these Requirements TAU University of Oregon, US HPCToolkit Rice University, US Extrae/Paraver Barcelona Supercomputing Centre, Spain Vampir toolset Technical University of Dresden, Germany Scalasca Jülich Supercomputing Centre, Germany IWSDHPC, Tsukuba, 2013 JSC 6

7 Run everywhere PORTABILITY IWSDHPC, Tsukuba, 2013 JSC 7

8 Scalasca: Supported Platforms Instrumentation and measurement only (visual analysis on front-end or workstation) Cray XT3/XT4/XT5, XE6, XK6 IBM BlueGene/L, BlueGene/P, BlueGene/Q NEC SX8 and SX9 K Machine, Fujitsu FX10 Intel MIC Full support (instrumentation, measurement, and automatic analysis) Linux IA32, IA64, x86_64, and PPC based clusters IBM AIX Power3/4/5/6/7 based clusters SGI Linux IA64 and x86_64 based clusters SUN/Oracle Solaris Sparc and x86/x86_64 based clusters IWSDHPC, Tsukuba, 2013 JSC 8

9 Now also on the K Computer Thanks to Tomotake Nakamura + colleagues at RIKEN!! IWSDHPC, Tsukuba, 2013 JSC 9

10 Known Installations of Scalasca Companies ( France ) Bull Dassault Aviation ( France ) EDF (France) ( Germany ) GNS ( Germany ) MAGMA ( Germany ) RECOM ( Netherlands ) Shell ( USA ) Sun Microsystems ( UK ) Qontix Research / HPC Centres ( USA ) ANL ( Spain ) BSC ( France ) CEA ( France ) CERFACS ( Italy ) CINECA ( Finland ) CSC ( Switzerland ) CSCS Research / HPC Centres (cont.) ( Germany ) DLR ( Germany ) DKRZ ( UK ) EPCC ( Germany ) HLRN ( Germany ) HLRS ( Ireland ) ICHEC ( France ) IDRIS JSCC (Russia) ( USA ) LLNL ( Germany ) LRZ MSU (Russia) ( USA ) NCAR ( USA ) NCSA NSCC (China) ( USA ) ORNL ( USA ) PSC ( Germany ) RZG RIKEN (Japan) Research / HPC Centres (cont.) ( Netherlands ) SARA ( Bulgaria ) SAITC ( USA ) TACC Universities ( USA ) RPI ( Germany ) RWTH ( Germany ) TUD ( USA ) UOregon ( USA ) UTK DoD Computing Centers (USA) AFRL DSRC ARL DSRC ARSC DSRC ERDC DSRC Navy DSRC MHPCC DSRC SSC-Pacific IWSDHPC, Tsukuba, 2013 JSC 10

11 More than numbers and diagrams INSIGHTFULNESS IWSDHPC, Tsukuba, 2013 JSC 11

12 A picture is worth 1000 words MPI ring program Real world example IWSDHPC, Tsukuba, 2013 JSC 12

13 What about 1000 s of pictures? (with 100 s of menu options) IWSDHPC, Tsukuba, 2013 JSC 13

14 Example Automatic Analysis: Late Sender IWSDHPC, Tsukuba, 2013 JSC 14

15 process process process process Scalasca: Example MPI Patterns (a) Late Sender time (b) Late Receiver time (c) Late Sender / Wrong Order time (d) Wait at N x N time ENTER EXIT SEND RECV COLLEXIT IWSDHPC, Tsukuba, 2013 JSC 15

16 Scalasca Example: CESM Sea Ice Module Late Sender Analysis IWSDHPC, Tsukuba, 2013 JSC 16

17 Scalasca Example: CESM Sea Ice Module Late Sender Analysis + Application Topology IWSDHPC, Tsukuba, 2013 JSC 17

18 Scalasca Root Cause Analysis Root-cause analysis Wait states typically caused by load or communication imbalances earlier in the program Waiting time can also propagate (e.g., indirect waiting time) Goal: Enhance performance analysis to find the root cause of wait states cause Approach Distinguish between direct and indirect waiting time Identify call path/process combinations delaying other processes and causing first order waiting time Identify original delay A foo DELAY bar Send B foo bar Recv Send C foo bar Recv Indirect wait Direct wait time IWSDHPC, Tsukuba, 2013 JSC 18

19 Scalasca Example: CESM Sea Ice Module Direct Wait Time Analysis IWSDHPC, Tsukuba, 2013 JSC 19

20 Scalasca Example: CESM Sea Ice Module Indirect Wait Time Analysis IWSDHPC, Tsukuba, 2013 JSC 20

21 Scalasca Example: CESM Sea Ice Module Delay Costs Analysis IWSDHPC, Tsukuba, 2013 JSC 21

22 To infinity and beyond EXTREME CONCURRENCY IWSDHPC, Tsukuba, 2013 JSC 22

23 Scaling already important TODAY! Number of Cores share for TOP 500 November 2012 NCore Count NCore Share Rmax Share % 122 TF 0.1% 1, % 155 TF 0.1% 7, % 8,579 TF 5.3% 551, % 24,543 TF 15.1% 2,617,986 > % 128,574 TF 79.4% 11,707,806 Total % 161,973 TF 100% 14,885,800 Average system size: 29,772 cores Median system size: 15,360 cores IWSDHPC, Tsukuba, 2013 JSC 23

24 Personal Motivation (I) 07 / 2012 Jugene 72 rack IBM BlueGene/P 294,912 cores Most parallel system in the world 06/2009 to 06/2011!!! IWSDHPC, Tsukuba, 2013 JSC 24

25 Personal Motivation (II) Juqueen 28 rack IBM BlueGene/Q 458,752 cores 1,835,008 HW threads IWSDHPC, Tsukuba, 2013 JSC 25

26 Roads to Scalability Scalable data collection and reduction Automatic detection of most important execution phases (Paraver) Parallel collection and reduction based on MPI and parallel I/O (All tools) Scalable parallel data analysis Parallel client/server processing and visualization (Vampir) Parallel pattern search, delay and critical-path analysis (Scalasca) Parallel analyzer and visualizer (Paraver) Scalable visualizations 3D charts and topology displays (TAU, Scalasca) Hierarchical browsers (Scalasca) IWSDHPC, Tsukuba, 2013 JSC 26

27 TAU ParaProf: 3D Profile, Miranda, 16K PEs IWSDHPC, Tsukuba, 2013 JSC 27

28 TAU 3D Topology view / distribution histogram IWSDHPC, Tsukuba, 2013 JSC 28

29 VampirServer BETA: Trace Visualization OTF trace of 4.5 TB VampirServer running with 20,000 analysis processes IWSDHPC, Tsukuba, 2013 JSC 29

30 Paraver Data Reduction Features Accumulation of values using software counters Powerful filtering expression over time, processors, states, communications, events Automatic structure / phase detection Based on signal processing Using wavelets (Casas: ParCo 2007) Using autocorrelation functions (Casas: Euro-Par 2007) Also used for cleanup: Preemptions Clogged systems / instrumentation overhead Flushing IWSDHPC, Tsukuba, 2013 JSC 30

31 Scalasca trace analysis BGP 10 min sweep3d runtime 11 sec replay 4 min trace data write/read (576 files) 7.6 TB buffered trace data 510 billion events B. J. N. Wylie, M. Geimer, B. Mohr, D. Böhme, Z.Szebenyi, F. Wolf: Largescale performance analysis of Sweep3D with the Scalasca toolset. Parallel Processing Letters, 20(4): , IWSDHPC, Tsukuba, 2013 JSC 31

32 Scalasca trace analysis BGQ IWSDHPC, Tsukuba, 2013 JSC 32

33 Scalasca trace analysis BGQ IWSDHPC, Tsukuba, 2013 JSC 33

34 Together we are strong INTEGRATION IWSDHPC, Tsukuba, 2013 JSC 34

35 Integration Need integrated tool (environment) for all levels of parallelization Inter-node (MPI) Intra-node (OpenMP, task-based programming) Accelerators (CUDA, OpenCL) Integration with performance modeling and prediction No tool fits all requirements Interoperability of tools Integration via open interfaces IWSDHPC, Tsukuba, 2013 JSC 35

36 Scalasca TAU VAMPIR Paraver Extrae Status End 2011 PRV trace Paraver TAU VT Vampir Trace TAU TRACE TAU trace X X X X X OTF / VTF3 trace VAMPIR R R Scalasca EPILOG trace Trace Analyzer CUBE3 profile CUBE3 Presenter TAU EPILOG TAU PROFILE TAU profile X X gprof/mpip/ profile PerfDMF PARAPROF IWSDHPC, Tsukuba, 2013 JSC 36

37 Scalasca TAU VAMPIR Paraver Extrae Status Begin 2013 PRV trace Paraver X VAMPIR R R Score-P OTF2 trace Scalasca Trc Analyzer CUBE4 profile CUBE4 Presenter TAU SCOREP gprof/mpip/ profile PerfDMF PARAPROF IWSDHPC, Tsukuba, 2013 JSC 37

38 Tool Integration: Score-P Objectives Mainly funded by SILC, LMAC (BMBF) + PRIMA (DOE) projects Make common part of Periscope, Scalasca, TAU, and Vampir a community effort Score-P measurement system Functional requirements Performance data: profiles (CUBE4), traces (OTF2) Initially direct instrumentation, later also sampling Offline and online access Metrics: time, communication metrics and hardware counters Initially MPI 2 and OpenMP 3, later also CUDA and OpenCL Current release: V1.1.1 of Feb IWSDHPC, Tsukuba, 2013 JSC 38

39 Score-P Architecture Vampir Scalasca TAU Periscope Event traces (OTF2) Call-path profiles (CUBE4) Online interface Hardware counter (PAPI) Memory management Score-P measurement infrastructure etc Compiler TAU instrumentor COBI (binary) MAQAO instrumentor OPARI 2 (OpenMP) MPI wrappers Instrumentation Application (MPI, OpenMP, hybrid) IWSDHPC, Tsukuba, 2013 JSC 39

40 Score-P Partners Forschungszentrum Jülich, Germany German Research School for Simulation Sciences, Aachen, Germany Gesellschaft für numerische Simulation mbh Braunschweig, Germany RWTH Aachen, Germany Technische Universität Dresden, Germany Technische Universität München, Germany University of Oregon, Eugene, USA IWSDHPC, Tsukuba, 2013 JSC 40

41 Funded Integration Projects SILC (01/2009 to 12/2011) Unified measurement system (Score-P) for Vampir, Scalasca, Periscope PRIMA (08/2009 to 0?/2013) Integration of TAU and Scalasca LMAC (08/2011 to 07/2013) Evolution of Score-P Analysis of performance dynamics H4H (10/2010 to 09/2013) Hybrid programming for heterogeneous platforms HOPSA (02/2011 to 01/2013) Integration of system and application monitoring IWSDHPC, Tsukuba, 2013 JSC 41

42 Integration of Score-P based Tools Threadspotter Application measured with ThreadSpotter Memory profile Link Explore memory behavior Cube HOPSA Workflow Application linked to Score-P Profile CUBE-4 Scalasca waitstate analysis Visual exploration profile generation Worst-instance visualization What-if scenarios Dimemas OTF2 to PRV conversion Trace PRV Trace OTF-2 Visual exploration Paraver done to do LAPTA System metrics Vampir Application measured with Extrae IWSDHPC, Tsukuba, 2013 JSC 42

43 Scalasca Vampir/Paraver integration IWSDHPC, Tsukuba, 2013 JSC 43

44 Scalasca Vampir/Paraver integration IWSDHPC, Tsukuba, 2013 JSC 44

45 Scalasca Vampir/Paraver integration IWSDHPC, Tsukuba, 2013 JSC 45

46 Future Work OPEN ISSUES IWSDHPC, Tsukuba, 2013 JSC 46

47 Biggest Open Issues How to handle asynchronous non-deterministic executions? Currently favored programming model at node-level Breaks measure-analyze-optimize cycle Potential solution Use traditional tools only at inter-node level Use auto-tuning smart runtime systems inside node Further factors to non-determinism Failing components and recovery actions Components operating on varying speeds to save energy IWSDHPC, Tsukuba, 2013 JSC 47

48 Acknowledgements Scalasca team (JSC) (GRS) Markus Geimer Jie Jiang Michael Knobloch Daniel Lorenz Bernd Mohr Peter Philippen Christian Rössel David Böhme Marc-André Hermanns Alexandru Calotoiu Marc Schlütter Pavel Saviankou Alexandre Strube Brian Wylie Anke Visser Ilja Zhukov Monika Lücke Aamer Shah Felix Wolf Sponsors IWSDHPC, Tsukuba, 2013 JSC 48

Scalasca: A Scalable Portable Integrated Performance Measurement and Analysis Toolset. CEA Tools 2012 Bernd Mohr

Scalasca: A Scalable Portable Integrated Performance Measurement and Analysis Toolset. CEA Tools 2012 Bernd Mohr Scalasca: A Scalable Portable Integrated Performance Measurement and Analysis Toolset CEA Tools 2012 Bernd Mohr Exascale Performance Challenges Exascale systems will consist of Complex configurations With

More information

[Scalasca] Tool Integrations

[Scalasca] Tool Integrations Mitglied der Helmholtz-Gemeinschaft [Scalasca] Tool Integrations Aug 2011 Bernd Mohr CScADS Performance Tools Workshop Lake Tahoe Contents Current integration of various direct measurement tools Paraver

More information

Recent Developments in Score-P and Scalasca V2

Recent Developments in Score-P and Scalasca V2 Mitglied der Helmholtz-Gemeinschaft Recent Developments in Score-P and Scalasca V2 Aug 2015 Bernd Mohr 9 th Scalable Tools Workshop Lake Tahoe YOU KNOW YOU MADE IT IF LARGE COMPANIES STEAL YOUR STUFF August

More information

Performance-oriented development

Performance-oriented development Performance-oriented development Performance often regarded as a prost-process that is applied after an initial version has been created Instead, performance must be of concern right from the beginning

More information

Score-P A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir

Score-P A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir Score-P A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir Andreas Knüpfer, Christian Rössel andreas.knuepfer@tu-dresden.de, c.roessel@fz-juelich.de 2011-09-26

More information

Score-P. SC 14: Hands-on Practical Hybrid Parallel Application Performance Engineering 1

Score-P. SC 14: Hands-on Practical Hybrid Parallel Application Performance Engineering 1 Score-P SC 14: Hands-on Practical Hybrid Parallel Application Performance Engineering 1 Score-P Functionality Score-P is a joint instrumentation and measurement system for a number of PA tools. Provide

More information

Large-scale performance analysis of PFLOTRAN with Scalasca

Large-scale performance analysis of PFLOTRAN with Scalasca Mitglied der Helmholtz-Gemeinschaft Large-scale performance analysis of PFLOTRAN with Scalasca 2011-05-26 Brian J. N. Wylie & Markus Geimer Jülich Supercomputing Centre b.wylie@fz-juelich.de Overview Dagstuhl

More information

Large-scale performance analysis of PFLOTRAN with Scalasca

Large-scale performance analysis of PFLOTRAN with Scalasca Mitglied der Helmholtz-Gemeinschaft Large-scale performance analysis of PFLOTRAN with Scalasca 2011-05-26 Brian J. N. Wylie & Markus Geimer Jülich Supercomputing Centre b.wylie@fz-juelich.de Overview Dagstuhl

More information

Performance analysis of Sweep3D on Blue Gene/P with Scalasca

Performance analysis of Sweep3D on Blue Gene/P with Scalasca Mitglied der Helmholtz-Gemeinschaft Performance analysis of Sweep3D on Blue Gene/P with Scalasca 2010-04-23 Brian J. N. Wylie, David Böhme, Bernd Mohr, Zoltán Szebenyi & Felix Wolf Jülich Supercomputing

More information

AutoTune Workshop. Michael Gerndt Technische Universität München

AutoTune Workshop. Michael Gerndt Technische Universität München AutoTune Workshop Michael Gerndt Technische Universität München AutoTune Project Automatic Online Tuning of HPC Applications High PERFORMANCE Computing HPC application developers Compute centers: Energy

More information

Automatic trace analysis with the Scalasca Trace Tools

Automatic trace analysis with the Scalasca Trace Tools Automatic trace analysis with the Scalasca Trace Tools Ilya Zhukov Jülich Supercomputing Centre Property Automatic trace analysis Idea Automatic search for patterns of inefficient behaviour Classification

More information

Introduction to VI-HPS

Introduction to VI-HPS Introduction to VI-HPS José Gracia HLRS Virtual Institute High Productivity Supercomputing Goal: Improve the quality and accelerate the development process of complex simulation codes running on highly-parallel

More information

Scalability Improvements in the TAU Performance System for Extreme Scale

Scalability Improvements in the TAU Performance System for Extreme Scale Scalability Improvements in the TAU Performance System for Extreme Scale Sameer Shende Director, Performance Research Laboratory, University of Oregon TGCC, CEA / DAM Île de France Bruyères- le- Châtel,

More information

Scalasca support for Intel Xeon Phi. Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany

Scalasca support for Intel Xeon Phi. Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany Scalasca support for Intel Xeon Phi Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany Overview Scalasca performance analysis toolset support for MPI & OpenMP

More information

Profile analysis with CUBE. David Böhme, Markus Geimer German Research School for Simulation Sciences Jülich Supercomputing Centre

Profile analysis with CUBE. David Böhme, Markus Geimer German Research School for Simulation Sciences Jülich Supercomputing Centre Profile analysis with CUBE David Böhme, Markus Geimer German Research School for Simulation Sciences Jülich Supercomputing Centre CUBE Parallel program analysis report exploration tools Libraries for XML

More information

Energy Efficiency Tuning: READEX. Madhura Kumaraswamy Technische Universität München

Energy Efficiency Tuning: READEX. Madhura Kumaraswamy Technische Universität München Energy Efficiency Tuning: READEX Madhura Kumaraswamy Technische Universität München Project Overview READEX Starting date: 1. September 2015 Duration: 3 years Runtime Exploitation of Application Dynamism

More information

Introduction to Parallel Performance Engineering

Introduction to Parallel Performance Engineering Introduction to Parallel Performance Engineering Markus Geimer, Brian Wylie Jülich Supercomputing Centre (with content used with permission from tutorials by Bernd Mohr/JSC and Luiz DeRose/Cray) Performance:

More information

Scalable Critical Path Analysis for Hybrid MPI-CUDA Applications

Scalable Critical Path Analysis for Hybrid MPI-CUDA Applications Center for Information Services and High Performance Computing (ZIH) Scalable Critical Path Analysis for Hybrid MPI-CUDA Applications The Fourth International Workshop on Accelerators and Hybrid Exascale

More information

VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW

VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW 8th VI-HPS Tuning Workshop at RWTH Aachen September, 2011 Tobias Hilbrich and Joachim Protze Slides by: Andreas Knüpfer, Jens Doleschal, ZIH, Technische Universität

More information

Výpočetní zdroje IT4Innovations a PRACE pro využití ve vědě a výzkumu

Výpočetní zdroje IT4Innovations a PRACE pro využití ve vědě a výzkumu Výpočetní zdroje IT4Innovations a PRACE pro využití ve vědě a výzkumu Filip Staněk Seminář gridového počítání 2011, MetaCentrum, Brno, 7. 11. 2011 Introduction I Project objectives: to establish a centre

More information

SCORE-P USER MANUAL. 4.0 (revision 13505) Wed May :20:42

SCORE-P USER MANUAL. 4.0 (revision 13505) Wed May :20:42 SCORE-P USER MANUAL 4.0 (revision 13505) Wed May 2 2018 10:20:42 SCORE-P LICENSE AGREEMENT COPYRIGHT 2009-2014, RWTH Aachen University, Germany COPYRIGHT 2009-2013, Gesellschaft für numerische Simulation

More information

Performance analysis on Blue Gene/Q with

Performance analysis on Blue Gene/Q with Performance analysis on Blue Gene/Q with + other tools and debugging Michael Knobloch Jülich Supercomputing Centre scalasca@fz-juelich.de July 2012 Based on slides by Brian Wylie and Markus Geimer Performance

More information

Lawrence Livermore National Laboratory Tools for High Productivity Supercomputing: Extreme-scale Case Studies

Lawrence Livermore National Laboratory Tools for High Productivity Supercomputing: Extreme-scale Case Studies Lawrence Livermore National Laboratory Tools for High Productivity Supercomputing: Extreme-scale Case Studies EuroPar 2013 Martin Schulz Lawrence Livermore National Laboratory Brian Wylie Jülich Supercomputing

More information

SCALASCA parallel performance analyses of SPEC MPI2007 applications

SCALASCA parallel performance analyses of SPEC MPI2007 applications Mitglied der Helmholtz-Gemeinschaft SCALASCA parallel performance analyses of SPEC MPI2007 applications 2008-05-22 Zoltán Szebenyi Jülich Supercomputing Centre, Forschungszentrum Jülich Aachen Institute

More information

Introduction to Performance Engineering

Introduction to Performance Engineering Introduction to Performance Engineering Markus Geimer Jülich Supercomputing Centre (with content used with permission from tutorials by Bernd Mohr/JSC and Luiz DeRose/Cray) Performance: an old problem

More information

READEX Runtime Exploitation of Application Dynamism for Energyefficient

READEX Runtime Exploitation of Application Dynamism for Energyefficient READEX Runtime Exploitation of Application Dynamism for Energyefficient exascale computing EnA-HPC @ ISC 17 Robert Schöne TUD Project Motivation Applications exhibit dynamic behaviour Changing resource

More information

Parallel I/O on JUQUEEN

Parallel I/O on JUQUEEN Parallel I/O on JUQUEEN 4. Februar 2014, JUQUEEN Porting and Tuning Workshop Mitglied der Helmholtz-Gemeinschaft Wolfgang Frings w.frings@fz-juelich.de Jülich Supercomputing Centre Overview Parallel I/O

More information

Leveraging Parallelware in MAESTRO and EPEEC

Leveraging Parallelware in MAESTRO and EPEEC Leveraging Parallelware in MAESTRO and EPEEC and Enhancements to Parallelware Manuel Arenaz manuel.arenaz@appentra.com PRACE booth #2033 Thursday, 15 November 2018 Dallas, US http://www.prace-ri.eu/praceatsc18/

More information

UPC Performance Analysis Tool: Status and Plans

UPC Performance Analysis Tool: Status and Plans UPC Performance Analysis Tool: Status and Plans Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant Mr. Adam Leko, Sr. Research Assistant Mr. Bryan Golden, Research

More information

Scalable, Automated Parallel Performance Analysis with TAU, PerfDMF and PerfExplorer

Scalable, Automated Parallel Performance Analysis with TAU, PerfDMF and PerfExplorer Scalable, Automated Parallel Performance Analysis with TAU, PerfDMF and PerfExplorer Kevin A. Huck, Allen D. Malony, Sameer Shende, Alan Morris khuck, malony, sameer, amorris@cs.uoregon.edu http://www.cs.uoregon.edu/research/tau

More information

Improving Applica/on Performance Using the TAU Performance System

Improving Applica/on Performance Using the TAU Performance System Improving Applica/on Performance Using the TAU Performance System Sameer Shende, John C. Linford {sameer, jlinford}@paratools.com ParaTools, Inc and University of Oregon. April 4-5, 2013, CG1, NCAR, UCAR

More information

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich Mitglied der Helmholtz-Gemeinschaft Welcome to the Jülich Supercomputing Centre D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich Schedule: Monday, May 18 13:00-13:30 Welcome

More information

Usage of the SCALASCA toolset for scalable performance analysis of large-scale parallel applications

Usage of the SCALASCA toolset for scalable performance analysis of large-scale parallel applications Usage of the SCALASCA toolset for scalable performance analysis of large-scale parallel applications Felix Wolf 1,2, Erika Ábrahám 1, Daniel Becker 1,2, Wolfgang Frings 1, Karl Fürlinger 3, Markus Geimer

More information

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich Mitglied der Helmholtz-Gemeinschaft Welcome to the Jülich Supercomputing Centre D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich Schedule: Thursday, Nov 26 13:00-13:30

More information

Code Auto-Tuning with the Periscope Tuning Framework

Code Auto-Tuning with the Periscope Tuning Framework Code Auto-Tuning with the Periscope Tuning Framework Renato Miceli, SENAI CIMATEC renato.miceli@fieb.org.br Isaías A. Comprés, TUM compresu@in.tum.de Project Participants Michael Gerndt, TUM Coordinator

More information

D5.3 Basic Score-P OpenCL support Version 1.0. Document Information

D5.3 Basic Score-P OpenCL support Version 1.0. Document Information D5.3 Basic Score-P OpenCL support Document Information Contract Number 610402 Project Website www.montblanc-project.eu Contractual Deadline M12 Dissemination Level PU Nature O Authors Peter Philippen (JSC)

More information

Automatic Tuning of HPC Applications with Periscope. Michael Gerndt, Michael Firbach, Isaias Compres Technische Universität München

Automatic Tuning of HPC Applications with Periscope. Michael Gerndt, Michael Firbach, Isaias Compres Technische Universität München Automatic Tuning of HPC Applications with Periscope Michael Gerndt, Michael Firbach, Isaias Compres Technische Universität München Agenda 15:00 15:30 Introduction to the Periscope Tuning Framework (PTF)

More information

ELP. Effektive Laufzeitunterstützung für zukünftige Programmierstandards. Speaker: Tim Cramer, RWTH Aachen University

ELP. Effektive Laufzeitunterstützung für zukünftige Programmierstandards. Speaker: Tim Cramer, RWTH Aachen University ELP Effektive Laufzeitunterstützung für zukünftige Programmierstandards Agenda ELP Project Goals ELP Achievements Remaining Steps ELP Project Goals Goals of ELP: Improve programmer productivity By influencing

More information

Virtual Institute High Productivity Supercomputing Code Tuning Tutorial

Virtual Institute High Productivity Supercomputing Code Tuning Tutorial Virtual Institute High Productivity Supercomputing Code Tuning Tutorial 18 May 2012 Brian Wylie Jülich Supercomputing Centre b.wylie@fz-juelich.de Outline Friday 18 May 09:00 Start Welcome & introduction

More information

Characterizing Imbalance in Large-Scale Parallel Programs. David Bo hme September 26, 2013

Characterizing Imbalance in Large-Scale Parallel Programs. David Bo hme September 26, 2013 Characterizing Imbalance in Large-Scale Parallel Programs David o hme September 26, 2013 Need for Performance nalysis Tools mount of parallelism in Supercomputers keeps growing Efficient resource usage

More information

Cube v4 : From performance report explorer to performance analysis tool

Cube v4 : From performance report explorer to performance analysis tool Procedia Computer Science Volume 51, 2015, Pages 1343 1352 ICCS 2015 International Conference On Computational Science Cube v4 : From performance report explorer to performance analysis tool Pavel Saviankou

More information

SCORE-P. USER MANUAL 1.3 (revision 7349) Fri Aug :42:08

SCORE-P. USER MANUAL 1.3 (revision 7349) Fri Aug :42:08 SCORE-P USER MANUAL 1.3 (revision 7349) Fri Aug 29 2014 14:42:08 COPYRIGHT 2009-2012, RWTH Aachen University, Germany Gesellschaft fuer numerische Simulation mbh, Germany Technische Universitaet Dresden,

More information

Hardware Counter Performance Analysis of Parallel Programs

Hardware Counter Performance Analysis of Parallel Programs Holistic Hardware Counter Performance Analysis of Parallel Programs Brian J. N. Wylie & Bernd Mohr John von Neumann Institute for Computing Forschungszentrum Jülich B.Wylie@fz-juelich.de Outline Hardware

More information

Performance Analysis with Vampir

Performance Analysis with Vampir Performance Analysis with Vampir Johannes Ziegenbalg Technische Universität Dresden Outline Part I: Welcome to the Vampir Tool Suite Event Trace Visualization The Vampir Displays Vampir & VampirServer

More information

Using Automated Performance Modeling to Find Scalability Bugs in Complex Codes

Using Automated Performance Modeling to Find Scalability Bugs in Complex Codes Using Automated Performance Modeling to Find Scalability Bugs in Complex Codes A. Calotoiu 1, T. Hoefler 2, M. Poke 1, F. Wolf 1 1) German Research School for Simulation Sciences 2) ETH Zurich September

More information

HPC IN EUROPE. Organisation of public HPC resources

HPC IN EUROPE. Organisation of public HPC resources HPC IN EUROPE Organisation of public HPC resources Context Focus on publicly-funded HPC resources provided primarily to enable scientific research and development at European universities and other publicly-funded

More information

Operational Robustness of Accelerator Aware MPI

Operational Robustness of Accelerator Aware MPI Operational Robustness of Accelerator Aware MPI Sadaf Alam Swiss National Supercomputing Centre (CSSC) Switzerland 2nd Annual MVAPICH User Group (MUG) Meeting, 2014 Computing Systems @ CSCS http://www.cscs.ch/computers

More information

Debugging CUDA Applications with Allinea DDT. Ian Lumb Sr. Systems Engineer, Allinea Software Inc.

Debugging CUDA Applications with Allinea DDT. Ian Lumb Sr. Systems Engineer, Allinea Software Inc. Debugging CUDA Applications with Allinea DDT Ian Lumb Sr. Systems Engineer, Allinea Software Inc. ilumb@allinea.com GTC 2013, San Jose, March 20, 2013 Embracing GPUs GPUs a rival to traditional processors

More information

Change Log Version Description of Change

Change Log Version Description of Change Document Information Contract Number 610402 Project Website Contractual Deadline Dissemination Level Nature Author Contributors Reviewer Keywords www.montblanc-project.eu PM24 PU O Marc Schlütter (JUELICH)

More information

Score-P A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir

Score-P A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir Score-P A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir VI-HPS Team Score-P: Specialized Measurements and Analyses Mastering build systems Hooking up the

More information

Performance Analysis of Large-Scale OpenMP and Hybrid MPI/OpenMP Applications with Vampir NG

Performance Analysis of Large-Scale OpenMP and Hybrid MPI/OpenMP Applications with Vampir NG Performance Analysis of Large-Scale OpenMP and Hybrid MPI/OpenMP Applications with Vampir NG Holger Brunst Center for High Performance Computing Dresden University, Germany June 1st, 2005 Overview Overview

More information

Parallel Performance Tools

Parallel Performance Tools Parallel Performance Tools Parallel Computing CIS 410/510 Department of Computer and Information Science Performance and Debugging Tools Performance Measurement and Analysis: Open SpeedShop HPCToolkit

More information

A configurable binary instrumenter

A configurable binary instrumenter Mitglied der Helmholtz-Gemeinschaft A configurable binary instrumenter making use of heuristics to select relevant instrumentation points 12. April 2010 Jan Mussler j.mussler@fz-juelich.de Presentation

More information

Performance Analysis with Vampir. Joseph Schuchart ZIH, Technische Universität Dresden

Performance Analysis with Vampir. Joseph Schuchart ZIH, Technische Universität Dresden Performance Analysis with Vampir Joseph Schuchart ZIH, Technische Universität Dresden 1 Mission Visualization of dynamics of complex parallel processes Full details for arbitrary temporal and spatial levels

More information

The SCALASCA performance toolset architecture

The SCALASCA performance toolset architecture The SCALASCA performance toolset architecture Markus Geimer 1, Felix Wolf 1,2, Brian J.N. Wylie 1, Erika Ábrahám 1, Daniel Becker 1,2, Bernd Mohr 1 1 Forschungszentrum Jülich 2 RWTH Aachen University Jülich

More information

Virtual Institute High Productivity Supercomputing

Virtual Institute High Productivity Supercomputing Virtual Institute High Productivity Supercomputing 23 April 2012 Brian Wylie Jülich Supercomputing Centre b.wylie@fz-juelich.de 9th VI-HPS Tuning Workshop (UVSQ) Presenters Jean-Baptiste Besnard (CEA)

More information

Score-P A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir

Score-P A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir Score-P A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir VI-HPS Team Performance engineering workflow Prepare application with symbols Insert extra code

More information

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Waiting for Moore s Law to save your serial code start getting bleak in 2004 Source: published SPECInt data Moore s Law is not at all

More information

Performance Analysis with Periscope

Performance Analysis with Periscope Performance Analysis with Periscope M. Gerndt, V. Petkov, Y. Oleynik, S. Benedict Technische Universität München periscope@lrr.in.tum.de October 2010 Outline Motivation Periscope overview Periscope performance

More information

Program Development for Extreme-Scale Computing

Program Development for Extreme-Scale Computing 10181 Executive Summary Program Development for Extreme-Scale Computing Dagstuhl Seminar Jesus Labarta 1, Barton P. Miller 2, Bernd Mohr 3 and Martin Schulz 4 1 BSC, ES jesus.labarta@bsc.es 2 University

More information

Pedraforca: a First ARM + GPU Cluster for HPC

Pedraforca: a First ARM + GPU Cluster for HPC www.bsc.es Pedraforca: a First ARM + GPU Cluster for HPC Nikola Puzovic, Alex Ramirez We ve hit the power wall ALL computers are limited by power consumption Energy-efficient approaches Multi-core Fujitsu

More information

meinschaft May 2012 Markus Geimer

meinschaft May 2012 Markus Geimer meinschaft Mitglied der Helmholtz-Gem Module setup and compiler May 2012 Markus Geimer The module Command Software which allows to easily manage different versions of a product (e.g., totalview 5.0 totalview

More information

SCALASCA v1.0 Quick Reference

SCALASCA v1.0 Quick Reference General SCALASCA is an open-source toolset for scalable performance analysis of large-scale parallel applications. Use the scalasca command with appropriate action flags to instrument application object

More information

An Introduction to OpenACC

An Introduction to OpenACC An Introduction to OpenACC Alistair Hart Cray Exascale Research Initiative Europe 3 Timetable Day 1: Wednesday 29th August 2012 13:00 Welcome and overview 13:15 Session 1: An Introduction to OpenACC 13:15

More information

Integrating Parallel Application Development with Performance Analysis in Periscope

Integrating Parallel Application Development with Performance Analysis in Periscope Technische Universität München Integrating Parallel Application Development with Performance Analysis in Periscope V. Petkov, M. Gerndt Technische Universität München 19 April 2010 Atlanta, GA, USA Motivation

More information

Automatic Adaption of the Sampling Frequency for Detailed Performance Analysis

Automatic Adaption of the Sampling Frequency for Detailed Performance Analysis Automatic Adaption of the for Detailed Performance Analysis Michael Wagner and Andreas Knüpfer Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain Center for Information Services and High Performance

More information

Cray XE6 Performance Workshop

Cray XE6 Performance Workshop Cray XE6 Performance Workshop Multicore Programming Overview Shared memory systems Basic Concepts in OpenMP Brief history of OpenMP Compiling and running OpenMP programs 2 1 Shared memory systems OpenMP

More information

The PAPI Cross-Platform Interface to Hardware Performance Counters

The PAPI Cross-Platform Interface to Hardware Performance Counters The PAPI Cross-Platform Interface to Hardware Performance Counters Kevin London, Shirley Moore, Philip Mucci, and Keith Seymour University of Tennessee-Knoxville {london, shirley, mucci, seymour}@cs.utk.edu

More information

Improving the Scalability of Performance Evaluation Tools

Improving the Scalability of Performance Evaluation Tools Improving the Scalability of Performance Evaluation Tools Sameer Suresh Shende, Allen D. Malony, and Alan Morris Performance Research Laboratory Department of Computer and Information Science University

More information

HPC Tools on Windows. Christian Terboven Center for Computing and Communication RWTH Aachen University.

HPC Tools on Windows. Christian Terboven Center for Computing and Communication RWTH Aachen University. - Excerpt - Christian Terboven terboven@rz.rwth-aachen.de Center for Computing and Communication RWTH Aachen University PPCES March 25th, RWTH Aachen University Agenda o Intel Trace Analyzer and Collector

More information

VIRTUAL INSTITUTE HIGH PRODUCTIVITY SUPERCOMPUTING. Tools Guide October 2017

VIRTUAL INSTITUTE HIGH PRODUCTIVITY SUPERCOMPUTING. Tools Guide October 2017 VIRTUAL INSTITUTE HIGH PRODUCTIVITY SUPERCOMPUTING Tools Guide October 2017 Introduction The mission of the Virtual Institute - High Productivity Supercomputing (VI-HPS 1 ) is to improve the quality and

More information

I/O at JSC. I/O Infrastructure Workloads, Use Case I/O System Usage and Performance SIONlib: Task-Local I/O. Wolfgang Frings

I/O at JSC. I/O Infrastructure Workloads, Use Case I/O System Usage and Performance SIONlib: Task-Local I/O. Wolfgang Frings Mitglied der Helmholtz-Gemeinschaft I/O at JSC I/O Infrastructure Workloads, Use Case I/O System Usage and Performance SIONlib: Task-Local I/O Wolfgang Frings W.Frings@fz-juelich.de Jülich Supercomputing

More information

Analyzing the Performance of IWAVE on a Cluster using HPCToolkit

Analyzing the Performance of IWAVE on a Cluster using HPCToolkit Analyzing the Performance of IWAVE on a Cluster using HPCToolkit John Mellor-Crummey and Laksono Adhianto Department of Computer Science Rice University {johnmc,laksono}@rice.edu TRIP Meeting March 30,

More information

NEXTGenIO Performance Tools for In-Memory I/O

NEXTGenIO Performance Tools for In-Memory I/O NEXTGenIO Performance Tools for In- I/O holger.brunst@tu-dresden.de ZIH, Technische Universität Dresden 22 nd -23 rd March 2017 Credits Intro slides by Adrian Jackson (EPCC) A new hierarchy New non-volatile

More information

Performance Analysis of MPI Programs with Vampir and Vampirtrace Bernd Mohr

Performance Analysis of MPI Programs with Vampir and Vampirtrace Bernd Mohr Performance Analysis of MPI Programs with Vampir and Vampirtrace Bernd Mohr Research Centre Juelich (FZJ) John von Neumann Institute of Computing (NIC) Central Institute for Applied Mathematics (ZAM) 52425

More information

READEX: A Tool Suite for Dynamic Energy Tuning. Michael Gerndt Technische Universität München

READEX: A Tool Suite for Dynamic Energy Tuning. Michael Gerndt Technische Universität München READEX: A Tool Suite for Dynamic Energy Tuning Michael Gerndt Technische Universität München Campus Garching 2 SuperMUC: 3 Petaflops, 3 MW 3 READEX Runtime Exploitation of Application Dynamism for Energy-efficient

More information

A Systematic Multi-step Methodology for Performance Analysis of Communication Traces of Distributed Applications based on Hierarchical Clustering

A Systematic Multi-step Methodology for Performance Analysis of Communication Traces of Distributed Applications based on Hierarchical Clustering A Systematic Multi-step Methodology for Performance Analysis of Communication Traces of Distributed Applications based on Hierarchical Clustering Gaby Aguilera, Patricia J. Teller, Michela Taufer, and

More information

Tracing the Cache Behavior of Data Structures in Fortran Applications

Tracing the Cache Behavior of Data Structures in Fortran Applications John von Neumann Institute for Computing Tracing the Cache Behavior of Data Structures in Fortran Applications L. Barabas, R. Müller-Pfefferkorn, W.E. Nagel, R. Neumann published in Parallel Computing:

More information

TAU Performance System Hands on session

TAU Performance System Hands on session TAU Performance System Hands on session Sameer Shende sameer@cs.uoregon.edu University of Oregon http://tau.uoregon.edu Copy the workshop tarball! Setup preferred program environment compilers! Default

More information

Performance Analysis for Large Scale Simulation Codes with Periscope

Performance Analysis for Large Scale Simulation Codes with Periscope Performance Analysis for Large Scale Simulation Codes with Periscope M. Gerndt, Y. Oleynik, C. Pospiech, D. Gudu Technische Universität München IBM Deutschland GmbH May 2011 Outline Motivation Periscope

More information

Parallel Programming with MPI

Parallel Programming with MPI Parallel Programming with MPI Science and Technology Support Ohio Supercomputer Center 1224 Kinnear Road. Columbus, OH 43212 (614) 292-1800 oschelp@osc.edu http://www.osc.edu/supercomputing/ Functions

More information

IBM High Performance Computing Toolkit

IBM High Performance Computing Toolkit IBM High Performance Computing Toolkit Pidad D'Souza (pidsouza@in.ibm.com) IBM, India Software Labs Top 500 : Application areas (November 2011) Systems Performance Source : http://www.top500.org/charts/list/34/apparea

More information

Preparing GPU-Accelerated Applications for the Summit Supercomputer

Preparing GPU-Accelerated Applications for the Summit Supercomputer Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership

More information

TAU Parallel Performance System. DOD UGC 2004 Tutorial. Part 1: TAU Overview and Architecture

TAU Parallel Performance System. DOD UGC 2004 Tutorial. Part 1: TAU Overview and Architecture TAU Parallel Performance System DOD UGC 2004 Tutorial Part 1: TAU Overview and Architecture Tutorial Outline Part 1 TAU Overview and Architecture Introduction Performance technology Complexity challenges

More information

Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems. Ed Hinkel Senior Sales Engineer

Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems. Ed Hinkel Senior Sales Engineer Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems Ed Hinkel Senior Sales Engineer Agenda Overview - Rogue Wave & TotalView GPU Debugging with TotalView Nvdia CUDA Intel Phi 2

More information

Update of Post-K Development Yutaka Ishikawa RIKEN AICS

Update of Post-K Development Yutaka Ishikawa RIKEN AICS Update of Post-K Development Yutaka Ishikawa RIKEN AICS 11:20AM 11:40AM, 2 nd of November, 2017 FLAGSHIP2020 Project Missions Building the Japanese national flagship supercomputer, post K, and Developing

More information

Performance analysis basics

Performance analysis basics Performance analysis basics Christian Iwainsky Iwainsky@rz.rwth-aachen.de 25.3.2010 1 Overview 1. Motivation 2. Performance analysis basics 3. Measurement Techniques 2 Why bother with performance analysis

More information

EIOW Exa-scale I/O workgroup (exascale10)

EIOW Exa-scale I/O workgroup (exascale10) EIOW Exa-scale I/O workgroup (exascale10) Meghan McClelland Peter Braam Lug 2013 Large scale data management is fundamentally broken but functions somewhat successfully as an awkward patchwork Current

More information

Evolving HPCToolkit John Mellor-Crummey Department of Computer Science Rice University Scalable Tools Workshop 7 August 2017

Evolving HPCToolkit John Mellor-Crummey Department of Computer Science Rice University   Scalable Tools Workshop 7 August 2017 Evolving HPCToolkit John Mellor-Crummey Department of Computer Science Rice University http://hpctoolkit.org Scalable Tools Workshop 7 August 2017 HPCToolkit 1 HPCToolkit Workflow source code compile &

More information

Score-P A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir

Score-P A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir Score-P A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir 14th VI-HPS Tuning Workshop, 25-27 March 2014, RIKEN AICS, Kobe, Japan 1 Fragmentation of Tools

More information

DEISA. An European GRID-empowered infrastructure for Science and Industry" Pisa 11 maggio 2005 Angelo De Florio

DEISA. An European GRID-empowered infrastructure for Science and Industry Pisa 11 maggio 2005 Angelo De Florio DEISA An European GRID-empowered infrastructure for Science and Industry" Pisa 11 maggio 2005 Angelo De Florio Its goal jointly building and operating a distributed terascale supercomputing facility deep

More information

Distribution of Periscope Analysis Agents on ALTIX 4700

Distribution of Periscope Analysis Agents on ALTIX 4700 John von Neumann Institute for Computing Distribution of Periscope Analysis Agents on ALTIX 4700 Michael Gerndt, Sebastian Strohhäcker published in Parallel Computing: Architectures, Algorithms and Applications,

More information

Performance Analysis of Large-scale OpenMP and Hybrid MPI/OpenMP Applications with Vampir NG

Performance Analysis of Large-scale OpenMP and Hybrid MPI/OpenMP Applications with Vampir NG Performance Analysis of Large-scale OpenMP and Hybrid MPI/OpenMP Applications with Vampir NG Holger Brunst 1 and Bernd Mohr 2 1 Center for High Performance Computing Dresden University of Technology Dresden,

More information

Interactive Performance Analysis with Vampir UCAR Software Engineering Assembly in Boulder/CO,

Interactive Performance Analysis with Vampir UCAR Software Engineering Assembly in Boulder/CO, Interactive Performance Analysis with Vampir UCAR Software Engineering Assembly in Boulder/CO, 2013-04-03 Andreas Knüpfer, Thomas William TU Dresden, Germany Overview Introduction Vampir displays GPGPU

More information

LARGE-SCALE PERFORMANCE ANALYSIS OF SWEEP3D WITH THE SCALASCA TOOLSET

LARGE-SCALE PERFORMANCE ANALYSIS OF SWEEP3D WITH THE SCALASCA TOOLSET Parallel Processing Letters Vol. 20, No. 4 (2010) 397 414 c World Scientific Publishing Company LARGE-SCALE PERFORMANCE ANALYSIS OF SWEEP3D WITH THE SCALASCA TOOLSET BRIAN J. N. WYLIE, MARKUS GEIMER, BERND

More information

Performance Analysis with Vampir

Performance Analysis with Vampir Performance Analysis with Vampir Ronny Brendel Technische Universität Dresden Outline Part I: Welcome to the Vampir Tool Suite Mission Event Trace Visualization Vampir & VampirServer The Vampir Displays

More information

Performance properties The metrics tour

Performance properties The metrics tour Performance properties The metrics tour Markus Geimer & Brian Wylie Jülich Supercomputing Centre scalasca@fz-juelich.de January 2012 Scalasca analysis result Confused? Generic metrics Generic metrics Time

More information

The Titan Tools Experience

The Titan Tools Experience The Titan Tools Experience Michael J. Brim, Ph.D. Computer Science Research, CSMD/NCCS Petascale Tools Workshop 213 Madison, WI July 15, 213 Overview of Titan Cray XK7 18,688+ compute nodes 16-core AMD

More information

Achieving Efficient Strong Scaling with PETSc Using Hybrid MPI/OpenMP Optimisation

Achieving Efficient Strong Scaling with PETSc Using Hybrid MPI/OpenMP Optimisation Achieving Efficient Strong Scaling with PETSc Using Hybrid MPI/OpenMP Optimisation Michael Lange 1 Gerard Gorman 1 Michele Weiland 2 Lawrence Mitchell 2 Xiaohu Guo 3 James Southern 4 1 AMCG, Imperial College

More information

( ZIH ) Center for Information Services and High Performance Computing. Event Tracing and Visualization for Cell Broadband Engine Systems

( ZIH ) Center for Information Services and High Performance Computing. Event Tracing and Visualization for Cell Broadband Engine Systems ( ZIH ) Center for Information Services and High Performance Computing Event Tracing and Visualization for Cell Broadband Engine Systems ( daniel.hackenberg@zih.tu-dresden.de ) Daniel Hackenberg Cell Broadband

More information