Performance-oriented development

Similar documents
Scalasca: A Scalable Portable Integrated Performance Measurement and Analysis Toolset. CEA Tools 2012 Bernd Mohr

Performance Analysis and Optimization of Scientific Applications on Extreme-Scale Computer Systems

Recent Developments in Score-P and Scalasca V2

Introduction to VI-HPS

Score-P A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir

LARGE-SCALE PERFORMANCE ANALYSIS OF SWEEP3D WITH THE SCALASCA TOOLSET

Performance analysis of Sweep3D on Blue Gene/P with Scalasca

Using Automated Performance Modeling to Find Scalability Bugs in Complex Codes

[Scalasca] Tool Integrations

Leveraging Parallelware in MAESTRO and EPEEC

Program Development for Extreme-Scale Computing

Das TOP500-Projekt der Universitäten Mannheim und Tennessee zur Evaluierung des Supercomputer Marktes

A Systematic Multi-step Methodology for Performance Analysis of Communication Traces of Distributed Applications based on Hierarchical Clustering

Usage of the SCALASCA toolset for scalable performance analysis of large-scale parallel applications

Profile analysis with CUBE. David Böhme, Markus Geimer German Research School for Simulation Sciences Jülich Supercomputing Centre

Scalasca support for Intel Xeon Phi. Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany

AutoTune Workshop. Michael Gerndt Technische Universität München

Cube v4 : From performance report explorer to performance analysis tool

Technologies for Information and Health

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich

EUREKA European Network in international R&D Cooperation

Výpočetní zdroje IT4Innovations a PRACE pro využití ve vědě a výzkumu

The TOP500 Project of the Universities Mannheim and Tennessee

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich

Future Generation Computer Systems. Recording the control flow of parallel applications to determine iterative and phase-based behavior

Large-scale performance analysis of PFLOTRAN with Scalasca

CS 5803 Introduction to High Performance Computer Architecture: Performance Metrics

The SCALASCA performance toolset architecture

Automatic trace analysis with the Scalasca Trace Tools

TOP500 Listen und industrielle/kommerzielle Anwendungen

SC17 - Overview

SatNEx: Satellite Communications Network of Excellence. SatNEx. Satellite Communications Network of Excellence

Joachim Biercamp Deutsches Klimarechenzentrum (DKRZ) With input from Peter Bauer, Reinhard Budich, Sylvie Joussaume, Bryan Lawrence.

Introduction to Parallel Performance Engineering

DEISA. An European GRID-empowered infrastructure for Science and Industry" Pisa 11 maggio 2005 Angelo De Florio

Score-P. SC 14: Hands-on Practical Hybrid Parallel Application Performance Engineering 1

Items exceeding one or more of the maximum weight and dimensions of a flat. For maximum dimensions please see the service user guide.

Performance engineering of GemsFDTD computational electromagnetics solver

Supercomputing im Jahr eine Analyse mit Hilfe der TOP500 Listen

A configurable binary instrumenter

Uploading protocols and Assay Control Sets to the QIAsymphony SP via the USB stick

International Packets

VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW

Automatic Tuning of HPC Applications with Periscope. Michael Gerndt, Michael Firbach, Isaias Compres Technische Universität München

EE Pay Monthly Add-Ons & Commitment Packs. Version

EventBuilder.com. International Audio Conferencing Access Guide. This guide contains: :: International Toll-Free Access Dialing Instructions

Cisco Extensible Provisioning and Operations Manager 4.5

Computing for LHC in Germany

Code Auto-Tuning with the Periscope Tuning Framework

Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester

SCALASCA parallel performance analyses of SPEC MPI2007 applications

KNÜRR TECHNICAL FURNITURE YOUR WORKPLACE SPECIALISTS

Digital EAGLEs. Outlook and perspectives

Service withdrawal: Selected IBM ServicePac offerings

[ PARADIGM SCIENTIFIC SEARCH ] A POWERFUL SOLUTION for Enterprise-Wide Scientific Information Access

Innovative Fastening Technologies

NEW JERSEY S HIGHER EDUCATION NETWORK (NJEDGE.NET), AN IP-VPN CASE STUDY

HPC SERVICE PROVISION FOR THE UK

iclass SE multiclass SE 125kHz, 13.56MHz 125kHz, 13.56MHz

DATA APPENDIX. Real Exchange Rate Movements and the Relative Price of Nontraded Goods Caroline M. Betts and Timothy J. Kehoe

Distribution of Periscope Analysis Agents on ALTIX 4700

ELP. Effektive Laufzeitunterstützung für zukünftige Programmierstandards. Speaker: Tim Cramer, RWTH Aachen University

Analysis report examination with Cube

Multi-Physics Multi-Code Coupling On Supercomputers

Purchasing. Operations 3% Marketing 3% HR. Production 1%

HPC IN EUROPE. Organisation of public HPC resources

Energy Efficiency Tuning: READEX. Madhura Kumaraswamy Technische Universität München

Instructions. (For 6180 Industrial Computers) Applications. Overview & Safety

International Business Mail Rate Card

Identifying the root causes of wait states in large-scale parallel applications

Allianz SE Reinsurance Branch Asia Pacific Systems Requirements & Developments. Dr. Lutz Füllgraf

BSRIA Air Conditioning Worldwide Market Intelligence

Characterizing Imbalance in Large-Scale Parallel Programs. David Bo hme September 26, 2013

Home Resources for Learning Scale, Fourth Grade

Scalable Critical Path Analysis for Hybrid MPI-CUDA Applications

END-OF-SALE AND END-OF-LIFE ANNOUNCEMENT FOR THE CISCO MEDIA CONVERGENCE SERVER 7845H-2400

PAY MONTHLY ADDITIONAL SERVICES TERMS AND CONDITIONS

ENHANCED INTERIOR GATEWAY ROUTING PROTOCOL STUB ROUTER FUNCTIONALITY

Customers want to transform their datacenter 80% 28% global IT budgets spent on maintenance. time spent on administrative tasks

Large-scale performance analysis of PFLOTRAN with Scalasca

CISCO IP PHONE 7970G NEW! CISCO IP PHONE 7905G AND 7912G XML

RT-AX95U Wireless-AX11000 Tri Band Gigabit Router

Earth System Sciences in the Times of Brilliant Technologies

For: Ministry of Education From Date: 19 November 18-2 December 18 Venue: M1 Shops

Performance Analysis of Large-scale OpenMP and Hybrid MPI/OpenMP Applications with Vampir NG

Visualization and Data Analysis using VisIt - In Situ Visualization -

Performance analysis on Blue Gene/Q with

Spoka Meet Audio Calls Rates Dial-In UK

International Business Parcels Rate card

Runtime Correctness Checking for Emerging Programming Paradigms

Student Bullying Scale, Fourth Grade

Fakultät Informatik, Institut für Technische Informatik, Professur Rechnerarchitektur. BenchIT. Project Overview

AN POST SCHEDULE OF CHARGES

The IECEE CB Scheme facilitates Global trade of Information Technology products.

HPCG UPDATE: ISC 15 Jack Dongarra Michael Heroux Piotr Luszczek

Cisco Voice Services Provisioning Tool 2.6(1)

Teacher Job Satisfaction Scale, Fourth Grade

Empowering a Digital Europe. Vincent Pang President of Western European Region, Huawei

High Performance Computing in Europe and USA: A Comparison

PIRLS 2016 INTERNATIONAL RESULTS IN READING

Transcription:

Performance-oriented development Performance often regarded as a prost-process that is applied after an initial version has been created Instead, performance must be of concern right from the beginning Elements of performance-oriented development Application divided into compute-intensive kernels Miniapplications that resemble their behavior (analog: lab mice) Models that describe their behavior analytically Performance studies that describe their behavior experimentally Exascale: software/hardware co-design Performance tools will play key role Need to embrace idea of systematic performance-oriented development 12

Scalable performance-analysis toolset for parallel codes Focus on communication & synchronization Integrated performance analysis process Performance overview via call-path profiling In-depth study of application behavior via event tracing Programming models MPI, OpenMP Future: support for PGAS and accelerators www.scalasca.org 13

Scalasca team David Böhme, Alexandru Calotoiu, Dominic Eschweiler, Wolfgang Frings, Markus Geimer, Max Görtz, Youssef Hatem, Marc-André Hermanns, Monika Lücke, Michael Knobloch, Daniel Lorenz, Bernd Mohr, Peter Philippen, Christian Rössel, Pavel Saviankou, Christopher Schleiden, Marc Schlütter, Aamer Shah, Christian Siebert, Alexandre Strube, Zoltán Szebenyi, Felix Wolf, Ilya Zhukov 14

Installations and users Companies Bull (France) Dassault Aviation (France) Efield Solutions (Sweden) GNS (Germany) INTES (Germany) MAGMA (Germany) RECOM (Germany) SciLab (France) Shell (Netherlands) SiCortex (USA) Sun Microsystems (USA, Singapore, India) Qontix (UK) Research/supercomputing centers Argonne National Laboratory (USA) Barcelona Supercomputing Center (Spain) Bulgarian Supercomputing Centre (Bulgaria) CERFACS (France) CINECA (Italy) Centre Informatique National de l Enseignement Supérieur (France) Commissariat à l'énergie atomique (France) CaSToRC (Cyprus) CASPUR (Italy) Deutsches Klimarechenzentrum (DKRZ) Deutsches Zentrum für Luft- und Raumfahrt (Germany) Edinburgh Parallel Computing Centre (UK) Federal Office of Meteorology and Climatology (Switzerland) Forschungszentrum Jülich (Germany) IT Center for Science (Finland) High Performance Computing Center Stuttgart (Germany) Irish Centre for High-End Computing (Ireland) IDRIS (France) Research/supercomputing centers (cont.) Karlsruher Institut für Technologie (Germany) Lawrence Livermore National Laboratory (LLNL) Leibniz-Rechenzentrum (Germany) National Authority For Remote Sensing & Space Science (Egypt) National Center for Atmospheric Research (USA) National Center for Supercomputing Applications (USA) HLRN (Germany) Oak Ridge National Laboratory (USA) PDC Center for High Performance Computing (Sweden) Pittsburgh Supercomputing Center (USA) Potsdam-Institut für Klimafolgenforschung (Germany) Rechenzentrum Garching (Germany) SARA Computing and Networking Services (Netherlands) Shanghai Supercomputing Center (China) Swiss National Supercomputing Center (Switzerland) Texas Advanced Computing Center (USA) Very Large Computing Centre (France) Universities King Abdullah University of Science and Technology (Saudi Arabia) Lund University (Sweden) Lomonosov Moscow State University (Russia) Rensselaer Polytechnic Institute (USA) Rheinisch-Westfälische Technische Hochschule Aachen (Germany) Technische Universität Dresden (Germany) Universität Basel (Switzerland) University of Oregon (USA) University of Tennessee (USA) University of Tsukuba (Japan) + 9 defense computing centers 15

Scalasca architecture Measurement library Instr. target application HWC Optimized measurement configuration Local event traces Parallel waitstate search Summary report Wait-state report Report manipulation Instrumented executable Which problem? Where in the program? Which process? Instrumenter compiler / linker Source modules 16

Performance optimizations XNS fluid-dynamics code (RWTH Aachen) Redundant messages detected 4-5x faster MAGMAfill fluid-dynamics code (MAGMASOFT GmbH) Communication bottleneck identified 25% faster INDEED FEM code (GNS mbh) Serialization bottleneck identified 30-40% faster Illumination particle simulation (Queen s University) Communication bottleneck uncovered 2x faster 17

Scalability Application study of ASCI Sweep3D benchmark Identified MPI waiting time correlating with computational imbalance Time [s] 1000 100 10 Jaguar, MK = 10 (default) Measured execution - Computation - MPI processing - MPI waiting Measurements & analyses demonstrated on Jaguar with up to 192k cores Jugene with up to 288k cores 1 1,024 2,048 4,096 8,192 16,384 32,768 65,636 131,072 262,144 Processes Computation

Performance-oriented development Application decomposition Kernel identification (static & dynamic analysis) Kernel extraction (miniapps) Kernel optimization Modeling support Model parameter identification Code validation against model Management of performance design documents Model representation Cross-experiment analysis 19

Article on Scalasca (to be published in October) 20 Parallel Programming, August 12, 2011

Virtual Institute High Productivity Supercomputing The virtual institute in a Partnership to develop advanced programming tools for complex simulation codes Goals Improve code quality Speed up development Activities Tool development and integration Training & support PROPER workshop series (in conjunction with Euro-Par) www.vi-hps.org

Conclusion Performance must become first-class citizen in development process Combination of experimental performance analysis and modeling Requires managing performance design documents Paying staff to do performance optimization is worth the money Performance tools will further improve their productivity Qualified staff hard to find We need more software engineering in computational science curricula 22

23 Thank you!