EZTrace upcoming features
|
|
- Edwina Joseph
- 5 years ago
- Views:
Transcription
1 EZTrace upcoming features François Trahay francois.trahay@telecom-sudparis.eu
2 Context Hardware is more and more complex NUMA, hierarchical caches, GPU,... Software is more and more complex Hybrid MPI+OpenMP, MPI+CUDA, Achieving good performance is hard Understanding the performance of an application is difficult Need for performance analysis tools 2
3 EZTrace 3
4 EZTrace Framework for performance analysis Generate execution traces - standard OTF or Pajé trace files Provides trace analysis facilities Developped at TSP + INRIA Bordeaux CeCILL-B licence (~BSD) C / Fortran / C++ programs Mostly tested on X86_64 / ARM 4
5 EZTrace plugins A set of pre-defined plugins Major programming models (MPI, OpenMP, CUDA) - Possibility to combine plugins (eg. MPI+OpenMP, etc.) General-purpose plugins (memory, pthread) Performance counters (PAPI) User-defined plugins Created for the user application/libary $ eztrace_plugin_generator./application $ eztrace_create_plugin example.tpl Third-party plugins shipped with external libraries (ex: PLASMA) 5
6 EZTrace 2-step analysis Run the application $ mpiexec np 2 eztrace t ''mpi cuda''./my_application Automatic instrumentation of the program Record events at key points Generate a (compact) raw trace file per process Post-mortem analysis $ eztrace_convert /tmp/trahay_eztrace_log_rank_* Generate a trace file for visualisation (Paje / OTF) - Using the GTG library Extract statistics 6
7 Generating traces 7 Outils d'analyse de performance
8 Instrumentation with LD_PRELOAD Install a wrapper for key functions Only works for shared libraries 8
9 Binary instrumentation Modify the entry of key functions in the binary Lightweight instrumentation Little architecture-specific code (~100s loc) - For now : x86_64 + ARMv7 9
10 LiTL : Lightweight Trace Library Events recorded in a binary format (timestamp, event_code, arg1, arg2,...) One buffer per thread Flush buffers at the end of the application 10
11 Performances Cluster Edel 2 x 4 cores per node Infiniband NAS Parallel Benchmarks Class=B, NProcs=64 overhead: ~100 ns per event 11
12 Trace analysis 12 Outils d'analyse de performance
13 Trace analysis Post-mortem analysis Read the LiTL trace files Interpret events Several possible output Generate a viewable trace file (Paje/OTF) Extract statistics In-browser trace analysis* * under development 13
14 Generating viewable trace files 14 generates a trace file Paje / OTF file formats Viewable with tools like Vampir or ViTE eztrace_convert
15 Extracting statistics 15 prints various statistics Time spent on locks Memory consumption MPI messages Duration of OpenMP parallel regions... eztrace_stats
16 In-browser analysis* Web-based trace analysis Communication matrix Gantt chart * under development 16
17 Conclusion EZTrace 1.0 is available Open source Contributions are welcome! Extensible Future work / upcoming features In-browser analysis Trace analysis in parallel Detection of patterns / anomalies 17
18 Questions? François Trahay 18 Outils d'analyse de performance
19 Bonus: Pattern detection in EZTrace Visualizing a large trace is difficult Millions of events Detect patterns in a trace Application phases that repeat 100 x { MPI_SEND MPI_RECV } MPI_Barrier x { MPI_SEND MPI_RECV } MPI_Barrier 19 (src=0 (src=1 dest=1 dest=0 len=16 len=16 tag=0) tag=0) (src=0 (src=1 dest=1 dest=0 len=16 len=16 tag=0) tag=0) NPB CG class A 16 MPI Processes events NPB CG class A 16 MPI Processes events
20 Bonus: Detecting anomalies Select representative occurrences Instead of examining 1000 occurrences Select 1 occurrence per class #327 #549 # HP2 seminar August 2014
Runtime Function Instrumentation with EZTrace
Runtime Function Instrumentation with EZTrace Charles Aulagnon, Damien Martin-Guillerez, François Rué and François Trahay 5 th Workshop on Productivity and Performance (PROPER 2012) INTRODUCTION Modern
More informationAn open source tool chain for performance analysis
An open source tool chain for performance analysis Kevin Coulomb, Augustin Degomme, Mathieu Faverge, François Trahay To cite this version: Kevin Coulomb, Augustin Degomme, Mathieu Faverge, François Trahay.
More informationScore-P. SC 14: Hands-on Practical Hybrid Parallel Application Performance Engineering 1
Score-P SC 14: Hands-on Practical Hybrid Parallel Application Performance Engineering 1 Score-P Functionality Score-P is a joint instrumentation and measurement system for a number of PA tools. Provide
More informationAutomatic trace analysis with the Scalasca Trace Tools
Automatic trace analysis with the Scalasca Trace Tools Ilya Zhukov Jülich Supercomputing Centre Property Automatic trace analysis Idea Automatic search for patterns of inefficient behaviour Classification
More informationMulti-Application Online Profiling Tool
Multi-Application Online Profiling Tool Vi-HPS Julien ADAM, Antoine CAPRA 1 About MALP MALP is a tool originally developed in CEA and in the University of Versailles (UVSQ) It generates rich HTML views
More informationVAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW
VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW 8th VI-HPS Tuning Workshop at RWTH Aachen September, 2011 Tobias Hilbrich and Joachim Protze Slides by: Andreas Knüpfer, Jens Doleschal, ZIH, Technische Universität
More informationMPI Runtime Error Detection with MUST
MPI Runtime Error Detection with MUST At the 25th VI-HPS Tuning Workshop Joachim Protze IT Center RWTH Aachen University March 2017 How many issues can you spot in this tiny example? #include #include
More informationPerformance Analysis of Large-Scale OpenMP and Hybrid MPI/OpenMP Applications with Vampir NG
Performance Analysis of Large-Scale OpenMP and Hybrid MPI/OpenMP Applications with Vampir NG Holger Brunst Center for High Performance Computing Dresden University, Germany June 1st, 2005 Overview Overview
More informationScalasca performance properties The metrics tour
Scalasca performance properties The metrics tour Markus Geimer m.geimer@fz-juelich.de Scalasca analysis result Generic metrics Generic metrics Time Total CPU allocation time Execution Overhead Visits Hardware
More informationA Characterization of Shared Data Access Patterns in UPC Programs
IBM T.J. Watson Research Center A Characterization of Shared Data Access Patterns in UPC Programs Christopher Barton, Calin Cascaval, Jose Nelson Amaral LCPC `06 November 2, 2006 Outline Motivation Overview
More informationTAU Performance System Hands on session
TAU Performance System Hands on session Sameer Shende sameer@cs.uoregon.edu University of Oregon http://tau.uoregon.edu Copy the workshop tarball! Setup preferred program environment compilers! Default
More informationPlaying with process tracing to instrument static function at runtime in EZTrace. Damien Martin-Guillerez SED
Playing with process tracing to instrument static function at runtime in EZTrace Damien Martin-Guillerez SED CENTRE Inria BORDEAUX SUD-OUEST INTRODUCTION EZTrace is a performance trace generator for parallel
More informationScore-P A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir
Score-P A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir VI-HPS Team Score-P: Specialized Measurements and Analyses Mastering build systems Hooking up the
More informationWhy you should care about hardware locality and how.
Why you should care about hardware locality and how. Brice Goglin TADaaM team Inria Bordeaux Sud-Ouest Agenda Quick example as an introduction Bind your processes What's the actual problem? Convenient
More informationReview of previous examinations TMA4280 Introduction to Supercomputing
Review of previous examinations TMA4280 Introduction to Supercomputing NTNU, IMF April 24. 2017 1 Examination The examination is usually comprised of: one problem related to linear algebra operations with
More informationInteractive Performance Analysis with Vampir UCAR Software Engineering Assembly in Boulder/CO,
Interactive Performance Analysis with Vampir UCAR Software Engineering Assembly in Boulder/CO, 2013-04-03 Andreas Knüpfer, Thomas William TU Dresden, Germany Overview Introduction Vampir displays GPGPU
More informationMessage Passing Interface
MPSoC Architectures MPI Alberto Bosio, Associate Professor UM Microelectronic Departement bosio@lirmm.fr Message Passing Interface API for distributed-memory programming parallel code that runs across
More informationParallel Performance Analysis Using the Paraver Toolkit
Parallel Performance Analysis Using the Paraver Toolkit Parallel Performance Analysis Using the Paraver Toolkit [16a] [16a] Slide 1 University of Stuttgart High-Performance Computing Center Stuttgart (HLRS)
More informationSCALASCA parallel performance analyses of SPEC MPI2007 applications
Mitglied der Helmholtz-Gemeinschaft SCALASCA parallel performance analyses of SPEC MPI2007 applications 2008-05-22 Zoltán Szebenyi Jülich Supercomputing Centre, Forschungszentrum Jülich Aachen Institute
More informationCenter for Information Services and High Performance Computing (ZIH) Session 3: Hands-On
Center for Information Services and High Performance Computing (ZIH) Session 3: Hands-On Dr. Matthias S. Müller (RWTH Aachen University) Tobias Hilbrich (Technische Universität Dresden) Joachim Protze
More informationScalasca performance properties The metrics tour
Scalasca performance properties The metrics tour Markus Geimer m.geimer@fz-juelich.de Scalasca analysis result Generic metrics Generic metrics Time Total CPU allocation time Execution Overhead Visits Hardware
More informationVAMPIR & VAMPIRTRACE Hands On
VAMPIR & VAMPIRTRACE Hands On 8th VI-HPS Tuning Workshop at RWTH Aachen September, 2011 Tobias Hilbrich and Joachim Protze Slides by: Andreas Knüpfer, Jens Doleschal, ZIH, Technische Universität Dresden
More informationImproving Applica/on Performance Using the TAU Performance System
Improving Applica/on Performance Using the TAU Performance System Sameer Shende, John C. Linford {sameer, jlinford}@paratools.com ParaTools, Inc and University of Oregon. April 4-5, 2013, CG1, NCAR, UCAR
More informationAutomatic Tuning of HPC Applications with Periscope. Michael Gerndt, Michael Firbach, Isaias Compres Technische Universität München
Automatic Tuning of HPC Applications with Periscope Michael Gerndt, Michael Firbach, Isaias Compres Technische Universität München Agenda 15:00 15:30 Introduction to the Periscope Tuning Framework (PTF)
More informationPerformance properties The metrics tour
Performance properties The metrics tour Markus Geimer & Brian Wylie Jülich Supercomputing Centre scalasca@fz-juelich.de August 2012 Scalasca analysis result Online description Analysis report explorer
More informationScalable Critical Path Analysis for Hybrid MPI-CUDA Applications
Center for Information Services and High Performance Computing (ZIH) Scalable Critical Path Analysis for Hybrid MPI-CUDA Applications The Fourth International Workshop on Accelerators and Hybrid Exascale
More informationUsing Lamport s Logical Clocks
Fast Classification of MPI Applications Using Lamport s Logical Clocks Zhou Tong, Scott Pakin, Michael Lang, Xin Yuan Florida State University Los Alamos National Laboratory 1 Motivation Conventional trace-based
More informationInstrumentation. BSC Performance Tools
Instrumentation BSC Performance Tools Index The instrumentation process A typical MN process Paraver trace format Configuration XML Environment variables Adding references to the source API CEPBA-Tools
More informationVampir and Lustre. Understanding Boundaries in I/O Intensive Applications
Center for Information Services and High Performance Computing (ZIH) Vampir and Lustre Understanding Boundaries in I/O Intensive Applications Zellescher Weg 14 Treffz-Bau (HRSK-Anbau) - HRSK 151 Tel. +49
More informationJURECA Tuning for the platform
JURECA Tuning for the platform Usage of ParaStation MPI 2017-11-23 Outline ParaStation MPI Compiling your program Running your program Tuning parameters Resources 2 ParaStation MPI Based on MPICH (3.2)
More informationBSC Tools Hands-On. Judit Giménez, Lau Mercadal Barcelona Supercomputing Center
BSC Tools Hands-On Judit Giménez, Lau Mercadal (lau.mercadal@bsc.es) Barcelona Supercomputing Center 2 VIRTUAL INSTITUTE HIGH PRODUCTIVITY SUPERCOMPUTING Extrae Extrae features Parallel programming models
More information( ZIH ) Center for Information Services and High Performance Computing. Event Tracing and Visualization for Cell Broadband Engine Systems
( ZIH ) Center for Information Services and High Performance Computing Event Tracing and Visualization for Cell Broadband Engine Systems ( daniel.hackenberg@zih.tu-dresden.de ) Daniel Hackenberg Cell Broadband
More informationPerformance properties The metrics tour
Performance properties The metrics tour Markus Geimer & Brian Wylie Jülich Supercomputing Centre scalasca@fz-juelich.de January 2012 Scalasca analysis result Confused? Generic metrics Generic metrics Time
More informationPerformance analysis with Periscope
Performance analysis with Periscope M. Gerndt, V. Petkov, Y. Oleynik, S. Benedict Technische Universität petkovve@in.tum.de March 2010 Outline Motivation Periscope (PSC) Periscope performance analysis
More informationHands-on: NPB-MZ-MPI / BT
Hands-on: NPB-MZ-MPI / BT VI-HPS Team Tutorial exercise objectives Familiarize with usage of Score-P, Cube, Scalasca & Vampir Complementary tools capabilities & interoperability Prepare to apply tools
More informationMPI Runtime Error Detection with MUST and Marmot For the 8th VI-HPS Tuning Workshop
MPI Runtime Error Detection with MUST and Marmot For the 8th VI-HPS Tuning Workshop Tobias Hilbrich and Joachim Protze ZIH, Technische Universität Dresden September 2011 Content MPI Usage Errors Error
More informationCUDA GPGPU Workshop 2012
CUDA GPGPU Workshop 2012 Parallel Programming: C thread, Open MP, and Open MPI Presenter: Nasrin Sultana Wichita State University 07/10/2012 Parallel Programming: Open MP, MPI, Open MPI & CUDA Outline
More informationIntroduction to Parallel Performance Engineering
Introduction to Parallel Performance Engineering Markus Geimer, Brian Wylie Jülich Supercomputing Centre (with content used with permission from tutorials by Bernd Mohr/JSC and Luiz DeRose/Cray) Performance:
More informationAutoTune Workshop. Michael Gerndt Technische Universität München
AutoTune Workshop Michael Gerndt Technische Universität München AutoTune Project Automatic Online Tuning of HPC Applications High PERFORMANCE Computing HPC application developers Compute centers: Energy
More informationSplotch: High Performance Visualization using MPI, OpenMP and CUDA
Splotch: High Performance Visualization using MPI, OpenMP and CUDA Klaus Dolag (Munich University Observatory) Martin Reinecke (MPA, Garching) Claudio Gheller (CSCS, Switzerland), Marzia Rivi (CINECA,
More informationCommunication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures
Communication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures Rolf Rabenseifner rabenseifner@hlrs.de Gerhard Wellein gerhard.wellein@rrze.uni-erlangen.de University of Stuttgart
More informationPerformance Analysis for Large Scale Simulation Codes with Periscope
Performance Analysis for Large Scale Simulation Codes with Periscope M. Gerndt, Y. Oleynik, C. Pospiech, D. Gudu Technische Universität München IBM Deutschland GmbH May 2011 Outline Motivation Periscope
More informationUniversität Hamburg. Replay Engine for Application Specific Workloads. Fachbereich Informatik. Bachelorarbeit. Name:
Universität Hamburg Fachbereich Informatik Bachelorarbeit Replay Engine for Application Specific Workloads Name: Jörn Ahlers Matrikelnummer: 6053193 Betreuer: Julian Kunkel Abgabe Datum: 12.04.2012 Ich
More informationOptimization of MPI Applications Rolf Rabenseifner
Optimization of MPI Applications Rolf Rabenseifner University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de Optimization of MPI Applications Slide 1 Optimization and Standardization
More informationPerformance Analysis of Parallel Scientific Applications In Eclipse
Performance Analysis of Parallel Scientific Applications In Eclipse EclipseCon 2015 Wyatt Spear, University of Oregon wspear@cs.uoregon.edu Supercomputing Big systems solving big problems Performance gains
More informationMulticore Performance and Tools. Part 1: Topology, affinity, clock speed
Multicore Performance and Tools Part 1: Topology, affinity, clock speed Tools for Node-level Performance Engineering Gather Node Information hwloc, likwid-topology, likwid-powermeter Affinity control and
More informationAnalysis of Program Behavior
Analysis of Program Behavior High Performance Computing, Visualization Lucas Mello Schnorr probably soon (LIG-CNRS INF-UFRGS) 2 nd LICIA Workshop Grenoble, France September 5th, 2012 1/ 25 Introduction
More informationMPI Profile (mpip) on IRS BenchMark Application
MPI Profile (mpip) on IRS BenchMark Application MPI with ZRAD8 -np=2 -np=6 -np=1 Call App% MPI% App% MPI% App% MPI% App% MPI% App% MPI% any.62 3.94.79 28.22 26.23 73. 1.72 1.67 1.46 6.28.12 1.1.24 8.44
More informationPerformance Analysis with Vampir. Joseph Schuchart ZIH, Technische Universität Dresden
Performance Analysis with Vampir Joseph Schuchart ZIH, Technische Universität Dresden 1 Mission Visualization of dynamics of complex parallel processes Full details for arbitrary temporal and spatial levels
More informationScore-P A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir
Score-P A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir VI-HPS Team Congratulations!? If you made it this far, you successfully used Score-P to instrument
More informationMPI Runtime Error Detection with MUST
MPI Runtime Error Detection with MUST At the 27th VI-HPS Tuning Workshop Joachim Protze IT Center RWTH Aachen University April 2018 How many issues can you spot in this tiny example? #include #include
More informationPlacement de processus (MPI) sur architecture multi-cœur NUMA
Placement de processus (MPI) sur architecture multi-cœur NUMA Emmanuel Jeannot, Guillaume Mercier LaBRI/INRIA Bordeaux Sud-Ouest/ENSEIRB Runtime Team Lyon, journées groupe de calcul, november 2010 Emmanuel.Jeannot@inria.fr
More informationELP. Effektive Laufzeitunterstützung für zukünftige Programmierstandards. Speaker: Tim Cramer, RWTH Aachen University
ELP Effektive Laufzeitunterstützung für zukünftige Programmierstandards Agenda ELP Project Goals ELP Achievements Remaining Steps ELP Project Goals Goals of ELP: Improve programmer productivity By influencing
More information[Scalasca] Tool Integrations
Mitglied der Helmholtz-Gemeinschaft [Scalasca] Tool Integrations Aug 2011 Bernd Mohr CScADS Performance Tools Workshop Lake Tahoe Contents Current integration of various direct measurement tools Paraver
More informationProgramming for Fujitsu Supercomputers
Programming for Fujitsu Supercomputers Koh Hotta The Next Generation Technical Computing Fujitsu Limited To Programmers who are busy on their own research, Fujitsu provides environments for Parallel Programming
More informationDon t reinvent the wheel. BLAS LAPACK Intel Math Kernel Library
Libraries Don t reinvent the wheel. Specialized math libraries are likely faster. BLAS: Basic Linear Algebra Subprograms LAPACK: Linear Algebra Package (uses BLAS) http://www.netlib.org/lapack/ to download
More informationCMAQ PARALLEL PERFORMANCE WITH MPI AND OPENMP**
CMAQ 5.2.1 PARALLEL PERFORMANCE WITH MPI AND OPENMP** George Delic* HiPERiSM Consulting, LLC, P.O. Box 569, Chapel Hill, NC 27514, USA 1. INTRODUCTION This presentation reports on implementation of the
More informationApproaches to Performance Evaluation On Shared Memory and Cluster Architectures
Approaches to Performance Evaluation On Shared Memory and Cluster Architectures Peter Strazdins (and the CC-NUMA Team), CC-NUMA Project, Department of Computer Science, The Australian National University
More informationPerformance Analysis of Parallel Applications Using LTTng & Trace Compass
Performance Analysis of Parallel Applications Using LTTng & Trace Compass Naser Ezzati DORSAL LAB Progress Report Meeting Polytechnique Montreal Dec 2017 What is MPI? Message Passing Interface (MPI) Industry-wide
More informationThe Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs
1 The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) s http://mpi-forum.org https://www.open-mpi.org/ Mike Bailey mjb@cs.oregonstate.edu Oregon State University mpi.pptx
More informationIntroduction to parallel computing concepts and technics
Introduction to parallel computing concepts and technics Paschalis Korosoglou (support@grid.auth.gr) User and Application Support Unit Scientific Computing Center @ AUTH Overview of Parallel computing
More informationParallel Programming with MPI and OpenMP
Parallel Programming with MPI and OpenMP Michael J. Quinn Chapter 6 Floyd s Algorithm Chapter Objectives Creating 2-D arrays Thinking about grain size Introducing point-to-point communications Reading
More information1. Define algorithm complexity 2. What is called out of order in detail? 3. Define Hardware prefetching. 4. Define software prefetching. 5. Define wor
CS6801-MULTICORE ARCHECTURES AND PROGRAMMING UN I 1. Difference between Symmetric Memory Architecture and Distributed Memory Architecture. 2. What is Vector Instruction? 3. What are the factor to increasing
More informationPerformance Analysis with Periscope
Performance Analysis with Periscope M. Gerndt, V. Petkov, Y. Oleynik, S. Benedict Technische Universität München periscope@lrr.in.tum.de October 2010 Outline Motivation Periscope overview Periscope performance
More informationPotentials and Limitations for Energy Efficiency Auto-Tuning
Center for Information Services and High Performance Computing (ZIH) Potentials and Limitations for Energy Efficiency Auto-Tuning Parco Symposium Application Autotuning for HPC (Architectures) Robert Schöne
More informationPractical Introduction to
1 2 Outline of the workshop Practical Introduction to What is ScaleMP? When do we need it? How do we run codes on the ScaleMP node on the ScaleMP Guillimin cluster? How to run programs efficiently on ScaleMP?
More informationCS 179: GPU Programming. Lecture 14: Inter-process Communication
CS 179: GPU Programming Lecture 14: Inter-process Communication The Problem What if we want to use GPUs across a distributed system? GPU cluster, CSIRO Distributed System A collection of computers Each
More informationDetection and Analysis of Iterative Behavior in Parallel Applications
Detection and Analysis of Iterative Behavior in Parallel Applications Karl Fürlinger and Shirley Moore Innovative Computing Laboratory, Department of Electrical Engineering and Computer Science, University
More informationPPCES 2016: MPI Lab March 2016 Hristo Iliev, Portions thanks to: Christian Iwainsky, Sandra Wienke
PPCES 2016: MPI Lab 16 17 March 2016 Hristo Iliev, iliev@itc.rwth-aachen.de Portions thanks to: Christian Iwainsky, Sandra Wienke Synopsis The purpose of this hands-on lab is to make you familiar with
More informationParticle-in-Cell Simulations on Modern Computing Platforms. Viktor K. Decyk and Tajendra V. Singh UCLA
Particle-in-Cell Simulations on Modern Computing Platforms Viktor K. Decyk and Tajendra V. Singh UCLA Outline of Presentation Abstraction of future computer hardware PIC on GPUs OpenCL and Cuda Fortran
More informationScalasca support for Intel Xeon Phi. Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany
Scalasca support for Intel Xeon Phi Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany Overview Scalasca performance analysis toolset support for MPI & OpenMP
More informationGraham vs legacy systems
New User Seminar Graham vs legacy systems This webinar only covers topics pertaining to graham. For the introduction to our legacy systems (Orca etc.), please check the following recorded webinar: SHARCNet
More informationPiecewise Holistic Autotuning of Compiler and Runtime Parameters
Piecewise Holistic Autotuning of Compiler and Runtime Parameters Mihail Popov, Chadi Akel, William Jalby, Pablo de Oliveira Castro University of Versailles Exascale Computing Research August 2016 C E R
More informationDmitry Durnov 15 February 2017
Cовременные тенденции разработки высокопроизводительных приложений Dmitry Durnov 15 February 2017 Agenda Modern cluster architecture Node level Cluster level Programming models Tools 2/20/2017 2 Modern
More informationHPC Parallel Programing Multi-node Computation with MPI - I
HPC Parallel Programing Multi-node Computation with MPI - I Parallelization and Optimization Group TATA Consultancy Services, Sahyadri Park Pune, India TCS all rights reserved April 29, 2013 Copyright
More informationThe Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs
1 The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs http://mpi-forum.org https://www.open-mpi.org/ Mike Bailey mjb@cs.oregonstate.edu Oregon State University mpi.pptx
More informationParallel Applications on Distributed Memory Systems. Le Yan HPC User LSU
Parallel Applications on Distributed Memory Systems Le Yan HPC User Services @ LSU Outline Distributed memory systems Message Passing Interface (MPI) Parallel applications 6/3/2015 LONI Parallel Programming
More informationProfilers and performance evaluation. Tools and techniques for performance analysis Andrew Emerson
Profilers and performance evaluation Tools and techniques for performance analysis Andrew Emerson 10/06/2016 Tools and Profilers, Summer School 2016 1 Contents Motivations Manual Methods Measuring execution
More informationHands-on / Demo: Building and running NPB-MZ-MPI / BT
Hands-on / Demo: Building and running NPB-MZ-MPI / BT Markus Geimer Jülich Supercomputing Centre What is NPB-MZ-MPI / BT? A benchmark from the NAS parallel benchmarks suite MPI + OpenMP version Implementation
More informationn N c CIni.o ewsrg.au
@NCInews NCI and Raijin National Computational Infrastructure 2 Our Partners General purpose, highly parallel processors High FLOPs/watt and FLOPs/$ Unit of execution Kernel Separate memory subsystem GPGPU
More informationRDMA Read Based Rendezvous Protocol for MPI over InfiniBand: Design Alternatives and Benefits
RDMA Read Based Rendezvous Protocol for MPI over InfiniBand: Design Alternatives and Benefits Sayantan Sur Hyun-Wook Jin Lei Chai D. K. Panda Network Based Computing Lab, The Ohio State University Presentation
More informationSCALASCA v1.0 Quick Reference
General SCALASCA is an open-source toolset for scalable performance analysis of large-scale parallel applications. Use the scalasca command with appropriate action flags to instrument application object
More informationPerformance Tools Hands-On. PATC Apr/2016.
Performance Tools Hands-On PATC Apr/2016 tools@bsc.es Accounts Users: nct010xx Password: f.23s.nct.0xx XX = [ 01 60 ] 2 Extrae features Parallel programming models MPI, OpenMP, pthreads, OmpSs, CUDA, OpenCL,
More informationIBM High Performance Computing Toolkit
IBM High Performance Computing Toolkit Pidad D'Souza (pidsouza@in.ibm.com) IBM, India Software Labs Top 500 : Application areas (November 2011) Systems Performance Source : http://www.top500.org/charts/list/34/apparea
More informationTAU by example - Mpich
TAU by example From Mpich TAU (Tuning and Analysis Utilities) is a toolkit for profiling and tracing parallel programs written in C, C++, Fortran and others. It supports dynamic (librarybased), compiler
More informationMPI+X on The Way to Exascale. William Gropp
MPI+X on The Way to Exascale William Gropp http://wgropp.cs.illinois.edu Likely Exascale Architectures (Low Capacity, High Bandwidth) 3D Stacked Memory (High Capacity, Low Bandwidth) Thin Cores / Accelerators
More informationAdvanced Threading and Optimization
Mikko Byckling, CSC Michael Klemm, Intel Advanced Threading and Optimization February 24-26, 2015 PRACE Advanced Training Centre CSC IT Center for Science Ltd, Finland!$omp parallel do collapse(3) do p4=1,p4d
More informationL21: Putting it together: Tree Search (Ch. 6)!
Administrative CUDA project due Wednesday, Nov. 28 L21: Putting it together: Tree Search (Ch. 6)! Poster dry run on Dec. 4, final presentations on Dec. 6 Optional final report (4-6 pages) due on Dec. 14
More informationSTUDYING OPENMP WITH VAMPIR & SCORE-P
STUDYING OPENMP WITH VAMPIR & SCORE-P Score-P Measurement Infrastructure November 14, 2018 Studying OpenMP with Vampir & Score-P 2 November 14, 2018 Studying OpenMP with Vampir & Score-P 3 OpenMP Instrumentation
More informationOP2 FOR MANY-CORE ARCHITECTURES
OP2 FOR MANY-CORE ARCHITECTURES G.R. Mudalige, M.B. Giles, Oxford e-research Centre, University of Oxford gihan.mudalige@oerc.ox.ac.uk 27 th Jan 2012 1 AGENDA OP2 Current Progress Future work for OP2 EPSRC
More informationAn MPI failure detector over PMPI 1
An MPI failure detector over PMPI 1 Donghoon Kim Department of Computer Science, North Carolina State University Raleigh, NC, USA Email : {dkim2}@ncsu.edu Abstract Fault Detectors are valuable services
More informationParallel Programming Models. Parallel Programming Models. Threads Model. Implementations 3/24/2014. Shared Memory Model (without threads)
Parallel Programming Models Parallel Programming Models Shared Memory (without threads) Threads Distributed Memory / Message Passing Data Parallel Hybrid Single Program Multiple Data (SPMD) Multiple Program
More informationLoad Balancing for Parallel Multi-core Machines with Non-Uniform Communication Costs
Load Balancing for Parallel Multi-core Machines with Non-Uniform Communication Costs Laércio Lima Pilla llpilla@inf.ufrgs.br LIG Laboratory INRIA Grenoble University Grenoble, France Institute of Informatics
More informationPerformance Analysis of Large-scale OpenMP and Hybrid MPI/OpenMP Applications with Vampir NG
Performance Analysis of Large-scale OpenMP and Hybrid MPI/OpenMP Applications with Vampir NG Holger Brunst 1 and Bernd Mohr 2 1 Center for High Performance Computing Dresden University of Technology Dresden,
More informationKommunikations- und Optimierungsaspekte paralleler Programmiermodelle auf hybriden HPC-Plattformen
Kommunikations- und Optimierungsaspekte paralleler Programmiermodelle auf hybriden HPC-Plattformen Rolf Rabenseifner rabenseifner@hlrs.de Universität Stuttgart, Höchstleistungsrechenzentrum Stuttgart (HLRS)
More informationNo Time to Read This Book?
Chapter 1 No Time to Read This Book? We know what it feels like to be under pressure. Try out a few quick and proven optimization stunts described below. They may provide a good enough performance gain
More informationIntroduction Contech s Task Graph Representation Parallel Program Instrumentation (Break) Analysis and Usage of a Contech Task Graph Hands-on
Introduction Contech s Task Graph Representation Parallel Program Instrumentation (Break) Analysis and Usage of a Contech Task Graph Hands-on Exercises 2 Contech is An LLVM compiler pass to instrument
More informationECMWF Workshop on High Performance Computing in Meteorology. 3 rd November Dean Stewart
ECMWF Workshop on High Performance Computing in Meteorology 3 rd November 2010 Dean Stewart Agenda Company Overview Rogue Wave Product Overview IMSL Fortran TotalView Debugger Acumem ThreadSpotter 1 Copyright
More informationComparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster G. Jost*, H. Jin*, D. an Mey**,F. Hatay*** *NASA Ames Research Center **Center for Computing and Communication, University of
More informationVAMPIR & VAMPIRTRACE Hands On
VAMPIR & VAMPIRTRACE Hands On PRACE Spring School 2012 in Krakow May, 2012 Holger Brunst Slides by: Andreas Knüpfer, Jens Doleschal, ZIH, Technische Universität Dresden Hands-on: NPB Build Copy NPB sources
More information