TAU by example - Mpich
|
|
- Ellen Cox
- 5 years ago
- Views:
Transcription
1 TAU by example From Mpich TAU (Tuning and Analysis Utilities) is a toolkit for profiling and tracing parallel programs written in C, C++, Fortran and others. It supports dynamic (librarybased), compiler and source-level instrumentation. Unlike MPE, TAU is not limited to profiling MPI code, being geared towards parallel programming in general, including CUDA, OpenMP and regular pthreads. ParaProf with QMCPACK As TAU is already ( extensively ( documented ( this page will only provide a short introduction to some common features, along with some basic example code. Contents 1 Installation 1.1 Desktop linux (ubuntu ) 2 Wave2D 2.1 Description 2.2 Profiling Dynamic instrumentation Source instrumentation Compiler-based instrumentation Selective instrumentation 2.3 Visualization pprof ParaProf 2.4 Tracing 1 of :51
2 3 Ring 4 NWChem 5 Hardware counters 6 Notes 7 External links Installation Desktop linux (ubuntu ) After downloading both TAU and PDT from here ( /Research/tau/downloads.php), unpack them wherever convenient and run the following command to configure, compile and install TAU. If MPICH2/PDT were installed on another prefix, the part within brackets must be appropriately setup: set $MPI_PATH to the directory where MPICH2 was installed, and adjust $PDT_DIR accordingly. Otherwise, it can be left out. (If MPICH2 was configured with --enable-shared, it is not necessary to pass the -mpilibrary argument below.) %./configure -mpilibrary='-lmpich -lmpl -lopa' [-mpilib=$mpi_path/lib -mpiinc=$mpi_path/include -pdt= % make -j clean install For more information, please refer to its installation manual ( /bk04ch01.html#installing.tau). Wave2D Description The wave equation is a partial differential equation used to describe the behavior of waves as they occur in physics: Here, the variable represents a physical measure such as pressure or water depth. This equation can be discretized over a 2D grid using a finite differencing scheme, leading to the following update rule: Solution to the wave equation 2 of :51
3 where is defined over a rectangular 2D grid in dimensions,, and time step. The constant defines the wave's propagation speed. We hence obtain the iterative five-point stencil code wave2d. Profiling A list of the different instrumentation methods used for e.g. profiling (and which features they support) can be found here ( /tau/docs/usersguide/ch01.html). We will cover three different methods: [1] ( Dynamic: statistical sampling; /TAU_by_example#endnote_dynamicInstAltName) Source: parser-aided automatic code categorization; Selective: uses a separate file to "manually control which parts of the application are profiled and how they are profiled"; this is technically part of the source instrumentation above (and hence also requires PDT). Dynamic instrumentation The most straightforward way of getting started with TAU is through tau_exec, which does dynamic instrumentation. This method makes use of statistical sampling to estimate which percentage of the execution time is taken by each function (as well as the absolute time spent). We first compile as usual: % mpicxx wave2d.cpp -o wave2d The difference occurs when executing the program, which is done as follows: [2] ( % mpirun -np 12 tau_exec./wave2d Unfortunately, this method does not support profiling user-defined routines, only those from MPI. For this reason, we do not recommend this method for real-world applications (which very likely spend less time on MPI calls than on computation itself, which would be unaccounted for). Source instrumentation If PDT is installed, it is possible to use the compiler wrapper scripts 3 of :51
4 tau_cc.sh/tau_cxx.sh to automatically instrument our code. In this case, the difference is that one call to TAU_PROFILE is inserted in every user-defined function, with this modified function's header. /TAU_by_example#endnote_optKeepFiles) [3] ( First, we select which features we want TAU to use (e.g. MPI support, tracing, CUDA hardware counters, etc), which is done by setting an environment variable called TAU_MAKEFILE to point to one of the (informatively-named) default Makefiles that are located in <TAU_HOME>/lib/. For now, we only want to profile MPI code, so we write: % export TAU_MAKEFILE=$TAU_HOME/lib/Makefile.tau-mpi-pdt Next, to build the instrumented wave2d program, we replace the regular mpicc or mpicxx commands by tau_cc.sh or tau_cxx.sh. % tau_cxx.sh wave2d.cpp -o wave2d If all goes right, we can then execute the code as usual: % mpirun -np 12./wave2d Compiler-based instrumentation Additionally, TAU has a compiler instrumentation method, which sits in between dynamic and source. Unlike dynamic, it requires compilation, but also inherits some features from source such as being able to profile user-defined functions. However, it cannot provide information about finer constructs such as loops and so on. To use this method, the argument -tau_options=-optcompinst should be added to tau_cc.sh/tau_cxx.sh when compiling (or, equivalently, to the TAU_OPTIONS environment variable); visualization and program execution remain exactly the same. In practice, we recommend installing PDT and using the source mode. Selective instrumentation A large program might have dozens of auxiliary functions that do not constitute a significant chunk of the execution time, and hence visually pollute the profile. Also, it may happen that a function will have two or more time-consuming loops, which will not be individually represented in the profile. 4 of :51
5 For this and other reasons we may want to selectively exclude functions, annotate (outer) loops, etc, using TAU's support for selectively profiling applications. Consider the following example, which we will name select.tau: BEGIN_EXCLUDE_LIST void foo(int *, double) void bartoo_#(int *) END_EXCLUDE_LIST BEGIN_INSTRUMENT_SECTION loops file="random.cpp" routine="int FooClass::fooToo(double, double)" END_INSTRUMENT_SECTION Here, the symbol # in the function names acts as a wildcard. To make use of this file, we define the TAU_OPTIONS environment variable: % export TAU_OPTIONS="-optTauSelectFile=select.tau" Warning: This does not as expected with regular C code. After every function in the exclude list, a C should be added. Otherwise, a # should be added after every function name.this is a consequence of the design used by TAU, and is effectively arbitrary. For some more information, check the official manual ( /Research/tau/docs/newguide/bk01ch01s03.html). Visualization Regardless of the instrumentation method, we will then obtain a number of profile.r.* files (r being a rank). These can now be visualized in a number of manners. pprof Text-based; can be invoked by a simple pprof: % pprof Reading Profile files in profile.* NODE 0;CONTEXT 0;THREAD 0: %Time Exclusive Inclusive #Call #Subrs Inclusive Name 5 of :51
6 msec total msec usec/call , int main(int, char **) , void Grid::doIterations(int) ,365 2, void Grid::doOneIteration() , void Grid::exchangeEdges() ,412 1, MPI_Waitall() void Grid::Grid(int, int, int, int, void Grid::initGrid(int) MPI_Finalize() MPI_Init() MPI_Send() void Grid::~Grid() MPI_Irecv() MPI_Bcast() MPI_Comm_rank() MPI_Comm_size() USER EVENTS Profile :NODE 0, CONTEXT 0, THREAD 0 NumSamples MaxValue MinValue MeanValue Std. Dev. Event Name Message size for broadcast... the same table for every rank... FUNCTION SUMMARY (total): %Time Exclusive Inclusive #Call #Subrs Inclusive Name msec total msec usec/call , int main(int, char **) , void Grid::doIterations(int) ,121 29, void Grid::doOneIteration() , void Grid::exchangeEdges() ,075 16, MPI_Waitall() void Grid::Grid(int, int, int, int, void Grid::initGrid(int) MPI_Finalize() MPI_Init() MPI_Bcast() MPI_Send() void Grid::~Grid() MPI_Irecv() MPI_Comm_rank() MPI_Comm_size()... the same table, but now with mean values... More information about how to sort the data differently can be found by doing pprof -h. ParaProf A much richer, graphical interface for visualization and analysis, ParaProf can also be initialized simply by paraprof. 6 of :51
7 The vast majority of the functionality can be found by navigating the menus. For instance, a graph of functions ordered by time taken on average can be obtained by right-clicking Mean and selecting Show Mean Bar Chart. ParaProf with inclusive metric Analogously, it is possible to see the execution time for a given function on all ranks. A way of doing it is right clicking on the corresponding function bar, and going to Show Function Bar Chart. Alternatively, the menu Windows->Function->Bar Chart displays a list of all profiled functions. Inclusive execution time per routine on one rank Tracing In addition to profiling, TAU can automatically instrument the code to do tracing without user intervention (unlike MPE, which requires manual insertion of tracing code). This depends on PDT (in particular, on its C/C++ parser), which is why we used a makefile ending with pdt in the previous section. However, tracing is disabled by default, so we enable it first: % export TAU_TRACE=1 Jumpshot main window (Note that this disables profiling, i.e. no profile.* files will be generated.) Now the program can be executed normally (with mpirun in this case), which will generate many.edf and.trc files. These must then be merged as follows: % tau_treemerge.pl % tau2slog2 tau.trc tau.edf -o tau.slog2 7 of :51
8 All that is left is to visualizer the tracing data (tau.slog2) with Jumpshot (or another such tool), as TAU does not have a tracing visualizer. Notice on the legend window that all functions have been automatically instrumented. Ring Jumpshot legend window ParaProf plot for time spent per function/per rank This toy example implements a ring whose elements (ranks) asynchronously send their successor a buffer whose size is dependent on their rank (as well as the amount of work needed to prepare it). The image to the left, which is a three-dimensional variation of the original 2D profiler plotter (on the top of the page). It can be accessed by navigating to Window->3D Visualization. The menu on the right has a few more plot configurations; of note is the scatter plot option. This buffer size dependence on the originating rank can also be seen in the communication matrix, which may be enabled (before run-time) by setting the environment variable TAU_COMM_MATRIX: % export TAU_COMM_MATRIX=1 3D Communication matrix From this graph it should be clear that the amount of communication between a node and its successor is increasing approximately in proportion to the square of the originating rank (the actual power is ). NWChem Here we focus on the scatter plot option on ParaProf. The idea of this viewer is to help developers identify groups of functions whose running time are tightly related. In the case of NWChem ( /Main_Page), this can be used to help develop a time model for the (new) static load balancing mechanism. The horizontal axis of this 3D scatter plot shows the exclusive execution time of the two dominant kernels in NWChem coupled cluster simulations (DGEMM and TCE_SORT). The vertical axis shows the time spent on dynamic load balancing. We see that smaller tasks which take less time in the kernels take more time 8 of :51
9 dynamic load balancing. Also, we see that the reason this occurs is possibly because of the size of the input data for the kernels - the red colored points correspond to large operations that take more time in the GA_Accumulate() operation compared to the green points. [4] ( /index.php/tau_by_example#endnote_citeozog) Hardware counters Clustering view for NWChem Like HPCToolkit, TAU can be built with PAPI support, which adds support for profiling branching and cache access patterns, time stalled waiting for resources (such as in memory reads), etc. The only change required in the configuration phase above is the addition of the argument -papi=, followed by the folder where PAPI was installed (say, /), e.g.: %./configure -mpilibrary='-lmpich -lmpl -lopa' -papi=/opt/papi (The same comment above about MPICH2 and --enable-shared applies here.) This will produce a new TAU makefile, Makefile.tau-papi-mpi-pdt, which should be used in the export TAU_MAKEFILE= commands above. Up to 25 counters/events can then be recorded by exporting the environment variables COUNTER1 through COUNTER25, as follows: % export COUNTER1=PAPI_TOT_CYC % export COUNTER2=PAPI_FML_INS % export COUNTER3=PAPI_FMA_INS Compilation and execution then proceeds exactly as usual. Instead of producing a set of profile.* files, TAU will generate one folder for each counter: % ls MULTI PAPI_TOT_CYC MULTI PAPI_FML_INS MULTI PAPI_FMA_INS... Paraprof can then be used to visualize the recorded metrics, as on the left, under the Windows/Thread submenu. For a simple use case, see the corresponding section on HPCToolkit. Notes 9 of :51
10 Counter statistics for HPCToolkit's matrix multiply example ^ Referred to as "binary rewriting" in TAU's manual. ^ The first two numbers are the dimensions for every rank's rectangle; the next two are the number of rectangles in the x and y dimensions (their product should be the number of ranks); the last is the number of initial perturbations in the wave (i.e. the circles in the image above). ^ To see the modifications made by the TAU script, you can add -optkeepfiles to the TAU_OPTIONS environment variable. The instrumented code will be written to PROGNAME.inst.c if PROGNAME is the executable's name, and similarly for C++. ^ Result from David Ozog, a PhD student from MCS/UOregon. External links Main TAU website ( TAU Wiki ( Original wave2d example ( /charm++/wave2d/) Retrieved from " oldid=2593" This page was last modified on 2 November 2012, at 09:39. This page has been accessed 2,985 times. 10 of :51
TAU 2.19 Quick Reference
What is TAU? The TAU Performance System is a portable profiling and tracing toolkit for performance analysis of parallel programs written in Fortran, C, C++, Java, Python. It comprises 3 main units: Instrumentation,
More informationMPI Performance Analysis TAU: Tuning and Analysis Utilities
MPI Performance Analysis TAU: Tuning and Analysis Utilities Berk ONAT İTÜ Bilişim Enstitüsü 19 Haziran 2012 Outline TAU Parallel Performance System Hands on: How to use TAU Tools of TAU Analysis and Visualiza
More informationProfiling with TAU. Le Yan. 6/6/2012 LONI Parallel Programming Workshop
Profiling with TAU Le Yan 6/6/2012 LONI Parallel Programming Workshop 2012 1 Three Steps of Code Development Debugging Make sure the code runs and yields correct results Profiling Analyze the code to identify
More informationProfiling with TAU. Le Yan. User Services LSU 2/15/2012
Profiling with TAU Le Yan User Services HPC @ LSU Feb 13-16, 2012 1 Three Steps of Code Development Debugging Make sure the code runs and yields correct results Profiling Analyze the code to identify performance
More informationPerformance Analysis of Parallel Scientific Applications In Eclipse
Performance Analysis of Parallel Scientific Applications In Eclipse EclipseCon 2015 Wyatt Spear, University of Oregon wspear@cs.uoregon.edu Supercomputing Big systems solving big problems Performance gains
More informationPerformance Tools. Tulin Kaman. Department of Applied Mathematics and Statistics
Performance Tools Tulin Kaman Department of Applied Mathematics and Statistics Stony Brook/BNL New York Center for Computational Science tkaman@ams.sunysb.edu Aug 23, 2012 Do you have information on exactly
More informationPLASMA TAU Guide. Parallel Linear Algebra Software for Multicore Architectures Version 2.0
PLASMA TAU Guide Parallel Linear Algebra Software for Multicore Architectures Version 2.0 Electrical Engineering and Computer Science University of Tennessee Electrical Engineering and Computer Science
More informationTau Introduction. Lars Koesterke (& Kent Milfeld, Sameer Shende) Cornell University Ithaca, NY. March 13, 2009
Tau Introduction Lars Koesterke (& Kent Milfeld, Sameer Shende) Cornell University Ithaca, NY March 13, 2009 General Outline Measurements Instrumentation & Control Example: matmult Profiling and Tracing
More informationScore-P. SC 14: Hands-on Practical Hybrid Parallel Application Performance Engineering 1
Score-P SC 14: Hands-on Practical Hybrid Parallel Application Performance Engineering 1 Score-P Functionality Score-P is a joint instrumentation and measurement system for a number of PA tools. Provide
More informationTAU Performance System. Sameer Shende Performance Research Lab, University of Oregon
TAU Performance System Sameer Shende Performance Research Lab, University of Oregon http://tau.uoregon.edu TAU Performance System (http://tau.uoregon.edu) Parallel performance framework and toolkit Supports
More informationCSE. Parallel Algorithms on a cluster of PCs. Ian Bush. Daresbury Laboratory (With thanks to Lorna Smith and Mark Bull at EPCC)
Parallel Algorithms on a cluster of PCs Ian Bush Daresbury Laboratory I.J.Bush@dl.ac.uk (With thanks to Lorna Smith and Mark Bull at EPCC) Overview This lecture will cover General Message passing concepts
More informationHolland Computing Center Kickstart MPI Intro
Holland Computing Center Kickstart 2016 MPI Intro Message Passing Interface (MPI) MPI is a specification for message passing library that is standardized by MPI Forum Multiple vendor-specific implementations:
More informationTAU PERFORMANCE SYSTEM
TAU PERFORMANCE SYSTEM Sameer Shende Chee Wai Lee, Wyatt Spear, Scott Biersdorff, Suzanne Millstein Performance Research Lab Allen D. Malony, Nick Chaimov, William Voorhees Department of Computer and Information
More informationTAU: Performance Measurement Technology
TAU: Performance Measurement Technology Performance Research Lab University of Oregon Allen Malony, Sameer Shende, Kevin Huck {malony, sameer, khuck} @cs.uoregon.edu P R L Performance Research Lab TAU
More informationMPI Performance Tools
Physics 244 31 May 2012 Outline 1 Introduction 2 Timing functions: MPI Wtime,etime,gettimeofday 3 Profiling tools time: gprof,tau hardware counters: PAPI,PerfSuite,TAU MPI communication: IPM,TAU 4 MPI
More informationSupercomputing in Plain English Exercise #6: MPI Point to Point
Supercomputing in Plain English Exercise #6: MPI Point to Point In this exercise, we ll use the same conventions and commands as in Exercises #1, #2, #3, #4 and #5. You should refer back to the Exercise
More informationStudying the behavior of parallel applications and identifying bottlenecks by using performance analysis tools
Studying the behavior of parallel applications and identifying bottlenecks by using performance analysis tools G. Markomanolis INRIA, LIP, ENS de Lyon Ecole IN2P3 d informatique "Méthodologie et outils
More informationPerformance Evalua/on using TAU Performance System for Scien/fic So:ware
Performance Evalua/on using TAU Performance System for Scien/fic So:ware Sameer Shende and Allen D. Malony University of Oregon WSSSPE4 h5p://tau.uoregon.edu TAU Performance System Tuning and Analysis
More informationProf. Thomas Sterling
High Performance Computing: Concepts, Methods & Means Performance 3 : Measurement Prof. Thomas Sterling Department of Computer Science Louisiana i State t University it February 27 th, 2007 Term Projects
More informationImproving Applica/on Performance Using the TAU Performance System
Improving Applica/on Performance Using the TAU Performance System Sameer Shende, John C. Linford {sameer, jlinford}@paratools.com ParaTools, Inc and University of Oregon. April 4-5, 2013, CG1, NCAR, UCAR
More informationIntroduction to MPI. Ekpe Okorafor. School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014
Introduction to MPI Ekpe Okorafor School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014 Topics Introduction MPI Model and Basic Calls MPI Communication Summary 2 Topics Introduction
More information30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy
Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy Why serial is not enough Computing architectures Parallel paradigms Message Passing Interface How
More informationMessage Passing Interface
MPSoC Architectures MPI Alberto Bosio, Associate Professor UM Microelectronic Departement bosio@lirmm.fr Message Passing Interface API for distributed-memory programming parallel code that runs across
More informationncsa eclipse internal training
ncsa eclipse internal training This tutorial will cover the basic setup and use of Eclipse with forge.ncsa.illinois.edu. At the end of the tutorial, you should be comfortable with the following tasks:
More informationCS 426. Building and Running a Parallel Application
CS 426 Building and Running a Parallel Application 1 Task/Channel Model Design Efficient Parallel Programs (or Algorithms) Mainly for distributed memory systems (e.g. Clusters) Break Parallel Computations
More informationHPC Parallel Programing Multi-node Computation with MPI - I
HPC Parallel Programing Multi-node Computation with MPI - I Parallelization and Optimization Group TATA Consultancy Services, Sahyadri Park Pune, India TCS all rights reserved April 29, 2013 Copyright
More informationPerformance Tool Workflows
Performance Tool Workflows Wyatt Spear, Allen Malony, Alan Morris, and Sameer Shende Performance Research Laboritory Department of Computer and Information Science University of Oregon, Eugene OR 97403,
More informationMULTI GPU PROGRAMMING WITH MPI AND OPENACC JIRI KRAUS, NVIDIA
MULTI GPU PROGRAMMING WITH MPI AND OPENACC JIRI KRAUS, NVIDIA MPI+OPENACC GDDR5 Memory System Memory GDDR5 Memory System Memory GDDR5 Memory System Memory GPU CPU GPU CPU GPU CPU PCI-e PCI-e PCI-e Network
More informationECE 574 Cluster Computing Lecture 13
ECE 574 Cluster Computing Lecture 13 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 21 March 2017 Announcements HW#5 Finally Graded Had right idea, but often result not an *exact*
More informationMPI 1. CSCI 4850/5850 High-Performance Computing Spring 2018
MPI 1 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives
More informationHigh Performance Computing Course Notes Message Passing Programming I
High Performance Computing Course Notes 2008-2009 2009 Message Passing Programming I Message Passing Programming Message Passing is the most widely used parallel programming model Message passing works
More informationBasic Communication Operations (Chapter 4)
Basic Communication Operations (Chapter 4) Vivek Sarkar Department of Computer Science Rice University vsarkar@cs.rice.edu COMP 422 Lecture 17 13 March 2008 Review of Midterm Exam Outline MPI Example Program:
More informationMPI and CUDA. Filippo Spiga, HPCS, University of Cambridge.
MPI and CUDA Filippo Spiga, HPCS, University of Cambridge Outline Basic principle of MPI Mixing MPI and CUDA 1 st example : parallel GPU detect 2 nd example: heat2d CUDA- aware MPI, how
More informationIntroduction to the Message Passing Interface (MPI)
Introduction to the Message Passing Interface (MPI) CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Introduction to the Message Passing Interface (MPI) Spring 2018
More informationDetection and Analysis of Iterative Behavior in Parallel Applications
Detection and Analysis of Iterative Behavior in Parallel Applications Karl Fürlinger and Shirley Moore Innovative Computing Laboratory, Department of Electrical Engineering and Computer Science, University
More informationScalability Improvements in the TAU Performance System for Extreme Scale
Scalability Improvements in the TAU Performance System for Extreme Scale Sameer Shende Director, Performance Research Laboratory, University of Oregon TGCC, CEA / DAM Île de France Bruyères- le- Châtel,
More informationCS 6230: High-Performance Computing and Parallelization Introduction to MPI
CS 6230: High-Performance Computing and Parallelization Introduction to MPI Dr. Mike Kirby School of Computing and Scientific Computing and Imaging Institute University of Utah Salt Lake City, UT, USA
More informationTAU PERFORMANCE SYSTEM
TAU PERFORMANCE SYSTEM Sameer Shende Alan Morris, Wyatt Spear, Scott Biersdorff Performance Research Lab Allen D. Malony, Kevin Huck, Aroon Nataraj Department of Computer and Information Science University
More informationProfiling and debugging. Carlos Rosales September 18 th 2009 Texas Advanced Computing Center The University of Texas at Austin
Profiling and debugging Carlos Rosales carlos@tacc.utexas.edu September 18 th 2009 Texas Advanced Computing Center The University of Texas at Austin Outline Debugging Profiling GDB DDT Basic use Attaching
More informationDistributed Memory Parallel Programming
COSC Big Data Analytics Parallel Programming using MPI Edgar Gabriel Spring 201 Distributed Memory Parallel Programming Vast majority of clusters are homogeneous Necessitated by the complexity of maintaining
More informationTAU PERFORMANCE SYSTEM. Wyatt Spear Sameer Shende, Allen D. Malony University of Oregon
TAU PERFORMANCE SYSTEM Wyatt Spear Sameer Shende, Allen D. Malony University of Oregon 1 What Is Tau? Tuning and Analysis Utilities (15+ year project) Performance problem solving framework for HPC Integrated
More informationScalasca performance properties The metrics tour
Scalasca performance properties The metrics tour Markus Geimer m.geimer@fz-juelich.de Scalasca analysis result Generic metrics Generic metrics Time Total CPU allocation time Execution Overhead Visits Hardware
More informationJURECA Tuning for the platform
JURECA Tuning for the platform Usage of ParaStation MPI 2017-11-23 Outline ParaStation MPI Compiling your program Running your program Tuning parameters Resources 2 ParaStation MPI Based on MPICH (3.2)
More informationDebugging / Profiling
The Center for Astrophysical Thermonuclear Flashes Debugging / Profiling Chris Daley 23 rd June An Advanced Simulation & Computing (ASC) Academic Strategic Alliances Program (ASAP) Center at Motivation
More informationThe MPI Message-passing Standard Lab Time Hands-on. SPD Course 11/03/2014 Massimo Coppola
The MPI Message-passing Standard Lab Time Hands-on SPD Course 11/03/2014 Massimo Coppola What was expected so far Prepare for the lab sessions Install a version of MPI which works on your O.S. OpenMPI
More informationOutline. Communication modes MPI Message Passing Interface Standard. Khoa Coâng Ngheä Thoâng Tin Ñaïi Hoïc Baùch Khoa Tp.HCM
THOAI NAM Outline Communication modes MPI Message Passing Interface Standard TERMs (1) Blocking If return from the procedure indicates the user is allowed to reuse resources specified in the call Non-blocking
More informationA few words about MPI (Message Passing Interface) T. Edwald 10 June 2008
A few words about MPI (Message Passing Interface) T. Edwald 10 June 2008 1 Overview Introduction and very short historical review MPI - as simple as it comes Communications Process Topologies (I have no
More informationProf. Thomas Sterling
High Performance Computing: Concepts, Methods & Means Performance Measurement 1 Prof. Thomas Sterling Department of Computer Science Louisiana i State t University it February 13 th, 2007 News Alert! Intel
More informationProgramming Scalable Systems with MPI. Clemens Grelck, University of Amsterdam
Clemens Grelck University of Amsterdam UvA / SurfSARA High Performance Computing and Big Data Course June 2014 Parallel Programming with Compiler Directives: OpenMP Message Passing Gentle Introduction
More informationTAU PERFORMANCE SYSTEM
rhtjhtyhy Please download slides from: http://tau.uoregon.edu/ecp17.pdf TAU PERFORMANCE SYSTEM SAMEER SHENDE Director, Performance Research Lab, University of Oregon ParaTools, Inc. ECP Webinar: November
More informationPractical Scientific Computing: Performanceoptimized
Practical Scientific Computing: Performanceoptimized Programming Programming with MPI November 29, 2006 Dr. Ralf-Peter Mundani Department of Computer Science Chair V Technische Universität München, Germany
More informationScore-P A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir
Score-P A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir VI-HPS Team Score-P: Specialized Measurements and Analyses Mastering build systems Hooking up the
More informationPart One: The Files. C MPI Slurm Tutorial - Hello World. Introduction. Hello World! hello.tar. The files, summary. Output Files, summary
C MPI Slurm Tutorial - Hello World Introduction The example shown here demonstrates the use of the Slurm Scheduler for the purpose of running a C/MPI program. Knowledge of C is assumed. Having read the
More informationSCALASCA v1.0 Quick Reference
General SCALASCA is an open-source toolset for scalable performance analysis of large-scale parallel applications. Use the scalasca command with appropriate action flags to instrument application object
More informationVAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW
VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW 8th VI-HPS Tuning Workshop at RWTH Aachen September, 2011 Tobias Hilbrich and Joachim Protze Slides by: Andreas Knüpfer, Jens Doleschal, ZIH, Technische Universität
More informationPerformance Engineering: Lab Session
PDC Summer School 2018 Performance Engineering: Lab Session We are going to work with a sample toy application that solves the 2D heat equation (see the Appendix in this handout). You can download that
More informationTutorial: Analyzing MPI Applications. Intel Trace Analyzer and Collector Intel VTune Amplifier XE
Tutorial: Analyzing MPI Applications Intel Trace Analyzer and Collector Intel VTune Amplifier XE Contents Legal Information... 3 1. Overview... 4 1.1. Prerequisites... 5 1.1.1. Required Software... 5 1.1.2.
More informationMPI. (message passing, MIMD)
MPI (message passing, MIMD) What is MPI? a message-passing library specification extension of C/C++ (and Fortran) message passing for distributed memory parallel programming Features of MPI Point-to-point
More informationPerformance Analysis of Parallel Applications Using LTTng & Trace Compass
Performance Analysis of Parallel Applications Using LTTng & Trace Compass Naser Ezzati DORSAL LAB Progress Report Meeting Polytechnique Montreal Dec 2017 What is MPI? Message Passing Interface (MPI) Industry-wide
More informationImproving Application Performance Using the TAU Performance System
Improving Application Performance Using the TAU Performance System Wyatt Spear wspear@cs.uoregon.edu University of Oregon. April 9, 2013, CG1, NCAR, UCAR Download slides from: http://nic.uoregon.edu/~wspear/tau_sea14_tutorial.pdf
More informationLecture 7: Distributed memory
Lecture 7: Distributed memory David Bindel 15 Feb 2010 Logistics HW 1 due Wednesday: See wiki for notes on: Bottom-up strategy and debugging Matrix allocation issues Using SSE and alignment comments Timing
More informationParallelism V. HPC Profiling. John Cavazos. Dept of Computer & Information Sciences University of Delaware
Parallelism V HPC Profiling John Cavazos Dept of Computer & Information Sciences University of Delaware Lecture Overview Performance Counters Profiling PAPI TAU HPCToolkit PerfExpert Performance Counters
More informationCS4961 Parallel Programming. Lecture 16: Introduction to Message Passing 11/3/11. Administrative. Mary Hall November 3, 2011.
CS4961 Parallel Programming Lecture 16: Introduction to Message Passing Administrative Next programming assignment due on Monday, Nov. 7 at midnight Need to define teams and have initial conversation with
More informationCOSC 6374 Parallel Computation. Message Passing Interface (MPI ) I Introduction. Distributed memory machines
Network card Network card 1 COSC 6374 Parallel Computation Message Passing Interface (MPI ) I Introduction Edgar Gabriel Fall 015 Distributed memory machines Each compute node represents an independent
More informationECE 574 Cluster Computing Lecture 13
ECE 574 Cluster Computing Lecture 13 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 15 October 2015 Announcements Homework #3 and #4 Grades out soon Homework #5 will be posted
More informationNo Time to Read This Book?
Chapter 1 No Time to Read This Book? We know what it feels like to be under pressure. Try out a few quick and proven optimization stunts described below. They may provide a good enough performance gain
More informationOverview Interactive Data Language Design of parallel IDL on a grid Design of IDL clients for Web/Grid Service Status Conclusions
GRIDL: High-Performance and Distributed Interactive Data Language Svetlana Shasharina, Ovsei Volberg, Peter Stoltz and Seth Veitzer Tech-X Corporation HPDC 2005, July 25, 2005 Poster Overview Interactive
More informationParallel Performance Analysis Using the Paraver Toolkit
Parallel Performance Analysis Using the Paraver Toolkit Parallel Performance Analysis Using the Paraver Toolkit [16a] [16a] Slide 1 University of Stuttgart High-Performance Computing Center Stuttgart (HLRS)
More informationDistributed Memory Programming with Message-Passing
Distributed Memory Programming with Message-Passing Pacheco s book Chapter 3 T. Yang, CS240A Part of slides from the text book and B. Gropp Outline An overview of MPI programming Six MPI functions and
More informationAnalyzing the Performance of IWAVE on a Cluster using HPCToolkit
Analyzing the Performance of IWAVE on a Cluster using HPCToolkit John Mellor-Crummey and Laksono Adhianto Department of Computer Science Rice University {johnmc,laksono}@rice.edu TRIP Meeting March 30,
More informationMPI Runtime Error Detection with MUST
MPI Runtime Error Detection with MUST At the 27th VI-HPS Tuning Workshop Joachim Protze IT Center RWTH Aachen University April 2018 How many issues can you spot in this tiny example? #include #include
More informationAssignment 5 Using Paraguin to Create Parallel Programs
Overview Assignment 5 Using Paraguin to Create Parallel Programs C. Ferner andb. Wilkinson October 15, 2014 The goal of this assignment is to use the Paraguin compiler to create parallel solutions using
More informationInstrumentation. BSC Performance Tools
Instrumentation BSC Performance Tools Index The instrumentation process A typical MN process Paraver trace format Configuration XML Environment variables Adding references to the source API CEPBA-Tools
More informationPPCES 2016: MPI Lab March 2016 Hristo Iliev, Portions thanks to: Christian Iwainsky, Sandra Wienke
PPCES 2016: MPI Lab 16 17 March 2016 Hristo Iliev, iliev@itc.rwth-aachen.de Portions thanks to: Christian Iwainsky, Sandra Wienke Synopsis The purpose of this hands-on lab is to make you familiar with
More informationTutorial on MPI: part I
Workshop on High Performance Computing (HPC08) School of Physics, IPM February 16-21, 2008 Tutorial on MPI: part I Stefano Cozzini CNR/INFM Democritos and SISSA/eLab Agenda first part WRAP UP of the yesterday's
More informationMPI Tutorial. Shao-Ching Huang. High Performance Computing Group UCLA Institute for Digital Research and Education
MPI Tutorial Shao-Ching Huang High Performance Computing Group UCLA Institute for Digital Research and Education Center for Vision, Cognition, Learning and Art, UCLA July 15 22, 2013 A few words before
More informationTo connect to the cluster, simply use a SSH or SFTP client to connect to:
RIT Computer Engineering Cluster The RIT Computer Engineering cluster contains 12 computers for parallel programming using MPI. One computer, phoenix.ce.rit.edu, serves as the master controller or head
More informationCOSC 4397 Parallel Computation. Debugging and Performance Analysis of Parallel MPI Applications
COSC 4397 Parallel Computation Debugging and Performance Analysis of Parallel MPI Applications Edgar Gabriel Spring 2006 Edgar Gabriel Debugging sequential applications Several ways how to debug a sequential
More informationIntroduction to parallel computing concepts and technics
Introduction to parallel computing concepts and technics Paschalis Korosoglou (support@grid.auth.gr) User and Application Support Unit Scientific Computing Center @ AUTH Overview of Parallel computing
More informationIntroduction to MPI. Branislav Jansík
Introduction to MPI Branislav Jansík Resources https://computing.llnl.gov/tutorials/mpi/ http://www.mpi-forum.org/ https://www.open-mpi.org/doc/ Serial What is parallel computing Parallel What is MPI?
More informationMPI: Parallel Programming for Extreme Machines. Si Hammond, High Performance Systems Group
MPI: Parallel Programming for Extreme Machines Si Hammond, High Performance Systems Group Quick Introduction Si Hammond, (sdh@dcs.warwick.ac.uk) WPRF/PhD Research student, High Performance Systems Group,
More informationCS 179: GPU Programming. Lecture 14: Inter-process Communication
CS 179: GPU Programming Lecture 14: Inter-process Communication The Problem What if we want to use GPUs across a distributed system? GPU cluster, CSIRO Distributed System A collection of computers Each
More informationEvaluating Performance Via Profiling
Performance Engineering of Software Systems September 21, 2010 Massachusetts Institute of Technology 6.172 Professors Saman Amarasinghe and Charles E. Leiserson Handout 6 Profiling Project 2-1 Evaluating
More informationParallel Short Course. Distributed memory machines
Parallel Short Course Message Passing Interface (MPI ) I Introduction and Point-to-point operations Spring 2007 Distributed memory machines local disks Memory Network card 1 Compute node message passing
More informationPractical Introduction to Message-Passing Interface (MPI)
1 Practical Introduction to Message-Passing Interface (MPI) October 1st, 2015 By: Pier-Luc St-Onge Partners and Sponsors 2 Setup for the workshop 1. Get a user ID and password paper (provided in class):
More informationHPM Hardware Performance Monitor for Bluegene/Q
HPM Hardware Performance Monitor for Bluegene/Q PRASHOBH BALASUNDARAM I-HSIN CHUNG KRIS DAVIS JOHN H MAGERLEIN The Hardware performance monitor (HPM) is a component of IBM high performance computing toolkit.
More informationCS 470 Spring Mike Lam, Professor. Distributed Programming & MPI
CS 470 Spring 2018 Mike Lam, Professor Distributed Programming & MPI MPI paradigm Single program, multiple data (SPMD) One program, multiple processes (ranks) Processes communicate via messages An MPI
More informationCS4961 Parallel Programming. Lecture 18: Introduction to Message Passing 11/3/10. Final Project Purpose: Mary Hall November 2, 2010.
Parallel Programming Lecture 18: Introduction to Message Passing Mary Hall November 2, 2010 Final Project Purpose: - A chance to dig in deeper into a parallel programming model and explore concepts. -
More informationIntroduction to Parallel Programming
Introduction to Parallel Programming Overview Parallel programming allows the user to use multiple cpus concurrently Reasons for parallel execution: shorten execution time by spreading the computational
More informationPerformance analysis basics
Performance analysis basics Christian Iwainsky Iwainsky@rz.rwth-aachen.de 25.3.2010 1 Overview 1. Motivation 2. Performance analysis basics 3. Measurement Techniques 2 Why bother with performance analysis
More informationSzámítogépes modellezés labor (MSc)
Számítogépes modellezés labor (MSc) Running Simulations on Supercomputers Gábor Rácz Physics of Complex Systems Department Eötvös Loránd University, Budapest September 19, 2018, Budapest, Hungary Outline
More informationHPC Fall 2007 Project 3 2D Steady-State Heat Distribution Problem with MPI
HPC Fall 2007 Project 3 2D Steady-State Heat Distribution Problem with MPI Robert van Engelen Due date: December 14, 2007 1 Introduction 1.1 Account and Login Information For this assignment you need an
More informationChip Multiprocessors COMP Lecture 9 - OpenMP & MPI
Chip Multiprocessors COMP35112 Lecture 9 - OpenMP & MPI Graham Riley 14 February 2018 1 Today s Lecture Dividing work to be done in parallel between threads in Java (as you are doing in the labs) is rather
More informationMPI Mechanic. December Provided by ClusterWorld for Jeff Squyres cw.squyres.com.
December 2003 Provided by ClusterWorld for Jeff Squyres cw.squyres.com www.clusterworld.com Copyright 2004 ClusterWorld, All Rights Reserved For individual private use only. Not to be reproduced or distributed
More informationIntegrated Tool Capabilities for Performance Instrumentation and Measurement
Integrated Tool Capabilities for Performance Instrumentation and Measurement Sameer Shende, Allen Malony Department of Computer and Information Science University of Oregon sameer@cs.uoregon.edu, malony@cs.uoregon.edu
More informationIntel Parallel Studio XE Cluster Edition - Intel MPI - Intel Traceanalyzer & Collector
Intel Parallel Studio XE Cluster Edition - Intel MPI - Intel Traceanalyzer & Collector A brief Introduction to MPI 2 What is MPI? Message Passing Interface Explicit parallel model All parallelism is explicit:
More informationThe MPI Message-passing Standard Lab Time Hands-on. SPD Course Massimo Coppola
The MPI Message-passing Standard Lab Time Hands-on SPD Course 2016-2017 Massimo Coppola Remember! Simplest programs do not need much beyond Send and Recv, still... Each process lives in a separate memory
More informationMPI MPI. Linux. Linux. Message Passing Interface. Message Passing Interface. August 14, August 14, 2007 MPICH. MPI MPI Send Recv MPI
Linux MPI Linux MPI Message Passing Interface Linux MPI Linux MPI Message Passing Interface MPI MPICH MPI Department of Science and Engineering Computing School of Mathematics School Peking University
More informationIntroduction to Parallel Performance Engineering
Introduction to Parallel Performance Engineering Markus Geimer, Brian Wylie Jülich Supercomputing Centre (with content used with permission from tutorials by Bernd Mohr/JSC and Luiz DeRose/Cray) Performance:
More informationCS 470 Spring Mike Lam, Professor. Distributed Programming & MPI
CS 470 Spring 2017 Mike Lam, Professor Distributed Programming & MPI MPI paradigm Single program, multiple data (SPMD) One program, multiple processes (ranks) Processes communicate via messages An MPI
More information