Intel Parallel Studio XE Cluster Edition - Intel MPI - Intel Traceanalyzer & Collector
|
|
- Betty Johnson
- 6 years ago
- Views:
Transcription
1 Intel Parallel Studio XE Cluster Edition - Intel MPI - Intel Traceanalyzer & Collector
2 A brief Introduction to MPI 2
3 What is MPI? Message Passing Interface Explicit parallel model All parallelism is explicit: the programmer is responsible for correctly identifying parallelism and implementing parallel algorithms using MPI constructs For parallel computers, clusters, heterogeneous networks and accelerators like Intel MIC architecture Designed as a standard to provide access to advanced parallel hardware for End users Library writers Tool developers Communication is done between MPI ranks typically implemented as operating system processes 3
4 MPI Standard Standard maintained by open forum Intel is one of the founders in 1992 and is still very actively engaged Version 1.0, (1994), 2 ( 2000), 2.1 ( 2008), 2.2 ( 2009) Version 3 released 2012 is latest No all implementation support this yet A message-passing library specification extended message-passing model not a language or compiler specification not a specific implementation or product 4
5 Notes on C and Fortran C and Fortran bindings correspond closely In C: mpi.h must be #included MPI functions return error codes or MPI_SUCCESS In Fortran: mpif.h must be included, or use MPI module (MPI-2) All MPI calls are to subroutines, with a place for the return code in the last argument. C++ bindings, and Fortran-90 issues, are part of MPI-2; MPI-3 introduces Fortran2008 interface 5
6 A first MPI program #include "mpi.h" #include <stdio.h> int main( int argc, char *argv[] ) { int rank, size; MPI_Init( &argc, &argv ); MPI_Comm_rank( MPI_COMM_WORLD, &rank ); MPI_Comm_size( MPI_COMM_WORLD, &size ); printf( "I am %d of %d\n", rank, size ); MPI_Finalize(); return 0; } MPI_COMM_WORLD is the default communicator whose group contains all processes initially. 6
7 Point-To_Point Communication MPI_SEND (start, count, datatype, dest, tag, comm) MPI_RECV(start, count, datatype, source, tag, comm, status) Messages are sent with an accompanying user-defined integer tag, to assist the receiving process in identifying the message. Messages can be screened at the receiving end by specifying a specific tag, or not screened by specifying MPI_ANY_TAG as the tag in a receive. MPI_SEND and MPI_RECV are blocking - there are non-blocking versions too ( ISEND, IRECEIVE) The 6 function introduced up to now ( Init, Finalize, Comm_rank, Comm_size, Send, Receive ) is all needed for many numerical programs but there is a lot more like MPI collective operations, 7
8 MPI Collective Routines Many Routines: MPI_ALLGATHER, MPI_ALLGATHERV, MPI_ALLREDUCE, MPI_ALLTOALL, MPI_ALLTOALLV, MPI_BCAST, MPI_GATHER, MPI_GATHERV, MPI_REDUCE, MPI_REDUCESCATTER, MPI_SCAN, MPI_SCATTER, MPI_SCATTERV, Collective operations are called by all processes in a communicator. All versions deliver results to all participating processes V versions (stands for vector) allow the hunks to have different sizes MPI_ALLREDUCE, MPI_REDUCE, MPI_REDUCESCATTER, and MPI_SCAN take both built-in ( like MPI_SUM, MPI_MAX) and userdefined combiner functions 8
9 Extending MPI Here MPI 2 Dynamic Process Management Dynamic process startup Dynamic establishment of connections One-sided communication Put/get Other operations Parallel I/O Other MPI-2 features Generalized requests Bindings for C++/ Fortran-90; inter-language topics 9
10 MPI-3 Planned and Added Features Topic Motivation Main Result Intel MPI 5.0 Support? Collective Operations Collective performance Non-blocking & sparse collectives Remote Memory Access Backward Compatibility Language Bindings Cache coherence, PGAS support Buffers > 2GB Supported ABI for latest C++ and Fortran Fast RMA Large buffer support, const buffers Fortran 2008 binding, removed C++ binding Tools Support PMPI limitations MPIT interface ( a very little bit was added) Hybrid Programming Core count growth MPI_Mprobe, shared memory windows Fault Tolerance Node count growth None. Next time? N/A Slide courtesy of A. Supalov 10
11 Overview Intel MPI Library 11
12 Intel MPI Library Intel MPI Library is derived from MPICH2; latest version is 5.0 Reasons to use Intel MPI Many additional features make Intel MPI Library more user friendly compared to other implementations correctness checking, statistics, Trace Analayzer support, Intel MPI Library provides top performance E.g. extensive performance tuning on key algorithms such as collective operations MPITUNE tool for automatic selection of best algorithms and settings Scalability to up to 150K ranks Available for Linux* and Windows* Professional support 12
13 Setting the environment Use this handy script to define all necessary paths: $ source /shared/intel/impi_5.0.1/bin64/mpivars.sh or $ module load Intel_MPI No additional paths to binaries and libs have to be specified Recommended: If Intel MPI is the only MPI you will use just include the above into your.bashrc 13
14 Intro to Intel MPI Library Compilation A simple test program is part of the Intel MPI Library distribution: $ cp $I_MPI_ROOT/test/test.c. $ mpiicc -o test.x test.c mpiicc is the wrapper script for Intel icc ( C-Compiler) mpicc is the wrapper script for GNU gcc Also available are mpiifort (Intel Fortran Compiler) mpiicpc (Intel C++ Compiler) mpicxx ( GNU g++) 14
15 Intro to Intel MPI Library Execution Intel MPI provides an easy-to-use run script: $ mpirun n <nprocs>./test.x Above works automatically on a single node and clusters with job schedulers present For more nodes we usually need to define a host file with a single node name per line: Unlike years before, there is no need anymore to start a daemon $ mpirun -f <host file> -n <nprocs>./test.x like legacy mpd since the MPICH-Hydra process management is used 15
16 Intro to Intel MPI Library Execution The test program prints out rank and hostname for each MPI process More debug information available by setting: $ export I_MPI_DEBUG=5 Will be propagated to all ranks automatically Prints basic settings of the Intel MPI Library 16
17 Output of test Program 17
18 Simple process placement using the Intel MPI Library Default pinning scheme: cores, sockets and nodes Easiest way to override default behavior is to use the process per node flag: $ mpirun ppn <nprocs-per-node> n <nprocs>./test.x if <nprocs-per-node> == 1, round robin with next process on next node is used 18
19 Intel MPI Library: ppn = 1 19
20 Overview Intel TraceAnalyzer and Collector 20
21 Intel Trace Analyzer and Collector (ITAC) A tool for understanding MPI program behavior, finding bottlenecks, performance analysis and MPI-correctness checking More than a profiler: Visualizes temporal behavior of MPI routines Shows dependencies and load imbalances Includes a correctness checking library Easy to use. Invoke via: Setting an extra flag to mpirun/mpiexec Setting an environment variable without changing your application or your run scripts 21
22 Intel Trace Analyzer and Collector ITAC may be applied without touching the program or environment. One way to get a first trace is: $ mpirun trace n <nprocs>./test.x Alternatively, just set the preload library and run without the trace flag: $ export LD_PRELOAD=libVT.so $ mpirun f <hostfile> -n <nprocs>./test.x this is actually what the flag does internally. This methodology may be applied to situations with complex run scripts not knowing where mpirun is actually executed. Note: this does not work for statically linked Intel MPI (not recommended). 22
23 Viewing the trace file ITAC will generate several files inside the directory where you started mpirun. Just start traceanalyzer in this directory: $ traceanalyzer test.x.stf Alternatively there is a Windows version of traceanalyzer contained in the Linux ICS package. 23
24 ITAC Function Profile After starting ITAC a window showing a basic timing profile for MPI and Application will be displayed. Right click on the red MPI bar to show the profiling for each used MPI routine: 24
25 ITAC Event Timeline Most important view of ITAC is the Event Timeline. This shows the temporal development of MPI routines and messages: 25
26 ITAC MPI Correctness Checker Correctness Checker validates MPI correctness. It uses another library but may be started like the ordinary ITAC: or $ mpirun check n <nprocs>./test.x $ export LD_PRELOAD=libVTmc.so $ mpirun n <nprocs>./test.x 26
27 Intel VTune Amplifier XE for MPI Intel VTune Analyzer XE provides detailed information timings and core events. It can also provide insight into the behavior of threaded applications: $ source /opt/intel/vtune_amplifier_xe/amplxe-vars.sh $ mpirun n <N> amplxe-cl --result-dir <result dir> --collect <mode> \ -- <MPI executable> Example: hotspots and concurrency are predefined analysis types; $ mpirun n 2 amplxe-cl --result-dir axe_ho -collect hotspots -- concurrency./poisson.x only makes sense with additional threading $ mpirun n 2 amplxe-cl --result-dir axe_co c concurrency --./poisson.x 27
28 Results with Intel VTune Amplifier XE After running the MPI program result directories should appear with the previously defined base name and indexed by MPI rank. Results may be viewed as ASCII output: or by using the Intel Vtune Amplifier GUI: $ amplxe-cl --report hotspots -r axe_ho.0 Results may also be transferred to Windows Laptop and viewed $ amplxe-gui axe_ho.0 by the Windows* version of Intel Vtune Amplifier XE 28
29 Intel Inspector XE for MPI Application Intel Inspector XE offers memory checking and correctness checking for threaded applications. For MPI applications we may use it in the following way: $ source /opt/intel/inspector_xe/inspxe-vars.sh intel64 $ mpirun n <N> inspxe-cl --result-dir <result dir> --collect <mode> \ -- <MPI executable> Example : $ mpirun n 4 inspxe-cl --result-dir insp_mi3 --collect mi3 --./poisson.x $ mpirun n 4 inspxe-cl --result-dir insp_ti3 --collect ti3 --./poisson.x mi3 and ti3 are the most demanding memory and threading modes. 29
30 Results with Intel Inspector XE After running the MPI program result directories should appear with the previously defined base name and indexed with MPI rank. Results may be viewed as ASCII output: $ inspxe-cl --report problems --r insp_mi3.0 or by using the Intel Inspector XE GUI: Results may also be transferred to a Windows* computer and $ inspxe-gui insp_mi3.0 viewed by the Windows* version of Intel Inspector XE 30
31 Advanced Topics: Cluster Exploration Tools 31
32 Cluster Exploration Tools cpuinfo: included in the Intel MPI Library package Debug level: raising the debug level of Intel MPI Library will provide extra information ifconfig etc: Linux tools for showing available network devices Intel MPI Benchmarks (IMB): Collection of timed MPI tests for generic MPI performance evaluation MPITUNE: tuning script for automatic determination of optimal setting. Results can be stored and used on demand. This lecture covers the generic mode using IMB as the Program to be tuned 32
33 Cluster Node Exploration: cpuinfo Shows important features of a node: number of sockets, cores per socket including hyper-threads and caches Part of the Intel MPI Library distribution Reads its data from /proc/cpuinfo and prints it in a more appropriate format 33
34 Cluster Node Exploration: cpuinfo Shows important features of a node: number of sockets, cores per socket including hyper-threads and caches Part of the Intel MPI Library distribution Reads its data from /proc/cpuinfo and prints it in a more appropriate format 34
35 Using Environment Variables Environment variables may be exported inside your shell and automatically propagated to each rank Or, they can be specified on the command line for a single run by: $ mpirun genv I_MPI_DEBUG 4 <program.x> -genv stands for global environment propagated to all nodes It is also possible to define local environments for different nodes: -env defines environment variables locally $ mpirun env OMP_NUM_THREADS 4 n 2 <program1.x> : \ env OMP_NUM_THREADS 2 n 4 <program2.x> 35
36 Cluster Node Exploration: Debug Info Setting the I_MPI_DEBUG environment variable increases the information printed to std_out depending on the non negative integer value specified For example, I_MPI_DEBUG=4 prints information about process pinning, used network interfaces and Intel MPI Library environment variables set by the user Process pinning is the mapping of MPI ranks to hardware resources like cores, sockets, caches etc. Default pinning strategy of Intel MPI Library may depend on version! To increase performance you should control the pinning especially for hybrid programs (pinning domains) 36
37 Cluster Node Exploration: Debug Info Setting the I_MPI_DEBUG environment variable increases the information printed to std_out depending on the non negative integer value specified For example, I_MPI_DEBUG=4 prints information about process pinning, used network interfaces and Intel MPI Library environment variables set by the user Process pinning is the mapping of MPI ranks to hardware resources like cores, sockets, caches etc. Default pinning strategy of Intel MPI Library may depend on version! To increase performance you should control the pinning especially for hybrid programs (pinning domains) 37
38 Cluster Node Exploration: Pinning Pin the ranks to explicit processors using the environment variable as shown below: $ export I_MPI_PIN_PROCESSOR_LIST=p1,p2,p3, rank #n is mapped to logical processor pn besides explicit mapping of ranks to logical processors as shown, you can also use the predefined settings 38
39 I_MPI_PIN_PROCESSOR_LIST=1-8 First rank on socket #0 and core #0 Second rank on socket #1 and core #1 39
40 Cluster Structure Inter Node IB router Inter Socket (QPI) ETH router Intra Socket Head Node: Compile, Edit, Job management Internet 40
41 Three Levels of Communication Speed Communication speed is not homogeneous: Inter node (Infiniband*, Ethernet, etc) Intra node inter sockets (Quick Path Interconnect QPI) Intra socket Two additional levels when using Intel Xeon Phi coprocessor: host Intel Xeon Phi coprocessor communication Inter Intel Xeon Phi coprocessor communication 41
42 Measuring Comm Speed with IMB The most simple benchmark in IMB is called PingPong: data packages of different size are sent from rank 0 to rank 1 and back: $ mpirun n 2 IMB-MPI1 pingpong 42
43 Placing MPI Ranks on a Cluster Process placement on a single node was already discussed The default strategy for mapping MPI ranks on a cluster tries to balance resources (same number of processes on each socket) and to minimize the distance between adjacent ranks A mapping with 2 MPI ranks on different nodes may be enforced by using the flag ppn 1 PPN stands for Processes Per Node Parameter value 1 will place first rank on first node and the second rank on the next node (alternative env. Var.: I_MPI_PERHOST=1) 43
44 Measuring 3 Levels of Comm Speed Inter node communication (e.g. InfiniBand*): $ mpirun ppn 1 n 2 IMB-MPI1 pingpong Intra node inter socket (QPI): $ export I_MPI_PIN_PROCESSOR_LIST=allsocks $ mpirun n 2 IMB-MPI1 pingpong Intra node intra socket (between cores on a processor) $ export I_MPI_PIN_PROCESSOR_LIST=allcores:grain=1 $ mpirun n 2 IMB-MPI1 pingpong 44
45 Multiple PingPongs The default IMB pingpong will just use the first 2 ranks for the pingpong an put all other ranks into a barrier It is possible to do simultaneous pingpongs e.g. 4 pairs: $ mpirun n 8 IMB-MPI multi <x> pingpong with x=0 for average results and x=1 for all results Stretch goal for the Labs is to show all different communication speeds in a single IMB run 45
46 Three Different Comm Levels 46
47 Automatic Tuning with MPITUNE Provides generic tuning of optimal settings for environment variables Uses the IMB benchmark Provides results in scripts that can be read by using mpirun with -tune The resulting settings may be just copied or used as a hint for further optimization The resulting settings are only taken if the time is reduce by more than 3% The 3% limit can be configured to another value 47
48 How To Run MPITUNE MPITUNE is an executable script. The easiest way is to simply run: $ mpitune We may restrict MPITUNE on full nodes and the default fabric $ mpitune pr 8:8 fl shm:dapl hosts should be taken from provided hostfile or the batch system 48
49 MPITUNE output 49
50 MPITUNE result file File: mpiexec_shm:dapl_nn_1_ppn_8.conf 50
51 IMB and Cache Effects IMB may deliver too optimistic results because send and receive buffers stay in cache Real applications will normally use data from main memory for sending Results may be more realistic if we make sure that cache lines are not reused The flag -off_cache <last level cache size [MB]> may help in avoiding cache usage 51
52 Summary Tuning can only be effective when hardware parameters like node structure and communication speeds are well known cpuinfo and I_MPI_DEBUG=4 provide some useful information about node structure process mapping and taken network fabric IMB provides information about communication speeds Many environment variables are available for fine tuning. We may automatically set some of them by using MPITUNE Labs show practical usage IMB and MPITUNE 52
53 Performance Caveats and Notes Performance varies with each application, regardless of the technology and methods used. Certain types of HPC applications are amenable to acceleration and it is important to understand their characteristics. Once an application is identified to take advantage of acceleration, the high level and low level techniques are expected to work equally well. 53
Outline. Communication modes MPI Message Passing Interface Standard. Khoa Coâng Ngheä Thoâng Tin Ñaïi Hoïc Baùch Khoa Tp.HCM
THOAI NAM Outline Communication modes MPI Message Passing Interface Standard TERMs (1) Blocking If return from the procedure indicates the user is allowed to reuse resources specified in the call Non-blocking
More informationIntroduction to the Message Passing Interface (MPI)
Introduction to the Message Passing Interface (MPI) CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Introduction to the Message Passing Interface (MPI) Spring 2018
More informationFor Distributed Performance
For Distributed Performance Intel Parallel Studio XE 2017 development suite Empowering Faster Code Faster Delivering HPC Development Solutions Over 20 years Industry Collaboration on Standards PARALLELISM
More informationCS 6230: High-Performance Computing and Parallelization Introduction to MPI
CS 6230: High-Performance Computing and Parallelization Introduction to MPI Dr. Mike Kirby School of Computing and Scientific Computing and Imaging Institute University of Utah Salt Lake City, UT, USA
More informationIntroduction to MPI. Branislav Jansík
Introduction to MPI Branislav Jansík Resources https://computing.llnl.gov/tutorials/mpi/ http://www.mpi-forum.org/ https://www.open-mpi.org/doc/ Serial What is parallel computing Parallel What is MPI?
More informationAn Introduction to MPI
An Introduction to MPI Parallel Programming with the Message Passing Interface William Gropp Ewing Lusk Argonne National Laboratory 1 Outline Background The message-passing model Origins of MPI and current
More informationOutline. Communication modes MPI Message Passing Interface Standard
MPI THOAI NAM Outline Communication modes MPI Message Passing Interface Standard TERMs (1) Blocking If return from the procedure indicates the user is allowed to reuse resources specified in the call Non-blocking
More informationDistributed Memory Parallel Programming
COSC Big Data Analytics Parallel Programming using MPI Edgar Gabriel Spring 201 Distributed Memory Parallel Programming Vast majority of clusters are homogeneous Necessitated by the complexity of maintaining
More informationDmitry Durnov 15 February 2017
Cовременные тенденции разработки высокопроизводительных приложений Dmitry Durnov 15 February 2017 Agenda Modern cluster architecture Node level Cluster level Programming models Tools 2/20/2017 2 Modern
More informationPractical Introduction to Message-Passing Interface (MPI)
1 Practical Introduction to Message-Passing Interface (MPI) October 1st, 2015 By: Pier-Luc St-Onge Partners and Sponsors 2 Setup for the workshop 1. Get a user ID and password paper (provided in class):
More informationCS4961 Parallel Programming. Lecture 16: Introduction to Message Passing 11/3/11. Administrative. Mary Hall November 3, 2011.
CS4961 Parallel Programming Lecture 16: Introduction to Message Passing Administrative Next programming assignment due on Monday, Nov. 7 at midnight Need to define teams and have initial conversation with
More informationMessage Passing Interface (MPI) on Intel Xeon Phi coprocessor
Message Passing Interface (MPI) on Intel Xeon Phi coprocessor Special considerations for MPI on Intel Xeon Phi and using the Intel Trace Analyzer and Collector Gergana Slavova gergana.s.slavova@intel.com
More informationMPI Collective communication
MPI Collective communication CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) MPI Collective communication Spring 2018 1 / 43 Outline 1 MPI Collective communication
More informationDistributed Memory Programming with MPI
Distributed Memory Programming with MPI Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna moreno.marzolla@unibo.it Algoritmi Avanzati--modulo 2 2 Credits Peter Pacheco,
More informationSlides prepared by : Farzana Rahman 1
Introduction to MPI 1 Background on MPI MPI - Message Passing Interface Library standard defined by a committee of vendors, implementers, and parallel programmers Used to create parallel programs based
More informationUsing Intel VTune Amplifier XE for High Performance Computing
Using Intel VTune Amplifier XE for High Performance Computing Vladimir Tsymbal Performance, Analysis and Threading Lab 1 The Majority of all HPC-Systems are Clusters Interconnect I/O I/O... I/O I/O Message
More informationCS 470 Spring Mike Lam, Professor. Distributed Programming & MPI
CS 470 Spring 2017 Mike Lam, Professor Distributed Programming & MPI MPI paradigm Single program, multiple data (SPMD) One program, multiple processes (ranks) Processes communicate via messages An MPI
More informationSymmetric Computing. ISC 2015 July John Cazes Texas Advanced Computing Center
Symmetric Computing ISC 2015 July 2015 John Cazes Texas Advanced Computing Center Symmetric Computing Run MPI tasks on both MIC and host Also called heterogeneous computing Two executables are required:
More informationHigh Performance Computing Course Notes Message Passing Programming I
High Performance Computing Course Notes 2008-2009 2009 Message Passing Programming I Message Passing Programming Message Passing is the most widely used parallel programming model Message passing works
More informationCS 470 Spring Mike Lam, Professor. Distributed Programming & MPI
CS 470 Spring 2018 Mike Lam, Professor Distributed Programming & MPI MPI paradigm Single program, multiple data (SPMD) One program, multiple processes (ranks) Processes communicate via messages An MPI
More informationCINES MPI. Johanne Charpentier & Gabriel Hautreux
Training @ CINES MPI Johanne Charpentier & Gabriel Hautreux charpentier@cines.fr hautreux@cines.fr Clusters Architecture OpenMP MPI Hybrid MPI+OpenMP MPI Message Passing Interface 1. Introduction 2. MPI
More informationIntel MPI Library for Windows* OS. Developer Guide
Intel MPI Library for Windows* OS Developer Guide Contents Legal Information... 4 1. Introduction... 5 1.1. Introducing Intel MPI Library... 5 1.2. Conventions and Symbols... 5 1.3. Related Information...
More informationCOSC 6374 Parallel Computation. Message Passing Interface (MPI ) I Introduction. Distributed memory machines
Network card Network card 1 COSC 6374 Parallel Computation Message Passing Interface (MPI ) I Introduction Edgar Gabriel Fall 015 Distributed memory machines Each compute node represents an independent
More informationIntel MPI Cluster Edition on Graham A First Look! Doug Roberts
Intel MPI Cluster Edition on Graham A First Look! Doug Roberts SHARCNET / COMPUTE CANADA Intel Parallel Studio XE 2016 Update 4 Cluster Edition for Linux 1. Intel(R) MPI Library 5.1 Update 3 Cluster Ed
More informationLecture 4 Introduction to MPI
CS075 1896 Lecture 4 Introduction to MPI Jeremy Wei Center for HPC, SJTU Mar 13th, 2017 1920 1987 2006 Recap of the last lecture (OpenMP) OpenMP is a standardized pragma-based intra-node parallel programming
More informationMessage Passing Interface
MPSoC Architectures MPI Alberto Bosio, Associate Professor UM Microelectronic Departement bosio@lirmm.fr Message Passing Interface API for distributed-memory programming parallel code that runs across
More informationThe Message Passing Model
Introduction to MPI The Message Passing Model Applications that do not share a global address space need a Message Passing Framework. An application passes messages among processes in order to perform
More informationHolland Computing Center Kickstart MPI Intro
Holland Computing Center Kickstart 2016 MPI Intro Message Passing Interface (MPI) MPI is a specification for message passing library that is standardized by MPI Forum Multiple vendor-specific implementations:
More informationPRACE PATC Course: Intel MIC Programming Workshop MPI LRZ,
PRACE PATC Course: Intel MIC Programming Workshop MPI LRZ, 27.6.- 29.6.2016 Intel Xeon Phi Programming Models: MPI MPI on Hosts & MICs MPI @ LRZ Default Module: SuperMUC: mpi.ibm/1.4 SuperMIC: mpi.intel/5.1
More informationThe Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing
The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Parallelism Decompose the execution into several tasks according to the work to be done: Function/Task
More informationCS4961 Parallel Programming. Lecture 18: Introduction to Message Passing 11/3/10. Final Project Purpose: Mary Hall November 2, 2010.
Parallel Programming Lecture 18: Introduction to Message Passing Mary Hall November 2, 2010 Final Project Purpose: - A chance to dig in deeper into a parallel programming model and explore concepts. -
More informationIntroduction to MPI. May 20, Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign
Introduction to MPI May 20, 2013 Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign Top500.org PERFORMANCE DEVELOPMENT 1 Eflop/s 162 Pflop/s PROJECTED 100 Pflop/s
More informationPractical Introduction to Message-Passing Interface (MPI)
1 Outline of the workshop 2 Practical Introduction to Message-Passing Interface (MPI) Bart Oldeman, Calcul Québec McGill HPC Bart.Oldeman@mcgill.ca Theoretical / practical introduction Parallelizing your
More informationJURECA Tuning for the platform
JURECA Tuning for the platform Usage of ParaStation MPI 2017-11-23 Outline ParaStation MPI Compiling your program Running your program Tuning parameters Resources 2 ParaStation MPI Based on MPICH (3.2)
More informationMPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016
MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 Message passing vs. Shared memory Client Client Client Client send(msg) recv(msg) send(msg) recv(msg) MSG MSG MSG IPC Shared
More informationDistributed Memory Programming with Message-Passing
Distributed Memory Programming with Message-Passing Pacheco s book Chapter 3 T. Yang, CS240A Part of slides from the text book and B. Gropp Outline An overview of MPI programming Six MPI functions and
More informationMPI and comparison of models Lecture 23, cs262a. Ion Stoica & Ali Ghodsi UC Berkeley April 16, 2018
MPI and comparison of models Lecture 23, cs262a Ion Stoica & Ali Ghodsi UC Berkeley April 16, 2018 MPI MPI - Message Passing Interface Library standard defined by a committee of vendors, implementers,
More informationMPI Runtime Error Detection with MUST
MPI Runtime Error Detection with MUST At the 25th VI-HPS Tuning Workshop Joachim Protze IT Center RWTH Aachen University March 2017 How many issues can you spot in this tiny example? #include #include
More informationMA471. Lecture 5. Collective MPI Communication
MA471 Lecture 5 Collective MPI Communication Today: When all the processes want to send, receive or both Excellent website for MPI command syntax available at: http://www-unix.mcs.anl.gov/mpi/www/ 9/10/2003
More informationTopics. Lecture 7. Review. Other MPI collective functions. Collective Communication (cont d) MPI Programming (III)
Topics Lecture 7 MPI Programming (III) Collective communication (cont d) Point-to-point communication Basic point-to-point communication Non-blocking point-to-point communication Four modes of blocking
More informationPractical Scientific Computing: Performanceoptimized
Practical Scientific Computing: Performanceoptimized Programming Programming with MPI November 29, 2006 Dr. Ralf-Peter Mundani Department of Computer Science Chair V Technische Universität München, Germany
More informationSupercomputing in Plain English Exercise #6: MPI Point to Point
Supercomputing in Plain English Exercise #6: MPI Point to Point In this exercise, we ll use the same conventions and commands as in Exercises #1, #2, #3, #4 and #5. You should refer back to the Exercise
More informationIntroduction to MPI. Ekpe Okorafor. School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014
Introduction to MPI Ekpe Okorafor School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014 Topics Introduction MPI Model and Basic Calls MPI Communication Summary 2 Topics Introduction
More informationTutorial: Analyzing MPI Applications. Intel Trace Analyzer and Collector Intel VTune Amplifier XE
Tutorial: Analyzing MPI Applications Intel Trace Analyzer and Collector Intel VTune Amplifier XE Contents Legal Information... 3 1. Overview... 4 1.1. Prerequisites... 5 1.1.1. Required Software... 5 1.1.2.
More informationNo Time to Read This Book?
Chapter 1 No Time to Read This Book? We know what it feels like to be under pressure. Try out a few quick and proven optimization stunts described below. They may provide a good enough performance gain
More informationCSE 613: Parallel Programming. Lecture 21 ( The Message Passing Interface )
CSE 613: Parallel Programming Lecture 21 ( The Message Passing Interface ) Jesmin Jahan Tithi Department of Computer Science SUNY Stony Brook Fall 2013 ( Slides from Rezaul A. Chowdhury ) Principles of
More informationMessage Passing Interface. most of the slides taken from Hanjun Kim
Message Passing Interface most of the slides taken from Hanjun Kim Message Passing Pros Scalable, Flexible Cons Someone says it s more difficult than DSM MPI (Message Passing Interface) A standard message
More informationMessage Passing Interface
Message Passing Interface DPHPC15 TA: Salvatore Di Girolamo DSM (Distributed Shared Memory) Message Passing MPI (Message Passing Interface) A message passing specification implemented
More information30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy
Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy Why serial is not enough Computing architectures Parallel paradigms Message Passing Interface How
More informationParallel Programming
Parallel Programming MPI Part 1 Prof. Paolo Bientinesi pauldj@aices.rwth-aachen.de WS17/18 Preliminaries Distributed-memory architecture Paolo Bientinesi MPI 2 Preliminaries Distributed-memory architecture
More informationSymmetric Computing. Jerome Vienne Texas Advanced Computing Center
Symmetric Computing Jerome Vienne Texas Advanced Computing Center Symmetric Computing Run MPI tasks on both MIC and host Also called heterogeneous computing Two executables are required: CPU MIC Currently
More informationTutorial 2: MPI. CS486 - Principles of Distributed Computing Papageorgiou Spyros
Tutorial 2: MPI CS486 - Principles of Distributed Computing Papageorgiou Spyros What is MPI? An Interface Specification MPI = Message Passing Interface Provides a standard -> various implementations Offers
More informationMPI. (message passing, MIMD)
MPI (message passing, MIMD) What is MPI? a message-passing library specification extension of C/C++ (and Fortran) message passing for distributed memory parallel programming Features of MPI Point-to-point
More informationMPI 1. CSCI 4850/5850 High-Performance Computing Spring 2018
MPI 1 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives
More informationIntel Parallel Studio XE 2016 Cluster Edition (Масштабирование MPI производительность гибридных приложений)
Intel Parallel Studio XE 2016 Cluster Edition (Масштабирование MPI производительность гибридных приложений) Dmitry Sivkov, Michael Steyer Technical Consulting Engineer Intel Challenges Copyright 2015,
More informationHPC Workshop University of Kentucky May 9, 2007 May 10, 2007
HPC Workshop University of Kentucky May 9, 2007 May 10, 2007 Part 3 Parallel Programming Parallel Programming Concepts Amdahl s Law Parallel Programming Models Tools Compiler (Intel) Math Libraries (Intel)
More informationParallel Computing Paradigms
Parallel Computing Paradigms Message Passing João Luís Ferreira Sobral Departamento do Informática Universidade do Minho 31 October 2017 Communication paradigms for distributed memory Message passing is
More informationIntel MPI Library for Linux* OS. Developer Guide
Intel MPI Library for Linux* OS Developer Guide Contents Legal Information... 4 1. Introduction... 5 1.1. Introducing Intel MPI Library... 5 1.2. Conventions and Symbols... 5 1.3. Related Information...
More informationChip Multiprocessors COMP Lecture 9 - OpenMP & MPI
Chip Multiprocessors COMP35112 Lecture 9 - OpenMP & MPI Graham Riley 14 February 2018 1 Today s Lecture Dividing work to be done in parallel between threads in Java (as you are doing in the labs) is rather
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 4 Message-Passing Programming Learning Objectives n Understanding how MPI programs execute n Familiarity with fundamental MPI functions
More informationMPI Runtime Error Detection with MUST
MPI Runtime Error Detection with MUST At the 27th VI-HPS Tuning Workshop Joachim Protze IT Center RWTH Aachen University April 2018 How many issues can you spot in this tiny example? #include #include
More informationAdvanced MPI. Andrew Emerson
Advanced MPI Andrew Emerson (a.emerson@cineca.it) Agenda 1. One sided Communications (MPI-2) 2. Dynamic processes (MPI-2) 3. Profiling MPI and tracing 4. MPI-I/O 5. MPI-3 11/12/2015 Advanced MPI 2 One
More informationOur new HPC-Cluster An overview
Our new HPC-Cluster An overview Christian Hagen Universität Regensburg Regensburg, 15.05.2009 Outline 1 Layout 2 Hardware 3 Software 4 Getting an account 5 Compiling 6 Queueing system 7 Parallelization
More informationECE 574 Cluster Computing Lecture 13
ECE 574 Cluster Computing Lecture 13 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 15 October 2015 Announcements Homework #3 and #4 Grades out soon Homework #5 will be posted
More informationScalasca performance properties The metrics tour
Scalasca performance properties The metrics tour Markus Geimer m.geimer@fz-juelich.de Scalasca analysis result Generic metrics Generic metrics Time Total CPU allocation time Execution Overhead Visits Hardware
More informationMPI Mechanic. December Provided by ClusterWorld for Jeff Squyres cw.squyres.com.
December 2003 Provided by ClusterWorld for Jeff Squyres cw.squyres.com www.clusterworld.com Copyright 2004 ClusterWorld, All Rights Reserved For individual private use only. Not to be reproduced or distributed
More informationPractical Course Scientific Computing and Visualization
July 5, 2006 Page 1 of 21 1. Parallelization Architecture our target architecture: MIMD distributed address space machines program1 data1 program2 data2 program program3 data data3.. program(data) program1(data1)
More informationA few words about MPI (Message Passing Interface) T. Edwald 10 June 2008
A few words about MPI (Message Passing Interface) T. Edwald 10 June 2008 1 Overview Introduction and very short historical review MPI - as simple as it comes Communications Process Topologies (I have no
More informationRecap of Parallelism & MPI
Recap of Parallelism & MPI Chris Brady Heather Ratcliffe The Angry Penguin, used under creative commons licence from Swantje Hess and Jannis Pohlmann. Warwick RSE 13/12/2017 Parallel programming Break
More informationIntroduction to parallel computing concepts and technics
Introduction to parallel computing concepts and technics Paschalis Korosoglou (support@grid.auth.gr) User and Application Support Unit Scientific Computing Center @ AUTH Overview of Parallel computing
More informationSymmetric Computing. SC 14 Jerome VIENNE
Symmetric Computing SC 14 Jerome VIENNE viennej@tacc.utexas.edu Symmetric Computing Run MPI tasks on both MIC and host Also called heterogeneous computing Two executables are required: CPU MIC Currently
More informationAdvanced MPI. Andrew Emerson
Advanced MPI Andrew Emerson (a.emerson@cineca.it) Agenda 1. One sided Communications (MPI-2) 2. Dynamic processes (MPI-2) 3. Profiling MPI and tracing 4. MPI-I/O 5. MPI-3 22/02/2017 Advanced MPI 2 One
More informationIntroduction to MPI. Jerome Vienne Texas Advanced Computing Center January 10 th,
Introduction to MPI Jerome Vienne Texas Advanced Computing Center January 10 th, 2013 Email: viennej@tacc.utexas.edu 1 Course Objectives & Assumptions Objectives Teach basics of MPI-Programming Share information
More informationIntroduction to Intel Xeon Phi programming techniques. Fabio Affinito Vittorio Ruggiero
Introduction to Intel Xeon Phi programming techniques Fabio Affinito Vittorio Ruggiero Outline High level overview of the Intel Xeon Phi hardware and software stack Intel Xeon Phi programming paradigms:
More informationIntroduction to MPI. Ritu Arora Texas Advanced Computing Center June 17,
Introduction to MPI Ritu Arora Texas Advanced Computing Center June 17, 2014 Email: rauta@tacc.utexas.edu 1 Course Objectives & Assumptions Objectives Teach basics of MPI-Programming Share information
More informationMessage-Passing Computing
Chapter 2 Slide 41þþ Message-Passing Computing Slide 42þþ Basics of Message-Passing Programming using userlevel message passing libraries Two primary mechanisms needed: 1. A method of creating separate
More informationIntel VTune Amplifier XE
Intel VTune Amplifier XE Vladimir Tsymbal Performance, Analysis and Threading Lab 1 Agenda Intel VTune Amplifier XE Overview Features Data collectors Analysis types Key Concepts Collecting performance
More information15-440: Recitation 8
15-440: Recitation 8 School of Computer Science Carnegie Mellon University, Qatar Fall 2013 Date: Oct 31, 2013 I- Intended Learning Outcome (ILO): The ILO of this recitation is: Apply parallel programs
More informationExperiencing Cluster Computing Message Passing Interface
Experiencing Cluster Computing Message Passing Interface Class 6 Message Passing Paradigm The Underlying Principle A parallel program consists of p processes with different address spaces. Communication
More informationIntroduction to MPI. SHARCNET MPI Lecture Series: Part I of II. Paul Preney, OCT, M.Sc., B.Ed., B.Sc.
Introduction to MPI SHARCNET MPI Lecture Series: Part I of II Paul Preney, OCT, M.Sc., B.Ed., B.Sc. preney@sharcnet.ca School of Computer Science University of Windsor Windsor, Ontario, Canada Copyright
More informationL14 Supercomputing - Part 2
Geophysical Computing L14-1 L14 Supercomputing - Part 2 1. MPI Code Structure Writing parallel code can be done in either C or Fortran. The Message Passing Interface (MPI) is just a set of subroutines
More informationCS 426. Building and Running a Parallel Application
CS 426 Building and Running a Parallel Application 1 Task/Channel Model Design Efficient Parallel Programs (or Algorithms) Mainly for distributed memory systems (e.g. Clusters) Break Parallel Computations
More informationMPI MESSAGE PASSING INTERFACE
MPI MESSAGE PASSING INTERFACE David COLIGNON, ULiège CÉCI - Consortium des Équipements de Calcul Intensif http://www.ceci-hpc.be Outline Introduction From serial source code to parallel execution MPI functions
More informationExercises: April 11. Hermann Härtig, TU Dresden, Distributed OS, Load Balancing
Exercises: April 11 1 PARTITIONING IN MPI COMMUNICATION AND NOISE AS HPC BOTTLENECK LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2017 Hermann Härtig THIS LECTURE Partitioning: bulk synchronous
More informationParallel Applications on Distributed Memory Systems. Le Yan HPC User LSU
Parallel Applications on Distributed Memory Systems Le Yan HPC User Services @ LSU Outline Distributed memory systems Message Passing Interface (MPI) Parallel applications 6/3/2015 LONI Parallel Programming
More informationThe Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs
1 The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) s http://mpi-forum.org https://www.open-mpi.org/ Mike Bailey mjb@cs.oregonstate.edu Oregon State University mpi.pptx
More informationLOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS Hermann Härtig
LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2016 Hermann Härtig LECTURE OBJECTIVES starting points independent Unix processes and block synchronous execution which component (point in
More informationPaul Burton April 2015 An Introduction to MPI Programming
Paul Burton April 2015 Topics Introduction Initialising MPI & basic concepts Compiling and running a parallel program on the Cray Practical : Hello World MPI program Synchronisation Practical Data types
More informationCornell Theory Center. Discussion: MPI Collective Communication I. Table of Contents. 1. Introduction
1 of 18 11/1/2006 3:59 PM Cornell Theory Center Discussion: MPI Collective Communication I This is the in-depth discussion layer of a two-part module. For an explanation of the layers and how to navigate
More informationMPI MESSAGE PASSING INTERFACE
MPI MESSAGE PASSING INTERFACE David COLIGNON, ULiège CÉCI - Consortium des Équipements de Calcul Intensif http://www.ceci-hpc.be Outline Introduction From serial source code to parallel execution MPI functions
More informationIntroduction to MPI HPC Workshop: Parallel Programming. Alexander B. Pacheco
Introduction to MPI 2018 HPC Workshop: Parallel Programming Alexander B. Pacheco Research Computing July 17-18, 2018 Distributed Memory Model Each process has its own address space Data is local to each
More informationParallel Programming Using MPI
Parallel Programming Using MPI Prof. Hank Dietz KAOS Seminar, February 8, 2012 University of Kentucky Electrical & Computer Engineering Parallel Processing Process N pieces simultaneously, get up to a
More informationCSE. Parallel Algorithms on a cluster of PCs. Ian Bush. Daresbury Laboratory (With thanks to Lorna Smith and Mark Bull at EPCC)
Parallel Algorithms on a cluster of PCs Ian Bush Daresbury Laboratory I.J.Bush@dl.ac.uk (With thanks to Lorna Smith and Mark Bull at EPCC) Overview This lecture will cover General Message passing concepts
More informationIPM Workshop on High Performance Computing (HPC08) IPM School of Physics Workshop on High Perfomance Computing/HPC08
IPM School of Physics Workshop on High Perfomance Computing/HPC08 16-21 February 2008 MPI tutorial Luca Heltai Stefano Cozzini Democritos/INFM + SISSA 1 When
More informationTool for Analysing and Checking MPI Applications
Tool for Analysing and Checking MPI Applications April 30, 2010 1 CONTENTS CONTENTS Contents 1 Introduction 3 1.1 What is Marmot?........................... 3 1.2 Design of Marmot..........................
More informationPart One: The Files. C MPI Slurm Tutorial - Hello World. Introduction. Hello World! hello.tar. The files, summary. Output Files, summary
C MPI Slurm Tutorial - Hello World Introduction The example shown here demonstrates the use of the Slurm Scheduler for the purpose of running a C/MPI program. Knowledge of C is assumed. Having read the
More informationCluster Clonetroop: HowTo 2014
2014/02/25 16:53 1/13 Cluster Clonetroop: HowTo 2014 Cluster Clonetroop: HowTo 2014 This section contains information about how to access, compile and execute jobs on Clonetroop, Laboratori de Càlcul Numeric's
More informationProgramming Scalable Systems with MPI. Clemens Grelck, University of Amsterdam
Clemens Grelck University of Amsterdam UvA / SurfSARA High Performance Computing and Big Data Course June 2014 Parallel Programming with Compiler Directives: OpenMP Message Passing Gentle Introduction
More informationDesigning Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters
Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters K. Kandalla, A. Venkatesh, K. Hamidouche, S. Potluri, D. Bureddy and D. K. Panda Presented by Dr. Xiaoyi
More informationHands-on. MPI basic exercises
WIFI XSF-UPC: Username: xsf.convidat Password: 1nt3r3st3l4r WIFI EDUROAM: Username: roam06@bsc.es Password: Bsccns.4 MareNostrum III User Guide http://www.bsc.es/support/marenostrum3-ug.pdf Remember to
More information