Intel Parallel Studio XE Cluster Edition - Intel MPI - Intel Traceanalyzer & Collector

Size: px
Start display at page:

Download "Intel Parallel Studio XE Cluster Edition - Intel MPI - Intel Traceanalyzer & Collector"

Transcription

1 Intel Parallel Studio XE Cluster Edition - Intel MPI - Intel Traceanalyzer & Collector

2 A brief Introduction to MPI 2

3 What is MPI? Message Passing Interface Explicit parallel model All parallelism is explicit: the programmer is responsible for correctly identifying parallelism and implementing parallel algorithms using MPI constructs For parallel computers, clusters, heterogeneous networks and accelerators like Intel MIC architecture Designed as a standard to provide access to advanced parallel hardware for End users Library writers Tool developers Communication is done between MPI ranks typically implemented as operating system processes 3

4 MPI Standard Standard maintained by open forum Intel is one of the founders in 1992 and is still very actively engaged Version 1.0, (1994), 2 ( 2000), 2.1 ( 2008), 2.2 ( 2009) Version 3 released 2012 is latest No all implementation support this yet A message-passing library specification extended message-passing model not a language or compiler specification not a specific implementation or product 4

5 Notes on C and Fortran C and Fortran bindings correspond closely In C: mpi.h must be #included MPI functions return error codes or MPI_SUCCESS In Fortran: mpif.h must be included, or use MPI module (MPI-2) All MPI calls are to subroutines, with a place for the return code in the last argument. C++ bindings, and Fortran-90 issues, are part of MPI-2; MPI-3 introduces Fortran2008 interface 5

6 A first MPI program #include "mpi.h" #include <stdio.h> int main( int argc, char *argv[] ) { int rank, size; MPI_Init( &argc, &argv ); MPI_Comm_rank( MPI_COMM_WORLD, &rank ); MPI_Comm_size( MPI_COMM_WORLD, &size ); printf( "I am %d of %d\n", rank, size ); MPI_Finalize(); return 0; } MPI_COMM_WORLD is the default communicator whose group contains all processes initially. 6

7 Point-To_Point Communication MPI_SEND (start, count, datatype, dest, tag, comm) MPI_RECV(start, count, datatype, source, tag, comm, status) Messages are sent with an accompanying user-defined integer tag, to assist the receiving process in identifying the message. Messages can be screened at the receiving end by specifying a specific tag, or not screened by specifying MPI_ANY_TAG as the tag in a receive. MPI_SEND and MPI_RECV are blocking - there are non-blocking versions too ( ISEND, IRECEIVE) The 6 function introduced up to now ( Init, Finalize, Comm_rank, Comm_size, Send, Receive ) is all needed for many numerical programs but there is a lot more like MPI collective operations, 7

8 MPI Collective Routines Many Routines: MPI_ALLGATHER, MPI_ALLGATHERV, MPI_ALLREDUCE, MPI_ALLTOALL, MPI_ALLTOALLV, MPI_BCAST, MPI_GATHER, MPI_GATHERV, MPI_REDUCE, MPI_REDUCESCATTER, MPI_SCAN, MPI_SCATTER, MPI_SCATTERV, Collective operations are called by all processes in a communicator. All versions deliver results to all participating processes V versions (stands for vector) allow the hunks to have different sizes MPI_ALLREDUCE, MPI_REDUCE, MPI_REDUCESCATTER, and MPI_SCAN take both built-in ( like MPI_SUM, MPI_MAX) and userdefined combiner functions 8

9 Extending MPI Here MPI 2 Dynamic Process Management Dynamic process startup Dynamic establishment of connections One-sided communication Put/get Other operations Parallel I/O Other MPI-2 features Generalized requests Bindings for C++/ Fortran-90; inter-language topics 9

10 MPI-3 Planned and Added Features Topic Motivation Main Result Intel MPI 5.0 Support? Collective Operations Collective performance Non-blocking & sparse collectives Remote Memory Access Backward Compatibility Language Bindings Cache coherence, PGAS support Buffers > 2GB Supported ABI for latest C++ and Fortran Fast RMA Large buffer support, const buffers Fortran 2008 binding, removed C++ binding Tools Support PMPI limitations MPIT interface ( a very little bit was added) Hybrid Programming Core count growth MPI_Mprobe, shared memory windows Fault Tolerance Node count growth None. Next time? N/A Slide courtesy of A. Supalov 10

11 Overview Intel MPI Library 11

12 Intel MPI Library Intel MPI Library is derived from MPICH2; latest version is 5.0 Reasons to use Intel MPI Many additional features make Intel MPI Library more user friendly compared to other implementations correctness checking, statistics, Trace Analayzer support, Intel MPI Library provides top performance E.g. extensive performance tuning on key algorithms such as collective operations MPITUNE tool for automatic selection of best algorithms and settings Scalability to up to 150K ranks Available for Linux* and Windows* Professional support 12

13 Setting the environment Use this handy script to define all necessary paths: $ source /shared/intel/impi_5.0.1/bin64/mpivars.sh or $ module load Intel_MPI No additional paths to binaries and libs have to be specified Recommended: If Intel MPI is the only MPI you will use just include the above into your.bashrc 13

14 Intro to Intel MPI Library Compilation A simple test program is part of the Intel MPI Library distribution: $ cp $I_MPI_ROOT/test/test.c. $ mpiicc -o test.x test.c mpiicc is the wrapper script for Intel icc ( C-Compiler) mpicc is the wrapper script for GNU gcc Also available are mpiifort (Intel Fortran Compiler) mpiicpc (Intel C++ Compiler) mpicxx ( GNU g++) 14

15 Intro to Intel MPI Library Execution Intel MPI provides an easy-to-use run script: $ mpirun n <nprocs>./test.x Above works automatically on a single node and clusters with job schedulers present For more nodes we usually need to define a host file with a single node name per line: Unlike years before, there is no need anymore to start a daemon $ mpirun -f <host file> -n <nprocs>./test.x like legacy mpd since the MPICH-Hydra process management is used 15

16 Intro to Intel MPI Library Execution The test program prints out rank and hostname for each MPI process More debug information available by setting: $ export I_MPI_DEBUG=5 Will be propagated to all ranks automatically Prints basic settings of the Intel MPI Library 16

17 Output of test Program 17

18 Simple process placement using the Intel MPI Library Default pinning scheme: cores, sockets and nodes Easiest way to override default behavior is to use the process per node flag: $ mpirun ppn <nprocs-per-node> n <nprocs>./test.x if <nprocs-per-node> == 1, round robin with next process on next node is used 18

19 Intel MPI Library: ppn = 1 19

20 Overview Intel TraceAnalyzer and Collector 20

21 Intel Trace Analyzer and Collector (ITAC) A tool for understanding MPI program behavior, finding bottlenecks, performance analysis and MPI-correctness checking More than a profiler: Visualizes temporal behavior of MPI routines Shows dependencies and load imbalances Includes a correctness checking library Easy to use. Invoke via: Setting an extra flag to mpirun/mpiexec Setting an environment variable without changing your application or your run scripts 21

22 Intel Trace Analyzer and Collector ITAC may be applied without touching the program or environment. One way to get a first trace is: $ mpirun trace n <nprocs>./test.x Alternatively, just set the preload library and run without the trace flag: $ export LD_PRELOAD=libVT.so $ mpirun f <hostfile> -n <nprocs>./test.x this is actually what the flag does internally. This methodology may be applied to situations with complex run scripts not knowing where mpirun is actually executed. Note: this does not work for statically linked Intel MPI (not recommended). 22

23 Viewing the trace file ITAC will generate several files inside the directory where you started mpirun. Just start traceanalyzer in this directory: $ traceanalyzer test.x.stf Alternatively there is a Windows version of traceanalyzer contained in the Linux ICS package. 23

24 ITAC Function Profile After starting ITAC a window showing a basic timing profile for MPI and Application will be displayed. Right click on the red MPI bar to show the profiling for each used MPI routine: 24

25 ITAC Event Timeline Most important view of ITAC is the Event Timeline. This shows the temporal development of MPI routines and messages: 25

26 ITAC MPI Correctness Checker Correctness Checker validates MPI correctness. It uses another library but may be started like the ordinary ITAC: or $ mpirun check n <nprocs>./test.x $ export LD_PRELOAD=libVTmc.so $ mpirun n <nprocs>./test.x 26

27 Intel VTune Amplifier XE for MPI Intel VTune Analyzer XE provides detailed information timings and core events. It can also provide insight into the behavior of threaded applications: $ source /opt/intel/vtune_amplifier_xe/amplxe-vars.sh $ mpirun n <N> amplxe-cl --result-dir <result dir> --collect <mode> \ -- <MPI executable> Example: hotspots and concurrency are predefined analysis types; $ mpirun n 2 amplxe-cl --result-dir axe_ho -collect hotspots -- concurrency./poisson.x only makes sense with additional threading $ mpirun n 2 amplxe-cl --result-dir axe_co c concurrency --./poisson.x 27

28 Results with Intel VTune Amplifier XE After running the MPI program result directories should appear with the previously defined base name and indexed by MPI rank. Results may be viewed as ASCII output: or by using the Intel Vtune Amplifier GUI: $ amplxe-cl --report hotspots -r axe_ho.0 Results may also be transferred to Windows Laptop and viewed $ amplxe-gui axe_ho.0 by the Windows* version of Intel Vtune Amplifier XE 28

29 Intel Inspector XE for MPI Application Intel Inspector XE offers memory checking and correctness checking for threaded applications. For MPI applications we may use it in the following way: $ source /opt/intel/inspector_xe/inspxe-vars.sh intel64 $ mpirun n <N> inspxe-cl --result-dir <result dir> --collect <mode> \ -- <MPI executable> Example : $ mpirun n 4 inspxe-cl --result-dir insp_mi3 --collect mi3 --./poisson.x $ mpirun n 4 inspxe-cl --result-dir insp_ti3 --collect ti3 --./poisson.x mi3 and ti3 are the most demanding memory and threading modes. 29

30 Results with Intel Inspector XE After running the MPI program result directories should appear with the previously defined base name and indexed with MPI rank. Results may be viewed as ASCII output: $ inspxe-cl --report problems --r insp_mi3.0 or by using the Intel Inspector XE GUI: Results may also be transferred to a Windows* computer and $ inspxe-gui insp_mi3.0 viewed by the Windows* version of Intel Inspector XE 30

31 Advanced Topics: Cluster Exploration Tools 31

32 Cluster Exploration Tools cpuinfo: included in the Intel MPI Library package Debug level: raising the debug level of Intel MPI Library will provide extra information ifconfig etc: Linux tools for showing available network devices Intel MPI Benchmarks (IMB): Collection of timed MPI tests for generic MPI performance evaluation MPITUNE: tuning script for automatic determination of optimal setting. Results can be stored and used on demand. This lecture covers the generic mode using IMB as the Program to be tuned 32

33 Cluster Node Exploration: cpuinfo Shows important features of a node: number of sockets, cores per socket including hyper-threads and caches Part of the Intel MPI Library distribution Reads its data from /proc/cpuinfo and prints it in a more appropriate format 33

34 Cluster Node Exploration: cpuinfo Shows important features of a node: number of sockets, cores per socket including hyper-threads and caches Part of the Intel MPI Library distribution Reads its data from /proc/cpuinfo and prints it in a more appropriate format 34

35 Using Environment Variables Environment variables may be exported inside your shell and automatically propagated to each rank Or, they can be specified on the command line for a single run by: $ mpirun genv I_MPI_DEBUG 4 <program.x> -genv stands for global environment propagated to all nodes It is also possible to define local environments for different nodes: -env defines environment variables locally $ mpirun env OMP_NUM_THREADS 4 n 2 <program1.x> : \ env OMP_NUM_THREADS 2 n 4 <program2.x> 35

36 Cluster Node Exploration: Debug Info Setting the I_MPI_DEBUG environment variable increases the information printed to std_out depending on the non negative integer value specified For example, I_MPI_DEBUG=4 prints information about process pinning, used network interfaces and Intel MPI Library environment variables set by the user Process pinning is the mapping of MPI ranks to hardware resources like cores, sockets, caches etc. Default pinning strategy of Intel MPI Library may depend on version! To increase performance you should control the pinning especially for hybrid programs (pinning domains) 36

37 Cluster Node Exploration: Debug Info Setting the I_MPI_DEBUG environment variable increases the information printed to std_out depending on the non negative integer value specified For example, I_MPI_DEBUG=4 prints information about process pinning, used network interfaces and Intel MPI Library environment variables set by the user Process pinning is the mapping of MPI ranks to hardware resources like cores, sockets, caches etc. Default pinning strategy of Intel MPI Library may depend on version! To increase performance you should control the pinning especially for hybrid programs (pinning domains) 37

38 Cluster Node Exploration: Pinning Pin the ranks to explicit processors using the environment variable as shown below: $ export I_MPI_PIN_PROCESSOR_LIST=p1,p2,p3, rank #n is mapped to logical processor pn besides explicit mapping of ranks to logical processors as shown, you can also use the predefined settings 38

39 I_MPI_PIN_PROCESSOR_LIST=1-8 First rank on socket #0 and core #0 Second rank on socket #1 and core #1 39

40 Cluster Structure Inter Node IB router Inter Socket (QPI) ETH router Intra Socket Head Node: Compile, Edit, Job management Internet 40

41 Three Levels of Communication Speed Communication speed is not homogeneous: Inter node (Infiniband*, Ethernet, etc) Intra node inter sockets (Quick Path Interconnect QPI) Intra socket Two additional levels when using Intel Xeon Phi coprocessor: host Intel Xeon Phi coprocessor communication Inter Intel Xeon Phi coprocessor communication 41

42 Measuring Comm Speed with IMB The most simple benchmark in IMB is called PingPong: data packages of different size are sent from rank 0 to rank 1 and back: $ mpirun n 2 IMB-MPI1 pingpong 42

43 Placing MPI Ranks on a Cluster Process placement on a single node was already discussed The default strategy for mapping MPI ranks on a cluster tries to balance resources (same number of processes on each socket) and to minimize the distance between adjacent ranks A mapping with 2 MPI ranks on different nodes may be enforced by using the flag ppn 1 PPN stands for Processes Per Node Parameter value 1 will place first rank on first node and the second rank on the next node (alternative env. Var.: I_MPI_PERHOST=1) 43

44 Measuring 3 Levels of Comm Speed Inter node communication (e.g. InfiniBand*): $ mpirun ppn 1 n 2 IMB-MPI1 pingpong Intra node inter socket (QPI): $ export I_MPI_PIN_PROCESSOR_LIST=allsocks $ mpirun n 2 IMB-MPI1 pingpong Intra node intra socket (between cores on a processor) $ export I_MPI_PIN_PROCESSOR_LIST=allcores:grain=1 $ mpirun n 2 IMB-MPI1 pingpong 44

45 Multiple PingPongs The default IMB pingpong will just use the first 2 ranks for the pingpong an put all other ranks into a barrier It is possible to do simultaneous pingpongs e.g. 4 pairs: $ mpirun n 8 IMB-MPI multi <x> pingpong with x=0 for average results and x=1 for all results Stretch goal for the Labs is to show all different communication speeds in a single IMB run 45

46 Three Different Comm Levels 46

47 Automatic Tuning with MPITUNE Provides generic tuning of optimal settings for environment variables Uses the IMB benchmark Provides results in scripts that can be read by using mpirun with -tune The resulting settings may be just copied or used as a hint for further optimization The resulting settings are only taken if the time is reduce by more than 3% The 3% limit can be configured to another value 47

48 How To Run MPITUNE MPITUNE is an executable script. The easiest way is to simply run: $ mpitune We may restrict MPITUNE on full nodes and the default fabric $ mpitune pr 8:8 fl shm:dapl hosts should be taken from provided hostfile or the batch system 48

49 MPITUNE output 49

50 MPITUNE result file File: mpiexec_shm:dapl_nn_1_ppn_8.conf 50

51 IMB and Cache Effects IMB may deliver too optimistic results because send and receive buffers stay in cache Real applications will normally use data from main memory for sending Results may be more realistic if we make sure that cache lines are not reused The flag -off_cache <last level cache size [MB]> may help in avoiding cache usage 51

52 Summary Tuning can only be effective when hardware parameters like node structure and communication speeds are well known cpuinfo and I_MPI_DEBUG=4 provide some useful information about node structure process mapping and taken network fabric IMB provides information about communication speeds Many environment variables are available for fine tuning. We may automatically set some of them by using MPITUNE Labs show practical usage IMB and MPITUNE 52

53 Performance Caveats and Notes Performance varies with each application, regardless of the technology and methods used. Certain types of HPC applications are amenable to acceleration and it is important to understand their characteristics. Once an application is identified to take advantage of acceleration, the high level and low level techniques are expected to work equally well. 53

Outline. Communication modes MPI Message Passing Interface Standard. Khoa Coâng Ngheä Thoâng Tin Ñaïi Hoïc Baùch Khoa Tp.HCM

Outline. Communication modes MPI Message Passing Interface Standard. Khoa Coâng Ngheä Thoâng Tin Ñaïi Hoïc Baùch Khoa Tp.HCM THOAI NAM Outline Communication modes MPI Message Passing Interface Standard TERMs (1) Blocking If return from the procedure indicates the user is allowed to reuse resources specified in the call Non-blocking

More information

Introduction to the Message Passing Interface (MPI)

Introduction to the Message Passing Interface (MPI) Introduction to the Message Passing Interface (MPI) CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Introduction to the Message Passing Interface (MPI) Spring 2018

More information

For Distributed Performance

For Distributed Performance For Distributed Performance Intel Parallel Studio XE 2017 development suite Empowering Faster Code Faster Delivering HPC Development Solutions Over 20 years Industry Collaboration on Standards PARALLELISM

More information

CS 6230: High-Performance Computing and Parallelization Introduction to MPI

CS 6230: High-Performance Computing and Parallelization Introduction to MPI CS 6230: High-Performance Computing and Parallelization Introduction to MPI Dr. Mike Kirby School of Computing and Scientific Computing and Imaging Institute University of Utah Salt Lake City, UT, USA

More information

Introduction to MPI. Branislav Jansík

Introduction to MPI. Branislav Jansík Introduction to MPI Branislav Jansík Resources https://computing.llnl.gov/tutorials/mpi/ http://www.mpi-forum.org/ https://www.open-mpi.org/doc/ Serial What is parallel computing Parallel What is MPI?

More information

An Introduction to MPI

An Introduction to MPI An Introduction to MPI Parallel Programming with the Message Passing Interface William Gropp Ewing Lusk Argonne National Laboratory 1 Outline Background The message-passing model Origins of MPI and current

More information

Outline. Communication modes MPI Message Passing Interface Standard

Outline. Communication modes MPI Message Passing Interface Standard MPI THOAI NAM Outline Communication modes MPI Message Passing Interface Standard TERMs (1) Blocking If return from the procedure indicates the user is allowed to reuse resources specified in the call Non-blocking

More information

Distributed Memory Parallel Programming

Distributed Memory Parallel Programming COSC Big Data Analytics Parallel Programming using MPI Edgar Gabriel Spring 201 Distributed Memory Parallel Programming Vast majority of clusters are homogeneous Necessitated by the complexity of maintaining

More information

Dmitry Durnov 15 February 2017

Dmitry Durnov 15 February 2017 Cовременные тенденции разработки высокопроизводительных приложений Dmitry Durnov 15 February 2017 Agenda Modern cluster architecture Node level Cluster level Programming models Tools 2/20/2017 2 Modern

More information

Practical Introduction to Message-Passing Interface (MPI)

Practical Introduction to Message-Passing Interface (MPI) 1 Practical Introduction to Message-Passing Interface (MPI) October 1st, 2015 By: Pier-Luc St-Onge Partners and Sponsors 2 Setup for the workshop 1. Get a user ID and password paper (provided in class):

More information

CS4961 Parallel Programming. Lecture 16: Introduction to Message Passing 11/3/11. Administrative. Mary Hall November 3, 2011.

CS4961 Parallel Programming. Lecture 16: Introduction to Message Passing 11/3/11. Administrative. Mary Hall November 3, 2011. CS4961 Parallel Programming Lecture 16: Introduction to Message Passing Administrative Next programming assignment due on Monday, Nov. 7 at midnight Need to define teams and have initial conversation with

More information

Message Passing Interface (MPI) on Intel Xeon Phi coprocessor

Message Passing Interface (MPI) on Intel Xeon Phi coprocessor Message Passing Interface (MPI) on Intel Xeon Phi coprocessor Special considerations for MPI on Intel Xeon Phi and using the Intel Trace Analyzer and Collector Gergana Slavova gergana.s.slavova@intel.com

More information

MPI Collective communication

MPI Collective communication MPI Collective communication CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) MPI Collective communication Spring 2018 1 / 43 Outline 1 MPI Collective communication

More information

Distributed Memory Programming with MPI

Distributed Memory Programming with MPI Distributed Memory Programming with MPI Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna moreno.marzolla@unibo.it Algoritmi Avanzati--modulo 2 2 Credits Peter Pacheco,

More information

Slides prepared by : Farzana Rahman 1

Slides prepared by : Farzana Rahman 1 Introduction to MPI 1 Background on MPI MPI - Message Passing Interface Library standard defined by a committee of vendors, implementers, and parallel programmers Used to create parallel programs based

More information

Using Intel VTune Amplifier XE for High Performance Computing

Using Intel VTune Amplifier XE for High Performance Computing Using Intel VTune Amplifier XE for High Performance Computing Vladimir Tsymbal Performance, Analysis and Threading Lab 1 The Majority of all HPC-Systems are Clusters Interconnect I/O I/O... I/O I/O Message

More information

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI CS 470 Spring 2017 Mike Lam, Professor Distributed Programming & MPI MPI paradigm Single program, multiple data (SPMD) One program, multiple processes (ranks) Processes communicate via messages An MPI

More information

Symmetric Computing. ISC 2015 July John Cazes Texas Advanced Computing Center

Symmetric Computing. ISC 2015 July John Cazes Texas Advanced Computing Center Symmetric Computing ISC 2015 July 2015 John Cazes Texas Advanced Computing Center Symmetric Computing Run MPI tasks on both MIC and host Also called heterogeneous computing Two executables are required:

More information

High Performance Computing Course Notes Message Passing Programming I

High Performance Computing Course Notes Message Passing Programming I High Performance Computing Course Notes 2008-2009 2009 Message Passing Programming I Message Passing Programming Message Passing is the most widely used parallel programming model Message passing works

More information

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI CS 470 Spring 2018 Mike Lam, Professor Distributed Programming & MPI MPI paradigm Single program, multiple data (SPMD) One program, multiple processes (ranks) Processes communicate via messages An MPI

More information

CINES MPI. Johanne Charpentier & Gabriel Hautreux

CINES MPI. Johanne Charpentier & Gabriel Hautreux Training @ CINES MPI Johanne Charpentier & Gabriel Hautreux charpentier@cines.fr hautreux@cines.fr Clusters Architecture OpenMP MPI Hybrid MPI+OpenMP MPI Message Passing Interface 1. Introduction 2. MPI

More information

Intel MPI Library for Windows* OS. Developer Guide

Intel MPI Library for Windows* OS. Developer Guide Intel MPI Library for Windows* OS Developer Guide Contents Legal Information... 4 1. Introduction... 5 1.1. Introducing Intel MPI Library... 5 1.2. Conventions and Symbols... 5 1.3. Related Information...

More information

COSC 6374 Parallel Computation. Message Passing Interface (MPI ) I Introduction. Distributed memory machines

COSC 6374 Parallel Computation. Message Passing Interface (MPI ) I Introduction. Distributed memory machines Network card Network card 1 COSC 6374 Parallel Computation Message Passing Interface (MPI ) I Introduction Edgar Gabriel Fall 015 Distributed memory machines Each compute node represents an independent

More information

Intel MPI Cluster Edition on Graham A First Look! Doug Roberts

Intel MPI Cluster Edition on Graham A First Look! Doug Roberts Intel MPI Cluster Edition on Graham A First Look! Doug Roberts SHARCNET / COMPUTE CANADA Intel Parallel Studio XE 2016 Update 4 Cluster Edition for Linux 1. Intel(R) MPI Library 5.1 Update 3 Cluster Ed

More information

Lecture 4 Introduction to MPI

Lecture 4 Introduction to MPI CS075 1896 Lecture 4 Introduction to MPI Jeremy Wei Center for HPC, SJTU Mar 13th, 2017 1920 1987 2006 Recap of the last lecture (OpenMP) OpenMP is a standardized pragma-based intra-node parallel programming

More information

Message Passing Interface

Message Passing Interface MPSoC Architectures MPI Alberto Bosio, Associate Professor UM Microelectronic Departement bosio@lirmm.fr Message Passing Interface API for distributed-memory programming parallel code that runs across

More information

The Message Passing Model

The Message Passing Model Introduction to MPI The Message Passing Model Applications that do not share a global address space need a Message Passing Framework. An application passes messages among processes in order to perform

More information

Holland Computing Center Kickstart MPI Intro

Holland Computing Center Kickstart MPI Intro Holland Computing Center Kickstart 2016 MPI Intro Message Passing Interface (MPI) MPI is a specification for message passing library that is standardized by MPI Forum Multiple vendor-specific implementations:

More information

PRACE PATC Course: Intel MIC Programming Workshop MPI LRZ,

PRACE PATC Course: Intel MIC Programming Workshop MPI LRZ, PRACE PATC Course: Intel MIC Programming Workshop MPI LRZ, 27.6.- 29.6.2016 Intel Xeon Phi Programming Models: MPI MPI on Hosts & MICs MPI @ LRZ Default Module: SuperMUC: mpi.ibm/1.4 SuperMIC: mpi.intel/5.1

More information

The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing

The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Parallelism Decompose the execution into several tasks according to the work to be done: Function/Task

More information

CS4961 Parallel Programming. Lecture 18: Introduction to Message Passing 11/3/10. Final Project Purpose: Mary Hall November 2, 2010.

CS4961 Parallel Programming. Lecture 18: Introduction to Message Passing 11/3/10. Final Project Purpose: Mary Hall November 2, 2010. Parallel Programming Lecture 18: Introduction to Message Passing Mary Hall November 2, 2010 Final Project Purpose: - A chance to dig in deeper into a parallel programming model and explore concepts. -

More information

Introduction to MPI. May 20, Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign

Introduction to MPI. May 20, Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign Introduction to MPI May 20, 2013 Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign Top500.org PERFORMANCE DEVELOPMENT 1 Eflop/s 162 Pflop/s PROJECTED 100 Pflop/s

More information

Practical Introduction to Message-Passing Interface (MPI)

Practical Introduction to Message-Passing Interface (MPI) 1 Outline of the workshop 2 Practical Introduction to Message-Passing Interface (MPI) Bart Oldeman, Calcul Québec McGill HPC Bart.Oldeman@mcgill.ca Theoretical / practical introduction Parallelizing your

More information

JURECA Tuning for the platform

JURECA Tuning for the platform JURECA Tuning for the platform Usage of ParaStation MPI 2017-11-23 Outline ParaStation MPI Compiling your program Running your program Tuning parameters Resources 2 ParaStation MPI Based on MPICH (3.2)

More information

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 Message passing vs. Shared memory Client Client Client Client send(msg) recv(msg) send(msg) recv(msg) MSG MSG MSG IPC Shared

More information

Distributed Memory Programming with Message-Passing

Distributed Memory Programming with Message-Passing Distributed Memory Programming with Message-Passing Pacheco s book Chapter 3 T. Yang, CS240A Part of slides from the text book and B. Gropp Outline An overview of MPI programming Six MPI functions and

More information

MPI and comparison of models Lecture 23, cs262a. Ion Stoica & Ali Ghodsi UC Berkeley April 16, 2018

MPI and comparison of models Lecture 23, cs262a. Ion Stoica & Ali Ghodsi UC Berkeley April 16, 2018 MPI and comparison of models Lecture 23, cs262a Ion Stoica & Ali Ghodsi UC Berkeley April 16, 2018 MPI MPI - Message Passing Interface Library standard defined by a committee of vendors, implementers,

More information

MPI Runtime Error Detection with MUST

MPI Runtime Error Detection with MUST MPI Runtime Error Detection with MUST At the 25th VI-HPS Tuning Workshop Joachim Protze IT Center RWTH Aachen University March 2017 How many issues can you spot in this tiny example? #include #include

More information

MA471. Lecture 5. Collective MPI Communication

MA471. Lecture 5. Collective MPI Communication MA471 Lecture 5 Collective MPI Communication Today: When all the processes want to send, receive or both Excellent website for MPI command syntax available at: http://www-unix.mcs.anl.gov/mpi/www/ 9/10/2003

More information

Topics. Lecture 7. Review. Other MPI collective functions. Collective Communication (cont d) MPI Programming (III)

Topics. Lecture 7. Review. Other MPI collective functions. Collective Communication (cont d) MPI Programming (III) Topics Lecture 7 MPI Programming (III) Collective communication (cont d) Point-to-point communication Basic point-to-point communication Non-blocking point-to-point communication Four modes of blocking

More information

Practical Scientific Computing: Performanceoptimized

Practical Scientific Computing: Performanceoptimized Practical Scientific Computing: Performanceoptimized Programming Programming with MPI November 29, 2006 Dr. Ralf-Peter Mundani Department of Computer Science Chair V Technische Universität München, Germany

More information

Supercomputing in Plain English Exercise #6: MPI Point to Point

Supercomputing in Plain English Exercise #6: MPI Point to Point Supercomputing in Plain English Exercise #6: MPI Point to Point In this exercise, we ll use the same conventions and commands as in Exercises #1, #2, #3, #4 and #5. You should refer back to the Exercise

More information

Introduction to MPI. Ekpe Okorafor. School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014

Introduction to MPI. Ekpe Okorafor. School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014 Introduction to MPI Ekpe Okorafor School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014 Topics Introduction MPI Model and Basic Calls MPI Communication Summary 2 Topics Introduction

More information

Tutorial: Analyzing MPI Applications. Intel Trace Analyzer and Collector Intel VTune Amplifier XE

Tutorial: Analyzing MPI Applications. Intel Trace Analyzer and Collector Intel VTune Amplifier XE Tutorial: Analyzing MPI Applications Intel Trace Analyzer and Collector Intel VTune Amplifier XE Contents Legal Information... 3 1. Overview... 4 1.1. Prerequisites... 5 1.1.1. Required Software... 5 1.1.2.

More information

No Time to Read This Book?

No Time to Read This Book? Chapter 1 No Time to Read This Book? We know what it feels like to be under pressure. Try out a few quick and proven optimization stunts described below. They may provide a good enough performance gain

More information

CSE 613: Parallel Programming. Lecture 21 ( The Message Passing Interface )

CSE 613: Parallel Programming. Lecture 21 ( The Message Passing Interface ) CSE 613: Parallel Programming Lecture 21 ( The Message Passing Interface ) Jesmin Jahan Tithi Department of Computer Science SUNY Stony Brook Fall 2013 ( Slides from Rezaul A. Chowdhury ) Principles of

More information

Message Passing Interface. most of the slides taken from Hanjun Kim

Message Passing Interface. most of the slides taken from Hanjun Kim Message Passing Interface most of the slides taken from Hanjun Kim Message Passing Pros Scalable, Flexible Cons Someone says it s more difficult than DSM MPI (Message Passing Interface) A standard message

More information

Message Passing Interface

Message Passing Interface Message Passing Interface DPHPC15 TA: Salvatore Di Girolamo DSM (Distributed Shared Memory) Message Passing MPI (Message Passing Interface) A message passing specification implemented

More information

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy Why serial is not enough Computing architectures Parallel paradigms Message Passing Interface How

More information

Parallel Programming

Parallel Programming Parallel Programming MPI Part 1 Prof. Paolo Bientinesi pauldj@aices.rwth-aachen.de WS17/18 Preliminaries Distributed-memory architecture Paolo Bientinesi MPI 2 Preliminaries Distributed-memory architecture

More information

Symmetric Computing. Jerome Vienne Texas Advanced Computing Center

Symmetric Computing. Jerome Vienne Texas Advanced Computing Center Symmetric Computing Jerome Vienne Texas Advanced Computing Center Symmetric Computing Run MPI tasks on both MIC and host Also called heterogeneous computing Two executables are required: CPU MIC Currently

More information

Tutorial 2: MPI. CS486 - Principles of Distributed Computing Papageorgiou Spyros

Tutorial 2: MPI. CS486 - Principles of Distributed Computing Papageorgiou Spyros Tutorial 2: MPI CS486 - Principles of Distributed Computing Papageorgiou Spyros What is MPI? An Interface Specification MPI = Message Passing Interface Provides a standard -> various implementations Offers

More information

MPI. (message passing, MIMD)

MPI. (message passing, MIMD) MPI (message passing, MIMD) What is MPI? a message-passing library specification extension of C/C++ (and Fortran) message passing for distributed memory parallel programming Features of MPI Point-to-point

More information

MPI 1. CSCI 4850/5850 High-Performance Computing Spring 2018

MPI 1. CSCI 4850/5850 High-Performance Computing Spring 2018 MPI 1 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives

More information

Intel Parallel Studio XE 2016 Cluster Edition (Масштабирование MPI производительность гибридных приложений)

Intel Parallel Studio XE 2016 Cluster Edition (Масштабирование MPI производительность гибридных приложений) Intel Parallel Studio XE 2016 Cluster Edition (Масштабирование MPI производительность гибридных приложений) Dmitry Sivkov, Michael Steyer Technical Consulting Engineer Intel Challenges Copyright 2015,

More information

HPC Workshop University of Kentucky May 9, 2007 May 10, 2007

HPC Workshop University of Kentucky May 9, 2007 May 10, 2007 HPC Workshop University of Kentucky May 9, 2007 May 10, 2007 Part 3 Parallel Programming Parallel Programming Concepts Amdahl s Law Parallel Programming Models Tools Compiler (Intel) Math Libraries (Intel)

More information

Parallel Computing Paradigms

Parallel Computing Paradigms Parallel Computing Paradigms Message Passing João Luís Ferreira Sobral Departamento do Informática Universidade do Minho 31 October 2017 Communication paradigms for distributed memory Message passing is

More information

Intel MPI Library for Linux* OS. Developer Guide

Intel MPI Library for Linux* OS. Developer Guide Intel MPI Library for Linux* OS Developer Guide Contents Legal Information... 4 1. Introduction... 5 1.1. Introducing Intel MPI Library... 5 1.2. Conventions and Symbols... 5 1.3. Related Information...

More information

Chip Multiprocessors COMP Lecture 9 - OpenMP & MPI

Chip Multiprocessors COMP Lecture 9 - OpenMP & MPI Chip Multiprocessors COMP35112 Lecture 9 - OpenMP & MPI Graham Riley 14 February 2018 1 Today s Lecture Dividing work to be done in parallel between threads in Java (as you are doing in the labs) is rather

More information

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 4 Message-Passing Programming Learning Objectives n Understanding how MPI programs execute n Familiarity with fundamental MPI functions

More information

MPI Runtime Error Detection with MUST

MPI Runtime Error Detection with MUST MPI Runtime Error Detection with MUST At the 27th VI-HPS Tuning Workshop Joachim Protze IT Center RWTH Aachen University April 2018 How many issues can you spot in this tiny example? #include #include

More information

Advanced MPI. Andrew Emerson

Advanced MPI. Andrew Emerson Advanced MPI Andrew Emerson (a.emerson@cineca.it) Agenda 1. One sided Communications (MPI-2) 2. Dynamic processes (MPI-2) 3. Profiling MPI and tracing 4. MPI-I/O 5. MPI-3 11/12/2015 Advanced MPI 2 One

More information

Our new HPC-Cluster An overview

Our new HPC-Cluster An overview Our new HPC-Cluster An overview Christian Hagen Universität Regensburg Regensburg, 15.05.2009 Outline 1 Layout 2 Hardware 3 Software 4 Getting an account 5 Compiling 6 Queueing system 7 Parallelization

More information

ECE 574 Cluster Computing Lecture 13

ECE 574 Cluster Computing Lecture 13 ECE 574 Cluster Computing Lecture 13 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 15 October 2015 Announcements Homework #3 and #4 Grades out soon Homework #5 will be posted

More information

Scalasca performance properties The metrics tour

Scalasca performance properties The metrics tour Scalasca performance properties The metrics tour Markus Geimer m.geimer@fz-juelich.de Scalasca analysis result Generic metrics Generic metrics Time Total CPU allocation time Execution Overhead Visits Hardware

More information

MPI Mechanic. December Provided by ClusterWorld for Jeff Squyres cw.squyres.com.

MPI Mechanic. December Provided by ClusterWorld for Jeff Squyres cw.squyres.com. December 2003 Provided by ClusterWorld for Jeff Squyres cw.squyres.com www.clusterworld.com Copyright 2004 ClusterWorld, All Rights Reserved For individual private use only. Not to be reproduced or distributed

More information

Practical Course Scientific Computing and Visualization

Practical Course Scientific Computing and Visualization July 5, 2006 Page 1 of 21 1. Parallelization Architecture our target architecture: MIMD distributed address space machines program1 data1 program2 data2 program program3 data data3.. program(data) program1(data1)

More information

A few words about MPI (Message Passing Interface) T. Edwald 10 June 2008

A few words about MPI (Message Passing Interface) T. Edwald 10 June 2008 A few words about MPI (Message Passing Interface) T. Edwald 10 June 2008 1 Overview Introduction and very short historical review MPI - as simple as it comes Communications Process Topologies (I have no

More information

Recap of Parallelism & MPI

Recap of Parallelism & MPI Recap of Parallelism & MPI Chris Brady Heather Ratcliffe The Angry Penguin, used under creative commons licence from Swantje Hess and Jannis Pohlmann. Warwick RSE 13/12/2017 Parallel programming Break

More information

Introduction to parallel computing concepts and technics

Introduction to parallel computing concepts and technics Introduction to parallel computing concepts and technics Paschalis Korosoglou (support@grid.auth.gr) User and Application Support Unit Scientific Computing Center @ AUTH Overview of Parallel computing

More information

Symmetric Computing. SC 14 Jerome VIENNE

Symmetric Computing. SC 14 Jerome VIENNE Symmetric Computing SC 14 Jerome VIENNE viennej@tacc.utexas.edu Symmetric Computing Run MPI tasks on both MIC and host Also called heterogeneous computing Two executables are required: CPU MIC Currently

More information

Advanced MPI. Andrew Emerson

Advanced MPI. Andrew Emerson Advanced MPI Andrew Emerson (a.emerson@cineca.it) Agenda 1. One sided Communications (MPI-2) 2. Dynamic processes (MPI-2) 3. Profiling MPI and tracing 4. MPI-I/O 5. MPI-3 22/02/2017 Advanced MPI 2 One

More information

Introduction to MPI. Jerome Vienne Texas Advanced Computing Center January 10 th,

Introduction to MPI. Jerome Vienne Texas Advanced Computing Center January 10 th, Introduction to MPI Jerome Vienne Texas Advanced Computing Center January 10 th, 2013 Email: viennej@tacc.utexas.edu 1 Course Objectives & Assumptions Objectives Teach basics of MPI-Programming Share information

More information

Introduction to Intel Xeon Phi programming techniques. Fabio Affinito Vittorio Ruggiero

Introduction to Intel Xeon Phi programming techniques. Fabio Affinito Vittorio Ruggiero Introduction to Intel Xeon Phi programming techniques Fabio Affinito Vittorio Ruggiero Outline High level overview of the Intel Xeon Phi hardware and software stack Intel Xeon Phi programming paradigms:

More information

Introduction to MPI. Ritu Arora Texas Advanced Computing Center June 17,

Introduction to MPI. Ritu Arora Texas Advanced Computing Center June 17, Introduction to MPI Ritu Arora Texas Advanced Computing Center June 17, 2014 Email: rauta@tacc.utexas.edu 1 Course Objectives & Assumptions Objectives Teach basics of MPI-Programming Share information

More information

Message-Passing Computing

Message-Passing Computing Chapter 2 Slide 41þþ Message-Passing Computing Slide 42þþ Basics of Message-Passing Programming using userlevel message passing libraries Two primary mechanisms needed: 1. A method of creating separate

More information

Intel VTune Amplifier XE

Intel VTune Amplifier XE Intel VTune Amplifier XE Vladimir Tsymbal Performance, Analysis and Threading Lab 1 Agenda Intel VTune Amplifier XE Overview Features Data collectors Analysis types Key Concepts Collecting performance

More information

15-440: Recitation 8

15-440: Recitation 8 15-440: Recitation 8 School of Computer Science Carnegie Mellon University, Qatar Fall 2013 Date: Oct 31, 2013 I- Intended Learning Outcome (ILO): The ILO of this recitation is: Apply parallel programs

More information

Experiencing Cluster Computing Message Passing Interface

Experiencing Cluster Computing Message Passing Interface Experiencing Cluster Computing Message Passing Interface Class 6 Message Passing Paradigm The Underlying Principle A parallel program consists of p processes with different address spaces. Communication

More information

Introduction to MPI. SHARCNET MPI Lecture Series: Part I of II. Paul Preney, OCT, M.Sc., B.Ed., B.Sc.

Introduction to MPI. SHARCNET MPI Lecture Series: Part I of II. Paul Preney, OCT, M.Sc., B.Ed., B.Sc. Introduction to MPI SHARCNET MPI Lecture Series: Part I of II Paul Preney, OCT, M.Sc., B.Ed., B.Sc. preney@sharcnet.ca School of Computer Science University of Windsor Windsor, Ontario, Canada Copyright

More information

L14 Supercomputing - Part 2

L14 Supercomputing - Part 2 Geophysical Computing L14-1 L14 Supercomputing - Part 2 1. MPI Code Structure Writing parallel code can be done in either C or Fortran. The Message Passing Interface (MPI) is just a set of subroutines

More information

CS 426. Building and Running a Parallel Application

CS 426. Building and Running a Parallel Application CS 426 Building and Running a Parallel Application 1 Task/Channel Model Design Efficient Parallel Programs (or Algorithms) Mainly for distributed memory systems (e.g. Clusters) Break Parallel Computations

More information

MPI MESSAGE PASSING INTERFACE

MPI MESSAGE PASSING INTERFACE MPI MESSAGE PASSING INTERFACE David COLIGNON, ULiège CÉCI - Consortium des Équipements de Calcul Intensif http://www.ceci-hpc.be Outline Introduction From serial source code to parallel execution MPI functions

More information

Exercises: April 11. Hermann Härtig, TU Dresden, Distributed OS, Load Balancing

Exercises: April 11. Hermann Härtig, TU Dresden, Distributed OS, Load Balancing Exercises: April 11 1 PARTITIONING IN MPI COMMUNICATION AND NOISE AS HPC BOTTLENECK LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2017 Hermann Härtig THIS LECTURE Partitioning: bulk synchronous

More information

Parallel Applications on Distributed Memory Systems. Le Yan HPC User LSU

Parallel Applications on Distributed Memory Systems. Le Yan HPC User LSU Parallel Applications on Distributed Memory Systems Le Yan HPC User Services @ LSU Outline Distributed memory systems Message Passing Interface (MPI) Parallel applications 6/3/2015 LONI Parallel Programming

More information

The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs

The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs 1 The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) s http://mpi-forum.org https://www.open-mpi.org/ Mike Bailey mjb@cs.oregonstate.edu Oregon State University mpi.pptx

More information

LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS Hermann Härtig

LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS Hermann Härtig LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2016 Hermann Härtig LECTURE OBJECTIVES starting points independent Unix processes and block synchronous execution which component (point in

More information

Paul Burton April 2015 An Introduction to MPI Programming

Paul Burton April 2015 An Introduction to MPI Programming Paul Burton April 2015 Topics Introduction Initialising MPI & basic concepts Compiling and running a parallel program on the Cray Practical : Hello World MPI program Synchronisation Practical Data types

More information

Cornell Theory Center. Discussion: MPI Collective Communication I. Table of Contents. 1. Introduction

Cornell Theory Center. Discussion: MPI Collective Communication I. Table of Contents. 1. Introduction 1 of 18 11/1/2006 3:59 PM Cornell Theory Center Discussion: MPI Collective Communication I This is the in-depth discussion layer of a two-part module. For an explanation of the layers and how to navigate

More information

MPI MESSAGE PASSING INTERFACE

MPI MESSAGE PASSING INTERFACE MPI MESSAGE PASSING INTERFACE David COLIGNON, ULiège CÉCI - Consortium des Équipements de Calcul Intensif http://www.ceci-hpc.be Outline Introduction From serial source code to parallel execution MPI functions

More information

Introduction to MPI HPC Workshop: Parallel Programming. Alexander B. Pacheco

Introduction to MPI HPC Workshop: Parallel Programming. Alexander B. Pacheco Introduction to MPI 2018 HPC Workshop: Parallel Programming Alexander B. Pacheco Research Computing July 17-18, 2018 Distributed Memory Model Each process has its own address space Data is local to each

More information

Parallel Programming Using MPI

Parallel Programming Using MPI Parallel Programming Using MPI Prof. Hank Dietz KAOS Seminar, February 8, 2012 University of Kentucky Electrical & Computer Engineering Parallel Processing Process N pieces simultaneously, get up to a

More information

CSE. Parallel Algorithms on a cluster of PCs. Ian Bush. Daresbury Laboratory (With thanks to Lorna Smith and Mark Bull at EPCC)

CSE. Parallel Algorithms on a cluster of PCs. Ian Bush. Daresbury Laboratory (With thanks to Lorna Smith and Mark Bull at EPCC) Parallel Algorithms on a cluster of PCs Ian Bush Daresbury Laboratory I.J.Bush@dl.ac.uk (With thanks to Lorna Smith and Mark Bull at EPCC) Overview This lecture will cover General Message passing concepts

More information

IPM Workshop on High Performance Computing (HPC08) IPM School of Physics Workshop on High Perfomance Computing/HPC08

IPM Workshop on High Performance Computing (HPC08) IPM School of Physics Workshop on High Perfomance Computing/HPC08 IPM School of Physics Workshop on High Perfomance Computing/HPC08 16-21 February 2008 MPI tutorial Luca Heltai Stefano Cozzini Democritos/INFM + SISSA 1 When

More information

Tool for Analysing and Checking MPI Applications

Tool for Analysing and Checking MPI Applications Tool for Analysing and Checking MPI Applications April 30, 2010 1 CONTENTS CONTENTS Contents 1 Introduction 3 1.1 What is Marmot?........................... 3 1.2 Design of Marmot..........................

More information

Part One: The Files. C MPI Slurm Tutorial - Hello World. Introduction. Hello World! hello.tar. The files, summary. Output Files, summary

Part One: The Files. C MPI Slurm Tutorial - Hello World. Introduction. Hello World! hello.tar. The files, summary. Output Files, summary C MPI Slurm Tutorial - Hello World Introduction The example shown here demonstrates the use of the Slurm Scheduler for the purpose of running a C/MPI program. Knowledge of C is assumed. Having read the

More information

Cluster Clonetroop: HowTo 2014

Cluster Clonetroop: HowTo 2014 2014/02/25 16:53 1/13 Cluster Clonetroop: HowTo 2014 Cluster Clonetroop: HowTo 2014 This section contains information about how to access, compile and execute jobs on Clonetroop, Laboratori de Càlcul Numeric's

More information

Programming Scalable Systems with MPI. Clemens Grelck, University of Amsterdam

Programming Scalable Systems with MPI. Clemens Grelck, University of Amsterdam Clemens Grelck University of Amsterdam UvA / SurfSARA High Performance Computing and Big Data Course June 2014 Parallel Programming with Compiler Directives: OpenMP Message Passing Gentle Introduction

More information

Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters

Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters K. Kandalla, A. Venkatesh, K. Hamidouche, S. Potluri, D. Bureddy and D. K. Panda Presented by Dr. Xiaoyi

More information

Hands-on. MPI basic exercises

Hands-on. MPI basic exercises WIFI XSF-UPC: Username: xsf.convidat Password: 1nt3r3st3l4r WIFI EDUROAM: Username: roam06@bsc.es Password: Bsccns.4 MareNostrum III User Guide http://www.bsc.es/support/marenostrum3-ug.pdf Remember to

More information