Introduction to MPI. May 20, Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign

Similar documents
Parallel Computing Paradigms

HPC Parallel Programing Multi-node Computation with MPI - I

An Introduction to MPI

Practical Introduction to Message-Passing Interface (MPI)

Slides prepared by : Farzana Rahman 1

CS 6230: High-Performance Computing and Parallelization Introduction to MPI

CS 179: GPU Programming. Lecture 14: Inter-process Communication

Introduction to the Message Passing Interface (MPI)

The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing

Introduction to MPI. Ricardo Fonseca.

Collective Communication in MPI and Advanced Features

CSE. Parallel Algorithms on a cluster of PCs. Ian Bush. Daresbury Laboratory (With thanks to Lorna Smith and Mark Bull at EPCC)

Holland Computing Center Kickstart MPI Intro

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

Message Passing Interface

Cornell Theory Center. Discussion: MPI Collective Communication I. Table of Contents. 1. Introduction

A few words about MPI (Message Passing Interface) T. Edwald 10 June 2008

Practical Scientific Computing: Performanceoptimized

High Performance Computing

Introduction to parallel computing concepts and technics

Recap of Parallelism & MPI

CSE 613: Parallel Programming. Lecture 21 ( The Message Passing Interface )

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

MPI Message Passing Interface

Outline. Communication modes MPI Message Passing Interface Standard

Practical Introduction to Message-Passing Interface (MPI)

Outline. Communication modes MPI Message Passing Interface Standard. Khoa Coâng Ngheä Thoâng Tin Ñaïi Hoïc Baùch Khoa Tp.HCM

CS4961 Parallel Programming. Lecture 18: Introduction to Message Passing 11/3/10. Final Project Purpose: Mary Hall November 2, 2010.

Message Passing Interface. most of the slides taken from Hanjun Kim

Acknowledgments. Programming with MPI Basic send and receive. A Minimal MPI Program (C) Contents. Type to enter text

Programming with MPI Basic send and receive

CS4961 Parallel Programming. Lecture 16: Introduction to Message Passing 11/3/11. Administrative. Mary Hall November 3, 2011.

MPI Tutorial. Shao-Ching Huang. IDRE High Performance Computing Workshop

Message Passing Interface

Parallel programming MPI

MPI 5. CSCI 4850/5850 High-Performance Computing Spring 2018

The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs

MPI. (message passing, MIMD)

Basic MPI Communications. Basic MPI Communications (cont d)

COSC 6374 Parallel Computation. Message Passing Interface (MPI ) I Introduction. Distributed memory machines

Parallel Programming Using MPI

Chip Multiprocessors COMP Lecture 9 - OpenMP & MPI

The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs

High Performance Computing Course Notes Message Passing Programming I

MA471. Lecture 5. Collective MPI Communication

IPM Workshop on High Performance Computing (HPC08) IPM School of Physics Workshop on High Perfomance Computing/HPC08

Distributed Memory Parallel Programming

Introduction to MPI. SHARCNET MPI Lecture Series: Part I of II. Paul Preney, OCT, M.Sc., B.Ed., B.Sc.

Standard MPI - Message Passing Interface

CINES MPI. Johanne Charpentier & Gabriel Hautreux

Introduction to MPI. SuperComputing Applications and Innovation Department 1 / 143

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

Introduction to MPI. Ekpe Okorafor. School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014

MPI Collective communication

Parallel Programming

Parallel Programming Using MPI

AMath 483/583 Lecture 21

Programming Scalable Systems with MPI. Clemens Grelck, University of Amsterdam

Introduction to MPI. Jerome Vienne Texas Advanced Computing Center January 10 th,

Part - II. Message Passing Interface. Dheeraj Bhardwaj

MPI and comparison of models Lecture 23, cs262a. Ion Stoica & Ali Ghodsi UC Berkeley April 16, 2018

Distributed Memory Programming with MPI

Practical stuff! ü OpenMP

Introduction to MPI: Part II

Számítogépes modellezés labor (MSc)

The Message Passing Model

High performance computing. Message Passing Interface

A Message Passing Standard for MPP and Workstations

Lecture 9: MPI continued

CS 426. Building and Running a Parallel Application

Parallel Programming using MPI. Supercomputing group CINECA

Experiencing Cluster Computing Message Passing Interface

PCAP Assignment I. 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail.

What is Hadoop? Hadoop is an ecosystem of tools for processing Big Data. Hadoop is an open source project.

Tutorial: parallel coding MPI

15-440: Recitation 8

Programming Scalable Systems with MPI. UvA / SURFsara High Performance Computing and Big Data. Clemens Grelck, University of Amsterdam

Lecture 4 Introduction to MPI

NUMERICAL PARALLEL COMPUTING

MPI MESSAGE PASSING INTERFACE

mith College Computer Science CSC352 Week #7 Spring 2017 Introduction to MPI Dominique Thiébaut

MPI (Message Passing Interface)

Introduction to MPI, the Message Passing Library

Parallel Short Course. Distributed memory machines

Parallel Computing and the MPI environment

Parallel Programming with MPI MARCH 14, 2018

Message Passing Interface. George Bosilca

HPC Workshop University of Kentucky May 9, 2007 May 10, 2007

Message-Passing Computing

Parallel hardware. Distributed Memory. Parallel software. COMP528 MPI Programming, I. Flynn s taxonomy:

Introduction to MPI. Ritu Arora Texas Advanced Computing Center June 17,

AMath 483/583 Lecture 18 May 6, 2011

Introduction to MPI. Branislav Jansík

Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS. Teacher: Jan Kwiatkowski, Office 201/15, D-2

Claudio Chiaruttini Dipartimento di Matematica e Informatica Centro Interdipartimentale per le Scienze Computazionali (CISC) Università di Trieste

Message Passing Interface

Parallel Programming Using Basic MPI. Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center

An introduction to MPI

Advanced Message-Passing Interface (MPI)

Department of Informatics V. HPC-Lab. Session 4: MPI, CG M. Bader, A. Breuer. Alex Breuer

Transcription:

Introduction to MPI May 20, 2013 Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign

Top500.org PERFORMANCE DEVELOPMENT 1 Eflop/s 162 Pflop/s PROJECTED 100 Pflop/s 10 Pflop/s 1 Pflop/s 100 T flop/s 10 T flop/s 1 T flop/s 100 Gflop/s 1.17 T flop/s 59.7 Gflop/s SUM N=1 N=500 17.6 Pflop/s 76.5 T flop/s 10 Gflop/s 1 Gflop/s 0.4 Gflop/s 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 100% ARCHITECTURES SIMD 100% CHIP TECHNOLOGY Alpha 80% 60% MPP Constellations Clusters 80% 60% MIPS IBM HP INTEL 40% 40% 20% Single Proc. SMP 20% Proprietary SPARC AMD 0% 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 0% 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 D. J. Bodony (UIUC) Introduction to MPI May 20, 2013 2 / 35

Types of Parallelism SISD Single Instruction Single Data single core PCs SIMD Single Instruction Multiple Data GPUs MISD Multiple Instruction Single Data Esoteric MIMD Multiple Instruction Multiple Data multi-core PCs; most parallel computers D. J. Bodony (UIUC) Introduction to MPI May 20, 2013 3 / 35

Types of Parallelism SISD Single Instruction Single Data single core PCs SIMD Single Instruction Multiple Data GPUs MISD Multiple Instruction Single Data Esoteric MIMD Multiple Instruction Multiple Data multi-core PCs; most parallel computers MIMD = Need to manage tasks and data = MPI D. J. Bodony (UIUC) Introduction to MPI May 20, 2013 3 / 35

What is MPI? MPI Message Passing Interface Developed in early 1990s... a portable, efficient, and flexible standard for message passing that will be widely used for writing message passing programs a Describes a means to pass information between processes a Blaise, https://computing.llnl.gov/tutorials/mpi D. J. Bodony (UIUC) Introduction to MPI May 20, 2013 4 / 35

What is MPI? MPI Message Passing Interface Developed in early 1990s... a portable, efficient, and flexible standard for message passing that will be widely used for writing message passing programs a Describes a means to pass information between processes a Blaise, https://computing.llnl.gov/tutorials/mpi MPI implementations Open source MPICH (DOE s Argonne) http://www.mpich.org MVAPICH (OSU) http://mvapich.cse.ohio-state.edu/overview/mvapich2 OpenMPI (community) http://www.open-mpi.org Vendor-specific (IBM, SGI,... ) D. J. Bodony (UIUC) Introduction to MPI May 20, 2013 4 / 35

Why Use MPI? 1 Available: on nearly every parallel computer (including single CPU, multi-core systems) 2 Flexible: runs equally well on distributed memory, shared memory, and hybrid machines 3 Portable: same code runs on different machines without modification 1 4 Fast: vendors can implement (parts of it) in hardware 5 Scalable: demonstrated to work well on 1,000,000+ processes 6 Stable: future exascale machines are expected to have MPI 7 Multi-lingual: MPI available in Fortran 77/90+, C, C++, Python,... 1 Usually, unless you and/or the implementation has a bug D. J. Bodony (UIUC) Introduction to MPI May 20, 2013 5 / 35

MPI = Explicit Parallelism Within MPI you, the programmer, must explicitly include all parallelism, including: Task decomposition (how the data is to be distributed) Task granularity (problem size per MPI process) Message organization & handling (send, receive) Development of parallel algorithms (e.g., sparse mat-vec multiply) D. J. Bodony (UIUC) Introduction to MPI May 20, 2013 6 / 35

MPI = Explicit Parallelism Within MPI you, the programmer, must explicitly include all parallelism, including: Task decomposition (how the data is to be distributed) Task granularity (problem size per MPI process) Message organization & handling (send, receive) Development of parallel algorithms (e.g., sparse mat-vec multiply) Other programming models, like OpenMP, can handle the parallelism at the compile stage (but usually at a loss of performance or scalability) Using MPI & OpenMP together has some advantages on emerging computers (e.g., 1,000 nodes with 100s cores each) D. J. Bodony (UIUC) Introduction to MPI May 20, 2013 6 / 35

MPI Program Structure in Fortran Program NULL Include mpif.h Implicit None Integer :: rank, numprocs, ierr Call MPI_Init(ierr) Call MPI_Comm_rank( MPI_COMM_WORLD, rank, ierr ) Call MPI_Comm_size( MPI_COMM_WORLD, numprocs, ierr ) Write (*, (2(A,I5),A) ) Processor, rank, of, & numprocs, ready. < do stuff > Call MPI_Finalize ( ierr ) End Program D. J. Bodony (UIUC) Introduction to MPI May 20, 2013 7 / 35

MPI Program Structure in C #include <stdio.h> #include <stdlib.h> #include "mpi.h" int main (int argc, char *argv[]) { int ierr, rank, numprocs; ierr = MPI_Init(); ierr = MPI_Comm_rank ( MPI_COMM_WORLD, &rank ); ierr = MPI_Comm_size ( MPI_COMM_WORLD, &numprocs ); printf("processor %d of %d ready.\n"; rank, numprocs); < do stuff > ierr = MPI_Finalize(); return EXIT_SUCCESS; } D. J. Bodony (UIUC) Introduction to MPI May 20, 2013 8 / 35

Dissecting The Program Structure Structure of an MPI program (in any language): 1 Initialize code in serial 2 Initialize MPI 3 Do good work in parallel 4 Finalize MPI 5 Finalize code in serial #include <stdio.h> #include <stdlib.h> #include "mpi.h" int main (int argc, char *argv[]) { int ierr, rank, numprocs; ierr = MPI_Init(); ierr = MPI_Comm_rank ( MPI_COMM_WORLD, &rank ); ierr = MPI_Comm_size ( MPI_COMM_WORLD, &numprocs ); printf("processor %d of %d ready.\n"; rank, numprocs); < do stuff > ierr = MPI_Finalize(); return EXIT_SUCCESS; } The MPI commands above define the Environment Management routines. D. J. Bodony (UIUC) Introduction to MPI May 20, 2013 9 / 35

MPI COMM WORLD and Communicators MPI Communicators Establish a group of MPI processes MPI COMM WORLD is the default and contains all processes Can be created, copied, split, destroyed Serve as an argument for most point-to-point and collective operations Communicator datatypes Fortran: Integer (e.g., Integer :: mycomm) C: MPI Comm (e.g., MPI_Comm mycomm;) C++: (e.g., MPI::Intercomm mycomm;) D. J. Bodony (UIUC) Introduction to MPI May 20, 2013 10 / 35

Communicator Examples :: MPI COMM WORLD 30 31 32 33 34 35 24 25 26 27 28 29 18 19 20 21 22 23 12 13 14 15 16 17 6 7 8 9 10 11 0 1 2 3 4 5 MPI COMM WORLD contains all 36 processors, ranks 0 through 35 D. J. Bodony (UIUC) Introduction to MPI May 20, 2013 11 / 35

Communicator Examples :: Other Possibilities 30 31 32 33 34 35 24 25 26 27 28 29 18 19 20 21 22 23 12 13 14 15 16 17 6 7 8 9 10 11 0 1 2 3 4 5 Each color could be a different communicator D. J. Bodony (UIUC) Introduction to MPI May 20, 2013 12 / 35

Communicator Examples :: Other Possibilities 30 31 32 33 34 35 24 25 26 27 28 29 18 19 20 21 22 23 12 13 14 15 16 17 6 7 8 9 10 11 0 1 2 3 4 5 Each color (including white) could be a different communicator D. J. Bodony (UIUC) Introduction to MPI May 20, 2013 13 / 35

Interlude #1 Parallel Hello, World! Write a MPI program that prints Hello, World! as below: rank 0: Hello, World! rank 1: Hello, World!... rank <numprocs-1>: Hello, World! Do the processes write in ascending order? You have 30 minutes for this task. D. J. Bodony (UIUC) Introduction to MPI May 20, 2013 14 / 35

Interprocess Communication in MPI There are two primary types of communication Point-to-point 30 31 32 33 34 35 24 25 26 27 28 29 18 19 20 21 22 23 12 13 14 15 16 17 6 7 8 9 10 11 0 1 2 3 4 5 Collective 30 31 32 33 34 35 24 25 26 27 28 29 18 19 20 21 22 23 12 13 14 15 16 17 6 7 8 9 10 11 0 1 2 3 4 5 D. J. Bodony (UIUC) Introduction to MPI May 20, 2013 15 / 35

Point-to-Point Send/Recv Communication in MPI Process 13 Send... if (rank == 13) & Call MPI_Send (sbuf, count, type, dest, & tag, comm, ierr)... variable sbuf count type dest tag comm comment sending variable sbuf element size sbuf variable type process id in comm message tag communicator Process 27 Recv... if (rank == 27) & Call MPI_Recv (rbuf, count, type, source, & tag, comm, status, ierr)... variable rbuf count type source tag comm status comment receiving variable rbuf element size rbuf variable type process id in comm message tag communicator message status D. J. Bodony (UIUC) Introduction to MPI May 20, 2013 16 / 35

Point-to-Point Send/Recv Communication in MPI Call MPI_Send (sbuf, count, type, dest, tag, comm, ierr) Example: Integer :: N, dest, tag, ierr Real(KIND=8), Pointer :: sbuf(:)... N = 100 tag = 1 dest = 27 allocate(sbuf(n)) Call MPI_Send (sbuf, N, MPI_REAL8, dest, tag, & MPI_COMM_WORLD, ierr)... D. J. Bodony (UIUC) Introduction to MPI May 20, 2013 17 / 35

Point-to-Point Send/Recv Communication in MPI Call MPI_Recv (rbuf, count, type, source, tag, comm, status, i Example: Integer :: N, dest, tag, ierr Real(KIND=8), Pointer :: rbuf(:) Integer :: status(mpi_status_size)... N = 100 tag = 1 source = 13 allocate(rbuf(n)) Call MPI_Recv (rbuf, N, MPI_REAL8, source, tag, & MPI_COMM_WORLD, status, ierr)... Note: status contains source id, tag, and count of the message received (useful for debugging when sending multiple messages) D. J. Bodony (UIUC) Introduction to MPI May 20, 2013 18 / 35

MPI Predefined Datatypes 2 Fortran C Comment MPI CHARACTER MPI CHAR Character MPI INTEGER MPI INT Integer MPI REAL MPI FLOAT Single precision MPI REAL8 Real(KIND=8) in Fortran MPI DOUBLE PRECISION MPI DOUBLE Double precision MPI LOGICAL MPI C BOOL Logical... 2 There exist many more! D. J. Bodony (UIUC) Introduction to MPI May 20, 2013 19 / 35

Interlude #2a Send/Recv Example Write a MPI program that passes one integer from process 0 to process numprocs-1 through each process (1, 2,..., numprocs-2) and adds one to it after each MPI Recv. You have 30 minutes for this task. D. J. Bodony (UIUC) Introduction to MPI May 20, 2013 20 / 35

MPI Send and MPI Recv Are Blocking When MPI Send or MPI Recv are called, that process waits and is blocked from performing further communication and/or computation. Advantage: buffers sbuf and rbuf are able to be reused once the call finishes. Disadvantage: code waits for matching Send/Recv pair to complete. MPI provides non-blocking communication routines MPI Isend and MPI Irecv. Call MPI_Isend (sbuf, count, datatype, dest, tag, comm, & request, ierr) Call MPI_Irecv (rbuf, count, datatype, src, tag, comm, & request, ierr) Call MPI_Wait (request, status, ierr) Must watch buffer use closely. D. J. Bodony (UIUC) Introduction to MPI May 20, 2013 21 / 35

Non-Blocking Point-to-Point Communication Communication and computation can overlap the key to getting good scalability! High-Level Example... Call MPI_Isend (sbuf, count, datatype, dest, tag(1), comm, & request(1), ierr) Call MPI_Irecv (rbuf, count, datatype, src, tag(2), comm, & request(2), ierr) < do work while the Send and Recv complete > Call MPI_Waitall (2, request, status, ierr)... D. J. Bodony (UIUC) Introduction to MPI May 20, 2013 22 / 35

Interlude #2b Blocking vs. Nonblocking Send/Recv Compile and run Blaise Barney s mpi bandwidth.[c,f] and mpi bandwidth nonblock.[c,f] to observe the differences between blocking and non-blocking communications. file mpi bandwidth.c mpi bandwidth nonblock.c mpi bandwidth.f mpi bandwidth nonblock.f url https://computing.llnl.gov/tutorials/mpi/samples/c/mpi_bandwidth.c https://computing.llnl.gov/tutorials/mpi/samples/c/mpi_bandwidth_nonblock.c https://computing.llnl.gov/tutorials/mpi/samples/fortran/mpi_bandwidth.f https://computing.llnl.gov/tutorials/mpi/samples/fortran/mpi_bandwidth_nonblock.f You have 15 minutes for this task. D. J. Bodony (UIUC) Introduction to MPI May 20, 2013 23 / 35

Collective Communications In many circumstances, all processes on a communicator must exchange data = collective communication. D. J. Bodony (UIUC) Introduction to MPI May 20, 2013 24 / 35

MPI Bcast Call MPI_BCAST (buffer, count, datatype, & root, comm, ierr) MPI_Bcast (&buffer, count, datatype, root, comm); D. J. Bodony (UIUC) Introduction to MPI May 20, 2013 25 / 35

MPI Scatter Call MPI_SCATTER (sbuf, scount, stype, & rbuf, rcount, rtype, & root, comm, ierr) MPI_Scatter (&sbuf, scount, stype, &rbuf, rcount, rtype, root, comm); D. J. Bodony (UIUC) Introduction to MPI May 20, 2013 26 / 35

MPI Gather Call MPI_GATHER (sbuf, scount, stype, & rbuf, rcount, rtype, & root, comm, ierr) MPI_Gather (&sbuf, scount, stype, &rbuf, rcount, rtype, root, comm); D. J. Bodony (UIUC) Introduction to MPI May 20, 2013 27 / 35

MPI Reduce Call MPI_REDUCE (sbuf, rbuf, rcount, & rtype, op, root, & comm, ierr) MPI_Reduce (&sbuf, &rbuf, rcount, & rtype, op, root, & comm); op MPI MAX MPI MIN MPI SUM MPI PROD. description maximum minimum sum product D. J. Bodony (UIUC) Introduction to MPI May 20, 2013 28 / 35

All Variants of MPI Collective Routines D. J. Bodony (UIUC) Introduction to MPI May 20, 2013 29 / 35

All Variants of MPI Collective Routines D. J. Bodony (UIUC) Introduction to MPI May 20, 2013 30 / 35

All Variants of MPI Collective Routines Useful for matrix transpose and FFTs. (NOTE: It s future on exascale machines is not clear.) D. J. Bodony (UIUC) Introduction to MPI May 20, 2013 31 / 35

Interlude #3 Parallel Inner Product Develop a parallel code to take the scalar product of two N 1 vectors x and y, i.e., x T y. Choose N > 5N p, where N p is the number of MPI processes. Have the scalar answer stored on all processors. You have 30 minutes for this task. D. J. Bodony (UIUC) Introduction to MPI May 20, 2013 32 / 35

Interlude #4 Parallel Matrix Transpose Use MPI Alltoall to take the transpose of a 10N p 10N p matrix, where N p is the number of MPI processes used. You have 30 minutes for this task. D. J. Bodony (UIUC) Introduction to MPI May 20, 2013 33 / 35

Summary What MPI commands did we learn? MPI Init MPI Comm rank, MPI Comm size MPI Finalize MPI Send, MPI Recv MPI Isend, MPI Irecv MPI Waitall MPI Scatter, MPI Bcast MPI Gather, MPI Reduce MPI Allreduce, MPI Allscatter MPI Alltoall MPI Barrier These commands will get you very far in your parallel program development. D. J. Bodony (UIUC) Introduction to MPI May 20, 2013 34 / 35

Additional Resources 1 http://www.mcs.anl.gov/research/projects/mpi/tutorial/index.html 2 https://computing.llnl.gov/tutorials/parallel_comp/ 3 https://computing.llnl.gov/tutorials/mpi/ 4 http://www.mpi-forum.org/ D. J. Bodony (UIUC) Introduction to MPI May 20, 2013 35 / 35