A Message Passing Standard for MPP and Workstations

Similar documents
A Message Passing Standard for MPP and Workstations. Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W.

Intermediate MPI features

Outline. Communication modes MPI Message Passing Interface Standard

MPI Message Passing Interface

CSE 613: Parallel Programming. Lecture 21 ( The Message Passing Interface )

High Performance Computing

Standard MPI - Message Passing Interface

An Introduction to MPI

Acknowledgments. Programming with MPI Basic send and receive. A Minimal MPI Program (C) Contents. Type to enter text

Programming with MPI Basic send and receive

Outline. Communication modes MPI Message Passing Interface Standard. Khoa Coâng Ngheä Thoâng Tin Ñaïi Hoïc Baùch Khoa Tp.HCM

Introduction to parallel computing concepts and technics

The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs

MPI. (message passing, MIMD)

CS 179: GPU Programming. Lecture 14: Inter-process Communication

CS4961 Parallel Programming. Lecture 18: Introduction to Message Passing 11/3/10. Final Project Purpose: Mary Hall November 2, 2010.

High-Performance Computing: MPI (ctd)

Message Passing Interface

Parallel programming MPI

Peter Pacheco. Chapter 3. Distributed Memory Programming with MPI. Copyright 2010, Elsevier Inc. All rights Reserved

Distributed Memory Programming with MPI. Copyright 2010, Elsevier Inc. All rights Reserved

CS 6230: High-Performance Computing and Parallelization Introduction to MPI

MPI 5. CSCI 4850/5850 High-Performance Computing Spring 2018

High performance computing. Message Passing Interface

Practical Scientific Computing: Performanceoptimized

MPI: Message Passing Interface An Introduction. S. Lakshmivarahan School of Computer Science

Basic MPI Communications. Basic MPI Communications (cont d)

Chapter 3. Distributed Memory Programming with MPI

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

Reusing this material

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs

The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing

A message contains a number of elements of some particular datatype. MPI datatypes:

COSC 6374 Parallel Computation

Programming Scalable Systems with MPI. Clemens Grelck, University of Amsterdam

Distributed Memory Parallel Programming

Topics. Lecture 7. Review. Other MPI collective functions. Collective Communication (cont d) MPI Programming (III)

Point-to-Point Communication. Reference:

Parallel Computing Paradigms

Collective Communication in MPI and Advanced Features

Message Passing Interface. most of the slides taken from Hanjun Kim

CS4961 Parallel Programming. Lecture 19: Message Passing, cont. 11/5/10. Programming Assignment #3: Simple CUDA Due Thursday, November 18, 11:59 PM

Message Passing Programming. Modes, Tags and Communicators

Tutorial 2: MPI. CS486 - Principles of Distributed Computing Papageorgiou Spyros

Holland Computing Center Kickstart MPI Intro

Chip Multiprocessors COMP Lecture 9 - OpenMP & MPI

IPM Workshop on High Performance Computing (HPC08) IPM School of Physics Workshop on High Perfomance Computing/HPC08

Slides prepared by : Farzana Rahman 1

Topics. Lecture 6. Point-to-point Communication. Point-to-point Communication. Broadcast. Basic Point-to-point communication. MPI Programming (III)

Introduction to the Message Passing Interface (MPI)

High Performance Computing Course Notes Message Passing Programming I

Message-Passing Computing

CSE. Parallel Algorithms on a cluster of PCs. Ian Bush. Daresbury Laboratory (With thanks to Lorna Smith and Mark Bull at EPCC)

Introduction to MPI. May 20, Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign

Department of Informatics V. HPC-Lab. Session 4: MPI, CG M. Bader, A. Breuer. Alex Breuer

CS4961 Parallel Programming. Lecture 16: Introduction to Message Passing 11/3/11. Administrative. Mary Hall November 3, 2011.

Message Passing Programming. Modes, Tags and Communicators

Message Passing Interface

Parallel Programming with MPI: Day 1

CSE 160 Lecture 15. Message Passing

Message Passing Interface

Introduction to MPI: Part II

Part - II. Message Passing Interface. Dheeraj Bhardwaj

Cluster Computing MPI. Industrial Standard Message Passing

Parallel Computing and the MPI environment

Experiencing Cluster Computing Message Passing Interface

Document Classification

MPI 3. CSCI 4850/5850 High-Performance Computing Spring 2018

Lecture 7: More about MPI programming. Lecture 7: More about MPI programming p. 1

Non-Blocking Communications

Bulk Synchronous and SPMD Programming. The Bulk Synchronous Model. CS315B Lecture 2. Bulk Synchronous Model. The Machine. A model

Parallel Short Course. Distributed memory machines

Practical Introduction to Message-Passing Interface (MPI)

Non-Blocking Communications

Programming Scalable Systems with MPI. UvA / SURFsara High Performance Computing and Big Data. Clemens Grelck, University of Amsterdam

Parallel Programming

AMath 483/583 Lecture 21

Parallel Programming in C with MPI and OpenMP

Week 3: MPI. Day 02 :: Message passing, point-to-point and collective communications

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

Advanced MPI. Andrew Emerson

Recap of Parallelism & MPI

More about MPI programming. More about MPI programming p. 1

Distributed Systems + Middleware Advanced Message Passing with MPI

HPC Parallel Programing Multi-node Computation with MPI - I

ME964 High Performance Computing for Engineering Applications

MPI point-to-point communication

Review of MPI Part 2

Buffering in MPI communications

Document Classification Problem

PCAP Assignment I. 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail.

Parallel Programming

CS 426. Building and Running a Parallel Application

MPI 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Programming SoHPC Course June-July 2015 Vladimir Subotic MPI - Message Passing Interface

MPI Runtime Error Detection with MUST

MPI MESSAGE PASSING INTERFACE

Distributed Memory Programming with MPI

Parallel Programming Using MPI

Transcription:

A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker

Message Passing Interface (MPI) Message passing library Can be added to sequential languages (C, Fortran) Designed by consortium from industry, academics, government Goal is a message passing standard

MPI Programming Model Multiple Program Multiple Data (MPMD) Processors may execute different programs (unlike SPMD) Number of processes is fixed (one per processor) No support for multi-threading Point-to-point and collective communication

MPI Basics MPI_INIT MPI_FINALIZE MPI_COMM SIZE MPI_COMM RANK MPI_SEND MPI_RECV initialize MPI terminate computation number of processes my process identifier send a message receive a message

Language Bindings Describes for a given base language concrete syntax error-handling conventions parameter modes Popular base languages: C Fortran

Point-to-point message passing Messages sent from one processor to another are FIFO ordered Messages sent by different processors arrive non-deterministically Receiver may specify source source = sender's identity => symmetric naming source = MPI_ANY_SOURCE => asymmetric naming example: specify sender of next pivot row in ASP Receiver may also specify tag Distinguishes different kinds of messages Similar to operation name in SR

Examples (1/2) int x, status; float buf[10]; MPI_SEND (buf, 10, MPI_FLOAT, 3, 0, MPI_COMM_WORLD); /* send 10 floats to process 3; MPI_COMM_WORLD = all processes */ MPI_RECV (&x, 1, MPI_INT, 15, 0, MPI_COMM_WORLD, &status); /* receive 1 integer from process 15 */ MPI_RECV (&x, 1, MPI_INT, MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, &status); /* receive 1 integer from any process */

Examples (2/2) int x, status; #define NEW_MINIMUM 1 MPI_SEND (&x, 1, MPI_INT, 3, NEW_MINIMUM, MPI_COMM_WORLD); /* send message with tag NEW_MINIMUM */. MPI_RECV (&x, 1, MPI_INT, 15, NEW_MINIMUM, MPI_COMM_WORLD, &status); /* receive 1 integer with tag NEW_MINIMUM */ MPI_RECV (&x, 1, MPI_INT, MPI_ANY_SOURCE, NEW_MINIMUM, MPI_COMM_WORLD, &status); /* receive tagged message from any source */

Forms of message passing MPI has a very wide and complex variety of primitives Programmer can decide whether to: minimize copying overhead -> synchronous and ready-mode sends minimize idle time, overlap communication & computation -> buffered sends and nonblocking sends Communication modes control the buffering (Non)blocking determines when sends complete

Standard: Communication modes Programmer cannot make any assumptions whether the message is buffered, this is up to the system Buffered: Programmer provides (bounded) buffer space SEND completes when message is copied into buffer (local) Erroneous if buffer space is insufficient Synchronous: Send waits for matching receive No buffering -> easy to get deadlocks Ready: Programmer asserts that receive has already been posted Erroneous if there is no matching receive yet

Unsafe programs MPI does not guarantee any system buffering Programs that assume it are unsafe and may deadlock Example of such a deadlock: Machine 0: MPI_SEND (&x1, 10, MPI_INT, 1, 0, MPI_COMM_WORLD); MPI_RECV (&x2, 10, MPI_INT, 1, 0, MPI_COMM_WORLD, &status); Machine 1: MPI_SEND (&y1, 10, MPI_INT, 0, 0, MPI_COMM_WORLD); MPI_RECV (&y2, 10, MPI_INT, 0, 0, MPI_COMM_WORLD, &status);

(Non)blocking sends A blocking send returns when it s safe to modify its arguments A non-blocking ISEND returns immediately (dangerous) int buf[10]; request = MPI_ISEND (buf, 10, MPI_FLOAT, 3, 0, MPI_COMM_WORLD); compute(); /* these computations can be overlapped with the transmission */ buf[2]++; /* dangerous: may or may not affect the transmitted buf */ MPI_WAIT(&request, & status); /* waits until the ISEND completes */ Don t confuse this with synchronous vs asynchronous! synchronous = wait for (remote) receiver nonblocking = wait till arguments have been saved (locally)

Non-blocking receive MPI_IPROBE MPI_PROBE MPI_GET_COUNT check for pending message wait for pending message number of data elements in message MPI_PROBE (source, tag, comm, &status) => status MPI_GET_COUNT (status, datatype, &count) => message size status.mpi_source => identity of sender status.mpi_tag => tag of message

Example: Check for Pending Messages int buf[1], flag, source, minimum; while (...) { MPI_IPROBE(MPI_ANY_SOURCE, NEW_MINIMUM, comm, &flag, &status); while (flag) { /* handle new minimum */ source = status.mpi_source; MPI_RECV (buf, 1, MPI_INT, source, NEW_MINIMUM, comm, &status); minimum = buf[0]; /* check for another update */ MPI_IPROBE(MPI_ANY_SOURCE, NEW_MINIMUM, comm, &flag, &status); }... /* compute */ }

Example: Receiving Message with Unknown Size int count, *buf, source; MPI_PROBE(MPI_ANY_SOURCE, 0, comm, &status); source = status.mpi_source; MPI_GET_COUNT (status, MPI_INT, &count); buf = malloc (count * sizeof (int)); MPI_RECV (buf, count, MPI_INT, source, 0, comm, &status);

Global Operations - Collective Communication Coordinated communication involving all processes Functions: MPI_BARRIER MPI_BCAST MPI_GATHER MPI_SCATTER MPI_REDUCE MPI_REDUCE ALL synchronize all processes send data to all processes gather data from all processes scatter data to all processes reduction operation reduction, all processes get result

Barrier MPI_BARRIER (comm) Synchronizes group of processes All processes block until all have reached the barrier Often invoked at end of loop in iterative algorithms

Figure 8.3 from Foster's book

Reduction Combine values provided by different processes Result sent to one processor (MPI REDUCE) or all processors (MPI REDUCE ALL) Used with commutative and associative operators: MAX, MIN, +, x, AND, OR

Example 1 Global minimum operation MPI_REDUCE (inbuf, outbuf, 2, MPI_INT, MPI_MIN, 0, MPI_COMM_WORLD) outbuf[0] = minimum over inbuf[0]'s outbuf[1] = minimum over inbuf[1]'s

Figure 8.4 from Foster's book

Example 2: SOR in MPI

SOR communication scheme Each CPU communicates with left & right neighbor (if existing) Also need to determine convergence criteria

Expressing SOR in MPI Use a ring topology Each processor exchanges rows with left/right neighbor Use REDUCE_ALL to determine if grid has changed less than epsilon during last iteration

Figure 8.5 from Foster's book

Semantics of collective operations Blocking operations: It s safe to reuse buffers after they return Standard mode only: Completion of a call does not guarantee that other processes have completed the operation A collective operation may or may not have the effect of synchronizing all processes

Modularity MPI programs use libraries Library routines may send messages These messages should not interfere with application messages Tags do not solve this problem

Communicators Communicator denotes group of processes (context) MPI_SEND and MPI_RECV specify a communicator MPI_RECV can only receive messages sent to same communicator Library routines should use separate communicators, passed as parameter

Discussion Library-based: No language modifications No compiler Syntax is awkward Message receipt based on identity of sender and operation tag, but not on contents of message Needs separate mechanism for organizing name space No type checking of messages

Syntax SR: call slave.coordinates(2.4, 5.67); in coordinates (x, y); MPI: #define COORDINATES_TAG 1 #define SLAVE_ID 15 float buf[2]; buf[0] = 2.4; buf[1] = 5.67; MPI_SEND (buf, 2, MPI_FLOAT, SLAVE_ID, COORDINATES_TAG, MPI_COMM_WORLD); MPI_RECV (buf, 2, MPI_FLOAT, MPI_ANY_SOURCE, COORDINATES_TAG, MPI_COMM_WORLD, &status);