Part - II. Message Passing Interface. Dheeraj Bhardwaj

Similar documents
Slides prepared by : Farzana Rahman 1

Outline. Communication modes MPI Message Passing Interface Standard

Lecture Topic: Multi-Core Processors:MPI 1.0 Overview (Part-III)

Week 3: MPI. Day 02 :: Message passing, point-to-point and collective communications

Distributed Memory Parallel Programming

Collective Communication in MPI and Advanced Features

Cornell Theory Center. Discussion: MPI Collective Communication I. Table of Contents. 1. Introduction

Buffering in MPI communications

Message passing. Week 3: MPI. Day 02 :: Message passing, point-to-point and collective communications. What is MPI?

CDP. MPI Derived Data Types and Collective Communication

Outline. Communication modes MPI Message Passing Interface Standard. Khoa Coâng Ngheä Thoâng Tin Ñaïi Hoïc Baùch Khoa Tp.HCM

High Performance Computing

High-Performance Computing: MPI (ctd)

Review of MPI Part 2

Collective Communications

Parallel Programming. Using MPI (Message Passing Interface)

High Performance Computing Course Notes Message Passing Programming III

Message Passing Interface. most of the slides taken from Hanjun Kim

Recap of Parallelism & MPI

CS 179: GPU Programming. Lecture 14: Inter-process Communication

Message Passing Interface

Experiencing Cluster Computing Message Passing Interface

Programming with MPI Collectives

High Performance Computing Course Notes Message Passing Programming III

CSE 613: Parallel Programming. Lecture 21 ( The Message Passing Interface )

Cluster Computing MPI. Industrial Standard Message Passing

Reusing this material

Lecture Topic: Multi-Core Processors: MPI 1.0 Overview (Part-II)

Practical Scientific Computing: Performanceoptimized

Standard MPI - Message Passing Interface

IPM Workshop on High Performance Computing (HPC08) IPM School of Physics Workshop on High Perfomance Computing/HPC08

HPC Parallel Programing Multi-node Computation with MPI - I

Parallel Programming in C with MPI and OpenMP

Introduction to the Message Passing Interface (MPI)

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

MA471. Lecture 5. Collective MPI Communication

NAME MPI_Address - Gets the address of a location in memory. INPUT PARAMETERS location - location in caller memory (choice)

Agenda. MPI Application Example. Praktikum: Verteiltes Rechnen und Parallelprogrammierung Introduction to MPI. 1) Recap: MPI. 2) 2.

Parallel programming MPI

The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing

MPI Runtime Error Detection with MUST

Distributed Memory Machines and Programming. Lecture 7

Programming SoHPC Course June-July 2015 Vladimir Subotic MPI - Message Passing Interface

COMP 322: Fundamentals of Parallel Programming. Lecture 34: Introduction to the Message Passing Interface (MPI), contd

COMP 322: Fundamentals of Parallel Programming

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

More MPI. Bryan Mills, PhD. Spring 2017

MPI. (message passing, MIMD)

Introduction to parallel computing concepts and technics

Distributed Systems + Middleware Advanced Message Passing with MPI

Programming Using the Message Passing Paradigm

Topics. Lecture 7. Review. Other MPI collective functions. Collective Communication (cont d) MPI Programming (III)

MPI MESSAGE PASSING INTERFACE

Message Passing with MPI

Data parallelism. [ any app performing the *same* operation across a data stream ]

Message-Passing and MPI Programming

Parallel Programming

Introduction to MPI. Shaohao Chen Research Computing Services Information Services and Technology Boston University

Practical Scientific Computing: Performanceoptimized

MPI MESSAGE PASSING INTERFACE

Parallel Computing Paradigms

Introduction to Parallel Programming

Praktikum: Verteiltes Rechnen und Parallelprogrammierung Introduction to MPI

Parallel Programming with MPI MARCH 14, 2018

For developers. If you do need to have all processes write e.g. debug messages, you d then use channel 12 (see below).

Intermediate MPI features

Message Passing Interface

Introduction to MPI. May 20, Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign

Introduction to MPI: Part II

LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS Hermann Härtig

The MPI Message-passing Standard Practical use and implementation (VI) SPD Course 08/03/2017 Massimo Coppola

MPI - The Message Passing Interface

Non-Blocking Communications

Parallel Programming in C with MPI and OpenMP

Introduction to TDDC78 Lab Series. Lu Li Linköping University Parts of Slides developed by Usman Dastgeer

MPI Message Passing Interface

Message Passing Programming with MPI. Message Passing Programming with MPI 1

Discussion: MPI Basic Point to Point Communication I. Table of Contents. Cornell Theory Center

Intermediate MPI. M. D. Jones, Ph.D. Center for Computational Research University at Buffalo State University of New York

MPI Correctness Checking with MUST

Exercises: April 11. Hermann Härtig, TU Dresden, Distributed OS, Load Balancing

Chapter 4. Message-passing Model

Parallel Computing. Distributed memory model MPI. Leopold Grinberg T. J. Watson IBM Research Center, USA. Instructor: Leopold Grinberg

1 Overview. KH Computational Physics QMC. Parallel programming.

CS 470 Spring Mike Lam, Professor. Advanced MPI Topics

4. Parallel Programming with MPI

Derived Datatypes. MPI - Derived Data Structures. MPI Datatypes Procedure Datatype Construction Contiguous Datatype*

A Message Passing Standard for MPP and Workstations

An Introduction to MPI

Lecture 9: MPI continued

Point-to-Point Communication. Reference:

Non-Blocking Communications

Message Passing Interface

The MPI Message-passing Standard Practical use and implementation (V) SPD Course 6/03/2017 Massimo Coppola

AMath 483/583 Lecture 18 May 6, 2011

Parallel Computing. PD Dr. rer. nat. habil. Ralf-Peter Mundani. Computation in Engineering / BGU Scientific Computing in Computer Science / INF

Basic MPI Communications. Basic MPI Communications (cont d)

Introduzione al Message Passing Interface (MPI) Andrea Clematis IMATI CNR

Parallel Programming, MPI Lecture 2

Transcription:

Part - II Dheeraj Bhardwaj Department of Computer Science & Engineering Indian Institute of Technology, Delhi 110016 India http://www.cse.iitd.ac.in/~dheerajb 1

Outlines Basics of MPI How to compile and execute MPI programs? MPI library calls used in Example program MPI point-to-point communication MPI advanced point-to-point communication MPI Collective Communication and Computations MPI Datatypes MPI Communication Modes MPI special features 2

The MPI Small or Large MPI can be small. One can begin programming with 6 MPI function calls Is MPI Large or Small? MPI_INIT Initializes MPI MPI_COMM_SIZE Determines number of processors MPI_COMM_RANK Determines the label of the calling process MPI_SEND Sends a message MPI_RECV Receives a message MPI_FINALIZE Terminates MPI MPI can be large One can utilize any of 125 functions in MPI. 3

MPI Blocking Send and Receive Blocking Send A typical blocking send looks like Where send (dest, type, address, length) dest is an integer identifier representing the process to receive the message type is nonnegative integer that the destination can use to selectively screen messages (address, length) describes a contiguous area in memory containing the message to be sent 4

MPI Blocking Send and Receive Point-to-Point Communications The sending and receiving of messages between pairs of processors. BLOCKING SEND: returns only after the corresponding RECEIVE operation has been issued and the message has been transferred. MPI_Send BLOCKING RECEIVE: returns only after the corresponding SEND has been issued and the message has been received. MPI_Recv 5

MPI Blocking Send and Receive If we are sending a large message, most implementations of blocking send and receive use the following procedure. S = Sender R = Receiver S MPI_SEND (blocking standard send) size > threshold data transfer from source complete task waits R Transfer doesn t begin until word has arrived that corresponding MPI_RECV has been posted wait MPI_RECV task continues when data transfer to user s buffer is complete 6

MPI Non- Blocking Send and Receive Non-blocking Receive: does not wait for the message transfer to complete, but immediate returns control back to the calling processor. MPI_IRecv C MPI_Isend (buf, count, dtype, dest, tag, comm, request); MPI_Irecv (buf, count, dtype, dest, tag, comm, request); Fortran MPI_Isend (buf, count, dtype, tag, comm, request, ierror) MPI_Irecv (buf, count, dtype, source, tag, comm, request, ierror) 7

Non- Blocking Communications Separate communication into three phases: Initiate non-blocking communication. Do some work (perhaps involving other communications?) Wait for non-blocking communication to complete. 8

MPI Non- Blocking Send and Receive 9

MPI Non- Blocking Send and Receive If we are sending a small message, most implementations of non-blocking sends and receive use the following procedure. The message can be sent immediately and stored in a buffer on the receiving side. S = Sender R = Receiver An MPI-Wait checks to see it a non-blocking operation has completed. In this case, the MPI_Wait on the sending side believes the message has already been received. MPI_ISEND (non-blocking standard send) size? threshold MPI_WAIT Transfer to buffer on receiving node can be avoided if MPI_IRECV posted early enough MPI_IRECV MPI_WAIT no delay if MPI_WAIT is late enough 10

If we are sending a large message, most implementations of non-blocking sends and receive use the following procedure. The send is issued, but the data is not immediately sent. Computation is resumed after the send, but later halted by an MPI_Wait. S = Sender MPI Non- Blocking Send and Receive R = Receiver An MPI_Wait checks to see it a non-blocking operation has completed. In this case, the MPI_Wait on the sending side sees that the message has not been sent yet. S MPI_ISEND (non-blocking standard send) size > threshold data transfer from MPI_WAIT source complete task waits R transfer doesn t begin until word has arrived that corresponding MPI_IRECV has been posted MPI_IRECV MPI_WAIT no interruption if wait is late enough 11

MPI Communication Modes Synchronous mode The same as standard mode, except the send will not complete until message delivery is guaranteed Buffered mode Similar to standard mode, but completion is always independent of matching receive, and message may be buffered to ensure this 12

MPI Buffered Send and Receive If we the programmer allocate some memory (buffer space) for temporary storage on the sending processor, we can perform a type of non-blocking send. S = Sender R = Receiver S R copy data to buffer MPI_BSEND (buffered send) data transfer to usersupplied buffer complete MPI_RECV task waits 13

MPI Communication Modes Sender mode Synchronous send Buffered send Standard send Ready send Receive Notes Only completes when the receive has completed Always completes (unless an error occurs), irrespective of receiver. Either synchronous or buffered. Always completes (unless an error occurs), irrespective of whether the receive has completed. Completes when a message has arrived. 14

MPI Communication Modes MPI Sender Modes OPERATION Standard send MPI CALL MPI_SEND Synchronous send MPI_SSEND Buffered send MPI_BSEND Ready send Receive MPI_RSEND MPI_RECV 15

MPI Datatypes Message type A message contains a number of elements of some particular datatype MPI datatypes: Basic types Derived types - Vector, Struct, Others Derived types can be built up from basic types C types are different from Fortran types 16

MPI Derived Datatypes Contiguous Data The simplest derived datatype consists of a number of contiguous items of the same datatype C : int MPI_Type_contiguous (int count, MPI_Datatype oldtype,mpi_datatype *newtype); Fortran : MPI_Type_contiguous (count, oldtype, newtype) integer count, oldtype, newtype 17

MPI Derived Datatypes 18

Constructing a Vector Datatype MPI Derived Datatypes C int MPI_Type_vector (int count, int blocklength, int stride, MPI_Datatype oldtype, MPI_Datatype *newtype); Fortran MPI_Type_vector (count, blocklength, stride, oldtype, newtype, ierror) Extent of a Datatype C int MPI_Type_extent (MPI_Datatype datatype, int *extent); Fortran MPI_Type_extent(datatype, extent, ierror) integer datatype, extent, ierror 19

MPI Derived Datatypes 20

MPI Derived Datatypes Constructing a Struct Datatype C : int MPI_Type_struct (int count, int array_of_blocklengths, MPI_Aint *array_of_displacements, MPI_Datatype *array_of_types, MPI_Datatype *newtype); Fortran : MPI_Type_Struct (count, array_of_blocklengths, array_of_displacements, array_of_types, newtype, ierror) 21

MPI Derived Datatypes Committing a datatype Once a datatype has been constructed, it needs to be committed before it is used. This is done using MPI_TYPE_COMMIT C int MPI_Type_Commit (MPI_Datatype *datatype); Fortran MPI_Type_Commit (datatype, ierror) integer datatype, ierror 22

MPI : Support for Regular Decompositions Using topology routines MPI_Cart_Create User can define virtual topology Why you use the topology routines Simple to use (why not?) MPI - Using topology Allow MPI implementation to provide low expected contention layout of processes (contention can matter) Remember,contention still matters; a good mapping can reduce contention effects 23

MPI Persistent Communication MPI : Nonblocking operations, overlap effective Isend, Irecv, Waitall MPI : Persistent Operations Potential saving Allocation of MPI_Request Variation of example sendinit, recvinit, startall, waitall startall(recvs), sendrecv/barrier, startall(rsends), waitall Vendor implementations are buggy 24

Collective Communications Collective Communications The sending and/or receiving of messages to/from groups of processors. A collective communication implies that all processors need participate in the communication. Involves coordinated communication within a group of processes No message tags used MPI Collective Communications All collective routines block until they are locally complete Two broad classes : Data movement routines Global computation routines 25

MPI Collective Communications Collective Communication Communications involving a group of processes. Called by all processes in a communicator. Examples: Barrier synchronization. Broadcast, scatter, gather. Global sum, global maximum, etc.jj 26

MPI Collective Communications Characteristics of Collective Communication Collective action over a communicator All processes must communicate Synchronization may or may not occur All collective operations are blocking. No tags. Receive buffers must be exactly the right size 27

Communication is coordinated among a group of processes Group can be constructed by hand with MPI groupmanipulation routines or by using MPI topology-definition routines Different communicators are used instead No non-blocking collective operations MPI Collective Communications Collective Communication routines - Three classes Synchronization Data movement Collective computation 28

MPI Collective Communications Barrier A barrier insures that all processor reach a specified location within the code before continuing. C: int MPI_Barrier (MPI_Comm comm); Fortran: MPI_barrier (comm, ierror) integer comm, ierror 29

MPI Collective Communications Broadcast A broadcast sends data from one processor to all other processors. C: int MPI_Bcast ( void *buffer, int count, MPI_Datatype datatype, int root, MPI_Comm comm); Fortran: MPI_bcast (buffer, count, datatype, root, comm, ierror) <type> buffer(*) integer count, datatype, root, comm, ierror 30

Global Reduction Operations Used to compute a result involving data distributed over a group of processes. Examples: Global sum or product MPI Collective Computations Global maximum or minimum Global user-defined operation 31

MPI Collective Computations Fortran MPI_Reduce (sendbuf, recvbuf, count, datatype, op, root, comm, ierror) <type> sendbuf (*), recvbuf (*) integer count, datatype, op, root, comm, integer ierror C int MPI_Reduce (void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm comm) ; 32

MPI Collective Computations Collective Computation Operations MPI_Name Operation MPI_LAND Logical and MPI_LOR Logical or MPI_LXOR Logical exclusive or (xor) MPI_BAND Bitwise AND MPI_BOR MPI_BXOR Bitwise OR Bitwise exclusive OR 33

MPI Collective Computations Collective Computation Operation MPI Name Operation MPI_MAX Maximum MPI_MIN Minimum MPI_PROD Product MPI_SUM MPI_MAXLOC MPI_MAXLOC Sum Maximum and location Maximum and location 34

MPI Collective Computations Reduction A reduction compares or computes using a set of data stored on all processors and saves the final result on one specified processor. Global Reduction (sum) of an integer array of size 4 on each processor and accumulate the same on processor P1 35

MPI Collective Communication Gather Accumulate onto a single processor, the data that resides on all processors Gather an integer array of size of 4 from each processor 36

MPI Collective Communication Scatter Distribute a set of data from one processor to all other processors. Scatter an integer array of size 16 on 4 processors 37

MPI Collective Communication P0 A P0 A P1 Broadcast P1 A P2 P2 A P3 P3 A P0 A B C D P0 A P1 P2 P3 Scatter Gather P1 P2 P3 B C D Representation of collective data movement in MPI 38

MPI Collective Communication P0 A0 P0 P1 P2 A1 A2 Reduce (A,B,P1,SUM) P1 P2 =A0+A1+A2+A3 P3 A3 P3 P0 A0 P1 A1 P2 A2 P3 A3 Reduce (A,B,P2,MAX) 39 P0 P1 P2 P3 Representation of collective data movement in MPI = MAXIMUM

MPI Collective Communications & Computations Allgather Allgatherv Allreduce Alltoall Alltoallv Bcast Gather Gatherv Reduce Reduce Scatter Scan Scatter Scatterv All versions deliver results to all participating processes V -version allow the chunks to have different non-uniform data sizes (Scatterv, Allgatherv, Gatherv) All reduce, Reduce, ReduceScatter, and Scan take both built-in and user-defined combination functions 40

MPI Collective Communication P0 A P0 A B C D P1 B All gather P1 A B C D P2 C P2 A B C D P3 D P3 A B C D P0 A0 A1 A2 A3 P0 A0 B0 C0 D0 P1 B0 B1 B2 B3 All to All P1 A1 B1 C1 P2 C0 C1 C2 C3 P2 A2 B2 C2 P3 D0 D1 D2 D3 P3 A3 B3 C3 Representation of collective data movement in MPI D1 D2 D3 41

MPI Collective Communication All-to-All Performs a scatter and gather from all four processors to all other four processors. every processor accumulates the final values All-to-All operation for an integer array of size 8 on 4 processors 42

Profiling - Hooks allow users to intercept MPI calls to install their own tools Environmental Inquiry Error control Collective Features of MPI Both built-in and user-defined collective operations Large number of data movements routines Subgroups defined directly or by topology Application-oriented process topologies Built-in support for grids and graphs (uses groups) 43

Features of MPI General Communicators combine context and group for message security Thread safety Point-to-Point communication Structured buffers and derived datatypes, heterogeneity Modes: normal (blocking and non-blocking), synchronous, ready (to allow access to fast protocols), buffered 44

Features of MPI Non-message-passing concepts included: Active messages Threads Non-message-passing concepts not included: Process management Remote memory transfers Virtual shared memory 45

Features of MPI Positives MPI is De-facto standard for message-passing in a box Performance was a high-priority in the design Simplified sending message Rich set of collective functions Do not require any daemon to start application No language binding issues 46

Features of MPI Pros Best scaling seen in practice Simple memory model Simple to understand conceptually Can send messages using any kind of data Not limited to shared -data 47

Features of MPI Cons Debugging is not easy Development of production codes is much difficult and time consuming Codes may be indeterministic in nature, using asynchronous communication Non-contiguous data handling either use derived data types which are error prone or use lots of messages, which is expensive 48

Features of MPI-2 MPI-2 Techniques Positives Non-blocking collective communications One-sided communication put/get/barrier to reduce synchronization points Generalized requests (interrupt receive) Dynamic Process spawning/deletion operations MPI-I/O Thread Safety is ensured Additional language bindings for Fortran90 /95 and C++ 49

Tuning Performance (General techniques) Aggregation Decomposition Load Balancing Changing the Algorithm Tuning Performance Performance Techniques MPI - Performance MPI -Specific tuning Pitfalls 50

References 1. Gropp, W., Lusk, E. and Skjellum, A., Using MPI: Portable Parallel Programming with Message-Passing Interface, The MIT Press, 1999. 1. Pacheco, P. S., Parallel Programming with MPI, Morgan Kaufmann Publishers, Inc, California (1997). 2. Vipin Kumar, Ananth Grama, Anshul Gupta, George Karypis, Introduction to Parallel Computing, Design and Analysis of Algorithms, Redwood City, CA, Benjmann/Cummings (1994). 3. William Gropp, Rusty Lusk, Tuning MPI Applications for Peak Performance, Pittsburgh (1996) 51