Basic MPI Communications. Basic MPI Communications (cont d)

Similar documents
High Performance Computing

MPI. (message passing, MIMD)

MA471. Lecture 5. Collective MPI Communication

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

CSE 613: Parallel Programming. Lecture 21 ( The Message Passing Interface )

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

High-Performance Computing: MPI (ctd)

Recap of Parallelism & MPI

Outline. Communication modes MPI Message Passing Interface Standard

COMP 322: Fundamentals of Parallel Programming

Collective Communications

Programming with MPI Collectives

Message Passing Interface

MPI Collective communication

MPI 5. CSCI 4850/5850 High-Performance Computing Spring 2018

Topics. Lecture 7. Review. Other MPI collective functions. Collective Communication (cont d) MPI Programming (III)

Data parallelism. [ any app performing the *same* operation across a data stream ]

CS 179: GPU Programming. Lecture 14: Inter-process Communication

Message Passing Interface. most of the slides taken from Hanjun Kim

Outline. Communication modes MPI Message Passing Interface Standard. Khoa Coâng Ngheä Thoâng Tin Ñaïi Hoïc Baùch Khoa Tp.HCM

COMP 322: Fundamentals of Parallel Programming. Lecture 34: Introduction to the Message Passing Interface (MPI), contd

Intermediate MPI features

Parallel programming MPI

Introduction to the Message Passing Interface (MPI)

Non-Blocking Communications

Message Passing Interface

CEE 618 Scientific Parallel Computing (Lecture 5): Message-Passing Interface (MPI) advanced

HPC Parallel Programing Multi-node Computation with MPI - I

Non-Blocking Communications

Collective Communications II

Scientific Computing

a. Assuming a perfect balance of FMUL and FADD instructions and no pipeline stalls, what would be the FLOPS rate of the FPU?

L15: Putting it together: N-body (Ch. 6)!

Distributed Memory Programming with MPI

Collective Communication in MPI and Advanced Features

Distributed Memory Parallel Programming

MPI point-to-point communication

Standard MPI - Message Passing Interface

In the simplest sense, parallel computing is the simultaneous use of multiple computing resources to solve a problem.

Introduction to MPI: Part II

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

The MPI Message-passing Standard Practical use and implementation (V) SPD Course 6/03/2017 Massimo Coppola

What s in this talk? Quick Introduction. Programming in Parallel

Cornell Theory Center. Discussion: MPI Collective Communication I. Table of Contents. 1. Introduction

Message Passing with MPI

MPI: Parallel Programming for Extreme Machines. Si Hammond, High Performance Systems Group

Chip Multiprocessors COMP Lecture 9 - OpenMP & MPI

Distributed Systems + Middleware Advanced Message Passing with MPI

CSE 160 Lecture 23. Matrix Multiplication Continued Managing communicators Gather and Scatter (Collectives)

Advanced MPI. Andrew Emerson

Introduction to MPI. HY555 Parallel Systems and Grids Fall 2003

Parallel Programming. Using MPI (Message Passing Interface)

Introduction to MPI Part II Collective Communications and communicators

First day. Basics of parallel programming. RIKEN CCS HPC Summer School Hiroya Matsuba, RIKEN CCS

MPI Tutorial. Shao-Ching Huang. IDRE High Performance Computing Workshop

MPI MESSAGE PASSING INTERFACE

Capstone Project. Project: Middleware for Cluster Computing

CPS 303 High Performance Computing

For developers. If you do need to have all processes write e.g. debug messages, you d then use channel 12 (see below).

Programming SoHPC Course June-July 2015 Vladimir Subotic MPI - Message Passing Interface

MPI - The Message Passing Interface

L19: Putting it together: N-body (Ch. 6)!

Cluster Computing MPI. Industrial Standard Message Passing

MPI Workshop - III. Research Staff Cartesian Topologies in MPI and Passing Structures in MPI Week 3 of 3

Working with IITJ HPC Environment

Introduction to MPI. May 20, Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign

MPI Programming. Henrik R. Nagel Scientific Computing IT Division

Practical Scientific Computing: Performanceoptimized

A Message Passing Standard for MPP and Workstations. Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W.

MPI. What to Learn This Week? MPI Program Structure. What is MPI? This week, we will learn the basics of MPI programming.

MPI MESSAGE PASSING INTERFACE

Lecture 6: Message Passing Interface

A Message Passing Standard for MPP and Workstations

NUMERICAL PARALLEL COMPUTING

CS 6230: High-Performance Computing and Parallelization Introduction to MPI

Message-Passing Computing

Message Passing Interface

AgentTeamwork Programming Manual

Collective Communications I

PCAP Assignment II. 1. With a neat diagram, explain the various stages of fixed-function graphic pipeline.

Document Classification

IPM Workshop on High Performance Computing (HPC08) IPM School of Physics Workshop on High Perfomance Computing/HPC08

More MPI. Bryan Mills, PhD. Spring 2017

Advanced MPI. Andrew Emerson

Bryan Carpenter, School of Computing

Introduction to MPI. Jerome Vienne Texas Advanced Computing Center January 10 th,

Parallel Programming

Topics. Lecture 6. Point-to-point Communication. Point-to-point Communication. Broadcast. Basic Point-to-point communication. MPI Programming (III)

More about MPI programming. More about MPI programming p. 1

Review of MPI Part 2

Parallel Programming, MPI Lecture 2

MPI (Message Passing Interface)

PARALLEL AND DISTRIBUTED COMPUTING

MPI Message Passing Interface. Source:

COSC 6374 Parallel Computation

An Introduction to Parallel Programming

Introduction to MPI. Ritu Arora Texas Advanced Computing Center June 17,

MPI - v Operations. Collective Communication: Gather

Message Passing Interface - MPI

Lecture 7: More about MPI programming. Lecture 7: More about MPI programming p. 1

Transcription:

Basic MPI Communications MPI provides two non-blocking routines: MPI_Isend(buf,cnt,type,dst,tag,comm,reqHandle) buf: source of data to be sent cnt: number of data elements to be sent type: type of each data element to be sent dst: rank of the receiving process the I in MPI_Isend or MPI_Irecv is for Initiate! tag: application specific identifier for the type of message sent comm: communicator to be used determines which processes can receive the message rank is relative to the communicator used reqhandle: pointer to a request object used to find out when the send is complete 11/1/07 COMP4510 - Introduction to Parallel Computation 56 Basic MPI Communications (cont d) MPI_Irecv(buf,cnt,type,src,tag,comm,reqHandle) buf: source of data to be received cnt: number of data elements to be received Note: no type: type of each data element to be received stat parm src: rank of the sending process or MPI_ANY_SOURCE to receive a message from anyone. tag: identifier for the type of message received or MPI_ANY_TAG to receive a message with any tag. comm: communicator to be used determines which processes can receive the message reqhandle: pointer to a request object used to find out when the recv is complete 11/1/07 COMP4510 - Introduction to Parallel Computation 57 1

Basic MPI Communications (cont d) It is fine to use non-blocking communication primitives (allowing us to overlap communication and computation) but we must know when a send/recv completes for a send, we need to know when we can change the buffer for a recv we need to know when we can process the received data This is accomplished using MPI_Wait 11/1/07 COMP4510 - Introduction to Parallel Computation 58 Basic MPI Communications (cont d) MPI_Wait(reqHandle,status) reqhandle: pointer to a request object used to find out when the send is complete status: used to provide various bits of status information MPI_Wait waits until the communication operation associated with the request object reqhandle has completed status information is returned in status Thus we see pairs of operations: an MPI_Isend or MPI_Irecv paired with MPI_Wait 11/1/07 COMP4510 - Introduction to Parallel Computation 59 2

Basic MPI Communications (cont d) MPI_Isend(buf,sz,type,dst,tag,comm,&reqHndle); // do some useful computation here MPI_Wait(&reqHndle,&status); // safe to rewrite buf here or: MPI_Irecv(buf,sz,type,dst,tag,comm,&reqHndle); // do some useful computation here MPI_Wait(&reqHndle,&status); // process received data in buf here 11/1/07 COMP4510 - Introduction to Parallel Computation 60 MPI Collective Communications In many applications, it is common to need to move multiple pieces of data in certain specific ways between the processes in the virtual machine e.g. distribute copies of data to all nodes during initialization or collect partial results from all nodes,... MPI provides collective communications functions to support several such common patterns of data movement between processes These are much easier to use than writing your own code without them 11/1/07 COMP4510 - Introduction to Parallel Computation 61 3

MPI provides five basic routines for doing collective communications: MPI_Gather: collects data from each process and delivers it all to one MPI process MPI_Allgather: collects data from each process and delivers it all to each MPI process MPI_Bcast: sends a single piece of data from one process to all processes MPI_Scatter: takes data from one process and distributes it, in parts, to all processes MPI_Alltoall: takes selected data from each process and distributes it to all processes 11/1/07 COMP4510 - Introduction to Parallel Computation 62 Figure courtesy of Boston University: http://scv.bu.edu/tutorials/mpi/more/collectives.html 11/1/07 COMP4510 - Introduction to Parallel Computation 63 4

int MPI_Bcast( void *message, /* in-out */ int count, /* in */ MPI_Datatype sendtype, /* in */ int root, /* in */ MPI_Comm comm /* in */ ) 11/1/07 COMP4510 - Introduction to Parallel Computation 64 MPI_Bcast is used to distribute copies of data to all MPI processes E.g. Each process holds a row of a matrix and we wish to scale the elements by a given value One MPI process reads in the scaling value Parallel input from multiple processes isn t advised That process uses MPI_Bcast to distribute it Each process iterates through the elements in the row it stores, multiplying the value by the scaling factor 11/1/07 COMP4510 - Introduction to Parallel Computation 65 5

float scale; if (rank==0) { /* master */ /* read scaling factor into scale */ } MPI_Bcast(&scale,1,MPI_FLOAT,0, MPI_COMM_WORLD); if (rank>0) { /* slaves */ /* do scale x row */ } MPI_Bcast is executed on all MPI nodes - on node 0, scale is sent, on others it is received 11/1/07 COMP4510 - Introduction to Parallel Computation 66 MPI_Gather( void *sendbuf /* in */ int sendcount /* in */ MPI_Datatype sendtype /* in */ void *recvbuf /* out @ root */ int recvcount /* in */ MPI_Datatype recvtype /* in */ int root /* in */ MPI_Comm comm /* in */ ) 11/1/07 COMP4510 - Introduction to Parallel Computation 67 6

MPI_Gather is commonly used to collect partial results from MPI processes E.g. Each process has computed one row of a result matrix and we wish to collect all the rows on one node for output Each MPI process (including the master) computes its row of the matrix The MPI processes then use MPI_Gather to collect the rows from each process The master process then outputs the result Again, I/O is normally limited to one machine 11/1/07 COMP4510 - Introduction to Parallel Computation 68 #define R numberofrowsinthematrix #define C numberofcolumnsinthematrix int i,j; float vals[c],collectedvals[r][c]; /* compute values (including process 0) */ MPI_Gather(vals,C,MPI_FLOAT, collectedvals,c,mpi_float, 0,MPI_COMM_WORLD); N.B. Number of elements to receive from EACH node if (rank==0) { /* master */ for (i=0;i<r;i++) for (j=0;j<c;j++) /* output the collected array value */ } 11/1/07 COMP4510 - Introduction to Parallel Computation 69 7

MPI_Allgather( void *sendbuf /* in */ int sendcount /* in */ MPI_Datatype sendtype /* in */ void *recvbuf /* out */ int recvcount /* in */ MPI_Datatype recvtype /* in */ MPI_Comm comm /* in */ ) 11/1/07 COMP4510 - Introduction to Parallel Computation 70 MPI_Allgather is commonly used to distribute intermediate results to all processes E.g. Each process has computed one row of a result matrix and we need to distribute the entire result matrix to all process for further calculations Each MPI process computes its row Each MPI process uses MPI_Allgather to collect the partial results from the other processes Each process then has its own copy of the entire result matrix and can do whatever it needs to with it 11/1/07 COMP4510 - Introduction to Parallel Computation 71 8

#define R numberofrowsinthematrix #define C numberofcolumnsinthematrix float vals[c],collectedvals[r][c]; /* compute values (including process 0) */ MPI_Allgather(vals,C,MPI_FLOAT, collectedvals,c,mpi_float, MPI_COMM_WORLD); /* All nodes now have the entire matrix */ N.B. No root specified 11/1/07 COMP4510 - Introduction to Parallel Computation 72 int MPI_Scatter( void *sendbuf, /* in @ root */ int sendcount, /* in */ MPI_Datatype sendtype, /* in */ void *recvbuf, /* out */ int recvcount, /* in */ MPI_Datatype recvtype, /* in */ int root, /* in */ MPI_Comm comm /* in */ ) 11/1/07 COMP4510 - Introduction to Parallel Computation 73 9

MPI_Scatter is commonly used to distribute data subsets to the processes that will operate on them E.g. One process needs to send each row of a matrix to the other processes for computation The process computes/reads the matrix It then uses MPI_Scatter ( executed by each process) to send each row, in turn, to the other processes for subsequent processing in parallel 11/1/07 COMP4510 - Introduction to Parallel Computation 74 #define R numberofrowsinthematrix #define C numberofcolumnsinthematrix float rows[c],ar[r][c]; /* process 0 loads the array, ar */ MPI_Scatter(ar,C,MPI_FLOAT,rows,C, MPI_FLOAT,0, MPI_COMM_WORLD); /* Each node has its own row in rows */ 11/1/07 COMP4510 - Introduction to Parallel Computation 75 10

MPI_Alltoall( void *sendbuf /* in */ int sendcount /* in */ MPI_Datatype sendtype /* in */ void *recvbuf /* out */ int recvcount /* int */ MPI_Datatype recvtype /* in */ MPI_Comm comm /* in */ ) 11/1/07 COMP4510 - Introduction to Parallel Computation 76 MPI_Alltoall distributes equal shares of data originating at all nodes in the cluster to all other nodes in the cluster Sounds powerful but what is it good for? 11/1/07 COMP4510 - Introduction to Parallel Computation 77 11

Let s think about transposing a matrix i.e. interchanging the rows and the columns 7 4 2 3 1 9 7 0 5 T = 7 3 7 4 1 0 2 9 5 This is a common operation a b c d e f g h i j k l m n o p T = a e i m b f j n c g k o d h l p 11/1/07 COMP4510 - Introduction to Parallel Computation 78 Let s think about transposing a matrix i.e. interchanging the rows and the columns 7 4 2 3 1 9 7 0 5 T = 7 3 7 4 1 0 2 9 5 This is a common operation a b c d e f g h i j k l m n o p T = a e i m b f j n c g k o d h l p 11/1/07 COMP4510 - Introduction to Parallel Computation 79 12

As an exercise lets rewrite the vector sum operation to use MPI_Scatter and MPI_Gather to distribute the vector and gather the partial sums, respectively Example Online 11/1/07 COMP4510 - Introduction to Parallel Computation 80 Reductions MPI also provides reduction (i.e. aggregation) operators Similar to those we saw in OpenMP but based around collective communications These are useful when we want to aggregate simple results together as we collect them from other processes E.g. summing up a number of partial sums 11/1/07 COMP4510 - Introduction to Parallel Computation 81 13

Reductions (cont d) MPI_Reduce( void *operand, /* in */ void *result, /* out */ int count, /* in */ MPI_Datatype dt, /* in */ MPI_Op operator, /* in */ int root, /* in */ MPI_Comm comm /* in */ ) 11/1/07 COMP4510 - Introduction to Parallel Computation 82 Reductions (cont d) A simple example of the use of MPI_Reduce is, again, the computation of a sum of the elements in a vector Each process is assigned a part of the vector to sum and then the partial-sums must be added together to give the final sum Instead of explicitly sending each partial sum to the master process and adding them up we can use reduction via MPI_Reduce 11/1/07 COMP4510 - Introduction to Parallel Computation 83 14

Reductions (cont d) Pictorially: Master: Distribute Vector Slaves: do i=1,25 sum 0 =... do i=26,50 sum 1 =......... Master: MPI_Reduce: sum=sum 0+sum 1+... 11/1/07 COMP4510 - Introduction to Parallel Computation 84 Reductions (cont d) int sum; /* total sum */ int psum; /* partial sum*/ /* compute partial sums */ MPI_Reduce(&psum, &sum, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD); if (rank==0) printf( Sum is %d\n,sum); 11/1/07 COMP4510 - Introduction to Parallel Computation 85 15

Reductions (cont d) As an exercise lets rewrite the vector sum operation to replace the MPI_Gather that was used to gather the partial sums with MPI_Reduce which will gather and sum them at once Example Online 11/1/07 COMP4510 - Introduction to Parallel Computation 86 Reductions (cont d) MPI also provides an MPI_ALLreduce operation which collects aggregated results at all nodes MPI_Allreduce( void *sendbuf /* in */ void *recvbuf /* out */ int count /* in */ MPI_Datatype datatype /* in */ MPI_Op op /* in */ MPI_Comm comm /* in */ ) 11/1/07 COMP4510 - Introduction to Parallel Computation 87 16