MPI - v Operations. Collective Communication: Gather

Similar documents
Collective Communication: Gather. MPI - v Operations. Collective Communication: Gather. MPI_Gather. root WORKS A OK

Collective Communication: Gatherv. MPI v Operations. root

MPI Collective communication

The MPI Message-passing Standard Practical use and implementation (V) SPD Course 6/03/2017 Massimo Coppola

MA471. Lecture 5. Collective MPI Communication

Programming with MPI Collectives

MPI MESSAGE PASSING INTERFACE

Cornell Theory Center. Discussion: MPI Collective Communication I. Table of Contents. 1. Introduction

CEE 618 Scientific Parallel Computing (Lecture 5): Message-Passing Interface (MPI) advanced

Introduction to MPI Part II Collective Communications and communicators

Outline. Communication modes MPI Message Passing Interface Standard

Advanced Parallel Programming

Recap of Parallelism & MPI

High Performance Computing

MPI. (message passing, MIMD)

Collective Communications II

Basic MPI Communications. Basic MPI Communications (cont d)

Topics. Lecture 7. Review. Other MPI collective functions. Collective Communication (cont d) MPI Programming (III)

MPI 5. CSCI 4850/5850 High-Performance Computing Spring 2018

Outline. Communication modes MPI Message Passing Interface Standard. Khoa Coâng Ngheä Thoâng Tin Ñaïi Hoïc Baùch Khoa Tp.HCM

Data parallelism. [ any app performing the *same* operation across a data stream ]

Message Passing with MPI

Report S1 C. Kengo Nakajima. Programming for Parallel Computing ( ) Seminar on Advanced Computing ( )

Non-Blocking Communications

Message-Passing and MPI Programming

Introduction to MPI. May 20, Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign

Non-Blocking Communications

Parallel Programming, MPI Lecture 2

Message-Passing and MPI Programming

Distributed Memory Programming with MPI

Standard MPI - Message Passing Interface

Parallel Computing. Distributed memory model MPI. Leopold Grinberg T. J. Watson IBM Research Center, USA. Instructor: Leopold Grinberg

Lecture Topic: Multi-Core Processors: MPI 1.0 Overview (Part-II)

Intermediate MPI. M. D. Jones, Ph.D. Center for Computational Research University at Buffalo State University of New York

Collective Communications

Paul Burton April 2015 An Introduction to MPI Programming

Introduction to MPI. SuperComputing Applications and Innovation Department 1 / 143

HPC Parallel Programing Multi-node Computation with MPI - I

Parallel programming MPI

MPI Tutorial. Shao-Ching Huang. IDRE High Performance Computing Workshop

Notes. Message Passing with MPI. Introduction to MPI. What is MPI? What is MPI. What is MPI?

Practical stuff! ü OpenMP

Slides prepared by : Farzana Rahman 1

Evaluation of MPI Collectives for HPC Applications on Distributed Virtualized Environments

MPI - The Message Passing Interface

Collective Communication in MPI and Advanced Features

High-Performance Computing: MPI (ctd)

Introduzione al Message Passing Interface (MPI) Andrea Clematis IMATI CNR

In the simplest sense, parallel computing is the simultaneous use of multiple computing resources to solve a problem.

O.I. Streltsova, D.V. Podgainy, M.V. Bashashin, M.I.Zuev

ASTROPHYSIKALISCHES INSTITUT POTSDAM AIP. Helmholtz school. Introduction to MPI. Stefan Gottlöber

Report S1 Fortran. Kengo Nakajima Information Technology Center

Message Passing Interface - MPI

CSE 613: Parallel Programming. Lecture 21 ( The Message Passing Interface )

AgentTeamwork Programming Manual

Introduction to MPI. Ritu Arora Texas Advanced Computing Center June 17,

COMP 322: Fundamentals of Parallel Programming

CSE 160 Lecture 23. Matrix Multiplication Continued Managing communicators Gather and Scatter (Collectives)

An Introduction to Parallel Programming

CINES MPI. Johanne Charpentier & Gabriel Hautreux

L15: Putting it together: N-body (Ch. 6)!

Introduction to MPI part II. Fabio AFFINITO

CSCE 5610 Midterm-2 (Krishna Kavi)

AMath 483/583 Lecture 21

Introduction to Programming by MPI for Parallel FEM Report S1 & S2 in Fortran

Week 3: MPI. Day 02 :: Message passing, point-to-point and collective communications

Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 8

Introduction to MPI Programming Part 2

Review of MPI Part 2

Today's agenda. Parallel Programming for Multicore Machines Using OpenMP and MPI

Introduction to Programming by MPI for Parallel FEM Report S1 & S2 in C

Parallel Programming with MPI MARCH 14, 2018

Introduction to Programming by MPI for Parallel FEM Report S1 & S2 in C

MPI MESSAGE PASSING INTERFACE

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

CS 6230: High-Performance Computing and Parallelization Introduction to MPI

Lecture 9: MPI continued

COMP 322: Fundamentals of Parallel Programming. Lecture 34: Introduction to the Message Passing Interface (MPI), contd

Programming with MPI

CDP. MPI Derived Data Types and Collective Communication

Introduction to MPI, the Message Passing Library

Introduction to MPI. Jerome Vienne Texas Advanced Computing Center January 10 th,

Advanced MPI. Andrew Emerson

IPM Workshop on High Performance Computing (HPC08) IPM School of Physics Workshop on High Perfomance Computing/HPC08

PCAP Assignment II. 1. With a neat diagram, explain the various stages of fixed-function graphic pipeline.

Message passing. Week 3: MPI. Day 02 :: Message passing, point-to-point and collective communications. What is MPI?

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

MPI Workshop - III. Research Staff Cartesian Topologies in MPI and Passing Structures in MPI Week 3 of 3

MPI: Message Passing Interface An Introduction. S. Lakshmivarahan School of Computer Science

For developers. If you do need to have all processes write e.g. debug messages, you d then use channel 12 (see below).

Introduction to parallel computing

What is Point-to-Point Comm.?

DEADLOCK DETECTION IN MPI PROGRAMS

Practical Course Scientific Computing and Visualization

MPI Program Structure

Introduction to Programming by MPI for Parallel FEM Report S1 & S2 in Fortran

Matrix-vector Multiplication

Parallel Processing WS 2017/18. Universität Siegen Tel.: 0271/ , Büro: H-B Stand: January 15, 2018

a. Assuming a perfect balance of FMUL and FADD instructions and no pipeline stalls, what would be the FLOPS rate of the FPU?

Introduction to MPI: Part II

Transcription:

MPI - v Operations Based on notes by Dr. David Cronk Innovative Computing Lab University of Tennessee Cluster Computing 1 Collective Communication: Gather A Gather operation has data from all processes collected, or gathered, at a central process, referred to as the root Even the root process contributes data The root process can be any process, it does not have to have any particular rank Every process must pass the same value for the root argument Each process must send the same amount of data Cluster Computing 2 1

Collective Communication: Gather MPI_GATHER (sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, root, comm, ierr) Receive arguments are only meaningful at the root recvcount is the number of elements received from each process Data received at root in rank order Root can use MPI_IN_PLACE for sendbuf: data is assumed to be in the correct place in the recvbuf P1 = root P2 P3 P2 MPI_GATHER root Cluster Computing 3 MPI_Gather for (i = 0; i < 20; i++) { do some computation tmp[i] = some value; } MPI_Gather (tmp, 20, MPI_INT, res, 20, MPI_INT, 0, MPI_COMM_WORLD); if (myrank == 0) write out results WORKS int tmp[20]; int res[320]; for (i = 0; i < 20; i++) { do some computation if (myrank == 0) res[i] = some value tmp[i] = some value } if (myrank == 0) MPI_Gather (MPI_IN_PLACE, 20, MPI_INT, res, 20, MPI_INT, 0, MPI_COMM_WORLD); write out results MPI_Gather (tmp, 20, MPI_INT, tmp, 320, MPI_REAL, 0 MPI_COMM_WORLD); A OK Cluster Computing 4 2

Collective Communication: Gatherv MPI_GATHERV (sendbuf, sendcount, sendtype, recvbuf, recvcounts, displs, recvtype, root, comm,ierr) Vector variant of MPI_GATHER Allows a varying amount of data from each proc allows root to specify where data from each proc goes No portion of the receive buffer may be written more than once The amount of data specified to be received at the root must match amount of data sent by non-roots Displacements are in terms of recvtype MPI_IN_PLACE may be used by root. Cluster Computing 5 Collective Communication: Gatherv (cont) 1 2 3 4 counts 9 7 4 0 displs P1 = root P2 MPI_GATHERV P3 P4 Cluster Computing 6 3

Collective Communication: Gatherv (cont) sbuffs rbuff stride = 105; root = 0; for (i = 0; i < nprocs; i++) { displs[i] = i*stride counts[i] = 100; } MPI_Gatherv (sbuff, 100, MPI_INT, rbuff, counts, displs, MPI_INT, root, MPI_COMM_WORLD); Cluster Computing 7 Collective Communication: Gatherv (cont) nprocs. nprocs CALL MPI_COMM_SIZE(MPI_COMM_WORLD, nprocs, ierr) CALL MPI_COMM_RANK(MPI_COMM_WORLD, myrank, ierr) scount = myrank+1 displs(1) = 0 rcounts(1) = 1 DO I=2,nprocs displs(i) = displs(i-1) + I 1 rcounts(i) = I ENDDO CALL MPI_GATHERV(sbuff, scount, MPI_INT, rbuff, rcounts, displs, MPI_INT, root, MPI_COMM_WORL, ierr) Cluster Computing 8 4

Collective Communication: Scatter MPI_SCATTER (sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, root, comm, ierr) Opposite of MPI_GATHER Send arguments only meaningful at root Root can use MPI_IN_PLACE for recvbuf P1 MPI_SCATTER P2 B (on root) P3 P4 Cluster Computing 9 MPI_SCATTER IF (MYPE.EQ. ROOT) THEN OPEN (25, FILE= filename ) READ (25, *) nprocs, nboxes READ (25, *) mat(i,j) (i=1,nboxes)(j=1,nprocs) CLOSE (25) ENDIF CALL MPI_BCAST (nboxes, 1, MPI_INTEGER, & ROOT, MPI_COMM_WORLD, ierr) CALL MPI_SCATTER (mat, nboxes, MPI_INT, & lboxes, nboxes, MPI_INT, ROOT, & MPI_COMM_WORLD, ierr) Cluster Computing 10 5

Collective Communication: Scatterv MPI_SCATTERV (sendbuf, scounts, displs, sendtype, recvbuf, recvcount, recvtype, ierr) Opposite of MPI_GATHERV Send arguments only meaningful at root Root can use MPI_IN_PLACE for recvbuf No location of the sendbuf can be read more than once Cluster Computing 11 Collective Communication: Scatterv (cont) P1 MPI_SCATTERV P2 B (on root) P3 counts 1 2 3 4 dislps 9 7 4 0 P4 Cluster Computing 12 6

MPI_SCATTERV C mnb = max number of boxes IF (MYPE.EQ. ROOT) THEN OPEN (25, FILE= filename ) READ (25, *) nprocs READ (25, *) (nboxes(i), I=1,nprocs) READ (25, *) mat(i,j) (I=1,nboxes(I))(J=1,nprocs) CLOSE (25) DO I = 1,nprocs displs(i) = (I-1)*mnb ENDDO ENDIF CALL MPI_SCATTER (nboxes, 1, MPI_INT, nb, 1, MPI_INT, ROOT, MPI_COMM_WORLD, ierr) CALL MPI_SCATTERV (mat, nboxes, displs, MPI_INT, lboxes, nb, MPI_INT, ROOT, MPI_COMM_WORLD, ierr) Cluster Computing 13 Collective Communication: Allgather MPI_ALLGATHER (sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, comm, ierr) Same as MPI_GATHER, except all processors get the result MPI_IN_PLACE may be used for sendbuf of all processors Equivalent to a gather followed by a bcast Cluster Computing 14 7

Collective Communication: Allgatherv MPI_ALLGATHERV (sendbuf, sendcount, sendtype, recvbuf, recvcounts, displs, recvtype, comm, ierr) Same as MPI_GATHERV, except all processes get the result MPI_IN_PLACE may be used for sendbuf of all processors Similar to a gatherv followed by a bcast If there are holes in the receive buffer, bcast would overwrite the holes displs need not be the same on each PE Cluster Computing 15 Allgatherv int mycount; /* inited to # local ints */ int counts[4]; /* inited to proper values */ int displs[4]; displs[0] = 0; for (I = 1; I < 4; I++) displs[i] = displs[i-1] + counts[i-1]; MPI_Allgatherv(sbuff, mycount, MPI_INT, rbuff, counts, displs, MPI_INT, MPI_COMM_WORLD); Cluster Computing 16 8

Allgatherv sbuffs rbuff stride = 105; root = 0; for (i = 0; i < nprocs; i++) { displs[i] = i*stride counts[i] = 100; } MPI_Allgatherv (sbuff, 100, MPI_INT, rbuff, counts, displs, MPI_INT, MPI_COMM_WORLD); Cluster Computing 17 Allgatherv p1 p2 p3 p4 MPI_Comm_rank (MPI_COMM_WORLD, &myrank); MPI_Comm_size (MPI_COMM_WORLD, &nprocs); p1 p2 p3 p4 for (i=0; i<nprocs; i++) { counts[i] = num_elements; x = (myrank+i) % nprocs; dislps[x] = i; } MPI_Allgatherv (sbuff, num_elements, MPI_INT, rbuff, counts, dislps, MPI_INT, MPI_COMM_WORLD); Cluster Computing 18 9

Collective Communication: Alltoall (scatter/gather) MPI_ALLTOALL (sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, comm, ierr) Cluster Computing 19 Collective Communication: Alltoallv MPI_ALLTOALLV (sendbuf, sendcounts, sdispls, sendtype, recvbuf, recvcounts, rdispls, recvtype, comm, ierr) Same as MPI_ALLTOALL, but the vector variant Can specify how many blocks to send to each processor, location of blocks to send, how many blocks to receive from each processor, and where to place the received blocks No location in the sendbuf can be read more than once and no location in the recvbuf can be written to more than once Cluster Computing 20 10

Collective Communication: Alltoallw MPI_ALLTOALLW (sendbuf, sendcounts, sdispls, sendtypes, recvbuf, recvcounts, rdispls, recvtypes, comm, ierr) Same as MPI_ALLTOALLV, except different datatypes can be specified for data scattered as well as data gathered Can specify how many blocks to send to each processor, location and type of blocks to send, how many blocks to receive from each processor, the type of the blocks received, and where to place the received blocks Displacements are now in terms of bytes rather that types No location in the sendbuf can be read more than once and no location in the recvbuf can be written to more than once Cluster Computing 21 Example subroutine System_Change_init ( dq, ierror)!------------------------------------------------------------------------------! Subroutine for data exchange on all six boundaries!! This routine initiates the operations, has to be finished by according! Wait/Waitall!------------------------------------------------------------------------------! USE globale_daten USE comm implicit none include 'mpif.h' double precision, dimension (0:n1+1,0:n2+1,0:n3+1,nc) :: dq! local variables integer :: handnum, info, position integer :: j, k, n, ni integer :: ierror integer :: size2, size4, size6! /* Fortran90 */ double precision, allocatable, dimension (:) :: global_dq size2 = (n2+2)*(n3+2)*nc * SIZE_OF_REALx size4 = (n1+2)*(n3+2)*nc * SIZE_OF_REALx size6 = (n1+2)*(n2+2)*nc * SIZE_OF_REALx if (.not. rand_ab) then call MPI_IRECV(dq(1,1,1,1), recvcount(tid_io), & recvtype(tid_io), tid_io, 10001, & MPI_COMM_WORLD, recvhandle(1), info) recvhandle(1) = MPI_REQUEST_NULL if (.not. rand_sing) then call MPI_IRECV(dq(1,1,1,1), recvcount(tid_iu), & recvtype(tid_iu), tid_iu, 10002, & MPI_COMM_WORLD, recvhandle(2), info) recvhandle(2) = MPI_REQUEST_NULL integer, dimension (MPI_STATUS_SIZE,6) :: sendstatusfeld integer, dimension (MPI_STATUS_SIZE) :: status Cluster Computing 22 11

Example (cont) if (.not. rand_zu) then call MPI_IRECV(dq(1,1,1,1), recvcount(tid_jo), & recvtype(tid_jo), tid_jo, 10003, & MPI_COMM_WORLD, recvhandle(3), info) recvhandle(3) = MPI_REQUEST_NULL if (.not. rand_festk) then call MPI_IRECV(dq(1,1,1,1), recvcount(tid_ju), & recvtype(tid_ju), tid_ju, 10004, & MPI_COMM_WORLD, recvhandle(4), info) recvhandle(4) = MPI_REQUEST_NULL if (.not. rand_symo) then call MPI_IRECV(dq(1,1,1,1), recvcount(tid_ko), & recvtype(tid_ko), tid_ko, 10005, & MPI_COMM_WORLD, recvhandle(5), info) recvhandle(5) = MPI_REQUEST_NULL if (.not. rand_symu) then call MPI_IRECV(dq(1,1,1,1), recvcount(tid_ku), & recvtype(tid_ku), tid_ku, 10006, & MPI_COMM_WORLD, recvhandle(6), info) recvhandle(6) = MPI_REQUEST_NULL!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! if (.not. rand_sing) then call MPI_ISEND(dq(1,1,1,1), sendcount(tid_iu), & sendtype(tid_iu), tid_iu, 10001, & MPI_COMM_WORLD, sendhandle(1), info) sendhandle(1) = MPI_REQUEST_NULL if (.not. rand_ab) then call MPI_ISEND(dq(1,1,1,1), sendcount(tid_io), & sendtype(tid_io), tid_io, 10002, & MPI_COMM_WORLD, sendhandle(2), info) sendhandle(2) = MPI_REQUEST_NULL Cluster Computing 23 Example (cont) if (.not. rand_festk) then call MPI_ISEND(dq(1,1,1,1), sendcount(tid_ju), & sendtype(tid_ju), tid_ju, 10003, & MPI_COMM_WORLD, sendhandle(3), info) sendhandle(3) = MPI_REQUEST_NULL if (.not. rand_zu) then call MPI_ISEND(dq(1,1,1,1), sendcount(tid_jo), & sendtype(tid_jo), tid_jo, 10004, & MPI_COMM_WORLD, sendhandle(4), info) sendhandle(4) = MPI_REQUEST_NULL if (.not. rand_symo) then call MPI_ISEND(dq(1,1,1,1), sendcount(tid_ko), & sendtype(tid_ko), tid_ko, 10006, & MPI_COMM_WORLD, sendhandle(6), info) sendhandle(6) = MPI_REQUEST_NULL!... Waitall for the Isends/we have to force to finish them.. call MPI_WAITALL(6, sendhandle, sendstatusfeld, info) ierror = 0 return end if (.not. rand_symu) then call MPI_ISEND(dq(1,1,1,1), sendcount(tid_ku), & sendtype(tid_ku), tid_ku, 10005, & MPI_COMM_WORLD, sendhandle(5), info) sendhandle(5) = MPI_REQUEST_NULL Cluster Computing 24 12

Example (Alltoallw) subroutine System_Change ( dq, ierror)!------------------------------------------------------------------------------!! Subroutine for data exchange on all six boundaries!! uses the MPI-2 function MPI_Alltoallw!------------------------------------------------------------------------------! USE globale_daten implicit none include 'mpif.h' double precision, dimension (0:n1+1,0:n2+1,0:n3+1,nc) :: dq integer :: ierror call MPI_Alltoallw ( dq(1,1,1,1), sendcount, senddisps, sendtype, & dq(1,1,1,1), recvcount, recvdisps, recvtype, & MPI_COMM_WORLD, ierror ) end subroutine System_Change Cluster Computing 25 13