Parallel Computing. MPI Collective communication

Similar documents
Intra and Inter Communicators

Cornell Theory Center. Discussion: MPI Collective Communication I. Table of Contents. 1. Introduction

Introduction to MPI Part II Collective Communications and communicators

MPI Collective communication

Distributed Memory Parallel Programming

Standard MPI - Message Passing Interface

Outline. Communication modes MPI Message Passing Interface Standard

Introduction to MPI part II. Fabio AFFINITO

COSC 4397 Parallel Computation. Introduction to MPI (III) Process Grouping. Terminology (I)

Programming with MPI Collectives

Practical Scientific Computing: Performanceoptimized

Review of MPI Part 2

MA471. Lecture 5. Collective MPI Communication

CDP. MPI Derived Data Types and Collective Communication

CS 6230: High-Performance Computing and Parallelization Introduction to MPI

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

Parallel Programming with MPI MARCH 14, 2018

Introduction to MPI. May 20, Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign

Outline. Communication modes MPI Message Passing Interface Standard. Khoa Coâng Ngheä Thoâng Tin Ñaïi Hoïc Baùch Khoa Tp.HCM

Collective Communications

Parallel Programming. Using MPI (Message Passing Interface)

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

Recap of Parallelism & MPI

MPI. (message passing, MIMD)

Distributed Memory Programming with MPI

Experiencing Cluster Computing Message Passing Interface

Basic MPI Communications. Basic MPI Communications (cont d)

The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing

Message Passing with MPI

Topics. Lecture 7. Review. Other MPI collective functions. Collective Communication (cont d) MPI Programming (III)

High-Performance Computing: MPI (ctd)

Cluster Computing MPI. Industrial Standard Message Passing

Intermediate MPI features

Scientific Computing

Message-Passing and MPI Programming

Message Passing Interface

The MPI Message-passing Standard Practical use and implementation (IV) SPD Course 09/03/2016 Massimo Coppola

HPC Parallel Programing Multi-node Computation with MPI - I

CPS 303 High Performance Computing

Distributed Systems + Middleware Advanced Message Passing with MPI

Collective Communication in MPI and Advanced Features

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

Message Passing Interface. most of the slides taken from Hanjun Kim

Message Passing Interface

COMP 322: Fundamentals of Parallel Programming

Message-Passing Computing

Slides prepared by : Farzana Rahman 1

MPI 5. CSCI 4850/5850 High-Performance Computing Spring 2018

High Performance Computing

Department of Informatics V. HPC-Lab. Session 4: MPI, CG M. Bader, A. Breuer. Alex Breuer

Parallel Computing and the MPI environment

Collective Communications I

Part - II. Message Passing Interface. Dheeraj Bhardwaj

Parallel Computing Paradigms

Paul Burton April 2015 An Introduction to MPI Programming

Lecture 6: Message Passing Interface

The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs

In the simplest sense, parallel computing is the simultaneous use of multiple computing resources to solve a problem.

MPI 8. CSCI 4850/5850 High-Performance Computing Spring 2018

CSE 613: Parallel Programming. Lecture 21 ( The Message Passing Interface )

Report S1 C. Kengo Nakajima. Programming for Parallel Computing ( ) Seminar on Advanced Computing ( )

The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs

CS 179: GPU Programming. Lecture 14: Inter-process Communication

Parallel programming MPI

Message Passing Interface

IPM Workshop on High Performance Computing (HPC08) IPM School of Physics Workshop on High Perfomance Computing/HPC08

CSE. Parallel Algorithms on a cluster of PCs. Ian Bush. Daresbury Laboratory (With thanks to Lorna Smith and Mark Bull at EPCC)

MPI - The Message Passing Interface

Distributed Memory Systems: Part IV

The MPI Message-passing Standard Practical use and implementation (V) SPD Course 6/03/2017 Massimo Coppola

MPI Message Passing Interface

Chip Multiprocessors COMP Lecture 9 - OpenMP & MPI

MPI MESSAGE PASSING INTERFACE

CEE 618 Scientific Parallel Computing (Lecture 5): Message-Passing Interface (MPI) advanced

The MPI Message-passing Standard Practical use and implementation (VI) SPD Course 08/03/2017 Massimo Coppola

Chapter 4. Message-passing Model

MPI MESSAGE PASSING INTERFACE

Parallel Programming in C with MPI and OpenMP

MPI and comparison of models Lecture 23, cs262a. Ion Stoica & Ali Ghodsi UC Berkeley April 16, 2018

Parallel Programming in C with MPI and OpenMP

INTRODUCTION TO MPI COMMUNICATORS AND VIRTUAL TOPOLOGIES

Agenda. MPI Application Example. Praktikum: Verteiltes Rechnen und Parallelprogrammierung Introduction to MPI. 1) Recap: MPI. 2) 2.

1 Overview. KH Computational Physics QMC. Parallel programming.

Message Passing Programming

Introduction to the Message Passing Interface (MPI)

MPI: Parallel Programming for Extreme Machines. Si Hammond, High Performance Systems Group

Lecture 13. Writing parallel programs with MPI Matrix Multiplication Basic Collectives Managing communicators

Intermediate MPI. M. D. Jones, Ph.D. Center for Computational Research University at Buffalo State University of New York

4. Parallel Programming with MPI

Lecture 4 Introduction to MPI

Parallelization. Tianhe-1A, 2.45 Pentaflops/s, 224 Terabytes RAM. Nigel Mitchell

CS 426. Building and Running a Parallel Application

A Message Passing Standard for MPP and Workstations

Scalasca performance properties The metrics tour

mpidl The Power of MPI in IDL Version Tech-X Corporation 5621 Arapahoe Avenue, Suite A Boulder, CO

Parallel Programming Using MPI

Topics. Lecture 6. Point-to-point Communication. Point-to-point Communication. Broadcast. Basic Point-to-point communication. MPI Programming (III)

The Message Passing Model

Holland Computing Center Kickstart MPI Intro

Part 3: Message Passing Programming. foils by M. Allen, T. Fahringer, M. Gerndt, M. Quinn, B. Wilkinson

Transcription:

Parallel Computing MPI Collective communication Thorsten Grahs, 18. May 2015

Table of contents Collective Communication Communicator Intercommunicator 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 2

Collective Communication Communication involving a group of processes Selection of the collective group by a suitable communicator All communication members get an identical call. No tags Collective communication...does not necessarily mean all processes (i.e. global communication) 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 3

Collective Communication Amount of data sent must exactly match the amount of data received Collective routines are collective across an entire communicator and must be called in the same order from all processors within the communicator Collective routines are all blocking Buffer can be reused upon return Collective routines may return as soon as the calling process participation is complete No mixing of collective and point-to-point communication 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 4

Collective Communication functions Barrier operation MPI_Barrier() All tasks waiting for each other Broadcast operation MPI_Bcast() One task sends to all Accumulation operation MPI_Reduce() One task associated / acts on distributed data Gather operation MPI_Gather() One task collects/gather data Scatter operation MPI_Scatter() One task scatter data (e.g. a vector) 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 5

Multi-Task functions Multi-Broadcast operation MPI_Allgather() All participating tasks make the data available to other participating tasks Multi-Accumulation operation MPI_Allreduce() All participating tasks get result of the operation Total exchange MPI_Alltoall() Each involved task sends and receives to/from all 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 6

Synchronisation Barrier operation MPI_Barrier(comm) All tasks in comm wait on each other to achieve a barrier. Only collective routine which provides explicit synchronization Returns at any processor only after all processes have entered the call Barrier can be used to ensure all processes have reached a certain point in the computation Mostly used for synchronization sequence of tasks (e.g. debugging) 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 7

Example: MPI_Barrier Tasks are waiting on each other MPI_Isend is not completed Data can not be accessed. 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 8

Broadcast operation MPI_Bcast(buffer,count,datatype,root,communicator) All processes in the communicator use same function call. Data from rank root process are distributed to all process in the communicator The call is blocking, but not connected to synchronization 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 9

Accumulation operation MPI_Reduce(sendbf,recvbf,count,type,op,master,comm) Calling process is master Join operation op (e.g. summation) Processes involved put their local data into sendbf master collects results into recvbf 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 10

Reduce operation Pre-defined operations MPI_MAX MPI_MAXLOC MPI_MIN MPI_SUM MPI_PROD MPI_LXOR MPI_BXOR... maximum maximum and index of maximum minimum summation product logical exclusive OR bitwise exclusive OR 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 11

Example: Reduce Summation MPI_Reduce(teil,s,1,MPI_DOUBLE,MPI_SUM,0,comm) 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 12

Gather operation MPI_Gather(sbf,scount,stype,rbf,rcount,rtype,ma,comm) sbuf local send-buffer rbuf receive-buffer from master ma Each processor sends rcount elements of data type rtype to master ma Order of data in the rbuf corresponds to numerical order in communicator comm 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 13

Scatter operation MPI_Scatter(sbf,scount,stype,rbf,rcount,rtype,ma,comm) Master ma distributes/scatters data from sbf Each process receives sub-buffers from sbf in local receive buffer rbf Master ma sends to itself Order of data in the rbuf corresponds to numerical order in communicator comm 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 14

Example: Scatter Three processes involved in comm Send-buffer: int sbuf[6]={3,14,15,92,65,35}; Recieve-buffer: int rbuf[2]; Function call MPI_Scatter(sbuf,2,MPI_INT,rbuf,2,MPI_INT,0,comm); leads to the following distribution: Process rbuf 0 { 3, 14} 1 {15, 92} 2 {65, 35} 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 15

Example Scatter-Gather: Averaging 1 if (world_rank == 0) 2 rand_nums = create_rand_nums(elements_per_proc * world_size); 3 4 // Create a buffer that will hold a subset of the random numbers 5 float *sub_rand_nums = malloc(sizeof(float) * elements_per_proc); 6 7 // Scatter the random numbers to all processes 8 MPI_Scatter(rand_nums, elements_per_proc, MPI_FLOAT, sub_rand_nums, 9 elements_per_proc, MPI_FLOAT, 0, MPI_COMM_WORLD); 10 // Compute the average of your subset 11 float sub_avg = compute_avg(sub_rand_nums, elements_per_proc); 12 // Gather all partial averages down to the root process 13 float *sub_avgs = NULL; 14 if (world_rank == 0) 15 sub_avgs = malloc(sizeof(float) * world_size); 16 MPI_Gather(&sub_avg,1, MPI_FLOAT, sub_avgs, 1, MPI_FLOAT, 0,MPI_COMM_WORLD); 17 18 // Compute the total average of all numbers. 19 if (world_rank == 0) 20 float avg = compute_avg(sub_avgs, world_size); 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 16

Multi-broadcast operation MPI_Allgather(sbuf,scount,stype,rbuf,rcount,rtype,comm) Data from local sbuf are sent to all in rbuf Indication of master redundant since all processes receive the same data MPI_Allgather corresponds to MPI_Gather followed by a MPI_Bcast 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 17

Example Allgather: Averaging 1 // Gather all partial averages down to all the processes 2 float *sub_avgs = (float *)malloc(sizeof(float) * world_size); 3 MPI_Allgather(&sub_avg, 1, MPI_FLOAT, sub_avgs, 1, MPI_FLOAT, 4 MPI_COMM_WORLD); 5 6 // Compute the total average of all numbers. 7 float avg = compute_avg(sub_avgs, world_size); Output /home/th/: mpirun -n 4./average 100 Avg of all elements from proc 1 is 0.479736 Avg of all elements from proc 3 is 0.479736 Avg of all elements from proc 0 is 0.479736 Avg of all elements from proc 2 is 0.479736 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 18

Total exchange MPI_Alltoall(sbuf,scount,stype,rbuf,rcount,rtype,comm) Matrix view Before MPI_Alltoall process k has row k of the matrix After MPI_Alltoall process k has column k of the matrix MPI_Alltoall corresponds to MPI_Gather followed by a MPI_Scatter 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 19

Variable exchange operations Variable scatter & Gather variants MPI_Scatterv & MPI_Gatherv Variable are: Number of data elements that will be distributed to individual processes Their position in the send-buffer sbuf 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 20

Variable Scatter & Gather Variable scatter MPI_Scatterv(sbf,scount,displs,styp, rbf,rcount,rtyp,ma,comm) scount[i] contains the number of data elements which has to be send to process i. displs[i] defines the start of the data block for process i relative to sbuf. Variable gather MPI_Gatherv(sbuf,scount,styp, rbuf,rcount,displs,rtyp,ma,comm) Also variable function for Allgather, Allscatter & Alltoall 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 21

Example MPI_Scatterv 1 /*Initialising */ 2 if(myrank==root) init(sbuf,n); 3 /* Splitting work and data */ 4 MPI_Comm_size(comm,&size); 5 Nopt=N/size; 6 Rest=N-Nopt*size; 7 displs[0]=0; 8 for(i=0;i<n;i++) { 9 scount[i]=nopt; 10 if(i>0) displs[i]=displs[i-1]+scount[i-1]*sizeof(double); 11 if(rest>0) { scount[i]++; Rest--;} 12 } 13 /* Distributing data */ 14 MPI_Scatterv(sbuf,scount,displs,MPI_DOUBLE,rbuf, 15 scount[myrank],mpi_double,root,comm); 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 22

Comparison between BLAS & Reduce Multiplication Matrix with Vector 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 23

Example comparison Compare different approaches A R N M, N rows, M columns Row-wise distribution y=ax BLAS-Routine Column-wise distribution Reduction operation 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 24

Example row-wise Row-wise distribution Result vector y distributed 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 25

Example row-wise BLAS Building block Multiplikation Matrix*Vektor BLAS (Basic Linear Algorithm subroutines) algorithm dgemv 1 void local_mv(n,m,y,a,lda,x) 2 { 3 double x[n],a[n*m],y[m],s; 4 /*partial sum-local op.*/ 5 for(i=0;i<m;i++) { 6 s=0; 7 for(j=0;j<n;j++) 8 s+=a[i*lda+j]*x[j]; 9 y[i]=s; 10 } 11 } Timing arith. 2 N M T a mem.access - x M T m (N, 1) - y T m (M, 1) - A M T m (N, 1) 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 26

Example row-wise vector Task Initial distribution: All data at process 0 Result vector y expected at process 0 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 27

Example row-wise matrix Operations Distribute x to all processes: MPI_Bcast (p-1)*tk(n) Distribute rows of A: MPI_Scatter (p-1)*tk(m*n) Vector x Matrix A 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 28

Example row-wise results Operations Arithmetic Communication Memory access 2 N M T a (p 1) [T k (N) + T k (M N) + T k (M)] 2 M T m (N, 1) + T m (M, 1) 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 29

Example column-wise Task Distribution column-wise Solution vector assembled by reduction operation 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 30

Example column-wise vector Distributing vector x Vector x MPI_Scatter (p-1)*tk(m) 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 31

Example column-wise matrix Distributing matrix A Matrix A pack blocks in buffer memory: N*Tm(M,1)+M*Tm(N,1) Sending: (p-1)tk(m*n) 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 32

Example column-wise result Assemble vector y MPI_Reduce Cost for reduction of y: log 2 (p)(t k (N) + NT a + 2T m (N, 1)) Arithmetic 2 N M T a Communication (p 1)[T k (M) + T K (M N)] + log 2 (p)t k (N) Memory access N T m (M, 1) + M T M (N, 1) + 2 log 2 (p)t m (N, 1) Algorithm is slightly faster Parallelization is only useful if the corresponding data distribution is already available before the algorithm starts 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 33

Communicator 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 34

Communicators Motivation Communicator: Distinguish different contexts Conflict-free organization of groups Integration of third party software Example: Distinction between library functions application Predefined communicators MPI_COMM_WORLD MPI_COMM_SELF MPI_COMM_NULL 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 35

Duplicate communicators MPI_Comm_dup(MPI_COMM comm, MPI_COMM &newcomm); Creates a copy newcomm of comm Identical process group Allows clear delineation characterisation of process groups example MPI_COMM myworld;... MPI_Comm_dup(MPI_COMM_WORLD, &myworld) 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 36

Splitting communicators MPI_Comm_split(MPI_COMM comm, int color, int key, MPI_COMM &newcomm); Divides communicator comm into multiple communicators with disjoint processor groups MPI_Comm_split has to be called by all processes in comm Processes with the same value of color forms a new communicator group 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 37

Example Splitting communicator 1 MPI_COMM comm1, comm2; 2 MPI_Comm_size(comm,&size); 3 MPI_Comm_rank(comm,&rank); 4 i=rank%3; 5 j=size-rank; 6 if(i==0) 7 MPI_Comm_split(comm,MPI_UNDEFINED,0,&newcomm); 8 else if(i==1) 9 MPI_Comm_split(comm,i,j,&comm1); 10 else 11 MPI_Comm_split(comm,i,j,&comm2) MPI_UNDEFINED returns null-handle MPI_COMM_NULL. 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 38

Example Splitting communicator MPI_COMM_WORLD Rang P0 P1 P2 P3 P4 P5 P6 P7 P8 color 1 2 1 2 1 2 key 8 7 6 5 4 3 2 1 0 MPI_COMM_WORLD comm1 P1 P4 P7 2 1 0 comm2 P2 P5 P8 2 1 0 P0 P3 P6 0 1 2 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 39

Free communicator group Clean up MPI_COMM_free(MPI_COMM *comm); Deletes the communicator comm Resources occupied by comm are released by MPI. After the function call, the communicator has the value of the null-handle MPI_COMM_NULL MPI_COMM_free has to be called by all process, which belongs to comm 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 40

Grouping communicators MPI_COMM_group(MPI_COMM comm, MPI_Group grp) Creates a process group from a communicator More group constructors MPI_COMM_create Generating a communicator from the group MPI_Group_incl Include processes into a group MPI_Group_excl Exclude processes from a group MPI_Group_range_incl Forms a group from a simple pattern MPI_Group_range_excl Excludes processes from a group by simple pattern 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 41

Example: create a group Group grp=(a,b,c,d,e,f,g), n=3, rank=[5,0,2] MPI_Group_incl(grp, n, &rank, &newgrp) Include in new group newgrp n=3 processes defined by pattern rank=[5,0,2] newgrp=(f,a,c) MPI_Group_excl(grp, n, &rank, &newgrp) Exclude from new group newgrp n=3 processes defined by pattern rank=[5,0,2] newgrp=(b,d,e,g) 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 42

Example: create a group II Group grp=(a,b,c,d,e,f,g,h,i,j),, n=3, ranges=[[6,7,1],[1,6,2],[0,9,4]] Ranges forms a triple [start, end, spacing] MPI_Group_range_incl(grp, 3, ranges, &newgrp) Include in new group newgrp n=3 range triples defined by [[6,7,1],[1,6,2],[0,9,4]] newgrp=(g,h,b,d,f,a,e,i) MPI_Group_range_excl(grp, 3, ranges, &newgrp) Exclude from new group newgrp n=3 range triples defined by [[6,7,1],[1,6,2],[0,9,4]] newgrp=(c,j) 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 43

Operations on communicator groups More grouping functions Merging groups Intersection of groups Difference of groups Comparing groups Delete/Free groups Size of a group Rank of a group... MPI_Group_union MPI_Group_intersection MPI_Group_difference MPI_Group_compare MPI_Group_free MPI_Group_size MPI_Group_rank 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 44

Intercommunicator 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 45

Intercommunicator Intracommunicator Up til now, we had only handled communication inside a contiguous group. This communication was inside (intra/internal) a communicator. Intercommunicator A communicator who establishes a context between groups Intercommunicators are associated with 2 groups of disjoint processes Intercommunicators are associated with a remote group and a local group The target process (destination for send, source for receive) is its rank in the remote group. A communicator is either intra or inter, never both 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 46

Create intercommunicator MPI_Intercomm_create(local_comm, local_bridge, bridge_comm, remote_bridge, tag, &newcomm ) local_comm local Intracommunicator (handle) local_bridge Rank of a distinguished process in local_comm (integer) bridge_comm Remote intracommunication, which should be connected to local_comm by the newly build intercommunicator newcomm remote_bridge Rank of a certain process in remote communicator 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 47

Communication between groups Function uses point-to-point communication with specified tag between the two processes defined as bridge heads. 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 48

Example 1 int main(int argc, char **argv) 2 { 3 MPI_Comm mycomm; /* intra-communicator local sub-group */ 4 MPI_Comm myfirstcomm; /* inter-communicator */ 5 MPI_Comm mysecondcomm; /* second inter-communicator (group 1 only) */ 6 int memberkey, rank; 7 8 MPI_Init(&argc, &argv); 9 MPI_Comm_rank(MPI_COMM_WORLD, &rank); 10 11 /* User code must generate memberkey in the range [0, 1, 2] */ 12 memberkey = rank % 3; 13 14 /* Build intra-communicator for local sub-group */ 15 MPI_Comm_split(MPI_COMM_WORLD,memberKey,rank,&myComm); 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 49

Example 1 /* Build inter-communicators. Tags are hard-coded. */ 2 if (memberkey == 0) 3 { /*Group 0 communicates with group 1. */ 4 MPI_Intercomm_create( mycomm, 0, MPI_COMM_WORLD, 1, 5 01, &myfirstcomm); } 6 else if (memberkey == 1) 7 { /* Group 1 communicates with groups 0 and 2. */ 8 MPI_Intercomm_create( mycomm, 0, MPI_COMM_WORLD, 0, 9 01, &myfirstcomm); 10 MPI_Intercomm_create( mycomm, 0, MPI_COMM_WORLD, 2, 11 12, &mysecondcomm); 12 } 13 else if (memberkey == 2) 14 { /* Group 2 communicates with group 1. */ 15 MPI_Intercomm_create( mycomm, 0, MPI_COMM_WORLD, 1, 16 12, &mysecondcomm); 17 } 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 50

Example 1 /* Do work... */ 2 3 switch(memberkey) /* free communicators appropriately */ 4 { 5 case 1: 6 MPI_Comm_free(&myFirstComm); 7 MPI_Comm_free(&mySecondComm); 8 case 0: 9 MPI_Comm_free(&myFirstComm); 10 case 2: 11 MPI_Comm_free(&mySecondComm); 12 break; 13 } 14 15 MPI_Finalize(); 16 } 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 51

Motivation Intercommunicator Used for Meta-Computing Cloud-Computing Low bandwidth between components e.g. cluster < > pc bridge head controls communication with remote-computer 18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 52