Intermediate MPI features Advanced message passing Collective communication Topologies Group communication Forms of message passing (1) Communication modes: Standard: system decides whether message is buffered Buffered: user explicitly controls buffering Synchronous: send waits for matching receive Ready: send may be started only if matching receive has already been posted 1
Forms of message passing (2) Non-blocking communication When blocking send returns, memory buffer can be reused Blocking receive waits for message Non-blocking send returns immediately (dangerous) Non-blocking receive through IPROBE Non-blocking receive MPI_IPROBE MPI_PROBE MPI_GET_COUNT message check for pending message wait for pending message number of data elements in MPI_PROBE (source, tag, comm, &status) => status MPI_GET_COUNT (status, datatype, &count) => message size status.mpi_source => identity of sender status.mpi_tag => tag of message 2
Example: Check for Pending Message int buf[1], flag, source, minimum; while (...) { MPI_IPROBE(MPI_ANY_SOURCE, NEW_MINIMUM, comm, &flag, &status); if (flag) { /* handle new minimum */ source = status.mpi_source; MPI_RECV (buf, 1, MPI_INT, source, NEW_MINIMUM, comm, &status); minimum = buf[0]; }... /* compute */ } Example: Receiving Message with Unknown Size int count, *buf, source; MPI_PROBE(MPI_ANY_SOURCE, 0, comm, &status); source = status.mpi_source; MPI_GET_COUNT (status, MPI_INT, &count); buf = malloc (count * sizeof (int)); MPI_RECV (buf, count, MPI_INT, source, 0, comm, &status); 3
Global Operations - Collective Communication Coordinated communication involving all processes Functions: MPI_BARRIER MPI_BCAST MPI_GATHER MPI_SCATTER MPI_REDUCE MPI_ALLREDUCE result synchronize all processes send data to all processes gather data from all processes scatter data to all processes reduction operation reduction, all processes get Barrier MPI_Barrier (comm) Synchronizes group of processes All processes block until all have reached the barrier Often invoked at end of loop in iterative algorithms 4
Broadcast operation Sends the same data packet to all processes in the communicator MPI_Bcast(void *buffer, int count, MPI_Datatype datatype, int root, MPI_Comm comm) 5
Reduction Combine values provided by different processes Result sent to one processor (MPI_Reduce) or all processors (MPI_Allreduce) Used with commutative and associative operators: MAX, MIN, +, x, AND, OR MPI_Reduce(void *sendbuf, void *recbuff, int count, MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm comm) Example 1 Global minimum operation MPI_REDUCE (inbuf, outbuf, 2, MPI_INT, MPI_MIN, 0, MPI_COMM_WORLD) outbuf[0] = minimum over inbuf[0]'s outbuf[1] = minimum over inbuf[1]'s 6
Reduce operation 7
SOR communication scheme Each CPU communicates with left & right neighbor (if existing) Also need to determine convergence criteria Expressing SOR in MPI Use a ring topology Each processor exchanges rows with left/right neighbor Use REDUCE_ALL to determine if grid has changed less than epsilon during last iteration 8
Send iteration number Send data Exchange border data Check convergence error Collect result Topologies E.g. Cartesian coordinate system int MPI_Cart_create ( MPI_Comm comm_old, int ndims, int *dims, int *periods, int reorder, MPI_Comm *comm_cart ) Makes a new communicator to which topology information has been attached Input Parameters comm_old input communicator (handle) ndims number of dimensions of cartesian grid (integer) dims integer array of size ndims specifying the number of processes in each dimension periods logical array of size ndims specifying whether the grid is periodic (true) or not (false) in each dimension reorder ranking may be reordered (true) or not (false) (logical) 9
Group communication Several groups can be created in an MPI program. int MPI_Comm_create ( MPI_Comm comm, MPI_Group group, MPI_Comm *comm_out ) Creates a new communicator Input Parameters comm communicator (handle) group group, which is a subset of the group of comm (handle) Other features Parallel I/O Threads Complex data types Shared memory operations Dynamic process management Error handling 10