Parallel Programming Using MPI (Message Passing Interface)
Message Passing Model Simple implementation of the task/channel model Task Process Channel Message Suitable for a multicomputer Number of processes specified at startup Remains constant All execute from the same code Each has a unique ID (rank) allows different paths inside the code Alternately perform computations and communications 2010@FEUP Using MPI 2
Implementation daemon messages daemon Send() Send() process computer node process computer node Recv() Recv() Recv() Send() 2010@FEUP Using MPI 3
Launching > mpiexec n <np> <executable> <args> Other options: -wdir <working directory> -host <hostname> Several hosts -hosts <n host1 host2 host3 hostn> -hosts <n host1 m1 host2 m2 host3 m3 hostn mn> -machinefile <filename> - one host per line, # commented Several executables example: mpiexec -n 1 foo master : -n 8 worker 2010@FEUP Using MPI 4
Building a program Usually there are bindings and libraries for C C++ Fortran Function declarations for C and C++ are in an include file: mpih When compiling it is necessary to link to the library containing the implementation: The name and the type (static or dynamic) is implementation dependent (usually libmpia or mpilib) Some installations provide a compilation script named mpicc: > mpicc myprogc -o myprog 2010@FEUP Using MPI 5
Initializing and finalizing First call to the MPI library must be to MPI_Init() Last call must be to MPI_Finalize() #include <mpih> int main(int argc, char *argv[ ]) { MPI_Init(&argc, &argv); // other MPI calls MPI_Finalize(); return 0; } 2010@FEUP Using MPI 6
Communicators Communicator: set of processes that can exchange messages between them MPI_COMM_WORLD Default communicator already built Includes all processes Possible to create new communicators Communicator Processes Ranks MPI_COMM_WORLD 0 5 2 1 4 3 Communicator Name 2010@FEUP Using MPI 7
Number and rank of the MPI processes Each process can determine the total number of processes in a communicator: int MPI_Comm_size(MPI_Comm comm, int *size); Each process can obtain its own identifier or rank in a communicator: int MPI_Comm_rank(MPI_Comm comm, int *rank); the ranks begin with 0 Time from a common point in the past can be obtained with: double MPI_Wtime(void); the returned time is in seconds double elapsedtime; elapsedtime = -MPI_Wtime(); elapsedtime += MPI_Wtime(); 2010@FEUP Using MPI 8
Send and receive a message To send a message to another process: int MPI_Send(void *buf, int count, MPI_Datatype dtype, int dest, int tag, MPI_Comm comm); Usual predefined datatype: MPI_CHAR, MPI_INT, MPI_SHORT, MPI_LONG, MPI_LONG_LONG, MPI_FLOAT, MPI_DOUBLE, MPI_LONG_DOUBLE, MPI_WCHAR, MPI_BYTE, MPI_PACKED To receive a message: int MPI_Recv(void *buf, int count, MPI_Datatype dtype, int source, int tag, MPI_Comm comm, MPI_Status *status); For receiving from any source use MPI_ANY_SOURCE For receiving from any tag use MPI_ANY_TAG 2010@FEUP Using MPI 9
Collective operations Can involve more than 2 processes Barrier synchronization Broadcast data from one to all processes Gather collect data from all and joins it Scatter sends a different piece of data from one to all Allgather the same as gather but everyone receives the all data Alltoall a combination of gather and scatter involving all processes also known as complete exchange Reduce a global operation from all to one Allreduce everyone receives the result ReduceScatter Combine reduction with a scatter Scan Process i receives the reduction of processes 0i 2010@FEUP Using MPI 10
Barrier int MPI_Barrier(MPI_Comm comm); 2010@FEUP Using MPI 11
Broadcast int MPI_Bcast(void *buf, int count, MPI_Datatype dtype, int root, MPI_Comm comm); 2010@FEUP Using MPI 12
Gather int MPI_Gather(void *sendbuf, int sendcount, MPI_Datatype sdtype, void *recvbuf, int recvcount, MPI_Datatype rctype, int root, MPI_Comm comm); 2010@FEUP Using MPI 13
Allgather int MPI_Allgather(void* sbuf, int scount, MPI_Datatype sdtype, void* rbuf, int rcount, MPI_Datatype rctype, MPI_Comm comm); 2010@FEUP Using MPI 14
Scatter int MPI_Scatter(void *sendbuf, int sendcount, MPI_Datatype sdtype, void *recvbuf, int recvcount, MPI_Datatype rctype, int root, MPI_Comm comm); 2010@FEUP Using MPI 15
AlltoAll int MPI_Alltoall(void *sbuf, int scount, MPI_Datatype sdtype, void *rbuf, int rcount, MPI_Datatype rctype, MPI_Comm comm); 2010@FEUP Using MPI 16
Reduce int MPI_Reduce(void *sbuf, void* rbuf, int count, MPI_Datatype dtype, MPI_Op op, int root, MPI_Comm comm); Operations: MPI_MAX, MPI_MIN, MPI_SUM, MPI_PROD, MPI_LAND, MPI_BAND, MPI_LOR, MPI_BOR, MPI_LXOR, MPI_BXOR, MPI_MAXLOC, MPI_MINLOC 2010@FEUP Using MPI 17
AllReduce int MPI_Allreduce(void* sendbuf, void* recvbuf, int count, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm); All processes receive the reduce result 2010@FEUP Using MPI 18
ReduceScatter int MPI_Reduce_scatter(void* sendbuf, void* recvbuf, int *recvcounts, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm); recvcounts must be an array of size n (= nr of processes in the communicator) sendbuf in every process has recvcounts[i] elements After a reduction of size recvcounts[i] there is a scatter distributing the results to each process i according to recvcounts[i] 2010@FEUP Using MPI 19
Scan and Exscan int MPI_Scan(void* sendbuf, void* recvbuf, int count, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm ); int MPI_Exscan(void* sendbuf, void* recvbuf, int count, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm ); Scan does a reduction and puts in process i the reduction from processes 0i (0i-1 if an Exscan exclusive scan) 2010@FEUP Using MPI 20