Masterpraktikum - Scientific Computing, High Performance Computing

Masterpraktikum - Scientific Computing, High Performance Computing Message Passing Interface (MPI) and CG-method Michael Bader Alexander Heinecke Technische Universität München, Germany

Outline MPI Hello World P2P communication Collective operations Virtual topologies and communicators CG-method in a nutshell 2

Hello World #include <mpi.h> void main(int argc, char **argv){ int rank, size; MPI Init(&argc, &argv); MPI Comm rank(mpi COMM WORLD, &rank); MPI Comm size(mpi COMM WORLD, &size); printf("hello World! (rank %d of %d)", rank, size); MPI Finalize(); } int MPI Comm size(mpi Comm comm, int *size) Returns the number of processes in the communicator MPI COMM WORLD : Predefined standard communicator. Includes all processes of a parallel application. int MPI Comm rank(mpi Comm comm, int *size) Returns the process number of the executing process. 4

Point-to-Point Communication MPI Send(void *buf, int count, MPI Datatype datatype, int dest, int tag, MPI Comm communicator); MPI Recv(void *buf, int count, MPI Datatype datatype, int source, int tag, MPI Comm communicator, MPI Status *status); Blocking operations (return when buffer can be reused) rank (dest/source) and tag of send- and recieve-call must match Wildcards for recieve-calls MPI ANY SOURCE, MPI ANY TAG, MPI STATUS IGNORE Messages with same destination rank do not overtake each other (order preservation) 5

MPI Datatypes MPI datatype MPI CHAR MPI SHORT MPI INT MPI LONG MPI UNSIGNED CHAR MPI UNSIGNED... MPI FLOAT MPI DOUBLE C datatype signed char signed short int signed int signed long int unsigned char unsigned int float double 6

Point-to-Point Communication Example: ring.c... int rank, size, dest, src; double *s buf, *r buf; MPI Status status;... dest = (rank + 1) % size; src = (rank - 1 + size) % size; MPI Send(s buf,2,mpi DOUBLE,dest,0,MPI COMM WORLD); MPI Recv(r buf,2,mpi DOUBLE,src,0,MPI COMM WORLD,&status);... 8

Non-blocking Communication MPI Isend(void *buf, int count, MPI Datatype datatype, int dest, int tag, MPI Comm communicator, MPI Request *request); MPI Irecv(void *buf, int count, MPI Datatype datatype, int source, int tag, MPI Comm communicator, MPI Request *request); Returns immediately Separates communication into three phases (1) initiate communication (2) do something else (3) wait for communication to complete MPI Request-object is used to test / wait for completition. 10

Non-blocking Communication MPI Wait(MPI Request *request, MPI Status *status); Waits until pending communication is finished. MPI Test(MPI Request *request, int *flag, MPI Status *status); Tests if pending communication is finished. Other routines MPI Waitall, MPI Testall MPI Waitany, MPI Testany MPI Waitsome, MPI Testsome 11

Collective Operations Three types of collective operations Synchronization (MPI Barrier,...) Communication (MPI Bcast,...) Reduction (MPI Allreduce,...) Must be executed by all processes of the communicator All collective operations are blocking operations MPI 3.0 will contain non-blocking collective operations 12

Collective Operations MPI Barrier (MPI Comm comm); Blocks until all processes of the communicator have reached the barrier. 13

Collective Operations MPI Bcast (void *buf, int count, MPI Datatype dtype, int root, MPI Comm comm); 14

Collective Operations MPI Gather(void *sendbuf, int sendcnt, MPI Datatype sendtype, void* recvbuf, int recvcnt, MPI Datatype recvtype, int root, MPI Comm comm); MPI Scatter(void *sendbuf, int sendcnt, MPI Datatype sendtype, void *recvbuf, int recvcnt, MPI Datatype recvtype, int root, MPI Comm comm); 15

Collective Operations MPI Alltoall(void *sendbuf, int sendcount, MPI Datatype sendtype, void *recvbuf, int recvcount, MPI Datatype recvtype, MPI Comm comm); 16

Collective Operations MPI Reduce (void* sbuf, void* rbuf, int count, MPI Datatype dtype,mpi Op op, int root, MPI Comm comm); Accumulates the elements in sbuf and delivers the results to process root. MPI Op is a Reduction Operation Handle. Possible values: MPI MAX (Maximum) MPI MIN (Minimum) MPI SUM (Sum) MPI PROD (Product) MPI BAND (Bitwise AND)... Similar routines: MPI Allreduce, MPI Reduce scatter 17

Virtual Topologies Processes of a communicator (e.g. MPI COMM WORLD) can be mapped to a Cartesian Topology a Graph Topology Allow convenient process naming with cartesian process coordinates. May lead to better performance (network aware programming). 18

Virtual Topologies MPI Cart create(mpi Comm comm old, int ndims, int *dims, int *periods, int reorder, MPI Comm *comm cart); Creates a communicator with cartesian topology MPI Cart sub(mpi Comm comm, int *remain dims, MPI Comm *newcomm); Cuts a grid up into slices. 19

Virtual Topologies MPI Cart rank(mpi Comm comm, int *coords, int *rank); Converts grid coordinates into process rank. MPI Cart coords(mpi Comm comm, int rank, int maxdims, int *coords); Returns the grid coordinates of process rank. 20

Other useful routines double MPI Wtime(); Returns the elapsed time on the calling processor 21

Resources MPI 2.2 Standard http://www.mpi-forum.org/docs/mpi-2.2/mpi22-report.pdf List of MPI routines http://mpi.deino.net/mpi functions/ 22

Solving Systems of Linear Equations 23

Solving Systems of Linear Equations Given: A x = b, A regular, b known Direct Methods Gauß LU Decomposition QR Decomposition Iterative Methods Splitting Methods (Jacobi, Gauß-Seidl, SOR) Projection Methods CG, GMRES, BiCGSTAB QR Decomposition 24

CG Method Method to solve SLEs with a symmetric, positive definite system matrix Details: An Introduction to the Conjugate Gradient Method Without the Agonizing Pain by Jonathan Richard Shewchuk Idea: Minimize: F(x) = 1 2 (Ax, x) (b, x) F(x) = 1 2 (A + AT )x b = Ax b Definition residual: r = b Ax i 25

CG Method - Code 26

Laplace Equation Here: x R 2 Solve: f ( x) = 0 we have Dirichlet boundary conditions we employ a regular full grid we do not construct the matrix, we use an implicitly given operator finite differences, e.g. in 1D: f (x) = f (x+h) 2 f (x)+ f (x h) h 2 27