Parallel Computing MPI. Christoph Beetz. September 7, Parallel Computing. Introduction. Parallel Computing

Size: px

Start display at page:

Download "Parallel Computing MPI. Christoph Beetz. September 7, Parallel Computing. Introduction. Parallel Computing"

Martin Welch
6 years ago
Views:

1 Christoph Beetz Theoretical Physics I, Ruhr-Universität Bochum September 7, 2010 What is Basic

2 Overview to code What is Basic

3 Why Calculation is too big to fit in memory of one machine domain on several machines takes too much time with only one CPU speed up with lots of CPU s What is Basic

4 Parallel Computer Architectures SMP: Symmetric Multiprocessor uses shared system resources What is Basic

5 Parallel Computer Architectures MPP: Massively Parallel Processors consists of nodes connected by a high-speed network What is Basic

6 Parallel Computer Architectures Mixed SMP and MPP consists of nodes with several cores What is Basic

7 Parallel Computer Architectures Hardware (Examples) Name CPUs CPU/Node Mem/Node Paddy GB portable Euler GB DaVinci x 516 GPUs/Node BlueGene / P GB RoadRunner x16 GB Opteron + Cell Jaguar x6 16 GB What is Basic

8 Software SMP Multithreaded ms What is Basic

9 Software SMP Multithreaded ms Posix-Threads ( Threads) OpenMP ( What is Basic

10 Software MPP Local serial s with communication What is Basic

11 Software MPP Local serial s with communication High Performance Fortran ( (main focus of this lecture) CH Open What is Basic

12 Heat transfer Problem Solve heat transfer equation in 2 dimensions on a cartesian grid with periodic boundary conditions: tu = a u (1) What is Basic

13 Heat transfer Serial heat implicit none integer, parameter :: mx=32, my=32 real(kind=8) :: u(0:mx+1,0:my+1) real(kind=8) :: unp1(0:mx+1,0:my+1) integer :: i,j,ic real(kind=8) :: x,y,xb,xe,yb,ye,dx,dy real(kind=8) :: ax,ay,pi,time,dt ax = 1 ay = 1 pi = 4.*atan(1.d0) dt = ! critical value dt = xb = 0.d0 xe = 1 yb = 0.d0 ye = 1 dx = (xe-xb)/mx dy = (ye-yb)/my write(*, (a,e12.6) ) cfl =, dt/min(dx,dy)**2 u=0. do j = -1,my+1 y = yb + j*dy do i = -1,mx+1 x = xb + i*dx if ((x-0.5)**2+(y-0.5)**2.lt. 0.2) then u(i,j)= 1 end if end do end do time = 0. ic = 0 call VTKout(u,mx,my) c --> simple Euler step do while (time.le. 1000*dt) c--> heat equation unp1(1:mx,1:my) = u(1:mx,1:my) & +dt*((u(2:mx+1,1:my)-2.*u(1:mx,1:my) & +u(0:mx-1,1:my))/dx**2 & +(u(1:mx,2:my+1)-2.*u(1:mx,1:my) & +u(1:mx,0:my-1))/dy**2) call boundary(unp1,mx,my) u = unp1 time = time+dt ic = ic+1 c write(*,*) time =,time if (ic > 10) then ic = 0 call VTKout(u,mx,my) end if end do return end subroutine boundary(u,mx,my) implicit none integer :: mx,my real(kind=8) :: u(0:mx+1,0:my+1) integer :: i,j do i=1,mx u(i,0 ) = u(i,my) u(i,my+1) = u(i,1 ) end do do j=0,my+1 u(0,j ) = u(mx,j) u(mx+1,j) = u(1,j) end do return end What is Basic

14 Serial The Grid What is Basic real(kind=8) :: u(0:mx+1,0:my+1)

15 Serial 1.Step: Computation What is Basic & & unp1(1:mx,1:my) = u(1:mx,1:my) +dt*((u(2:mx+1,1:my)-2.*u(1:mx,1:my)+u(0:mx-1,1:my))/dx**2 +(u(1:mx,2:my+1)-2.*u(1:mx,1:my)+u(1:mx,0:my-1))/dy**2)

16 Serial 2.Step: Boundary Exchange What is Basic call boundary(unp1,mx,my)

17 Serial Parallel The Grid mpirun -np 4 heatmpi call _INIT(ierror) call _COMM_SIZE(_COMM_WORLD, & nprocs,ierror) call _COMM_RANK(_COMM_WORLD, & myrank,ierror) What is Basic

18 Parallel The Grid! GLOBAL FIELD DIMENSIONS integer, parameter :: nx=1024, ny=1024! LOCAL FIELD DIMENSIONS integer :: mx,my nprocsxy(1)=int(sqrt(dble(nprocs))) nprocsxy(2)=nprocs/nprocsxy(2) if mod(nx,nprocsxy(1).ne.0) then stop mx=nx/nprocsxy(1) my=ny/nprocsxy(2) What is Basic fields must be allocatable now. Size (1:mx,1:my) depends on number of processes.

19 Parallel 1.Step: Computation For each process exactly the same computation as in serial case. What is Basic & & unp1(1:mx,1:my) = u(1:mx,1:my) +dt*((u(2:mx+1,1:my)-2.*u(1:mx,1:my)+u(0:mx-1,1:my))/dx**2 +(u(1:mx,2:my+1)-2.*u(1:mx,1:my)+u(1:mx,0:my-1))/dy**2)

20 Parallel 2.Step: Boundary Exchange What is Basic absolutely necessary! Each process has to send to his upper and lower neighbour in each direction. In one direction the data to be send is not contiguous in memory.

21 What is is a tool to consolidate what parallelization has seperated a library for communication between processes available for Fortran / C / C++ portable What is Basic

22 Basic Over 150 subroutines Only need a dozen to pararellize Classification Environment Management Collective Data Transfer Point to Point Data Transfer... What is Basic

23 Environment Management A minimal in C++ #include <mpi.h> void main(int* argc, char*** argv) { _Init(&argc, &argv); int nprocs,myrank; _Comm_size(_COMM_WORLD, &nprocs); _Comm_rank(_COMM_WORLD, &myrank); if (myrank==0) { std::cout << nproc << processes started. << std::endl; } std::cout << Hello from process << myrank << std::endl; _Finalize(); } What is Basic

24 Environment Management A minimal in C++ #include <mpi.h> void main(int* argc, char*** argv) { _Init(&argc, &argv); int nprocs,myrank; _Comm_size(_COMM_WORLD, &nprocs); _Comm_rank(_COMM_WORLD, &myrank); if (myrank==0) { std::cout << nproc << processes started. << std::endl; } std::cout << Hello from process << myrank << std::endl; _Finalize(); } What is Basic mpi.h provides parameter like COMM WORLD and variable types like INTEGER

25 Environment Management A minimal in C++ #include <mpi.h> void main(int* argc, char*** argv) { _Init(&argc, &argv); int nprocs,myrank; _Comm_size(_COMM_WORLD, &nprocs); _Comm_rank(_COMM_WORLD, &myrank); if (myrank==0) { std::cout << nproc << processes started. << std::endl; } std::cout << Hello from process << myrank << std::endl; _Finalize(); } What is Basic Init has to be called once and only once, before calling any other -subroutine to initialize the environment.

26 Environment Management A minimal in C++ #include <mpi.h> void main(int* argc, char*** argv) { _Init(&argc, &argv); int nprocs,myrank; _Comm_size(_COMM_WORLD, &nprocs); _Comm_rank(_COMM_WORLD, &myrank); if (myrank==0) { std::cout << nproc << processes started. << std::endl; } std::cout << Hello from process << myrank << std::endl; _Finalize(); } What is Basic In C++ all parameters have to be passed as pointers.

27 Environment Management A minimal in C++ #include <mpi.h> void main(int* argc, char*** argv) { _Init(&argc, &argv); int nprocs,myrank; _Comm_size(_COMM_WORLD, &nprocs); _Comm_rank(_COMM_WORLD, &myrank); if (myrank==0) { std::cout << nproc << processes started. << std::endl; } std::cout << Hello from process << myrank << std::endl; _Finalize(); } What is Basic With Communicators you can define groups of processes. COMM WORLD is the Communicator for all processes.

28 Environment Management A minimal in C++ #include <mpi.h> void main(int* argc, char*** argv) { _Init(&argc, &argv); int nprocs,myrank; _Comm_size(_COMM_WORLD, &nprocs); _Comm_rank(_COMM_WORLD, &myrank); if (myrank==0) { std::cout << nproc << processes started. << std::endl; } std::cout << Hello from process << myrank << std::endl; _Finalize(); } What is Basic Comm size returns number of processes in variable nprocs.

29 Environment Management A minimal in C++ #include <mpi.h> void main(int* argc, char*** argv) { _Init(&argc, &argv); int nprocs,myrank; _Comm_size(_COMM_WORLD, &nprocs); _Comm_rank(_COMM_WORLD, &myrank); if (myrank==0) { std::cout << nproc << processes started. << std::endl; } std::cout << Hello from process << myrank << std::endl; _Finalize(); } What is Basic Comm rank returns the identification number of the process in variable myrank.

30 Environment Management A minimal in C++ #include <mpi.h> void main(int* argc, char*** argv) { _Init(&argc, &argv); int nprocs,myrank; _Comm_size(_COMM_WORLD, &nprocs); _Comm_rank(_COMM_WORLD, &myrank); if (myrank==0) { std::cout << nproc << processes started. << std::endl; } std::cout << Hello from process << myrank << std::endl; _Finalize(); } What is Basic All processes run the same code. Different behaviour can be achived by using the rank of the processes.

31 Environment Management A minimal in C++ #include <mpi.h> void main(int* argc, char*** argv) { _Init(&argc, &argv); int nprocs,myrank; _Comm_size(_COMM_WORLD, &nprocs); _Comm_rank(_COMM_WORLD, &myrank); if (myrank==0) { std::cout << nproc << processes started. << std::endl; } std::cout << Hello from process << myrank << std::endl; _Finalize(); } What is Basic Finalize terminates the environment.

32 Environment Management A minimal in C++ #include <mpi.h> void main(int* argc, char*** argv) { _Init(&argc, &argv); int nprocs,myrank; _Comm_size(_COMM_WORLD, &nprocs); _Comm_rank(_COMM_WORLD, &myrank); if (myrank==0) { std::cout << nproc << processes started. << std::endl; } std::cout << Hello from process << myrank << std::endl; _Finalize(); } What is Basic Sample Output: 2 processes started Hello from process 1 Hello from process 0

33 Environment Management A minimal in Fortran PROGRAM hello IMPLICIT NONE include mpif.h INTEGER ierror, nprocs, myrank CALL _INIT(ierror) CALL _COMM_SIZE(_COMM_WORLD, nprocs, ierror) CALL _COMM_RANK(_COMM_WORLD, myrank, ierror) IF (myrank==0) PRINT *,nprocs, processes started PRINT *, Hello from process,myrank CALL _FINALIZE(ierror) END PROGRAM hello What is Basic use Fortran include file.

34 Environment Management A minimal in Fortran PROGRAM hello IMPLICIT NONE include mpif.h INTEGER ierror, nprocs, myrank CALL _INIT(ierror) CALL _COMM_SIZE(_COMM_WORLD, nprocs, ierror) CALL _COMM_RANK(_COMM_WORLD, myrank, ierror) IF (myrank==0) PRINT *,nprocs, processes started PRINT *, Hello from process,myrank CALL _FINALIZE(ierror) END PROGRAM hello What is Basic Return code is given in the last argument. Note, that the wrong number of arguments will result in segmentation fault during run-time.

35 Environment Management BARRIER Blocks all processes until the last one calls this routine. CALL _BARRIER(communicator, ierror) Can be used to serialize parallel s. e.g. to write in a file in order of processes. DO (i=0,nprocs) IF (i==myrank) THEN do some IO-Routines END IF CALL _BARRIER(_COMM_WORLD,ierror) END DO What is Basic

36 Collective Communication BROADCAST Broadcasts data from one process to all processes in the communicator. CALL _BCAST(buffer, count, datatype, root, communicator, ierror) buffer count datatype root comm ierror First element of send / receive data Number of elements to be send datatype of elements in buffer Which process sends the data Communicator of Processes data will be send to Return value What is Basic

37 Collective Communication BROADCAST Broadcasts data from one process to all processes in the communicator. IF (myrank==0) buffer=42 ELSE buffer=0 CALL _BCAST(buffer, 1, _INTEGER, 0, _COMM_WORLD, ierror) What is Basic

38 Collective Communication GATHER Collects individual messages from each process at the root process CALL _GATHER(sendbuf, sendcnt, sendtype, recvbuf, recvcnt, recvtype, root, comm, ierror) sendbuf sendcnt sendtype recvbuf recvcnt recvtype root comm ierror First element of send data Number of elements to be send datatype of elements in sendbuf Buffer, where data is collected. Cannot overlap sendbuf Number of elements in recvbuf datatype of elements in recvbuf Which process collects the data Communicator of Processes Return value What is Basic

39 Collective Communication GATHER Collects individual messages from each process at the root process INTEGER :: irecv(3) INTEGER :: isend isend=myrank+1 CALL _GATHER(isend, 1, _INTEGER, irecv, 3, _INTEGER, 2, _COMM_WORLD, ierror) What is Basic

40 Collective Communication REDUCE Reduces data from all processes to one structure via data reduction operation CALL _REDUCE(sendbuf, recvbuf, count, datatype, op, root, communicator, ierror) sendbuf recvbuf count datatype op root comm ierror First element of send data buffer First element of receive data buffer Number of elements to be send datatype of elements in buffer reduction operation Which process receives the data Communicator of Processes data will be reduced from Return value What is Basic

41 Collective Communication REDUCE Reduces data from all processes to one structure via data reduction operation CALL _REDUCE(sendbuf, recvbuf, count, datatype, op, root, communicator, ierror) Reduction Operations: Write SUM the sum PROD the product MAX the maximum MIN the minimum What is Basic of sendbuf of all processes to recvbuf of process root.

42 Collective Communication REDUCE Reduces data from all processes to one structure via data reduction operation CALL _REDUCE(sendbuf, recvbuf, 1, _INTEGER, _SUM, 1, _COMM_WORLD, ierror) What is Basic

43 Collective Communication ALLREDUCE / ALLGATHER Same as REDUCE and GATHER but result is stored on every process CALL _ALLREDUCE(sendbuf, recvbuf, count, datatype, op, communicator, ierror) What is Basic

44 Collective Communication ALLTOALL Send a distinct message from each process to every process. Process i sends j-th block of sendbuffer to process j. Process j stores data from process i in i-th block of recvbuffer. CALL _ALLTOALL(sendbuf, sendcnt, sendtype, recvbuf, recvcnt, recvtype, comm, ierror) sendbuf sendcount sendtype recvbuf recvcount recvtype comm ierror First element of data to be send Number of elements to be send to each process Data type Starting adress of data to be received Number of elements received from any process Data type Communicator of Processes data will be send to Return value What is Basic

45 Collective Communication ALLTOALL Send a distinct message from each process to every process. Process i sends j-th block of sendbuffer to process j. Process j stores data from process i in i-th block of recvbuffer. CALL _ALLTOALL(sendbuf, 3, _INTEGER, recvbuf, 3, _INTEGER, _COMM_WORLD, ierror) What is Basic

46 Point to Point Communication SEND, RECV Used to send data to one specific process CALL _SEND(buffer, count, datatype, dest, tag, communicator, ierror) buffer count datatype dest tag comm ierror First element of data to be send Number of elements in buffer datatype of elements in buffer Rank of destination process Message tag. Any positiv integer number. Communicator of Processes data will be send to Return value What is Basic

47 Point to Point Communication SEND, RECV Used to receive data from one specific process CALL _RECV(buffer, count, datatype, source, tag, communicator, status, ierror) buffer count datatype source tag comm status ierror Received data will be stored here Number of elements to be received datatype of elements in buffer Rank of sending process Tag of the message or ANY TAG Communicator of Processes data will be send to Integer( STATUS SIZE) Return value What is Basic

48 Point to Point Communication SEND, RECV Send an INTEGER value from Proc0 to Proc1 INTEGER :: B INTEGER, DIMENSION(_STATUS_SIZE) :: status... B=0 IF(myRank==0) THEN B=42 CALL _SEND(B, 1, _INTEGER, 1, 0, _COMM_WORLD, ierror) ELSE IF (myrank==1) THEN CALL _RECV(B, 1, _INTEGER, 0, _ANY_TAG,_COMM_WORLD, status, ierror) END IF... What is Basic

49 Point to Point Communication SENDRECV Send and receive data in a combined operation CALL _SENDRECV(sendbuf,sendcount,sendtype,dest,sendtag, recvbuf,recvcount,recvtype,source,recvtag, comm,status,ierror) Parameter of SEND What is Basic

50 Point to Point Communication SENDRECV Send and receive data in a combined operation CALL _SENDRECV(sendbuf,sendcount,sendtype,dest,sendtag, recvbuf,recvcount,recvtype,source,recvtag, comm,status,ierror) Parameter of RECV What is Basic

51 Point to Point Communication SENDRECV Send and receive data in a combined operation src=myrank-1 dest=myrank+1 if (myrank==0) src=nprocs-1 if (myrank==nprocs-1) dest=0 call _SENDRECV(sendbuf, 1, _INTEGER, dest, 1, recvbuf, 1, _INTEGER, src, _ANY_TAG, _COMM_WORLD, stat, ierror) What is Basic

52 Point to Point Communication What happens on Send and Receive Send Receive Copy data into user buffer Call SEND copies user buffer into system buffer System buffer is send to receiving process Call RECV Receive data in system buffer copies data into user buffer Use data in user buffer What is Basic

53 Point to Point Communication Blocking/ Nonblocking Blocking: procedure blocks until copy to system buffer has completed Nonblocking : procedure returns immediately What is Basic all collective operations send, recv isend, irecv exceedingly effective with communication co-processor

54 Point to Point Communication IRECV / ISEND / WAIT / WAITALL _ISEND(buffer, count, datatype, dest, tag, communicator, request, ierror) _IRECV(buffer, count, datatype, source, tag, communicator, request, ierror) _WAIT(request, status, ierror) _WAITALL(count, request, status, ierror) request Communication handle status Integer( STATUS SIZE) WAIT blocks the m until the request-operation is finished. ISEND/ IRECV followed immediatly by WAIT is the same as a blocking operation. You can mix blocking with nonblocking operations. e.g.: send with ISEND and receive with RECV WAITALL waits for serveral ISEND/IRECV. request has to be an array then. What is Basic

55 Point to Point Communication IRECV / ISEND / WAIT / WAITALL left=myrank-1 right=myrank+1 if (myrank==0) left=nprocs-1 if (myrank==nprocs-1) right=0 call _IRECV(lRank, 1, _INTEGER, left, _ANY_TAG, _COMM_WORLD, req(1), ierror) call _IRECV(rRank, 1, _INTEGER, right, _ANY_TAG, _COMM_WORLD, req(2), ierror) call _ISEND(myRank, 1, _INTEGER, right, myrank, _COMM_WORLD, req(3), ierror) call _ISEND(myRank, 1, _INTEGER, left, myrank, _COMM_WORLD, req(4), ierror) call _WAITALL(4, req, stat, ierror) What is Basic MyRank= 0 left Rank= 3 right Rank= 1 MyRank= 1 left Rank= 0 right Rank= 2 MyRank= 2 left Rank= 1 right Rank= 3 MyRank= 3 left Rank= 2 right Rank= 0

56 Point to Point Communication Bidirectional Communication Three cases of bidirectional communication Both processes call send routine first, then receive Both processes call receive routine first, then send One process calls receive first then send and the other calls in opposite order What is Basic

57 Point to Point Communication First Send then Receive Blocking Program is returning from send when the copy is finished. Can cause deadlocks, if system buffer is smaller than sendbuffer Nonblocking Both processes: ISEND(.., req,...), RECV, WAIT(..,req,..) free from deadlock What is Basic

58 Point to Point Communication First Receive then Send Blocking Blocking receive returns when copy from system buffer completed Deadlock on any machine Nonblocking Both processes: IRECV(.., req,...), SEND, WAIT(..,req,..) free from deadlock What is Basic

59 Point to Point Communication Receive / Send and Send / Receive Always save Use blocking or non-blocking subroutines Considering performance and the avoidance of deadlocks, use if (myrank==0) then call _ISEND(sendbuf,...,ireq1,...) call _IRECV(recvbuf,...,ireq2,...) elseif (myrank==1) then call _IRECV(recvbuf,...,ireq1,...) call _ISEND(sendbuf,...,ireq2,...) endif call _WAIT(ireq1,...) call _WAIT(ireq2,...) What is Basic

60 Whenever possible use collective operations Mind the succession of Send and Recvs Don t reuse buffer, until operation is finished Don t mix nonblocking point to point with collective operations Care about scalabillity IF (myrank==0) THEN _SEND(myrank,..,myrank+1,..) _RECV(sum,..,nproc-1,..) PRINT *,"The sum is ",sum ELSE IF (myrank==nproc-1) THEN _RECV(sum,..,myrank-1,..) sum=sum+myrank _SEND(sum,..,0,..) ELSE _RECV(sum,..,myrank-1,..) sum=sum+myrank _SEND(sum,..,myrank+1,..) END IF CALL _REDUCE(myrank, sum,..., _SUM, 0,...) IF (myrank==0) PRINT *, The sum is,sum What is Basic

61 Whenever possible use collective operations Mind the succession of Send and Recvs Don t reuse buffer, until operation is finished Don t mix nonblocking point to point with collective operations Care about scalabillity CPU0 _ISEND(..,cpu1,..,tag1,..) _ISEND(..,cpu1,..,tag2,..) _ISEND(..,cpu1,..,tag3,..) _ISEND(..,cpu1,..,tag4,..) CPU1 _RECV(..,cpu0,..,tag4,..) _RECV(..,cpu0,..,tag3,..) _RECV(..,cpu0,..,tag2,..) _RECV(..,cpu0,..,tag1,..) What is Basic Will allocate huge amount of memory. System crash possible.

62 Whenever possible use collective operations Mind the succession of Send and Recvs Don t reuse buffer, until operation is finished Don t mix nonblocking point to point with collective operations Care about scalabillity _ISEND(buffer,..,req,..)... buffer(0)=something... _WAIT(req,ierror) _IRECV(buffer,..,req,..)... z=buffer(0)... _WAIT(req,ierror) What is Basic Results in data race and wrong results.

63 Whenever possible use collective operations Mind the succession of Send and Recvs Don t reuse buffer, until operation is finished Don t mix nonblocking point to point with collective operations Care about scalabillity CPU0 _ISEND(buffer,..,req,..)... _BARRIER(..)... _WAIT(req,ierror) CPU1 _RECV(buffer,...)... _BARRIER(..) What is Basic On the ragged edge of legality. Will your survive it?

64 Whenever possible use collective operations Mind the succession of Send and Recvs Don t reuse buffer, until operation is finished Don t mix nonblocking point to point with collective operations Care about scalabillity CPU1 to CPUn _SEND(buffer,..,CPU0,..) CPU0 do i=1,n _RECV(buffer,..,CPUi,..) What is Basic Not scalable! Don t write such code.

65 Parallelization Strategy Speed up Increase fraction of that can be parallelized Balance the workload. Sometimes hard but worth it. Minimize communication time by decreasing amount of data transmitted decreasing number of times data is transmitted Minimize cache misses in loops What is Basic

66 Decrease amount of data transmitted What is Basic

67 Decrease number of times data is transmitted What is Basic prefere domain with contiguous boundary copy uncontiguous memory to contiguous buffer and send it in one operation use derived data type, but it may be not optimal

68 Minimize cache misses do j=jstart, jend do i=1, n a(i,j)=b(i,j)+c(i,j) end do end do What is Basic minimize cache misses.

69 Minimize cache misses do j=jstart, jend do i=1, n a(i,j)=b(i,j)+c(i,j) end do end do do j=1, n do i=istart, iend a(i,j)=b(i-1,j)+b(i+1,j) end do end do What is Basic minimize cache misses. but, communication latency is much larger than memory latency.

70 Virtual topologies CART CREATE Makes a new communicator to which topology information of an cartesian grid has been attached. _CART_CREATE ( comm, dim, procperdim, periods, reorder, newcomm, ierror ) comm dim procperdim periods reorder newcomm ierror Input communicator Number of dimensions of cartesian grid Number of processes in each dimension, integer(dim) Grid is periodic (true) or not (false), bool(dim) Ranking of processes may be reordered (true) or not (false) New communicator Return value What is Basic

71 Virtual topologies CART CREATE Makes a new communicator to which topology information of an cartesian grid has been attached. procs=(/4,2,3/) periods=(/1,1,1/) _CART_CREATE (_COMM_WORLD, 3, procs, periods, &.true., comm3d, ierror ) What is Basic

72 Virtual topologies CART SHIFT Returns shifted source and destination ranks. _CART_SHIFT (comm, direction, displ, source, dest, ierror ) comm direction displ source dest ierror Communicator with cartesian structure Coordinate dimension of shift Displacement(>0: upwards, <0 downwards) Rank of source process Rank of destination process Return value What is Basic If the dimension specified by direction is non-periodic, off-end shifts result in the value PROC NULL.

73 Virtual topologies CART COORDS Translates process rank in a communicator into cartesian coordinates. _CART_COORDS(comm,rank,maxdims,coords,ierror); comm Communicator with cartesian structure. rank Rank of process within group comm. dim Number of dimensions of cartesian grid. coords Cartesian coordinates of the process. ierror Return value What is Basic

74 Virtual topologies Example: Dimension=3, 24 Processes call _Cart_create(_COMM_WORLD, dim, nproc, periods, 0, & comm3d, ierror) call _Cart_coords(comm3d, myrank, dim, coords,ierror) call _Cart_rank(comm3d, coords, rank3d, ierror) call _Cart_shift(comm3d, 0, 1, left,right,ierror) call _Cart_shift(comm3d, 1, 1, down, up,ierror) call _Cart_shift(comm3d, 2, 1, south,north,ierror) I am proc 13: coords(1)= 2 I am proc 13: coords(2)= 0 I am proc 13: coords(3)= 1 I am proc 13: left = 7 I am proc 13: right = 19 I am proc 13: down = 16 I am proc 13: up = 16 I am proc 13: north = 14 I am proc 13: south = 12 What is Basic

75 analysis tool MPE Method 0 copy to buffer irecv, isend, waitall What is Basic

76 analysis tool MPE Method 1 copy to buffer (irecv,isend / isend,irecv), waitall What is Basic

77 analysis tool MPE Method 2 copy to buffer sendrecv What is Basic

78 racoon II Scaling of a magnetohydrodynamic turbulence simulation Speedup: how much faster is the computation on x processors than on 256 Speedup mixed hard/weak scaling of driven compmhd3d 1/256 x 512 3, BS , BS , BS , BS 32 What is Basic # cores

Introduction to MPI, the Message Passing Library

Introduction to MPI, the Message Passing Library Chapter 3, p. 1/57 Basics of Basic Messages -To-? Introduction to, the Message Passing Library School of Engineering Sciences Computations for Large-Scale Problems I Chapter 3, p. 2/57 Outline Basics of