Parallel Computing MPI. Christoph Beetz. September 7, Parallel Computing. Introduction. Parallel Computing

Size: px
Start display at page:

Download "Parallel Computing MPI. Christoph Beetz. September 7, Parallel Computing. Introduction. Parallel Computing"

Transcription

1 Christoph Beetz Theoretical Physics I, Ruhr-Universität Bochum September 7, 2010 What is Basic

2 Overview to code What is Basic

3 Why Calculation is too big to fit in memory of one machine domain on several machines takes too much time with only one CPU speed up with lots of CPU s What is Basic

4 Parallel Computer Architectures SMP: Symmetric Multiprocessor uses shared system resources What is Basic

5 Parallel Computer Architectures MPP: Massively Parallel Processors consists of nodes connected by a high-speed network What is Basic

6 Parallel Computer Architectures Mixed SMP and MPP consists of nodes with several cores What is Basic

7 Parallel Computer Architectures Hardware (Examples) Name CPUs CPU/Node Mem/Node Paddy GB portable Euler GB DaVinci x 516 GPUs/Node BlueGene / P GB RoadRunner x16 GB Opteron + Cell Jaguar x6 16 GB What is Basic

8 Software SMP Multithreaded ms What is Basic

9 Software SMP Multithreaded ms Posix-Threads ( Threads) OpenMP ( What is Basic

10 Software MPP Local serial s with communication What is Basic

11 Software MPP Local serial s with communication High Performance Fortran ( (main focus of this lecture) CH Open What is Basic

12 Heat transfer Problem Solve heat transfer equation in 2 dimensions on a cartesian grid with periodic boundary conditions: tu = a u (1) What is Basic

13 Heat transfer Serial heat implicit none integer, parameter :: mx=32, my=32 real(kind=8) :: u(0:mx+1,0:my+1) real(kind=8) :: unp1(0:mx+1,0:my+1) integer :: i,j,ic real(kind=8) :: x,y,xb,xe,yb,ye,dx,dy real(kind=8) :: ax,ay,pi,time,dt ax = 1 ay = 1 pi = 4.*atan(1.d0) dt = ! critical value dt = xb = 0.d0 xe = 1 yb = 0.d0 ye = 1 dx = (xe-xb)/mx dy = (ye-yb)/my write(*, (a,e12.6) ) cfl =, dt/min(dx,dy)**2 u=0. do j = -1,my+1 y = yb + j*dy do i = -1,mx+1 x = xb + i*dx if ((x-0.5)**2+(y-0.5)**2.lt. 0.2) then u(i,j)= 1 end if end do end do time = 0. ic = 0 call VTKout(u,mx,my) c --> simple Euler step do while (time.le. 1000*dt) c--> heat equation unp1(1:mx,1:my) = u(1:mx,1:my) & +dt*((u(2:mx+1,1:my)-2.*u(1:mx,1:my) & +u(0:mx-1,1:my))/dx**2 & +(u(1:mx,2:my+1)-2.*u(1:mx,1:my) & +u(1:mx,0:my-1))/dy**2) call boundary(unp1,mx,my) u = unp1 time = time+dt ic = ic+1 c write(*,*) time =,time if (ic > 10) then ic = 0 call VTKout(u,mx,my) end if end do return end subroutine boundary(u,mx,my) implicit none integer :: mx,my real(kind=8) :: u(0:mx+1,0:my+1) integer :: i,j do i=1,mx u(i,0 ) = u(i,my) u(i,my+1) = u(i,1 ) end do do j=0,my+1 u(0,j ) = u(mx,j) u(mx+1,j) = u(1,j) end do return end What is Basic

14 Serial The Grid What is Basic real(kind=8) :: u(0:mx+1,0:my+1)

15 Serial 1.Step: Computation What is Basic & & unp1(1:mx,1:my) = u(1:mx,1:my) +dt*((u(2:mx+1,1:my)-2.*u(1:mx,1:my)+u(0:mx-1,1:my))/dx**2 +(u(1:mx,2:my+1)-2.*u(1:mx,1:my)+u(1:mx,0:my-1))/dy**2)

16 Serial 2.Step: Boundary Exchange What is Basic call boundary(unp1,mx,my)

17 Serial Parallel The Grid mpirun -np 4 heatmpi call _INIT(ierror) call _COMM_SIZE(_COMM_WORLD, & nprocs,ierror) call _COMM_RANK(_COMM_WORLD, & myrank,ierror) What is Basic

18 Parallel The Grid! GLOBAL FIELD DIMENSIONS integer, parameter :: nx=1024, ny=1024! LOCAL FIELD DIMENSIONS integer :: mx,my nprocsxy(1)=int(sqrt(dble(nprocs))) nprocsxy(2)=nprocs/nprocsxy(2) if mod(nx,nprocsxy(1).ne.0) then stop mx=nx/nprocsxy(1) my=ny/nprocsxy(2) What is Basic fields must be allocatable now. Size (1:mx,1:my) depends on number of processes.

19 Parallel 1.Step: Computation For each process exactly the same computation as in serial case. What is Basic & & unp1(1:mx,1:my) = u(1:mx,1:my) +dt*((u(2:mx+1,1:my)-2.*u(1:mx,1:my)+u(0:mx-1,1:my))/dx**2 +(u(1:mx,2:my+1)-2.*u(1:mx,1:my)+u(1:mx,0:my-1))/dy**2)

20 Parallel 2.Step: Boundary Exchange What is Basic absolutely necessary! Each process has to send to his upper and lower neighbour in each direction. In one direction the data to be send is not contiguous in memory.

21 What is is a tool to consolidate what parallelization has seperated a library for communication between processes available for Fortran / C / C++ portable What is Basic

22 Basic Over 150 subroutines Only need a dozen to pararellize Classification Environment Management Collective Data Transfer Point to Point Data Transfer... What is Basic

23 Environment Management A minimal in C++ #include <mpi.h> void main(int* argc, char*** argv) { _Init(&argc, &argv); int nprocs,myrank; _Comm_size(_COMM_WORLD, &nprocs); _Comm_rank(_COMM_WORLD, &myrank); if (myrank==0) { std::cout << nproc << processes started. << std::endl; } std::cout << Hello from process << myrank << std::endl; _Finalize(); } What is Basic

24 Environment Management A minimal in C++ #include <mpi.h> void main(int* argc, char*** argv) { _Init(&argc, &argv); int nprocs,myrank; _Comm_size(_COMM_WORLD, &nprocs); _Comm_rank(_COMM_WORLD, &myrank); if (myrank==0) { std::cout << nproc << processes started. << std::endl; } std::cout << Hello from process << myrank << std::endl; _Finalize(); } What is Basic mpi.h provides parameter like COMM WORLD and variable types like INTEGER

25 Environment Management A minimal in C++ #include <mpi.h> void main(int* argc, char*** argv) { _Init(&argc, &argv); int nprocs,myrank; _Comm_size(_COMM_WORLD, &nprocs); _Comm_rank(_COMM_WORLD, &myrank); if (myrank==0) { std::cout << nproc << processes started. << std::endl; } std::cout << Hello from process << myrank << std::endl; _Finalize(); } What is Basic Init has to be called once and only once, before calling any other -subroutine to initialize the environment.

26 Environment Management A minimal in C++ #include <mpi.h> void main(int* argc, char*** argv) { _Init(&argc, &argv); int nprocs,myrank; _Comm_size(_COMM_WORLD, &nprocs); _Comm_rank(_COMM_WORLD, &myrank); if (myrank==0) { std::cout << nproc << processes started. << std::endl; } std::cout << Hello from process << myrank << std::endl; _Finalize(); } What is Basic In C++ all parameters have to be passed as pointers.

27 Environment Management A minimal in C++ #include <mpi.h> void main(int* argc, char*** argv) { _Init(&argc, &argv); int nprocs,myrank; _Comm_size(_COMM_WORLD, &nprocs); _Comm_rank(_COMM_WORLD, &myrank); if (myrank==0) { std::cout << nproc << processes started. << std::endl; } std::cout << Hello from process << myrank << std::endl; _Finalize(); } What is Basic With Communicators you can define groups of processes. COMM WORLD is the Communicator for all processes.

28 Environment Management A minimal in C++ #include <mpi.h> void main(int* argc, char*** argv) { _Init(&argc, &argv); int nprocs,myrank; _Comm_size(_COMM_WORLD, &nprocs); _Comm_rank(_COMM_WORLD, &myrank); if (myrank==0) { std::cout << nproc << processes started. << std::endl; } std::cout << Hello from process << myrank << std::endl; _Finalize(); } What is Basic Comm size returns number of processes in variable nprocs.

29 Environment Management A minimal in C++ #include <mpi.h> void main(int* argc, char*** argv) { _Init(&argc, &argv); int nprocs,myrank; _Comm_size(_COMM_WORLD, &nprocs); _Comm_rank(_COMM_WORLD, &myrank); if (myrank==0) { std::cout << nproc << processes started. << std::endl; } std::cout << Hello from process << myrank << std::endl; _Finalize(); } What is Basic Comm rank returns the identification number of the process in variable myrank.

30 Environment Management A minimal in C++ #include <mpi.h> void main(int* argc, char*** argv) { _Init(&argc, &argv); int nprocs,myrank; _Comm_size(_COMM_WORLD, &nprocs); _Comm_rank(_COMM_WORLD, &myrank); if (myrank==0) { std::cout << nproc << processes started. << std::endl; } std::cout << Hello from process << myrank << std::endl; _Finalize(); } What is Basic All processes run the same code. Different behaviour can be achived by using the rank of the processes.

31 Environment Management A minimal in C++ #include <mpi.h> void main(int* argc, char*** argv) { _Init(&argc, &argv); int nprocs,myrank; _Comm_size(_COMM_WORLD, &nprocs); _Comm_rank(_COMM_WORLD, &myrank); if (myrank==0) { std::cout << nproc << processes started. << std::endl; } std::cout << Hello from process << myrank << std::endl; _Finalize(); } What is Basic Finalize terminates the environment.

32 Environment Management A minimal in C++ #include <mpi.h> void main(int* argc, char*** argv) { _Init(&argc, &argv); int nprocs,myrank; _Comm_size(_COMM_WORLD, &nprocs); _Comm_rank(_COMM_WORLD, &myrank); if (myrank==0) { std::cout << nproc << processes started. << std::endl; } std::cout << Hello from process << myrank << std::endl; _Finalize(); } What is Basic Sample Output: 2 processes started Hello from process 1 Hello from process 0

33 Environment Management A minimal in Fortran PROGRAM hello IMPLICIT NONE include mpif.h INTEGER ierror, nprocs, myrank CALL _INIT(ierror) CALL _COMM_SIZE(_COMM_WORLD, nprocs, ierror) CALL _COMM_RANK(_COMM_WORLD, myrank, ierror) IF (myrank==0) PRINT *,nprocs, processes started PRINT *, Hello from process,myrank CALL _FINALIZE(ierror) END PROGRAM hello What is Basic use Fortran include file.

34 Environment Management A minimal in Fortran PROGRAM hello IMPLICIT NONE include mpif.h INTEGER ierror, nprocs, myrank CALL _INIT(ierror) CALL _COMM_SIZE(_COMM_WORLD, nprocs, ierror) CALL _COMM_RANK(_COMM_WORLD, myrank, ierror) IF (myrank==0) PRINT *,nprocs, processes started PRINT *, Hello from process,myrank CALL _FINALIZE(ierror) END PROGRAM hello What is Basic Return code is given in the last argument. Note, that the wrong number of arguments will result in segmentation fault during run-time.

35 Environment Management BARRIER Blocks all processes until the last one calls this routine. CALL _BARRIER(communicator, ierror) Can be used to serialize parallel s. e.g. to write in a file in order of processes. DO (i=0,nprocs) IF (i==myrank) THEN do some IO-Routines END IF CALL _BARRIER(_COMM_WORLD,ierror) END DO What is Basic

36 Collective Communication BROADCAST Broadcasts data from one process to all processes in the communicator. CALL _BCAST(buffer, count, datatype, root, communicator, ierror) buffer count datatype root comm ierror First element of send / receive data Number of elements to be send datatype of elements in buffer Which process sends the data Communicator of Processes data will be send to Return value What is Basic

37 Collective Communication BROADCAST Broadcasts data from one process to all processes in the communicator. IF (myrank==0) buffer=42 ELSE buffer=0 CALL _BCAST(buffer, 1, _INTEGER, 0, _COMM_WORLD, ierror) What is Basic

38 Collective Communication GATHER Collects individual messages from each process at the root process CALL _GATHER(sendbuf, sendcnt, sendtype, recvbuf, recvcnt, recvtype, root, comm, ierror) sendbuf sendcnt sendtype recvbuf recvcnt recvtype root comm ierror First element of send data Number of elements to be send datatype of elements in sendbuf Buffer, where data is collected. Cannot overlap sendbuf Number of elements in recvbuf datatype of elements in recvbuf Which process collects the data Communicator of Processes Return value What is Basic

39 Collective Communication GATHER Collects individual messages from each process at the root process INTEGER :: irecv(3) INTEGER :: isend isend=myrank+1 CALL _GATHER(isend, 1, _INTEGER, irecv, 3, _INTEGER, 2, _COMM_WORLD, ierror) What is Basic

40 Collective Communication REDUCE Reduces data from all processes to one structure via data reduction operation CALL _REDUCE(sendbuf, recvbuf, count, datatype, op, root, communicator, ierror) sendbuf recvbuf count datatype op root comm ierror First element of send data buffer First element of receive data buffer Number of elements to be send datatype of elements in buffer reduction operation Which process receives the data Communicator of Processes data will be reduced from Return value What is Basic

41 Collective Communication REDUCE Reduces data from all processes to one structure via data reduction operation CALL _REDUCE(sendbuf, recvbuf, count, datatype, op, root, communicator, ierror) Reduction Operations: Write SUM the sum PROD the product MAX the maximum MIN the minimum What is Basic of sendbuf of all processes to recvbuf of process root.

42 Collective Communication REDUCE Reduces data from all processes to one structure via data reduction operation CALL _REDUCE(sendbuf, recvbuf, 1, _INTEGER, _SUM, 1, _COMM_WORLD, ierror) What is Basic

43 Collective Communication ALLREDUCE / ALLGATHER Same as REDUCE and GATHER but result is stored on every process CALL _ALLREDUCE(sendbuf, recvbuf, count, datatype, op, communicator, ierror) What is Basic

44 Collective Communication ALLTOALL Send a distinct message from each process to every process. Process i sends j-th block of sendbuffer to process j. Process j stores data from process i in i-th block of recvbuffer. CALL _ALLTOALL(sendbuf, sendcnt, sendtype, recvbuf, recvcnt, recvtype, comm, ierror) sendbuf sendcount sendtype recvbuf recvcount recvtype comm ierror First element of data to be send Number of elements to be send to each process Data type Starting adress of data to be received Number of elements received from any process Data type Communicator of Processes data will be send to Return value What is Basic

45 Collective Communication ALLTOALL Send a distinct message from each process to every process. Process i sends j-th block of sendbuffer to process j. Process j stores data from process i in i-th block of recvbuffer. CALL _ALLTOALL(sendbuf, 3, _INTEGER, recvbuf, 3, _INTEGER, _COMM_WORLD, ierror) What is Basic

46 Point to Point Communication SEND, RECV Used to send data to one specific process CALL _SEND(buffer, count, datatype, dest, tag, communicator, ierror) buffer count datatype dest tag comm ierror First element of data to be send Number of elements in buffer datatype of elements in buffer Rank of destination process Message tag. Any positiv integer number. Communicator of Processes data will be send to Return value What is Basic

47 Point to Point Communication SEND, RECV Used to receive data from one specific process CALL _RECV(buffer, count, datatype, source, tag, communicator, status, ierror) buffer count datatype source tag comm status ierror Received data will be stored here Number of elements to be received datatype of elements in buffer Rank of sending process Tag of the message or ANY TAG Communicator of Processes data will be send to Integer( STATUS SIZE) Return value What is Basic

48 Point to Point Communication SEND, RECV Send an INTEGER value from Proc0 to Proc1 INTEGER :: B INTEGER, DIMENSION(_STATUS_SIZE) :: status... B=0 IF(myRank==0) THEN B=42 CALL _SEND(B, 1, _INTEGER, 1, 0, _COMM_WORLD, ierror) ELSE IF (myrank==1) THEN CALL _RECV(B, 1, _INTEGER, 0, _ANY_TAG,_COMM_WORLD, status, ierror) END IF... What is Basic

49 Point to Point Communication SENDRECV Send and receive data in a combined operation CALL _SENDRECV(sendbuf,sendcount,sendtype,dest,sendtag, recvbuf,recvcount,recvtype,source,recvtag, comm,status,ierror) Parameter of SEND What is Basic

50 Point to Point Communication SENDRECV Send and receive data in a combined operation CALL _SENDRECV(sendbuf,sendcount,sendtype,dest,sendtag, recvbuf,recvcount,recvtype,source,recvtag, comm,status,ierror) Parameter of RECV What is Basic

51 Point to Point Communication SENDRECV Send and receive data in a combined operation src=myrank-1 dest=myrank+1 if (myrank==0) src=nprocs-1 if (myrank==nprocs-1) dest=0 call _SENDRECV(sendbuf, 1, _INTEGER, dest, 1, recvbuf, 1, _INTEGER, src, _ANY_TAG, _COMM_WORLD, stat, ierror) What is Basic

52 Point to Point Communication What happens on Send and Receive Send Receive Copy data into user buffer Call SEND copies user buffer into system buffer System buffer is send to receiving process Call RECV Receive data in system buffer copies data into user buffer Use data in user buffer What is Basic

53 Point to Point Communication Blocking/ Nonblocking Blocking: procedure blocks until copy to system buffer has completed Nonblocking : procedure returns immediately What is Basic all collective operations send, recv isend, irecv exceedingly effective with communication co-processor

54 Point to Point Communication IRECV / ISEND / WAIT / WAITALL _ISEND(buffer, count, datatype, dest, tag, communicator, request, ierror) _IRECV(buffer, count, datatype, source, tag, communicator, request, ierror) _WAIT(request, status, ierror) _WAITALL(count, request, status, ierror) request Communication handle status Integer( STATUS SIZE) WAIT blocks the m until the request-operation is finished. ISEND/ IRECV followed immediatly by WAIT is the same as a blocking operation. You can mix blocking with nonblocking operations. e.g.: send with ISEND and receive with RECV WAITALL waits for serveral ISEND/IRECV. request has to be an array then. What is Basic

55 Point to Point Communication IRECV / ISEND / WAIT / WAITALL left=myrank-1 right=myrank+1 if (myrank==0) left=nprocs-1 if (myrank==nprocs-1) right=0 call _IRECV(lRank, 1, _INTEGER, left, _ANY_TAG, _COMM_WORLD, req(1), ierror) call _IRECV(rRank, 1, _INTEGER, right, _ANY_TAG, _COMM_WORLD, req(2), ierror) call _ISEND(myRank, 1, _INTEGER, right, myrank, _COMM_WORLD, req(3), ierror) call _ISEND(myRank, 1, _INTEGER, left, myrank, _COMM_WORLD, req(4), ierror) call _WAITALL(4, req, stat, ierror) What is Basic MyRank= 0 left Rank= 3 right Rank= 1 MyRank= 1 left Rank= 0 right Rank= 2 MyRank= 2 left Rank= 1 right Rank= 3 MyRank= 3 left Rank= 2 right Rank= 0

56 Point to Point Communication Bidirectional Communication Three cases of bidirectional communication Both processes call send routine first, then receive Both processes call receive routine first, then send One process calls receive first then send and the other calls in opposite order What is Basic

57 Point to Point Communication First Send then Receive Blocking Program is returning from send when the copy is finished. Can cause deadlocks, if system buffer is smaller than sendbuffer Nonblocking Both processes: ISEND(.., req,...), RECV, WAIT(..,req,..) free from deadlock What is Basic

58 Point to Point Communication First Receive then Send Blocking Blocking receive returns when copy from system buffer completed Deadlock on any machine Nonblocking Both processes: IRECV(.., req,...), SEND, WAIT(..,req,..) free from deadlock What is Basic

59 Point to Point Communication Receive / Send and Send / Receive Always save Use blocking or non-blocking subroutines Considering performance and the avoidance of deadlocks, use if (myrank==0) then call _ISEND(sendbuf,...,ireq1,...) call _IRECV(recvbuf,...,ireq2,...) elseif (myrank==1) then call _IRECV(recvbuf,...,ireq1,...) call _ISEND(sendbuf,...,ireq2,...) endif call _WAIT(ireq1,...) call _WAIT(ireq2,...) What is Basic

60 Whenever possible use collective operations Mind the succession of Send and Recvs Don t reuse buffer, until operation is finished Don t mix nonblocking point to point with collective operations Care about scalabillity IF (myrank==0) THEN _SEND(myrank,..,myrank+1,..) _RECV(sum,..,nproc-1,..) PRINT *,"The sum is ",sum ELSE IF (myrank==nproc-1) THEN _RECV(sum,..,myrank-1,..) sum=sum+myrank _SEND(sum,..,0,..) ELSE _RECV(sum,..,myrank-1,..) sum=sum+myrank _SEND(sum,..,myrank+1,..) END IF CALL _REDUCE(myrank, sum,..., _SUM, 0,...) IF (myrank==0) PRINT *, The sum is,sum What is Basic

61 Whenever possible use collective operations Mind the succession of Send and Recvs Don t reuse buffer, until operation is finished Don t mix nonblocking point to point with collective operations Care about scalabillity CPU0 _ISEND(..,cpu1,..,tag1,..) _ISEND(..,cpu1,..,tag2,..) _ISEND(..,cpu1,..,tag3,..) _ISEND(..,cpu1,..,tag4,..) CPU1 _RECV(..,cpu0,..,tag4,..) _RECV(..,cpu0,..,tag3,..) _RECV(..,cpu0,..,tag2,..) _RECV(..,cpu0,..,tag1,..) What is Basic Will allocate huge amount of memory. System crash possible.

62 Whenever possible use collective operations Mind the succession of Send and Recvs Don t reuse buffer, until operation is finished Don t mix nonblocking point to point with collective operations Care about scalabillity _ISEND(buffer,..,req,..)... buffer(0)=something... _WAIT(req,ierror) _IRECV(buffer,..,req,..)... z=buffer(0)... _WAIT(req,ierror) What is Basic Results in data race and wrong results.

63 Whenever possible use collective operations Mind the succession of Send and Recvs Don t reuse buffer, until operation is finished Don t mix nonblocking point to point with collective operations Care about scalabillity CPU0 _ISEND(buffer,..,req,..)... _BARRIER(..)... _WAIT(req,ierror) CPU1 _RECV(buffer,...)... _BARRIER(..) What is Basic On the ragged edge of legality. Will your survive it?

64 Whenever possible use collective operations Mind the succession of Send and Recvs Don t reuse buffer, until operation is finished Don t mix nonblocking point to point with collective operations Care about scalabillity CPU1 to CPUn _SEND(buffer,..,CPU0,..) CPU0 do i=1,n _RECV(buffer,..,CPUi,..) What is Basic Not scalable! Don t write such code.

65 Parallelization Strategy Speed up Increase fraction of that can be parallelized Balance the workload. Sometimes hard but worth it. Minimize communication time by decreasing amount of data transmitted decreasing number of times data is transmitted Minimize cache misses in loops What is Basic

66 Decrease amount of data transmitted What is Basic

67 Decrease number of times data is transmitted What is Basic prefere domain with contiguous boundary copy uncontiguous memory to contiguous buffer and send it in one operation use derived data type, but it may be not optimal

68 Minimize cache misses do j=jstart, jend do i=1, n a(i,j)=b(i,j)+c(i,j) end do end do What is Basic minimize cache misses.

69 Minimize cache misses do j=jstart, jend do i=1, n a(i,j)=b(i,j)+c(i,j) end do end do do j=1, n do i=istart, iend a(i,j)=b(i-1,j)+b(i+1,j) end do end do What is Basic minimize cache misses. but, communication latency is much larger than memory latency.

70 Virtual topologies CART CREATE Makes a new communicator to which topology information of an cartesian grid has been attached. _CART_CREATE ( comm, dim, procperdim, periods, reorder, newcomm, ierror ) comm dim procperdim periods reorder newcomm ierror Input communicator Number of dimensions of cartesian grid Number of processes in each dimension, integer(dim) Grid is periodic (true) or not (false), bool(dim) Ranking of processes may be reordered (true) or not (false) New communicator Return value What is Basic

71 Virtual topologies CART CREATE Makes a new communicator to which topology information of an cartesian grid has been attached. procs=(/4,2,3/) periods=(/1,1,1/) _CART_CREATE (_COMM_WORLD, 3, procs, periods, &.true., comm3d, ierror ) What is Basic

72 Virtual topologies CART SHIFT Returns shifted source and destination ranks. _CART_SHIFT (comm, direction, displ, source, dest, ierror ) comm direction displ source dest ierror Communicator with cartesian structure Coordinate dimension of shift Displacement(>0: upwards, <0 downwards) Rank of source process Rank of destination process Return value What is Basic If the dimension specified by direction is non-periodic, off-end shifts result in the value PROC NULL.

73 Virtual topologies CART COORDS Translates process rank in a communicator into cartesian coordinates. _CART_COORDS(comm,rank,maxdims,coords,ierror); comm Communicator with cartesian structure. rank Rank of process within group comm. dim Number of dimensions of cartesian grid. coords Cartesian coordinates of the process. ierror Return value What is Basic

74 Virtual topologies Example: Dimension=3, 24 Processes call _Cart_create(_COMM_WORLD, dim, nproc, periods, 0, & comm3d, ierror) call _Cart_coords(comm3d, myrank, dim, coords,ierror) call _Cart_rank(comm3d, coords, rank3d, ierror) call _Cart_shift(comm3d, 0, 1, left,right,ierror) call _Cart_shift(comm3d, 1, 1, down, up,ierror) call _Cart_shift(comm3d, 2, 1, south,north,ierror) I am proc 13: coords(1)= 2 I am proc 13: coords(2)= 0 I am proc 13: coords(3)= 1 I am proc 13: left = 7 I am proc 13: right = 19 I am proc 13: down = 16 I am proc 13: up = 16 I am proc 13: north = 14 I am proc 13: south = 12 What is Basic

75 analysis tool MPE Method 0 copy to buffer irecv, isend, waitall What is Basic

76 analysis tool MPE Method 1 copy to buffer (irecv,isend / isend,irecv), waitall What is Basic

77 analysis tool MPE Method 2 copy to buffer sendrecv What is Basic

78 racoon II Scaling of a magnetohydrodynamic turbulence simulation Speedup: how much faster is the computation on x processors than on 256 Speedup mixed hard/weak scaling of driven compmhd3d 1/256 x 512 3, BS , BS , BS , BS 32 What is Basic # cores

Introduction to MPI, the Message Passing Library

Introduction to MPI, the Message Passing Library Chapter 3, p. 1/57 Basics of Basic Messages -To-? Introduction to, the Message Passing Library School of Engineering Sciences Computations for Large-Scale Problems I Chapter 3, p. 2/57 Outline Basics of

More information

Masterpraktikum - Scientific Computing, High Performance Computing

Masterpraktikum - Scientific Computing, High Performance Computing Masterpraktikum - Scientific Computing, High Performance Computing Message Passing Interface (MPI) Thomas Auckenthaler Wolfgang Eckhardt Technische Universität München, Germany Outline Hello World P2P

More information

Masterpraktikum - Scientific Computing, High Performance Computing

Masterpraktikum - Scientific Computing, High Performance Computing Masterpraktikum - Scientific Computing, High Performance Computing Message Passing Interface (MPI) and CG-method Michael Bader Alexander Heinecke Technische Universität München, Germany Outline MPI Hello

More information

HPC Parallel Programing Multi-node Computation with MPI - I

HPC Parallel Programing Multi-node Computation with MPI - I HPC Parallel Programing Multi-node Computation with MPI - I Parallelization and Optimization Group TATA Consultancy Services, Sahyadri Park Pune, India TCS all rights reserved April 29, 2013 Copyright

More information

Topic Notes: Message Passing Interface (MPI)

Topic Notes: Message Passing Interface (MPI) Computer Science 400 Parallel Processing Siena College Fall 2008 Topic Notes: Message Passing Interface (MPI) The Message Passing Interface (MPI) was created by a standards committee in the early 1990

More information

Introduction to MPI. May 20, Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign

Introduction to MPI. May 20, Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign Introduction to MPI May 20, 2013 Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign Top500.org PERFORMANCE DEVELOPMENT 1 Eflop/s 162 Pflop/s PROJECTED 100 Pflop/s

More information

Advanced Message-Passing Interface (MPI)

Advanced Message-Passing Interface (MPI) Outline of the workshop 2 Advanced Message-Passing Interface (MPI) Bart Oldeman, Calcul Québec McGill HPC Bart.Oldeman@mcgill.ca Morning: Advanced MPI Revision More on Collectives More on Point-to-Point

More information

Message Passing Interface: Basic Course

Message Passing Interface: Basic Course Overview of DM- HPC2N, UmeåUniversity, 901 87, Sweden. April 23, 2015 Table of contents Overview of DM- 1 Overview of DM- Parallelism Importance Partitioning Data Distributed Memory Working on Abisko 2

More information

Message Passing Interface. most of the slides taken from Hanjun Kim

Message Passing Interface. most of the slides taken from Hanjun Kim Message Passing Interface most of the slides taken from Hanjun Kim Message Passing Pros Scalable, Flexible Cons Someone says it s more difficult than DSM MPI (Message Passing Interface) A standard message

More information

MPI Collective communication

MPI Collective communication MPI Collective communication CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) MPI Collective communication Spring 2018 1 / 43 Outline 1 MPI Collective communication

More information

Message Passing Interface

Message Passing Interface Message Passing Interface DPHPC15 TA: Salvatore Di Girolamo DSM (Distributed Shared Memory) Message Passing MPI (Message Passing Interface) A message passing specification implemented

More information

Parallel Programming

Parallel Programming Parallel Programming for Multicore and Cluster Systems von Thomas Rauber, Gudula Rünger 1. Auflage Parallel Programming Rauber / Rünger schnell und portofrei erhältlich bei beck-shop.de DIE FACHBUCHHANDLUNG

More information

Agenda. MPI Application Example. Praktikum: Verteiltes Rechnen und Parallelprogrammierung Introduction to MPI. 1) Recap: MPI. 2) 2.

Agenda. MPI Application Example. Praktikum: Verteiltes Rechnen und Parallelprogrammierung Introduction to MPI. 1) Recap: MPI. 2) 2. Praktikum: Verteiltes Rechnen und Parallelprogrammierung Introduction to MPI Agenda 1) Recap: MPI 2) 2. Übungszettel 3) Projektpräferenzen? 4) Nächste Woche: 3. Übungszettel, Projektauswahl, Konzepte 5)

More information

Message Passing Interface - MPI

Message Passing Interface - MPI Message Passing Interface - MPI Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico October 24, 2011 Many slides adapted from lectures by

More information

MPI: Parallel Programming for Extreme Machines. Si Hammond, High Performance Systems Group

MPI: Parallel Programming for Extreme Machines. Si Hammond, High Performance Systems Group MPI: Parallel Programming for Extreme Machines Si Hammond, High Performance Systems Group Quick Introduction Si Hammond, (sdh@dcs.warwick.ac.uk) WPRF/PhD Research student, High Performance Systems Group,

More information

Message Passing Interface

Message Passing Interface MPSoC Architectures MPI Alberto Bosio, Associate Professor UM Microelectronic Departement bosio@lirmm.fr Message Passing Interface API for distributed-memory programming parallel code that runs across

More information

High-Performance Computing: MPI (ctd)

High-Performance Computing: MPI (ctd) High-Performance Computing: MPI (ctd) Adrian F. Clark: alien@essex.ac.uk 2015 16 Adrian F. Clark: alien@essex.ac.uk High-Performance Computing: MPI (ctd) 2015 16 1 / 22 A reminder Last time, we started

More information

ECE 574 Cluster Computing Lecture 13

ECE 574 Cluster Computing Lecture 13 ECE 574 Cluster Computing Lecture 13 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 21 March 2017 Announcements HW#5 Finally Graded Had right idea, but often result not an *exact*

More information

Practical Introduction to Message-Passing Interface (MPI)

Practical Introduction to Message-Passing Interface (MPI) 1 Outline of the workshop 2 Practical Introduction to Message-Passing Interface (MPI) Bart Oldeman, Calcul Québec McGill HPC Bart.Oldeman@mcgill.ca Theoretical / practical introduction Parallelizing your

More information

Introduction to MPI. Ricardo Fonseca. https://sites.google.com/view/rafonseca2017/

Introduction to MPI. Ricardo Fonseca. https://sites.google.com/view/rafonseca2017/ Introduction to MPI Ricardo Fonseca https://sites.google.com/view/rafonseca2017/ Outline Distributed Memory Programming (MPI) Message Passing Model Initializing and terminating programs Point to point

More information

Distributed Memory Systems: Part IV

Distributed Memory Systems: Part IV Chapter 5 Distributed Memory Systems: Part IV Max Planck Institute Magdeburg Jens Saak, Scientific Computing II 293/342 The Message Passing Interface is a standard for creation of parallel programs using

More information

Programming with MPI Collectives

Programming with MPI Collectives Programming with MPI Collectives Jan Thorbecke Type to enter text Delft University of Technology Challenge the future Collectives Classes Communication types exercise: BroadcastBarrier Gather Scatter exercise:

More information

MPI. (message passing, MIMD)

MPI. (message passing, MIMD) MPI (message passing, MIMD) What is MPI? a message-passing library specification extension of C/C++ (and Fortran) message passing for distributed memory parallel programming Features of MPI Point-to-point

More information

Introduction to Lab Series DMS & MPI

Introduction to Lab Series DMS & MPI TDDC 78 Labs: Memory-based Taxonomy Introduction to Lab Series DMS & Mikhail Chalabine Linköping University Memory Lab(s) Use Distributed 1 Shared 2 3 Posix threads OpenMP Distributed 4 2011 LAB 5 (tools)

More information

Outline. Communication modes MPI Message Passing Interface Standard. Khoa Coâng Ngheä Thoâng Tin Ñaïi Hoïc Baùch Khoa Tp.HCM

Outline. Communication modes MPI Message Passing Interface Standard. Khoa Coâng Ngheä Thoâng Tin Ñaïi Hoïc Baùch Khoa Tp.HCM THOAI NAM Outline Communication modes MPI Message Passing Interface Standard TERMs (1) Blocking If return from the procedure indicates the user is allowed to reuse resources specified in the call Non-blocking

More information

Review of MPI Part 2

Review of MPI Part 2 Review of MPI Part Russian-German School on High Performance Computer Systems, June, 7 th until July, 6 th 005, Novosibirsk 3. Day, 9 th of June, 005 HLRS, University of Stuttgart Slide Chap. 5 Virtual

More information

Praktikum: Verteiltes Rechnen und Parallelprogrammierung Introduction to MPI

Praktikum: Verteiltes Rechnen und Parallelprogrammierung Introduction to MPI Praktikum: Verteiltes Rechnen und Parallelprogrammierung Introduction to MPI Agenda 1) MPI für Java Installation OK? 2) 2. Übungszettel Grundidee klar? 3) Projektpräferenzen? 4) Nächste Woche: 3. Übungszettel,

More information

Collective Communication: Gatherv. MPI v Operations. root

Collective Communication: Gatherv. MPI v Operations. root Collective Communication: Gather MPI v Operations A Gather operation has data from all processes collected, or gathered, at a central process, referred to as the root Even the root process contributes

More information

Lecture 9: MPI continued

Lecture 9: MPI continued Lecture 9: MPI continued David Bindel 27 Sep 2011 Logistics Matrix multiply is done! Still have to run. Small HW 2 will be up before lecture on Thursday, due next Tuesday. Project 2 will be posted next

More information

Parallel programming MPI

Parallel programming MPI Parallel programming MPI Distributed memory Each unit has its own memory space If a unit needs data in some other memory space, explicit communication (often through network) is required Point-to-point

More information

Non-Blocking Communications

Non-Blocking Communications Non-Blocking Communications Deadlock 1 5 2 3 4 Communicator 0 2 Completion The mode of a communication determines when its constituent operations complete. - i.e. synchronous / asynchronous The form of

More information

Outline. Communication modes MPI Message Passing Interface Standard

Outline. Communication modes MPI Message Passing Interface Standard MPI THOAI NAM Outline Communication modes MPI Message Passing Interface Standard TERMs (1) Blocking If return from the procedure indicates the user is allowed to reuse resources specified in the call Non-blocking

More information

Collective Communication in MPI and Advanced Features

Collective Communication in MPI and Advanced Features Collective Communication in MPI and Advanced Features Pacheco s book. Chapter 3 T. Yang, CS240A. Part of slides from the text book, CS267 K. Yelick from UC Berkeley and B. Gropp, ANL Outline Collective

More information

More about MPI programming. More about MPI programming p. 1

More about MPI programming. More about MPI programming p. 1 More about MPI programming More about MPI programming p. 1 Some recaps (1) One way of categorizing parallel computers is by looking at the memory configuration: In shared-memory systems, the CPUs share

More information

Non-Blocking Communications

Non-Blocking Communications Non-Blocking Communications Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Lecture 7: More about MPI programming. Lecture 7: More about MPI programming p. 1

Lecture 7: More about MPI programming. Lecture 7: More about MPI programming p. 1 Lecture 7: More about MPI programming Lecture 7: More about MPI programming p. 1 Some recaps (1) One way of categorizing parallel computers is by looking at the memory configuration: In shared-memory systems

More information

CSE 613: Parallel Programming. Lecture 21 ( The Message Passing Interface )

CSE 613: Parallel Programming. Lecture 21 ( The Message Passing Interface ) CSE 613: Parallel Programming Lecture 21 ( The Message Passing Interface ) Jesmin Jahan Tithi Department of Computer Science SUNY Stony Brook Fall 2013 ( Slides from Rezaul A. Chowdhury ) Principles of

More information

Recap of Parallelism & MPI

Recap of Parallelism & MPI Recap of Parallelism & MPI Chris Brady Heather Ratcliffe The Angry Penguin, used under creative commons licence from Swantje Hess and Jannis Pohlmann. Warwick RSE 13/12/2017 Parallel programming Break

More information

The MPI Message-passing Standard Practical use and implementation (V) SPD Course 6/03/2017 Massimo Coppola

The MPI Message-passing Standard Practical use and implementation (V) SPD Course 6/03/2017 Massimo Coppola The MPI Message-passing Standard Practical use and implementation (V) SPD Course 6/03/2017 Massimo Coppola Intracommunicators COLLECTIVE COMMUNICATIONS SPD - MPI Standard Use and Implementation (5) 2 Collectives

More information

COMP 322: Fundamentals of Parallel Programming. Lecture 34: Introduction to the Message Passing Interface (MPI), contd

COMP 322: Fundamentals of Parallel Programming. Lecture 34: Introduction to the Message Passing Interface (MPI), contd COMP 322: Fundamentals of Parallel Programming Lecture 34: Introduction to the Message Passing Interface (MPI), contd Vivek Sarkar, Eric Allen Department of Computer Science, Rice University Contact email:

More information

Collective Communication: Gather. MPI - v Operations. Collective Communication: Gather. MPI_Gather. root WORKS A OK

Collective Communication: Gather. MPI - v Operations. Collective Communication: Gather. MPI_Gather. root WORKS A OK Collective Communication: Gather MPI - v Operations A Gather operation has data from all processes collected, or gathered, at a central process, referred to as the root Even the root process contributes

More information

MPI - v Operations. Collective Communication: Gather

MPI - v Operations. Collective Communication: Gather MPI - v Operations Based on notes by Dr. David Cronk Innovative Computing Lab University of Tennessee Cluster Computing 1 Collective Communication: Gather A Gather operation has data from all processes

More information

Standard MPI - Message Passing Interface

Standard MPI - Message Passing Interface c Ewa Szynkiewicz, 2007 1 Standard MPI - Message Passing Interface The message-passing paradigm is one of the oldest and most widely used approaches for programming parallel machines, especially those

More information

What s in this talk? Quick Introduction. Programming in Parallel

What s in this talk? Quick Introduction. Programming in Parallel What s in this talk? Parallel programming methodologies - why MPI? Where can I use MPI? MPI in action Getting MPI to work at Warwick Examples MPI: Parallel Programming for Extreme Machines Si Hammond,

More information

COMP 322: Fundamentals of Parallel Programming

COMP 322: Fundamentals of Parallel Programming COMP 322: Fundamentals of Parallel Programming https://wiki.rice.edu/confluence/display/parprog/comp322 Lecture 37: Introduction to MPI (contd) Vivek Sarkar Department of Computer Science Rice University

More information

Distributed Memory Programming with MPI

Distributed Memory Programming with MPI Distributed Memory Programming with MPI Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna moreno.marzolla@unibo.it Algoritmi Avanzati--modulo 2 2 Credits Peter Pacheco,

More information

CS 179: GPU Programming. Lecture 14: Inter-process Communication

CS 179: GPU Programming. Lecture 14: Inter-process Communication CS 179: GPU Programming Lecture 14: Inter-process Communication The Problem What if we want to use GPUs across a distributed system? GPU cluster, CSIRO Distributed System A collection of computers Each

More information

Introduction to MPI Part II Collective Communications and communicators

Introduction to MPI Part II Collective Communications and communicators Introduction to MPI Part II Collective Communications and communicators Andrew Emerson, Fabio Affinito {a.emerson,f.affinito}@cineca.it SuperComputing Applications and Innovation Department Collective

More information

MPI 5. CSCI 4850/5850 High-Performance Computing Spring 2018

MPI 5. CSCI 4850/5850 High-Performance Computing Spring 2018 MPI 5 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives

More information

First day. Basics of parallel programming. RIKEN CCS HPC Summer School Hiroya Matsuba, RIKEN CCS

First day. Basics of parallel programming. RIKEN CCS HPC Summer School Hiroya Matsuba, RIKEN CCS First day Basics of parallel programming RIKEN CCS HPC Summer School Hiroya Matsuba, RIKEN CCS Today s schedule: Basics of parallel programming 7/22 AM: Lecture Goals Understand the design of typical parallel

More information

High Performance Computing

High Performance Computing High Performance Computing Course Notes 2009-2010 2010 Message Passing Programming II 1 Communications Point-to-point communications: involving exact two processes, one sender and one receiver For example,

More information

Introduction to MPI. HY555 Parallel Systems and Grids Fall 2003

Introduction to MPI. HY555 Parallel Systems and Grids Fall 2003 Introduction to MPI HY555 Parallel Systems and Grids Fall 2003 Outline MPI layout Sending and receiving messages Collective communication Datatypes An example Compiling and running Typical layout of an

More information

Introduction to MPI: Part II

Introduction to MPI: Part II Introduction to MPI: Part II Pawel Pomorski, University of Waterloo, SHARCNET ppomorsk@sharcnetca November 25, 2015 Summary of Part I: To write working MPI (Message Passing Interface) parallel programs

More information

MPI - The Message Passing Interface

MPI - The Message Passing Interface MPI - The Message Passing Interface The Message Passing Interface (MPI) was first standardized in 1994. De facto standard for distributed memory machines. All Top500 machines (http://www.top500.org) are

More information

Distributed Systems + Middleware Advanced Message Passing with MPI

Distributed Systems + Middleware Advanced Message Passing with MPI Distributed Systems + Middleware Advanced Message Passing with MPI Gianpaolo Cugola Dipartimento di Elettronica e Informazione Politecnico, Italy cugola@elet.polimi.it http://home.dei.polimi.it/cugola

More information

Introduction to MPI part II. Fabio AFFINITO

Introduction to MPI part II. Fabio AFFINITO Introduction to MPI part II Fabio AFFINITO (f.affinito@cineca.it) Collective communications Communications involving a group of processes. They are called by all the ranks involved in a communicator (or

More information

Advanced MPI. Andrew Emerson

Advanced MPI. Andrew Emerson Advanced MPI Andrew Emerson (a.emerson@cineca.it) Agenda 1. One sided Communications (MPI-2) 2. Dynamic processes (MPI-2) 3. Profiling MPI and tracing 4. MPI-I/O 5. MPI-3 11/12/2015 Advanced MPI 2 One

More information

Basic MPI Communications. Basic MPI Communications (cont d)

Basic MPI Communications. Basic MPI Communications (cont d) Basic MPI Communications MPI provides two non-blocking routines: MPI_Isend(buf,cnt,type,dst,tag,comm,reqHandle) buf: source of data to be sent cnt: number of data elements to be sent type: type of each

More information

For developers. If you do need to have all processes write e.g. debug messages, you d then use channel 12 (see below).

For developers. If you do need to have all processes write e.g. debug messages, you d then use channel 12 (see below). For developers A. I/O channels in SELFE You need to exercise caution when dealing with parallel I/O especially for writing. For writing outputs, you d generally let only 1 process do the job, e.g. if(myrank==0)

More information

UNIVERSITY OF MORATUWA

UNIVERSITY OF MORATUWA UNIVERSITY OF MORATUWA FACULTY OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING B.Sc. Engineering 2012 Intake Semester 8 Examination CS4532 CONCURRENT PROGRAMMING Time allowed: 2 Hours March

More information

NUMERICAL PARALLEL COMPUTING

NUMERICAL PARALLEL COMPUTING Lecture 5, March 23, 2012: The Message Passing Interface http://people.inf.ethz.ch/iyves/pnc12/ Peter Arbenz, Andreas Adelmann Computer Science Dept, ETH Zürich E-mail: arbenz@inf.ethz.ch Paul Scherrer

More information

Introduction to the Message Passing Interface (MPI)

Introduction to the Message Passing Interface (MPI) Introduction to the Message Passing Interface (MPI) CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Introduction to the Message Passing Interface (MPI) Spring 2018

More information

Data parallelism. [ any app performing the *same* operation across a data stream ]

Data parallelism. [ any app performing the *same* operation across a data stream ] Data parallelism [ any app performing the *same* operation across a data stream ] Contrast stretching: Version Cores Time (secs) Speedup while (step < NumSteps &&!converged) { step++; diffs = 0; foreach

More information

Introduzione al Message Passing Interface (MPI) Andrea Clematis IMATI CNR

Introduzione al Message Passing Interface (MPI) Andrea Clematis IMATI CNR Introduzione al Message Passing Interface (MPI) Andrea Clematis IMATI CNR clematis@ge.imati.cnr.it Ack. & riferimenti An Introduction to MPI Parallel Programming with the Message Passing InterfaceWilliam

More information

MPI MESSAGE PASSING INTERFACE

MPI MESSAGE PASSING INTERFACE MPI MESSAGE PASSING INTERFACE David COLIGNON CÉCI - Consortium des Équipements de Calcul Intensif http://hpc.montefiore.ulg.ac.be Outline Introduction From serial source code to parallel execution MPI

More information

Optimization of MPI Applications Rolf Rabenseifner

Optimization of MPI Applications Rolf Rabenseifner Optimization of MPI Applications Rolf Rabenseifner University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de Optimization of MPI Applications Slide 1 Optimization and Standardization

More information

mpi4py HPC Python R. Todd Evans January 23, 2015

mpi4py HPC Python R. Todd Evans January 23, 2015 mpi4py HPC Python R. Todd Evans rtevans@tacc.utexas.edu January 23, 2015 What is MPI Message Passing Interface Most useful on distributed memory machines Many implementations, interfaces in C/C++/Fortran

More information

CS 6230: High-Performance Computing and Parallelization Introduction to MPI

CS 6230: High-Performance Computing and Parallelization Introduction to MPI CS 6230: High-Performance Computing and Parallelization Introduction to MPI Dr. Mike Kirby School of Computing and Scientific Computing and Imaging Institute University of Utah Salt Lake City, UT, USA

More information

INTRODUCTION TO MPI COMMUNICATORS AND VIRTUAL TOPOLOGIES

INTRODUCTION TO MPI COMMUNICATORS AND VIRTUAL TOPOLOGIES INTRODUCTION TO MPI COMMUNICATORS AND VIRTUAL TOPOLOGIES Introduction to Parallel Computing with MPI and OpenMP 24 november 2017 a.marani@cineca.it WHAT ARE COMMUNICATORS? Many users are familiar with

More information

The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing

The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Parallelism Decompose the execution into several tasks according to the work to be done: Function/Task

More information

Slides prepared by : Farzana Rahman 1

Slides prepared by : Farzana Rahman 1 Introduction to MPI 1 Background on MPI MPI - Message Passing Interface Library standard defined by a committee of vendors, implementers, and parallel programmers Used to create parallel programs based

More information

Parallel Programming, MPI Lecture 2

Parallel Programming, MPI Lecture 2 Parallel Programming, MPI Lecture 2 Ehsan Nedaaee Oskoee 1 1 Department of Physics IASBS IPM Grid and HPC workshop IV, 2011 Outline 1 Point-to-Point Communication Non Blocking PTP Communication 2 Collective

More information

Welcome to the introductory workshop in MPI programming at UNICC

Welcome to the introductory workshop in MPI programming at UNICC Welcome...... to the introductory workshop in MPI programming at UNICC Schedule: 08.00-12.00 Hard work and a short coffee break Scope of the workshop: We will go through the basics of MPI-programming and

More information

INTRODUCTION TO MPI VIRTUAL TOPOLOGIES

INTRODUCTION TO MPI VIRTUAL TOPOLOGIES INTRODUCTION TO MPI VIRTUAL TOPOLOGIES Introduction to Parallel Computing with MPI and OpenMP 18-19-20 november 2013 a.marani@cineca.it g.muscianisi@cineca.it l.ferraro@cineca.it VIRTUAL TOPOLOGY Topology:

More information

Message Passing Interface - MPI

Message Passing Interface - MPI Message Passing Interface - MPI Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico March 31, 2016 Many slides adapted from lectures by Bill

More information

L15: Putting it together: N-body (Ch. 6)!

L15: Putting it together: N-body (Ch. 6)! Outline L15: Putting it together: N-body (Ch. 6)! October 30, 2012! Review MPI Communication - Blocking - Non-Blocking - One-Sided - Point-to-Point vs. Collective Chapter 6 shows two algorithms (N-body

More information

Part - II. Message Passing Interface. Dheeraj Bhardwaj

Part - II. Message Passing Interface. Dheeraj Bhardwaj Part - II Dheeraj Bhardwaj Department of Computer Science & Engineering Indian Institute of Technology, Delhi 110016 India http://www.cse.iitd.ac.in/~dheerajb 1 Outlines Basics of MPI How to compile and

More information

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 Message passing vs. Shared memory Client Client Client Client send(msg) recv(msg) send(msg) recv(msg) MSG MSG MSG IPC Shared

More information

An Introduction to MPI

An Introduction to MPI An Introduction to MPI Parallel Programming with the Message Passing Interface William Gropp Ewing Lusk Argonne National Laboratory 1 Outline Background The message-passing model Origins of MPI and current

More information

CS4961 Parallel Programming. Lecture 16: Introduction to Message Passing 11/3/11. Administrative. Mary Hall November 3, 2011.

CS4961 Parallel Programming. Lecture 16: Introduction to Message Passing 11/3/11. Administrative. Mary Hall November 3, 2011. CS4961 Parallel Programming Lecture 16: Introduction to Message Passing Administrative Next programming assignment due on Monday, Nov. 7 at midnight Need to define teams and have initial conversation with

More information

Advanced Parallel Programming

Advanced Parallel Programming Advanced Parallel Programming Networks and All-to-All communication David Henty, Joachim Hein EPCC The University of Edinburgh Overview of this Lecture All-to-All communications MPI_Alltoall MPI_Alltoallv

More information

Chip Multiprocessors COMP Lecture 9 - OpenMP & MPI

Chip Multiprocessors COMP Lecture 9 - OpenMP & MPI Chip Multiprocessors COMP35112 Lecture 9 - OpenMP & MPI Graham Riley 14 February 2018 1 Today s Lecture Dividing work to be done in parallel between threads in Java (as you are doing in the labs) is rather

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms http://sudalabissu-tokyoacjp/~reiji/pna16/ [ 5 ] MPI: Message Passing Interface Parallel Numerical Algorithms / IST / UTokyo 1 PNA16 Lecture Plan General Topics 1 Architecture

More information

Parallel Programming Using MPI

Parallel Programming Using MPI Parallel Programming Using MPI Short Course on HPC 15th February 2019 Aditya Krishna Swamy adityaks@iisc.ac.in SERC, Indian Institute of Science When Parallel Computing Helps? Want to speed up your calculation

More information

CEE 618 Scientific Parallel Computing (Lecture 5): Message-Passing Interface (MPI) advanced

CEE 618 Scientific Parallel Computing (Lecture 5): Message-Passing Interface (MPI) advanced 1 / 32 CEE 618 Scientific Parallel Computing (Lecture 5): Message-Passing Interface (MPI) advanced Albert S. Kim Department of Civil and Environmental Engineering University of Hawai i at Manoa 2540 Dole

More information

PARALLEL AND DISTRIBUTED COMPUTING

PARALLEL AND DISTRIBUTED COMPUTING PARALLEL AND DISTRIBUTED COMPUTING 2013/2014 1 st Semester 1 st Exam January 7, 2014 Duration: 2h00 - No extra material allowed. This includes notes, scratch paper, calculator, etc. - Give your answers

More information

Programming Scalable Systems with MPI. Clemens Grelck, University of Amsterdam

Programming Scalable Systems with MPI. Clemens Grelck, University of Amsterdam Clemens Grelck University of Amsterdam UvA / SurfSARA High Performance Computing and Big Data Course June 2014 Parallel Programming with Compiler Directives: OpenMP Message Passing Gentle Introduction

More information

Parallel Computing Paradigms

Parallel Computing Paradigms Parallel Computing Paradigms Message Passing João Luís Ferreira Sobral Departamento do Informática Universidade do Minho 31 October 2017 Communication paradigms for distributed memory Message passing is

More information

Practical Course Scientific Computing and Visualization

Practical Course Scientific Computing and Visualization July 5, 2006 Page 1 of 21 1. Parallelization Architecture our target architecture: MIMD distributed address space machines program1 data1 program2 data2 program program3 data data3.. program(data) program1(data1)

More information

MA471. Lecture 5. Collective MPI Communication

MA471. Lecture 5. Collective MPI Communication MA471 Lecture 5 Collective MPI Communication Today: When all the processes want to send, receive or both Excellent website for MPI command syntax available at: http://www-unix.mcs.anl.gov/mpi/www/ 9/10/2003

More information

Cluster Computing MPI. Industrial Standard Message Passing

Cluster Computing MPI. Industrial Standard Message Passing MPI Industrial Standard Message Passing MPI Features Industrial Standard Highly portable Widely available SPMD programming model Synchronous execution MPI Outer scope int MPI_Init( int *argc, char ** argv)

More information

MPI Optimisation. Advanced Parallel Programming. David Henty, Iain Bethune, Dan Holmes EPCC, University of Edinburgh

MPI Optimisation. Advanced Parallel Programming. David Henty, Iain Bethune, Dan Holmes EPCC, University of Edinburgh MPI Optimisation Advanced Parallel Programming David Henty, Iain Bethune, Dan Holmes EPCC, University of Edinburgh Overview Can divide overheads up into four main categories: Lack of parallelism Load imbalance

More information

Acknowledgments. Programming with MPI Basic send and receive. A Minimal MPI Program (C) Contents. Type to enter text

Acknowledgments. Programming with MPI Basic send and receive. A Minimal MPI Program (C) Contents. Type to enter text Acknowledgments Programming with MPI Basic send and receive Jan Thorbecke Type to enter text This course is partly based on the MPI course developed by Rolf Rabenseifner at the High-Performance Computing-Center

More information

Programming with MPI Basic send and receive

Programming with MPI Basic send and receive Programming with MPI Basic send and receive Jan Thorbecke Type to enter text Delft University of Technology Challenge the future Acknowledgments This course is partly based on the MPI course developed

More information

MPI Tutorial. Shao-Ching Huang. High Performance Computing Group UCLA Institute for Digital Research and Education

MPI Tutorial. Shao-Ching Huang. High Performance Computing Group UCLA Institute for Digital Research and Education MPI Tutorial Shao-Ching Huang High Performance Computing Group UCLA Institute for Digital Research and Education Center for Vision, Cognition, Learning and Art, UCLA July 15 22, 2013 A few words before

More information

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 8 Matrix-vector Multiplication Chapter Objectives Review matrix-vector multiplication Propose replication of vectors Develop three

More information

Advanced MPI. Andrew Emerson

Advanced MPI. Andrew Emerson Advanced MPI Andrew Emerson (a.emerson@cineca.it) Agenda 1. One sided Communications (MPI-2) 2. Dynamic processes (MPI-2) 3. Profiling MPI and tracing 4. MPI-I/O 5. MPI-3 22/02/2017 Advanced MPI 2 One

More information

Parallel Programming. Matrix Decomposition Options (Matrix-Vector Product)

Parallel Programming. Matrix Decomposition Options (Matrix-Vector Product) Parallel Programming Matrix Decomposition Options (Matrix-Vector Product) Matrix Decomposition Sequential algorithm and its complexity Design, analysis, and implementation of three parallel programs using

More information

A few words about MPI (Message Passing Interface) T. Edwald 10 June 2008

A few words about MPI (Message Passing Interface) T. Edwald 10 June 2008 A few words about MPI (Message Passing Interface) T. Edwald 10 June 2008 1 Overview Introduction and very short historical review MPI - as simple as it comes Communications Process Topologies (I have no

More information

Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 8

Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 8 Chapter 8 Matrix-vector Multiplication Chapter Objectives Review matrix-vector multiplicaiton Propose replication of vectors Develop three parallel programs, each based on a different data decomposition

More information