ASTROPHYSIKALISCHES INSTITUT POTSDAM AIP. Helmholtz school. Introduction to MPI. Stefan Gottlöber
|
|
- Gordon Campbell
- 6 years ago
- Views:
Transcription
1 ASTROPHYSIKALISCHES INSTITUT POTSDAM AIP Helmholtz school Introduction to MPI Stefan Gottlöber 1
2 Topics Basics of parallel programming Calculation of π (an example for the basic structure of MPI programs and the possible combination with OpenMP) Direct integration (an example for message passing in MPI programs and for the scaling of MPI programs developing this MPI program could be your exercise during the first week) ART-MPI (an example for some more elaborated programs) Potsdam, July
3 Modern methods in science Numerical simulations are used to proof or disproof observations because experiments are impossible (astrophysics) are too expensive are too time consuming... Potsdam, July
4 Methods of parallelization OpenMP MPI needs computer with shared memory (JUMP at NIC, 107 Gb, NASA s COLUMBIA) works on distributed memory (in general more memory available) Potsdam, July
5 What is OpenMP? If you don t know it already you will learn about OpenMP tomorrow during Anatoly s lecture. Potsdam, July
6 What is MPI? What is MPI? Message Passing Interface: libraries, designed to be a standard for parallel computing on distributed memory. Goal: to be practical, portable, efficient, and flexible MPI history 1980s - early 1990s: distributed memory parallel computing develops, need for a standard arose April 1992: Workshop on Standards for Message Passing in a Distributed Memory Environment November 1992: Meeting in Minneapolis, MPI draft proposal (MPI1) November 1993: Supercomputing 93, draft MPI standard 1995: MPI1 standard 1997: MPI2 standard Potsdam, July
7 OpenMP vs MPI: Overview Potsdam, July
8 How to install MPI? The MPI home page is maintained at Argonne National Laboratory. Standards, archives, documentation and links to implementations are available. MPI is a library of subroutines for Fortran functions for C classes and methods for C++ Potsdam, July
9 How to install MPI? User programs are compiled as usual and then linked with the appropriate MPI libraries. Implementations are: MPICH ( is available from Argonne National Laboratory. It is free and easily downloaded and can be installed at the user level (i.e., without superuser privileges). Subroutines are provided for Fortran 90, C and C++. The CH in MPICH stands for Chameleon, symbol of adaptability to one s environment and thus of portability. Chameleons are fast, and from the beginning a secondary goal was to give up as little efficiency as possible for the portability. LAM/MPI ( is available from Indiana University. LAM stands for Local Area Multicomputer. WMPI II ( is a commercial (but free to academics) implementation for Windows. Potsdam, July
10 Which tasks can be parallized by MPI Trivial parallel programs parameter studies analysis of many time steps image processing Independent tasks Nbody interaction halo finding and treatment density field (smoothed) Problems in both cases Are the tasks scalable over many CPUs? Over how many? Load balance (Do all CPUs work, do many lie idle?) Potsdam, July
11 Examples: Trivial parallel programs calculation of π by different methods (one CPU rivals the others: which method is faster or more accurate) this little task is an excellent exercise for the combination of OpenMP and MPI Evolution of many clusters of galaxies (Hitachi project: 8 nodes with 8 processors on each node, 8 MPI processes, each with 8 OpenMP threads Potsdam, July
12 IMPLICIT REAL*8 (A-H,O-Z) IMPLICIT INTEGER*4 (I-N) include mpif.h Calculation of π N = pii = ! Num. Rec. p. 914 CALL mpi_init(ierr) CALL mpi_comm_size(mpi_comm_world, msize, ierr ) CALL mpi_comm_rank(mpi_comm_world, mrank, ierr ) CALL Calc_PI(pi,N, mrank)... CALL mpi_finalize(ierr) end SUBROUTINE Calc_PI(pi,N, mrank)... Potsdam, July
13 on mpif.h /* -*- Mode: Fortran; -*- */!! (C) 2001 by Argonne National Laboratory.! See COPYRIGHT in top-level directory.!! DO NOT EDIT! This file created by buildiface! INTEGER MPI_SOURCE, MPI_TAG, MPI_ERROR PARAMETER (MPI_SOURCE=3,MPI_TAG=4,MPI_ERROR=5)... Your compiler will see that file if you have the right environment: source /opt/env/pgi-mpich sh Potsdam, July
14 mpi init CALL mpi_init(ierr) Initializes the MPI execution environment. This function must be called in every MPI program, must be called before any other MPI functions and must be called only once in an MPI program. Potsdam, July
15 mpi comm size CALL mpi_comm_size(mpi_comm_world, msize, ierr ) Determines the number of processes msize in the group associated with a communicator. Generally used within the communicator MPI COMM WORLD to determine the number of processes being used by your application. Potsdam, July
16 What is MPI COMM WORLD? MPI uses objects called communicators and groups to define which collection of processes may communicate with each other. Most MPI routines require you to specify a communicator as an argument. MPI COMM WORLD is the predefined communicator which includes all of your MPI processes. Potsdam, July
17 MPI COMM WORLD extension Potsdam, July
18 mpi comm rank CALL mpi_comm_rank(mpi_comm_world, mrank, ierr ) Determines the rank mrank of the calling process within the communicator. Initially, each process will be assigned a unique integer rank between 0 and number of processors - 1 within the communicator MPI COMM WORLD. This rank is often referred to as a task ID. If a process becomes associated with other communicators, it will have a unique rank within each of these as well. Potsdam, July
19 CALL mpi_finalize(ierr) mpi finalize Terminates the MPI execution environment. This function should be the last MPI routine called in every MPI program - no other MPI routines may be called after it. Potsdam, July
20 include mpif.h Coming back to the calculation of π N = pii = ! Num. Rec. p. 914 CALL mpi_init(ierr) CALL mpi_comm_size(mpi_comm_world, msize, ierr ) CALL mpi_comm_rank(mpi_comm_world, mrank, ierr ) CALL Calc_PI(pi,N, mrank)... CALL mpi_finalize(ierr) end SUBROUTINE Calc_PI(pi,N, mrank) Now you can use on different processors different series to calculate π and check speed, convergence... Potsdam, July
21 Important note Don t use within MPI programs commands like the following: IF (error.gt. error_max) STOP increase accuracy One node would stop. During parallelization of your serial program you should replace such lines by something similar to IF (error.gt. error_max) THEN write(*,*) increase accuracy call mpi_abort(mpi_comm_world,ierr1,ierr2) STOP ENDIF which terminates all MPI processes associated with the communicator. In most MPI implementations it terminates ALL processes regardless of the communicator specified. Potsdam, July
22 Summary: structure of MPI programs Potsdam, July
23 first ART MPI code: Evolution of many clusters of galaxies c ==================================================== c c Adaptive Refinement Tree (ART) N-body solver c c Version 3 - February 1997 c c Andrey Kravtsov, Anatoly Klypin, Alexei Khokhlov c c ==================================================== c c this is a simple test version for MPI c changes only in ART_Main.f and ART_IO.f c program ART include mpif.h Potsdam, July
24 ... some more initialisation for ART CALL mpi_init(ierr) CALL mpi_comm_size(mpi_comm_world, isize, ierr ) CALL mpi_comm_rank(mpi_comm_world, irank, ierr )... the main ART program... read data... do loop n steps... integrate one step... decide whether results should be written to disk... enddo CALL mpi_finalize(ierr) STOP END Potsdam, July
25 c SUBROUTINE construct_name(name,in1,jn1) c c c purpose: construct file names for the output from different nodes c into different directories include mpif.h CHARACTER*120 name,tmp3 CHARACTER*5 tmp1 CHARACTER*1 tmp2 tmp1 = node_ tmp2 = / CALL mpi_comm_rank(mpi_comm_world, i_node, ierr ) tmp3 = name CALL get_name(tmp3,in1,jn1) write(name, (a,i1,a,a) ) tmp1,i_node,tmp2,tmp3(in1:jn1) CALL get_name(name,in1,jn1) write(*,*) name(in1:jn1) Potsdam, July
26 END That s all of changes! Each MPI process reads its own data (one cluster of galaxies) and integrates it completely independent of the others tasks. No communication. You will run into problems if there is any STOP in the code. Also here is nothing done concerning load balance, less massive clusters will finish earlier than more massive ones. Potsdam, July
27 Nbody code Examples: Independent tasks The interaction between any two particles does not depend on all the other particles. Straightforward parallelization (for example the direct integration code): more communication Parallelization of tasks in different sub-volumes (for example the MPI version of ART): less communications but problems with load balance Potsdam, July
28 Direct integration N particles position x velocity v move all particles to a new position after t use the leap-frog scheme calculate the movement for a subset of N p N/N CP U of the N CP U processors on each simple to parallize, however, all nodes need to know all positions and velocities (not really a disadvantage on present day computers with large memory) Potsdam, July
29 Leap frog scheme Define positions x and forces at time t, time step n. Define velocities v at time t + t 2, time step n Then we have for particle i x n+1 i = x n i + v n+1/2 i t (1) v n+1/2 i = v n 1/2 i + F i (x n i ) t/m (2) Potsdam, July
30 Initial conditions To start the integration, we need the initial position of all particles x and their velocities v at two separate times: x(t 0 ) and v(t 0 t/2). see Anatoly s lecture about initial conditions (PMstartM.f). Potsdam, July
31 Accuracy of the leap frog scheme x n+1 i = x n i + v n+1/2 i t (3) v n+1/2 i = v n 1/2 i + F i (x n i ) t/m (4) Substitute v n 1/2 i in the second equation using the first. v n+1/2 i = (x n i x n 1 i )/ t + F i (x n i ) t/m (5) Substitute back into first equation x n+1 i = x n i + (x n i x n 1 i ) + F i (x n i )( t) 2 /m (6) we get the central difference formula for F = ma. x n+1 i 2x n i + xn 1 i ) t 2 = F i (x n i )/m (7) Potsdam, July
32 Accuracy of the leap frog scheme Let us assume that X is the true solution. X n+1 i 2Xi n + Xn 1 i t 2 = F i (Xi n )/m + δ (8) Insert Taylor expansion for X n+1 i and X n 1 i, thus X n+1 i 2Xi n + X n 1 i = t 2d2 X dt 2 + t4 12 d 4 X +... (9) dt4 Substitute back and get the truncation error O( t 2 ), δ = t2 12 d 4 X +... (10) dt4 Potsdam, July
33 Consistency of the leap frog scheme As t 0 the difference equation converges to the differential equation: d 2 X dt 2 = F ( x)/m (11) and it is also a sympletic method (time symmetric). The scheme has the same accuracy for negative t. Potsdam, July
34 Truncation error vs. round-off error Truncation error can be reduced by smaller step t can be reduced by higher-order algorithm is not related to round-off error Round-off error representation of real numbers with finite number of bits can be reduced by higher precision (64 bit, REAL*8) can be reduced also by careful ordering of operations Potsdam, July
35 nbody par.f Reading by root Distribution of tasks Load balance N p particles per processor out of N particles Broadcast to all processors move particles on each processor distribute moved particles to all processors root writes to disk Potsdam, July
36 nbody par.f INTEGER Np_on_rank(maxrank+1)... CALL MPI_COMM_RANK( MPI_COMM_WORLD, mrank, ierr ) mroot=0 IF(mrank.eq. mroot) THEN... read the data ENDIF Np_per_process = N/msize Write first particle number and last for each processor on the integer array Np_on_rank(maxrank+1). Note, that in this construction Np_per_process * msize is not necessary equal N, thus the last CPU may get (much) more particles than the others = bad load balance. Potsdam, July
37 nbody par.f Now the root process has all necessary informations. The informations have to be distributed to all the others processors. Root has to tell them which processor has which tasks. = Message passing in systems with distributed memory Potsdam, July
38 Message passing Every processor has its own local memory which can be accessed directly only by its own CPU. We have to distribute data from root to all processors over the network. Potsdam, July
39 Message passing A synchronous send operation will complete only after acknowledgment that the message was safely received by the receiving process. Asynchronous send operations may complete even though the receiving process has not actually received the message. Potsdam, July
40 Point to Point Communication MPI_SEND (buf,count,datatype,dest,tag,comm,ierr) The basic blocking send operation returns only after the application buffer in the sending task is free for reuse. Note that this routine may be implemented differently on different systems. The MPI standard permits the use of a system buffer but does not require it. Some implementations may actually use a synchronous send (block longer until the destination process has started to receive the message) to implement the basic blocking send. Potsdam, July
41 Using a system buffer Potsdam, July
42 Point to Point Communication Buffer Count MPI_SEND (buf,count,datatype,dest,tag,comm,ierr) Address space which references the data that is to be sent or or received = variable name that is be sent/received number of data elements of the particular type to be sent Data Type MPI data type (next slide) Destination This argument indicates the process where the message should be delivered (rank of the receiving process). Potsdam, July
43 tag Arbitrary non-negative integer ( )assigned by the programmer to uniquely identify a message. Send and receive operations should match message tags. Communicator the predefined communicator MPI COMM WORLD is usually used Potsdam, July
44 Message passing - MPI data types MPI data types MPI INTEGER MPI REAL MPI DOUBLE PRECISION MPI COMPLEX MPI LOGICAL MPI CHARACTER MPI BYTE MPI PACKED Fortran data types INTEGER REAL DOUBLE PRECISION COMPLEX LOGICAL CHARACTER(1) 8 binary digits data (un)packed with MPI Pack (MPI Unpack) Potsdam, July
45 Point to Point Communication Source Status MPI_RECV (buf,count,datatype,source,tag,comm,status,ierr) This argument indicates the originating process of the message (rank of the sending process). This may be set to the wild card MPI ANY SOURCE to receive a message from any task. For a receive operation, indicates the source of the message and the tag of the message. In Fortran, it is an integer array of size MPI STATUS SIZE. Potsdam, July
46 nbody par.f Distribute data from root to all processors... CALL MPI_Bcast(Np,1,MPI_INTEGER,mroot,MPI_COMM_WORLD, ierr) CALL MPI_Bcast(dt,1,MPI_DOUBLE_PRECISION, + mroot,mpi_comm_world, ierr) nsend = 10*Nmax CALL MPI_Bcast(Coords,nsend,MPI_DOUBLE_PRECISION, + mroot,mpi_comm_world, ierr) CALL MPI_Bcast(Np_on_rank,maxrank, + MPI_INTEGER,mroot,MPI_COMM_WORLD, ierr)... where we have defined in the original serial program PARAMETER (Nmax =50000)! maximum number of particles REAL*8 Coords COMMON /MAINDATA/Coords(10,Nmax) Potsdam, July
47 MPI Bcast MPI_BCAST (buffer,count,datatype,root,comm,ierr) Broadcasts (sends) a message from the process with rank root to all other processes in the group. Potsdam, July
48 nbody par.f c Do i=1,nsteps Call GetAccelerations_NP Call MoveParticles time = time + dt istep= istep+ 1 distribute particles CALL Send_Receive()! main loop Distribute new positions and velocities after each time step to all processors. Each processor has to send data and to receive data from all other processors. Potsdam, July
49 MPI ALLGATHER Collect data from all tasks and distribute them to all tasks in a group. Each task in the group, in effect, performs a one-to-all broadcasting operation within the group. Potsdam, July
50 sendbuf MPI ALLGATHER MPI_ALLGATHER (sendbuf,sendcount,sendtype,recvbuf, recvcount,recvtype,comm,ierr) starting address of the send buffer (Fortran variable) sendcount sendtype recvbuf revcount number of data elements in the send buffer (integer) MPI data type address of the send buffer (Fortran variable) number of elements received from any process (integer) Potsdam, July
51 recvtype MPI data type ( = sendtype) Potsdam, July
52 MPI ALLGATHERV MPI ALLGATHERV extends the functionality of MPI ALLGATHER by allowing a varying count of data to be send from each process. MPI_ALLGATHERV (sendbuf,sendcount,sendtype,recvbuf, recvcounts,dipls,recvtype,comm,ierr) revcounts dipls integer array of length group size (msize) containing the number of elements that are received from each process integer array of length group size (msize). The entry i specifies the displacement (relative to recbuf) at which to place the incoming data from process i. Potsdam, July
53 subroutine Send Receive() c c distribute particles SUBROUTINE Send_Receive() c INCLUDE nbody_par.h INTEGER & REAL*8 sendcount,recvcount(msize), rdispl(msize) send(np),receive(np) Do i = 1, msize rdispl(i) = Np_on_rank(i) recvcount(i) = Np_on_rank(i+1) - Np_on_rank(i) ENDDO istart = Np_on_rank(mrank+1)+1 iend = Np_on_rank(mrank+2) sendcount = Np_on_rank(mrank+2) - Np_on_rank(mrank+1) Potsdam, July
54 Note: In our example already all processes know how many particles are handled by the different processors (information is stored in the array Np on rank). Thus each processor can calculate the amount of data which it receives and the corresponding displacement. In a more general case this information must be distributed before calling MPI ALLGATHER. Potsdam, July
55 DO k = 1, 10 DO i = 1,sendcount ii = istart -1 + i send(i) = Coords(k,ii) ENDDO CALL mpi_allgatherv(send, sendcount, MPI_REAL8, + receive, recvcount, rdispl, MPI_REAL8, + MPI_COMM_WORLD,ierr) CALL MPI_BARRIER(MPI_COMM_WORLD, ierrbar) DO i = 1,Np Coords(k,i) = ENDDO ENDDO RETURN End receive(i) Potsdam, July
56 MPI BARRIER CALL MPI_BARRIER(MPI_COMM_WORLD, ierrbar) Creates a barrier synchronization in a group. It blocks the calling process until all group members have called it; i.e. the call returns at any process only after all group members have entered the call. Potsdam, July
57 nbody par.f Do i=1,nsteps! main loop... if(mod(istep,1000).eq.0) then Open(ifile,file=log_file,position= append )... close(ifile) endif... It is useful to write informations (for example timing) for each processor into separate log-files constructed like: WRITE(a6, (I6) ) mrank log_file = DATA/timing_ //a6(5:6)//.log ifile = 100+mrank Potsdam, July
58 Scaling behavior on octopus Computation (black) and communication (gray) times This is only an example! 200 particles, steps 1 CPU: s 2 CPUs: s 4 CPUs: s 8 CPUs: s 16 CPUs: s ( 200/16 = 12.5, i.e ) too simple distribution of tasks Potsdam, July
59 Scaling behavior on octopus 2000 particles, time measured for integration steps processors time speedup efficiency particles per CPU s s in the original version s s s s s ( 1/ ) s ( 1/ ) speedup = sequential execution time parallel execution time (12) efficiency = sequential execution time processors used parallel execution time (13) Potsdam, July
60 test it! Parallelization and testing of the direct integration Nbody code will be your homework during the first week A serial version of the code is available at: Potsdam, July
61 Performance analysis Speedup ψ(n, p) for a problem of size n on p processors. We have three categories of operations: Computations that must be performed sequentially: σ(n) Computations that can be performed in parallel: ϕ(n) Parallel overhead (communication operations, redundant computations, load balance): κ(n, p) Then the speedup ψ(n, p) is ψ(n, p) σ(n) + ϕ(n) σ(n) + ϕ(n)/p + κ(n, p) (14) and the efficiency 0 ɛ(n, p) 1 is ɛ(n, p) σ(n) + ϕ(n) pσ(n) + ϕ(n) + pκ(n, p) (15) Potsdam, July
62 Amdahl s law Let us neglect the overhead κ(n, p) and define the inherently sequential portion f = σ(n) σ(n) + ϕ(n) (16) of the computation. Then the speedup on a parallel computer with p processors is (Amdahl s law) ψ 1 f + (1 f)/p (17) In particular interesting for estimation of the maximum speedup as p. Potsdam, July
63 Amdahl s law Anteil parallel = 1 f Potsdam, July
64 ART MPI Basic concept: To run the simulation using N MPI MPI-processes we divide the box into N MPI sub-boxes in such a way that all sub-boxes will need approximately the same amount of computational time for one integration step. Each MPIprocess uses N OMP CPUs within OpenMP, thus N CPU = N MPI N OMP. After each basic integration step the box is divided again into sub-boxes according to the best forecast of load balance. Input/output via parallel reading/writing of N MPI processors on N MPI files. The files contain for each primary particle 9 variables (3 coordinates, 3 velocities, mass, individual time step, particle id). Finding of sub-boxes is easy for the initial conditions where matter is distributed almost homogeneously almost impossible after structures have developed even more complicated for multi-mass realizations in the original box Potsdam, July
65 an artist s view of the ART MPI simulation box Potsdam, July
66 ART MPI Example for the load balance in the WMAP run (80h 1 Mpc box size, particles, 64 MPI processes, 512 CPU, done on COLUMBIA) Potsdam, July
67 ART MPI each sub-box is surrounded by a thin shell with primary particles m p more shells contain particles with increasing mass m 2 > m 1 > m p rest of the box is filled with most massive particles m b > m 2 Potsdam, July
68 ART MPI periodicity of the box is taken into account each sub-box runs one integration step of the multi-mass version of ART (tidal fields represented by the more massive particles) after each integration steps new subboxes are determined Potsdam, July
69 Main tasks for parallelization: ART MPI determine on each node which particles has to be send to which nodes (Fortran) construct the corresponding massive particles from the primary ones (Fortran) each node has to inform all others about the particles it wishes to send (MPI allgather) all nodes send to all nodes their particles (MPI alltoallv) Advantage: Advantage: less communications communications only after each basic integration step Potsdam, July
70 ART MPI Example: 5 sends only massive box particles m b to sends only massive box particles m b to 4 10 sends primary particles m p to 11 as well as massive ones, m 1, m 2, m b 5 sends primary particles m p to 4 as well as massive ones, m 1, m 2, m b Potsdam, July
71 ART MPI - the main program c ART_MPI_Main.f... CALL mpi_init(ierr) CALL mpi_comm_size(mpi_comm_world, mpisize, ierr ) CALL mpi_comm_rank(mpi_comm_world, irank, ierr ) IF(mpisize.NE. n_nodes) THEN write(*,*) mpisize.ne. n_nodes,mpisize,n_nodes call mpi_abort(mpi_comm_world,ierr1,ierr2) STOP ENDIF... Potsdam, July
72 CALL Read_ART_MPI_Inp () C$OMP PARALLEL DO DEFAULT(SHARED) C$OMP+PRIVATE ( ic1) do ic1 = 1, mcell var(1,ic1) = -1.0 var(2,ic1) = -1.0 ref(ic1) = zero pot(ic1) = zero enddo... call Read_Control() call Read_Particles()... Potsdam, July
73 c... main loop over mstep (read from input) integration steps DO ijkl = 1, mstep... CALL Send_Small () CALL Send_Large () c integrate one time step... c If(aexpn.lt.0.6)Then call LoadBalance2 Else call LoadBalance1 EndIf redistribution of primary particles call Redistribute_Primaries() Potsdam, July
74 c write output, if necessary call Save_Check ()... ENDDO 999 Continue CALL Save(0) CALL mpi_finalize(ierr) END Potsdam, July
75 ART MPI - send small particles c SUBROUTINE Send_Small() c c c purpose: gathers and sends small particles c c input: in0(3) - coordinates of sending node c c output: sends particles, sets n_refin =number of particles... node = irank + 1 CALL Node_to_IJK(node,iN0)... Do kn =1,n_divz! Loop over other nodes Do jn =1,n_divy Do in =1,n_divx Potsdam, July
76 c c... find boundaries of two nodes in 3D CALL BoundNode(iN0,iN1,Nbound) find primary particles, which node in0 will send for node in1 CALL Find_Small(iN0,Nbound,np_node,nn,ncount)... EndDo EndDo EndDo CALL Send_Receive()... RETURN End An analogous routine exists for sending large particles. Potsdam, July
77 ART MPI - send small particles c SUBROUTINE Send_Receive() c c c purpose: sends and reives data for particles INTEGER INTEGER... sendcount(n_nodes), recvcount(n_nodes) sendcount_all(n_nodes*n_nodes) c send integers with the lengths of all arrays which will be c sent (sendcount) and received (recvcount) to/from all nodes CALL mpi_allgather(sendcount, n_nodes, MPI_INTEGER, + sendcount_all, n_nodes, MPI_INTEGER, + MPI_COMM_WORLD, ierr) Potsdam, July
78 rdispl_new = 0 DO i = 1, n_nodes rdispl(i) = rdispl_new recvcount(i) = sendcount_all(irank+1+n_nodes*(i-1)) rdispl_new = rdispl(i) + recvcount(i) ENDDO CALL mpi_alltoallv(x_se, sendcount, sdispl, MPI_REAL8, + x_re, recvcount, rdispl, MPI_REAL8, + MPI_COMM_WORLD,ierr) CALL mpi_alltoallv(y_se, sendcount, sdispl, MPI_REAL8, + y_re, recvcount, rdispl, MPI_REAL8, + MPI_COMM_WORLD,ierr) CALL mpi_alltoallv(z_se, sendcount, sdispl, MPI_REAL8, + z_re, recvcount, rdispl, MPI_REAL8, + MPI_COMM_WORLD,ierr) CALL mpi_alltoallv(vx_se, sendcount, sdispl, MPI_REAL8, + vx_re, recvcount, rdispl, MPI_REAL8, + MPI_COMM_WORLD,ierr) Potsdam, July
79 CALL mpi_alltoallv(vy_se, sendcount, sdispl, MPI_REAL8, + vy_re, recvcount, rdispl, MPI_REAL8, + MPI_COMM_WORLD,ierr) CALL mpi_alltoallv(vz_se, sendcount, sdispl, MPI_REAL8, + vz_re, recvcount, rdispl, MPI_REAL8, + MPI_COMM_WORLD,ierr) CALL mpi_alltoallv(pt_se, sendcount, sdispl, MPI_REAL, + pt_re, recvcount, rdispl, MPI_REAL, + MPI_COMM_WORLD,ierr) CALL mpi_alltoallv(wpar_se, sendcount, sdispl, MPI_REAL, + wpar_re, recvcount, rdispl, MPI_REAL, + MPI_COMM_WORLD,ierr) CALL mpi_alltoallv(ip_se, sendcount, sdispl, MPI_INTEGER, + ip_re, recvcount, rdispl, MPI_INTEGER, + MPI_COMM_WORLD,ierr) RETURN END Potsdam, July
80 MPI ALLGATHER Has been used to distribute particles in the direct integration example. Here we distribute informations about how many particles each node is going to send to the other nodes (sendcount(n_nodes)) so that all nodes know how many particles arrive from the others (sendcount_all(n_nodes*n_nodes)). Having this information each processor can calculate where it has to put the arriving recvcount particles, i.e. rdispl. Potsdam, July
81 MPI ALLTOALL Each task in a group performs a scatter operation, sending a distinct message to all the tasks in the group in order by index. Potsdam, July
82 MPI ALLTOALL sendbuf MPI_ALLTOALL (sendbuf,sendcount,sendtype,recvbuf, recvcnt,recvtype,comm,ierr) starting address of the send buffer (Fortran variable) sendcount sendtype recvbuf number of data elements in the send buffer (integer) MPI data type address of the send buffer (Fortran variable) Potsdam, July
83 revcount number of elements received from any process (integer) recvtype MPI data type ( = sendtype) Potsdam, July
84 MPI ALLTOALLV MPI ALLTOALLV adds flexibility to MPI ALLTOALL in that the location of data for the send is specified by sdispls and the location of data on the receive side is specified by rdispls. sendcounts recvcnts MPI_ALLTOALL (sendbuf,sendcounts,sdipls,sendtype,recvbuf, recvcnts,rdipls,recvtype,comm,ierr) now: number of data elements to send to each processor (integer array of length msize) now: number of data elements which can be received by each processor (integer array of length msize) Potsdam, July
85 sdipls rdipls new: integer array of length msize specifying the displacement relative to sendbuf from which the data destined for process j has to be taken new: integer array of length msize specifying the displacement relative to recbuf at which to place the incoming data from process i Potsdam, July
86 After running MPI ART = analyze the data an MPI version of the BDM halo finder exists (Arman Khalatyan) an MPI version of the minimum spanning tree and friends-of-friends halo finder exists (Victor Turchaninov) Potsdam, July
87 Bugs leading to a deadlock Single process calls collective function. Example: root calls MPI Bcast Prevention: Do not put collective communications inside conditionally executed parts of the code. Two or more processes are trying to exchange data, but all call a blocking receive function (MPI Recv) before sending. Prevention: You could use MPI SendRecv. A process tries to receive data from a process that never will send it. Prevention: Use collective communications whenever it is possible, if using pointto-point communication, use simple communication patterns. Potsdam, July
88 Web pages about MPI Writing Message-Passing Parallel Programs with MPI SP Parallel Programming Workshop The Message Passing Interface (MPI) standard MPI: A Message-Passing Interface Standard MPI-2: Extensions to the Message-Passing Interface Potsdam, July
89 Books about MPI Michael J. Quinn, Parallel Programming in C with MPI and OpenMP, MC GrawHill, 2004, ISBN Peter S. Pacheco, Parallel Programming with MPI, Morgan Kaufmann Publishers, 1997, ISBN MPI: The Complete Reference (The MIT Press, ISBN ) Potsdam, July
Collective Communication: Gather. MPI - v Operations. Collective Communication: Gather. MPI_Gather. root WORKS A OK
Collective Communication: Gather MPI - v Operations A Gather operation has data from all processes collected, or gathered, at a central process, referred to as the root Even the root process contributes
More informationMPI - v Operations. Collective Communication: Gather
MPI - v Operations Based on notes by Dr. David Cronk Innovative Computing Lab University of Tennessee Cluster Computing 1 Collective Communication: Gather A Gather operation has data from all processes
More informationMPI MESSAGE PASSING INTERFACE
MPI MESSAGE PASSING INTERFACE David COLIGNON CÉCI - Consortium des Équipements de Calcul Intensif http://hpc.montefiore.ulg.ac.be Outline Introduction From serial source code to parallel execution MPI
More informationIntroduction to MPI Part II Collective Communications and communicators
Introduction to MPI Part II Collective Communications and communicators Andrew Emerson, Fabio Affinito {a.emerson,f.affinito}@cineca.it SuperComputing Applications and Innovation Department Collective
More informationCollective Communication: Gatherv. MPI v Operations. root
Collective Communication: Gather MPI v Operations A Gather operation has data from all processes collected, or gathered, at a central process, referred to as the root Even the root process contributes
More informationCEE 618 Scientific Parallel Computing (Lecture 5): Message-Passing Interface (MPI) advanced
1 / 32 CEE 618 Scientific Parallel Computing (Lecture 5): Message-Passing Interface (MPI) advanced Albert S. Kim Department of Civil and Environmental Engineering University of Hawai i at Manoa 2540 Dole
More informationHPC Parallel Programing Multi-node Computation with MPI - I
HPC Parallel Programing Multi-node Computation with MPI - I Parallelization and Optimization Group TATA Consultancy Services, Sahyadri Park Pune, India TCS all rights reserved April 29, 2013 Copyright
More informationAMath 483/583 Lecture 18 May 6, 2011
AMath 483/583 Lecture 18 May 6, 2011 Today: MPI concepts Communicators, broadcast, reduce Next week: MPI send and receive Iterative methods Read: Class notes and references $CLASSHG/codes/mpi MPI Message
More informationProgramming with MPI Collectives
Programming with MPI Collectives Jan Thorbecke Type to enter text Delft University of Technology Challenge the future Collectives Classes Communication types exercise: BroadcastBarrier Gather Scatter exercise:
More informationAMath 483/583 Lecture 21
AMath 483/583 Lecture 21 Outline: Review MPI, reduce and bcast MPI send and receive Master Worker paradigm References: $UWHPSC/codes/mpi class notes: MPI section class notes: MPI section of bibliography
More informationMA471. Lecture 5. Collective MPI Communication
MA471 Lecture 5 Collective MPI Communication Today: When all the processes want to send, receive or both Excellent website for MPI command syntax available at: http://www-unix.mcs.anl.gov/mpi/www/ 9/10/2003
More informationRecap of Parallelism & MPI
Recap of Parallelism & MPI Chris Brady Heather Ratcliffe The Angry Penguin, used under creative commons licence from Swantje Hess and Jannis Pohlmann. Warwick RSE 13/12/2017 Parallel programming Break
More informationThe MPI Message-passing Standard Practical use and implementation (V) SPD Course 6/03/2017 Massimo Coppola
The MPI Message-passing Standard Practical use and implementation (V) SPD Course 6/03/2017 Massimo Coppola Intracommunicators COLLECTIVE COMMUNICATIONS SPD - MPI Standard Use and Implementation (5) 2 Collectives
More informationIntroduction to parallel computing with MPI
Introduction to parallel computing with MPI Sergiy Bubin Department of Physics Nazarbayev University Distributed Memory Environment image credit: LLNL Hybrid Memory Environment Most modern clusters and
More informationTopics. Lecture 7. Review. Other MPI collective functions. Collective Communication (cont d) MPI Programming (III)
Topics Lecture 7 MPI Programming (III) Collective communication (cont d) Point-to-point communication Basic point-to-point communication Non-blocking point-to-point communication Four modes of blocking
More informationSlides prepared by : Farzana Rahman 1
Introduction to MPI 1 Background on MPI MPI - Message Passing Interface Library standard defined by a committee of vendors, implementers, and parallel programmers Used to create parallel programs based
More informationCSE. Parallel Algorithms on a cluster of PCs. Ian Bush. Daresbury Laboratory (With thanks to Lorna Smith and Mark Bull at EPCC)
Parallel Algorithms on a cluster of PCs Ian Bush Daresbury Laboratory I.J.Bush@dl.ac.uk (With thanks to Lorna Smith and Mark Bull at EPCC) Overview This lecture will cover General Message passing concepts
More informationOutline. Communication modes MPI Message Passing Interface Standard. Khoa Coâng Ngheä Thoâng Tin Ñaïi Hoïc Baùch Khoa Tp.HCM
THOAI NAM Outline Communication modes MPI Message Passing Interface Standard TERMs (1) Blocking If return from the procedure indicates the user is allowed to reuse resources specified in the call Non-blocking
More informationAdvanced Message-Passing Interface (MPI)
Outline of the workshop 2 Advanced Message-Passing Interface (MPI) Bart Oldeman, Calcul Québec McGill HPC Bart.Oldeman@mcgill.ca Morning: Advanced MPI Revision More on Collectives More on Point-to-Point
More informationL15: Putting it together: N-body (Ch. 6)!
Outline L15: Putting it together: N-body (Ch. 6)! October 30, 2012! Review MPI Communication - Blocking - Non-Blocking - One-Sided - Point-to-Point vs. Collective Chapter 6 shows two algorithms (N-body
More informationIntroduction to MPI, the Message Passing Library
Chapter 3, p. 1/57 Basics of Basic Messages -To-? Introduction to, the Message Passing Library School of Engineering Sciences Computations for Large-Scale Problems I Chapter 3, p. 2/57 Outline Basics of
More informationMPI 5. CSCI 4850/5850 High-Performance Computing Spring 2018
MPI 5 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives
More informationParallel Programming Using Basic MPI. Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center
05 Parallel Programming Using Basic MPI Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center Talk Overview Background on MPI Documentation Hello world in MPI Basic communications Simple
More informationMPI Collective communication
MPI Collective communication CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) MPI Collective communication Spring 2018 1 / 43 Outline 1 MPI Collective communication
More informationCS4961 Parallel Programming. Lecture 18: Introduction to Message Passing 11/3/10. Final Project Purpose: Mary Hall November 2, 2010.
Parallel Programming Lecture 18: Introduction to Message Passing Mary Hall November 2, 2010 Final Project Purpose: - A chance to dig in deeper into a parallel programming model and explore concepts. -
More informationMessage Passing Interface
MPSoC Architectures MPI Alberto Bosio, Associate Professor UM Microelectronic Departement bosio@lirmm.fr Message Passing Interface API for distributed-memory programming parallel code that runs across
More informationIntroduction to MPI. May 20, Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign
Introduction to MPI May 20, 2013 Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign Top500.org PERFORMANCE DEVELOPMENT 1 Eflop/s 162 Pflop/s PROJECTED 100 Pflop/s
More informationIntroduction to MPI. Ricardo Fonseca. https://sites.google.com/view/rafonseca2017/
Introduction to MPI Ricardo Fonseca https://sites.google.com/view/rafonseca2017/ Outline Distributed Memory Programming (MPI) Message Passing Model Initializing and terminating programs Point to point
More informationAgenda. MPI Application Example. Praktikum: Verteiltes Rechnen und Parallelprogrammierung Introduction to MPI. 1) Recap: MPI. 2) 2.
Praktikum: Verteiltes Rechnen und Parallelprogrammierung Introduction to MPI Agenda 1) Recap: MPI 2) 2. Übungszettel 3) Projektpräferenzen? 4) Nächste Woche: 3. Übungszettel, Projektauswahl, Konzepte 5)
More informationMPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016
MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 Message passing vs. Shared memory Client Client Client Client send(msg) recv(msg) send(msg) recv(msg) MSG MSG MSG IPC Shared
More informationIntroduction to MPI Programming Part 2
Introduction to MPI Programming Part 2 Outline Collective communication Derived data types Collective Communication Collective communications involves all processes in a communicator One to all, all to
More informationCornell Theory Center. Discussion: MPI Collective Communication I. Table of Contents. 1. Introduction
1 of 18 11/1/2006 3:59 PM Cornell Theory Center Discussion: MPI Collective Communication I This is the in-depth discussion layer of a two-part module. For an explanation of the layers and how to navigate
More informationWeek 3: MPI. Day 02 :: Message passing, point-to-point and collective communications
Week 3: MPI Day 02 :: Message passing, point-to-point and collective communications Message passing What is MPI? A message-passing interface standard MPI-1.0: 1993 MPI-1.1: 1995 MPI-2.0: 1997 (backward-compatible
More informationFirst day. Basics of parallel programming. RIKEN CCS HPC Summer School Hiroya Matsuba, RIKEN CCS
First day Basics of parallel programming RIKEN CCS HPC Summer School Hiroya Matsuba, RIKEN CCS Today s schedule: Basics of parallel programming 7/22 AM: Lecture Goals Understand the design of typical parallel
More informationBasic MPI Communications. Basic MPI Communications (cont d)
Basic MPI Communications MPI provides two non-blocking routines: MPI_Isend(buf,cnt,type,dst,tag,comm,reqHandle) buf: source of data to be sent cnt: number of data elements to be sent type: type of each
More informationOutline. Communication modes MPI Message Passing Interface Standard
MPI THOAI NAM Outline Communication modes MPI Message Passing Interface Standard TERMs (1) Blocking If return from the procedure indicates the user is allowed to reuse resources specified in the call Non-blocking
More informationMessage-Passing and MPI Programming
Message-Passing and MPI Programming 5.1 Introduction More on Datatypes and Collectives N.M. Maclaren nmm1@cam.ac.uk July 2010 There are a few important facilities we have not covered yet; they are less
More informationMPI Tutorial. Shao-Ching Huang. High Performance Computing Group UCLA Institute for Digital Research and Education
MPI Tutorial Shao-Ching Huang High Performance Computing Group UCLA Institute for Digital Research and Education Center for Vision, Cognition, Learning and Art, UCLA July 15 22, 2013 A few words before
More informationClaudio Chiaruttini Dipartimento di Matematica e Informatica Centro Interdipartimentale per le Scienze Computazionali (CISC) Università di Trieste
Claudio Chiaruttini Dipartimento di Matematica e Informatica Centro Interdipartimentale per le Scienze Computazionali (CISC) Università di Trieste http://www.dmi.units.it/~chiarutt/didattica/parallela
More informationParallel Programming
Parallel Programming for Multicore and Cluster Systems von Thomas Rauber, Gudula Rünger 1. Auflage Parallel Programming Rauber / Rünger schnell und portofrei erhältlich bei beck-shop.de DIE FACHBUCHHANDLUNG
More informationMPI Lab. How to split a problem across multiple processors Broadcasting input to other nodes Using MPI_Reduce to accumulate partial sums
MPI Lab Parallelization (Calculating π in parallel) How to split a problem across multiple processors Broadcasting input to other nodes Using MPI_Reduce to accumulate partial sums Sharing Data Across Processors
More informationNon-Blocking Communications
Non-Blocking Communications Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationPaul Burton April 2015 An Introduction to MPI Programming
Paul Burton April 2015 Topics Introduction Initialising MPI & basic concepts Compiling and running a parallel program on the Cray Practical : Hello World MPI program Synchronisation Practical Data types
More informationMessage Passing Interface: Basic Course
Overview of DM- HPC2N, UmeåUniversity, 901 87, Sweden. April 23, 2015 Table of contents Overview of DM- 1 Overview of DM- Parallelism Importance Partitioning Data Distributed Memory Working on Abisko 2
More informationCS 470 Spring Mike Lam, Professor. Distributed Programming & MPI
CS 470 Spring 2018 Mike Lam, Professor Distributed Programming & MPI MPI paradigm Single program, multiple data (SPMD) One program, multiple processes (ranks) Processes communicate via messages An MPI
More informationMPI Optimisation. Advanced Parallel Programming. David Henty, Iain Bethune, Dan Holmes EPCC, University of Edinburgh
MPI Optimisation Advanced Parallel Programming David Henty, Iain Bethune, Dan Holmes EPCC, University of Edinburgh Overview Can divide overheads up into four main categories: Lack of parallelism Load imbalance
More informationCommunication Characteristics in the NAS Parallel Benchmarks
Communication Characteristics in the NAS Parallel Benchmarks Ahmad Faraj Xin Yuan Department of Computer Science, Florida State University, Tallahassee, FL 32306 {faraj, xyuan}@cs.fsu.edu Abstract In this
More informationMPI Tutorial. Shao-Ching Huang. IDRE High Performance Computing Workshop
MPI Tutorial Shao-Ching Huang IDRE High Performance Computing Workshop 2013-02-13 Distributed Memory Each CPU has its own (local) memory This needs to be fast for parallel scalability (e.g. Infiniband,
More informationHolland Computing Center Kickstart MPI Intro
Holland Computing Center Kickstart 2016 MPI Intro Message Passing Interface (MPI) MPI is a specification for message passing library that is standardized by MPI Forum Multiple vendor-specific implementations:
More informationMessage-Passing and MPI Programming
Message-Passing and MPI Programming More on Collectives N.M. Maclaren Computing Service nmm1@cam.ac.uk ext. 34761 July 2010 5.1 Introduction There are two important facilities we have not covered yet;
More informationMPI. (message passing, MIMD)
MPI (message passing, MIMD) What is MPI? a message-passing library specification extension of C/C++ (and Fortran) message passing for distributed memory parallel programming Features of MPI Point-to-point
More informationPractical stuff! ü OpenMP
Practical stuff! REALITY: Ways of actually get stuff done in HPC: Ø Message Passing (send, receive, broadcast,...) Ø Shared memory (load, store, lock, unlock) ü MPI Ø Transparent (compiler works magic)
More informationNon-Blocking Communications
Non-Blocking Communications Deadlock 1 5 2 3 4 Communicator 0 2 Completion The mode of a communication determines when its constituent operations complete. - i.e. synchronous / asynchronous The form of
More informationL19: Putting it together: N-body (Ch. 6)!
Administrative L19: Putting it together: N-body (Ch. 6)! November 22, 2011! Project sign off due today, about a third of you are done (will accept it tomorrow, otherwise 5% loss on project grade) Next
More informationThe Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing
The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Parallelism Decompose the execution into several tasks according to the work to be done: Function/Task
More informationL14 Supercomputing - Part 2
Geophysical Computing L14-1 L14 Supercomputing - Part 2 1. MPI Code Structure Writing parallel code can be done in either C or Fortran. The Message Passing Interface (MPI) is just a set of subroutines
More informationIntroduction to MPI part II. Fabio AFFINITO
Introduction to MPI part II Fabio AFFINITO (f.affinito@cineca.it) Collective communications Communications involving a group of processes. They are called by all the ranks involved in a communicator (or
More informationCollective Communication in MPI and Advanced Features
Collective Communication in MPI and Advanced Features Pacheco s book. Chapter 3 T. Yang, CS240A. Part of slides from the text book, CS267 K. Yelick from UC Berkeley and B. Gropp, ANL Outline Collective
More informationMessage passing. Week 3: MPI. Day 02 :: Message passing, point-to-point and collective communications. What is MPI?
Week 3: MPI Day 02 :: Message passing, point-to-point and collective communications Message passing What is MPI? A message-passing interface standard MPI-1.0: 1993 MPI-1.1: 1995 MPI-2.0: 1997 (backward-compatible
More informationCS 470 Spring Mike Lam, Professor. Distributed Programming & MPI
CS 470 Spring 2017 Mike Lam, Professor Distributed Programming & MPI MPI paradigm Single program, multiple data (SPMD) One program, multiple processes (ranks) Processes communicate via messages An MPI
More informationParallel programming MPI
Parallel programming MPI Distributed memory Each unit has its own memory space If a unit needs data in some other memory space, explicit communication (often through network) is required Point-to-point
More informationcharacter :: buffer(100) integer :: position real :: a, b integer :: n position = 0 call MPI_PACK(a, 1, MPI_REAL, buffer, 100, & position, MPI_COMM_WO
MPI_PACK and MPI_UNPACK Each communication incurs a latency penalty so it is best to group communications together Requires data to be contiguous in memory with no gaps between variables This is true for
More informationMessage Passing Programming. Designing MPI Applications
Message Passing Programming Designing MPI Applications Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationECE 574 Cluster Computing Lecture 13
ECE 574 Cluster Computing Lecture 13 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 21 March 2017 Announcements HW#5 Finally Graded Had right idea, but often result not an *exact*
More informationCSE 160 Lecture 23. Matrix Multiplication Continued Managing communicators Gather and Scatter (Collectives)
CS 160 Lecture 23 Matrix Multiplication Continued Managing communicators Gather and Scatter (Collectives) Today s lecture All to all communication Application to Parallel Sorting Blocking for cache 2013
More informationMPI Message Passing Interface. Source:
MPI Message Passing Interface Source: http://www.netlib.org/utk/papers/mpi-book/mpi-book.html Message Passing Principles Explicit communication and synchronization Programming complexity is high But widely
More informationCSE 613: Parallel Programming. Lecture 21 ( The Message Passing Interface )
CSE 613: Parallel Programming Lecture 21 ( The Message Passing Interface ) Jesmin Jahan Tithi Department of Computer Science SUNY Stony Brook Fall 2013 ( Slides from Rezaul A. Chowdhury ) Principles of
More informationHigh Performance Computing
High Performance Computing Course Notes 2009-2010 2010 Message Passing Programming II 1 Communications Point-to-point communications: involving exact two processes, one sender and one receiver For example,
More informationPractical Introduction to Message-Passing Interface (MPI)
1 Practical Introduction to Message-Passing Interface (MPI) October 1st, 2015 By: Pier-Luc St-Onge Partners and Sponsors 2 Setup for the workshop 1. Get a user ID and password paper (provided in class):
More informationParallel Programming Using MPI
Parallel Programming Using MPI Short Course on HPC 15th February 2019 Aditya Krishna Swamy adityaks@iisc.ac.in SERC, Indian Institute of Science When Parallel Computing Helps? Want to speed up your calculation
More informationIntroduction to parallel computing concepts and technics
Introduction to parallel computing concepts and technics Paschalis Korosoglou (support@grid.auth.gr) User and Application Support Unit Scientific Computing Center @ AUTH Overview of Parallel computing
More informationCS 426. Building and Running a Parallel Application
CS 426 Building and Running a Parallel Application 1 Task/Channel Model Design Efficient Parallel Programs (or Algorithms) Mainly for distributed memory systems (e.g. Clusters) Break Parallel Computations
More informationDistributed Memory Programming with MPI
Distributed Memory Programming with MPI Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna moreno.marzolla@unibo.it Algoritmi Avanzati--modulo 2 2 Credits Peter Pacheco,
More informationIntroduzione al Message Passing Interface (MPI) Andrea Clematis IMATI CNR
Introduzione al Message Passing Interface (MPI) Andrea Clematis IMATI CNR clematis@ge.imati.cnr.it Ack. & riferimenti An Introduction to MPI Parallel Programming with the Message Passing InterfaceWilliam
More informationPeter Pacheco. Chapter 3. Distributed Memory Programming with MPI. Copyright 2010, Elsevier Inc. All rights Reserved
An Introduction to Parallel Programming Peter Pacheco Chapter 3 Distributed Memory Programming with MPI 1 Roadmap Writing your first MPI program. Using the common MPI functions. The Trapezoidal Rule in
More informationDistributed Memory Programming with MPI. Copyright 2010, Elsevier Inc. All rights Reserved
An Introduction to Parallel Programming Peter Pacheco Chapter 3 Distributed Memory Programming with MPI 1 Roadmap Writing your first MPI program. Using the common MPI functions. The Trapezoidal Rule in
More informationTopics. Lecture 6. Point-to-point Communication. Point-to-point Communication. Broadcast. Basic Point-to-point communication. MPI Programming (III)
Topics Lecture 6 MPI Programming (III) Point-to-point communication Basic point-to-point communication Non-blocking point-to-point communication Four modes of blocking communication Manager-Worker Programming
More informationNUMERICAL PARALLEL COMPUTING
Lecture 5, March 23, 2012: The Message Passing Interface http://people.inf.ethz.ch/iyves/pnc12/ Peter Arbenz, Andreas Adelmann Computer Science Dept, ETH Zürich E-mail: arbenz@inf.ethz.ch Paul Scherrer
More informationMessage Passing Interface
Message Passing Interface DPHPC15 TA: Salvatore Di Girolamo DSM (Distributed Shared Memory) Message Passing MPI (Message Passing Interface) A message passing specification implemented
More informationMore about MPI programming. More about MPI programming p. 1
More about MPI programming More about MPI programming p. 1 Some recaps (1) One way of categorizing parallel computers is by looking at the memory configuration: In shared-memory systems, the CPUs share
More informationIntroduction to MPI. Ekpe Okorafor. School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014
Introduction to MPI Ekpe Okorafor School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014 Topics Introduction MPI Model and Basic Calls MPI Communication Summary 2 Topics Introduction
More informationData parallelism. [ any app performing the *same* operation across a data stream ]
Data parallelism [ any app performing the *same* operation across a data stream ] Contrast stretching: Version Cores Time (secs) Speedup while (step < NumSteps &&!converged) { step++; diffs = 0; foreach
More informationExercises: April 11. Hermann Härtig, TU Dresden, Distributed OS, Load Balancing
Exercises: April 11 1 PARTITIONING IN MPI COMMUNICATION AND NOISE AS HPC BOTTLENECK LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2017 Hermann Härtig THIS LECTURE Partitioning: bulk synchronous
More informationPractical Introduction to Message-Passing Interface (MPI)
1 Outline of the workshop 2 Practical Introduction to Message-Passing Interface (MPI) Bart Oldeman, Calcul Québec McGill HPC Bart.Oldeman@mcgill.ca Theoretical / practical introduction Parallelizing your
More informationIntroduction to MPI. SHARCNET MPI Lecture Series: Part I of II. Paul Preney, OCT, M.Sc., B.Ed., B.Sc.
Introduction to MPI SHARCNET MPI Lecture Series: Part I of II Paul Preney, OCT, M.Sc., B.Ed., B.Sc. preney@sharcnet.ca School of Computer Science University of Windsor Windsor, Ontario, Canada Copyright
More informationStandard MPI - Message Passing Interface
c Ewa Szynkiewicz, 2007 1 Standard MPI - Message Passing Interface The message-passing paradigm is one of the oldest and most widely used approaches for programming parallel machines, especially those
More informationMessage Passing Interface. most of the slides taken from Hanjun Kim
Message Passing Interface most of the slides taken from Hanjun Kim Message Passing Pros Scalable, Flexible Cons Someone says it s more difficult than DSM MPI (Message Passing Interface) A standard message
More informationCS 6230: High-Performance Computing and Parallelization Introduction to MPI
CS 6230: High-Performance Computing and Parallelization Introduction to MPI Dr. Mike Kirby School of Computing and Scientific Computing and Imaging Institute University of Utah Salt Lake City, UT, USA
More informationLecture 7: More about MPI programming. Lecture 7: More about MPI programming p. 1
Lecture 7: More about MPI programming Lecture 7: More about MPI programming p. 1 Some recaps (1) One way of categorizing parallel computers is by looking at the memory configuration: In shared-memory systems
More informationScientific Computing
Lecture on Scientific Computing Dr. Kersten Schmidt Lecture 21 Technische Universität Berlin Institut für Mathematik Wintersemester 2014/2015 Syllabus Linear Regression, Fast Fourier transform Modelling
More informationA short overview of parallel paradigms. Fabio Affinito, SCAI
A short overview of parallel paradigms Fabio Affinito, SCAI Why parallel? In principle, if you have more than one computing processing unit you can exploit that to: -Decrease the time to solution - Increase
More informationIntroduction to Parallel Programming with MPI
Introduction to Parallel Programming with MPI PICASso Tutorial October 25-26, 2006 Stéphane Ethier (ethier@pppl.gov) Computational Plasma Physics Group Princeton Plasma Physics Lab Why Parallel Computing?
More informationCINES MPI. Johanne Charpentier & Gabriel Hautreux
Training @ CINES MPI Johanne Charpentier & Gabriel Hautreux charpentier@cines.fr hautreux@cines.fr Clusters Architecture OpenMP MPI Hybrid MPI+OpenMP MPI Message Passing Interface 1. Introduction 2. MPI
More informationProgramming with MPI
Programming with MPI p. 1/?? Programming with MPI More on Datatypes and Collectives Nick Maclaren nmm1@cam.ac.uk May 2008 Programming with MPI p. 2/?? Less Basic Collective Use A few important facilities
More informationOptimization of MPI Applications Rolf Rabenseifner
Optimization of MPI Applications Rolf Rabenseifner University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de Optimization of MPI Applications Slide 1 Optimization and Standardization
More informationDecomposing onto different processors
N-Body II: MPI Decomposing onto different processors Direct summation (N 2 ) - each particle needs to know about all other particles No locality possible Inherently a difficult problem to parallelize in
More informationAdvanced Parallel Programming
Advanced Parallel Programming Networks and All-to-All communication David Henty, Joachim Hein EPCC The University of Edinburgh Overview of this Lecture All-to-All communications MPI_Alltoall MPI_Alltoallv
More informationLecture 9: MPI continued
Lecture 9: MPI continued David Bindel 27 Sep 2011 Logistics Matrix multiply is done! Still have to run. Small HW 2 will be up before lecture on Thursday, due next Tuesday. Project 2 will be posted next
More informationMPI MESSAGE PASSING INTERFACE
MPI MESSAGE PASSING INTERFACE David COLIGNON, ULiège CÉCI - Consortium des Équipements de Calcul Intensif http://www.ceci-hpc.be Outline Introduction From serial source code to parallel execution MPI functions
More informationa. Assuming a perfect balance of FMUL and FADD instructions and no pipeline stalls, what would be the FLOPS rate of the FPU?
CPS 540 Fall 204 Shirley Moore, Instructor Test November 9, 204 Answers Please show all your work.. Draw a sketch of the extended von Neumann architecture for a 4-core multicore processor with three levels
More information