Introduction to parallel computing

Size: px
Start display at page:

Download "Introduction to parallel computing"

Transcription

1 Introduction to parallel computing

2 What is parallel computing? Serial computing Single processing unit (core) is used for solving a problem Single task performed at once Parallel computing Multiple cores are used for solving a problem Problem is split into smaller subtasks Multiple subtasks are performed simultaneously P r o b l e m P r o b l e m c o r e c1 c2 c3... cn r e s u l t r e s u l t

3 Solve problems faster Why parallel computing? CPU clock frequencies are no longer increasing Speed-up is obtained by using multiple cores Parallel programming is required for utilizing multiple cores

4 Solve bigger problems Why parallel computing? Parallel computing may allow application to use more memory Apply old models to new length and time scales Grand challenges New science

5 Solve problems better More precise models Why parallel computing? Algorithms that are more precise but also computationally heavier New science

6 Types of parallel computers Shared memory All cores can access the whole memory Distributed memory All the cores have their own memory Communication is needed in order to access the memory of other cores Current supercomputers combine the distributed memory and shared memory approaches Memory core 1 Memory c1 c2 cn Memory core 1-4 Memory core 1 Memory core 1 Memory core 5-8 Memory core x - N

7 Current trends in parallel computers Petaflop/s (10 15 operations/s) systems Commodity Linux clusters occupying a great share of the Top500 list Top spots taken by tightly-packed non-commodity systems Power consumption: both a driver and a limiting factor Novel technologies ( accelerators ) offer the promise of a quantum leap Data avalanche

8 Parallel programming models Threads (pthreads, OpenMP) Can be used only in shared memory computers Limited parallel scalability Simpler /less explicit programming Message passing Can be used both in distributed and shared memory computers Programming model allows for good parallel scalability Programming is quite explicit

9 Parallel programming models Hybrid programming Threads inside a node, message passing between nodes Can enable scaling to extreme core counts (> 10000) PGAS (partitioned global address space) languages (UPC, CAF, Chapel, X10) Hides a lot of explicit considerations of parallelism Still under development

10 Data parallelism Data is distributed to processor cores Each core performs (nearly) identical tasks with different data Example: summing the elements of a 2D array core 1: Σ= core 2: Σ= core 3: Σ= core 4: Σ=

11 Task parallelism Different cores perform different tasks with the same or different data Example: signal processing, four filters as separate tasks Core 2 obtains a data segment after core 1 has processed it; core 1 starts to process a new segment... f i l t e r f i l t e r f i l t e r f i l t e r data core 1 core 2 core 3 core 4

12 Speed-up Parallel computing concepts Strong parallel scaling Constant problem size Execution time decreases in proportion to the increase in the number of cores Ideal scaling Real scaling Cores

13 Problem size / time Parallel computing concepts Weak parallel scaling Increasing problem size Execution time remains constant when number of cores increases in proportion to the problem size Ideal scaling Real scaling Cores

14 Speed-up Parallel computing concepts Parallel programs contain often sequential parts Maximum speed-up: Amdahl's law gives the maximum speed-up in the presence of nonparallelizable parts F: parallel fraction N: number of cores F=1.0 F= F=0.94 F= Cores

15 More parallel computing concepts Synchronization Coordination of processes for maintaining correct runtime order and for keeping data coherent Granularity Amount of synchronization needed between subtasks Fine grained: lots of synchronization Coarse grained: synchronization less frequent Embarrassingly parallel: synchronization is needed rarely or never

16 Load balance More parallel computing concepts Distribution of workload to different cores Parallel overhead Additional operations which are not present in serial calculation Synchronization, redundant computations, communications

17 Summary Parallel programming is needed when solving large computational problems Different programming models and computer architectures, for example Data / task parallel Shared / distributed memory Achieving good scalability requires that there are no serial parts in the program

18 Getting started with MPI

19 Message-passing interface MPI is an application programming interface (API) for communication between separate processes The most widely used approach for distributed parallel computing MPI programs are portable and scalable MPI is flexible and comprehensive Large (over 300 procedures) Concise (often only 6 procedures are needed) MPI standardization by MPI Forum

20 Execution model Parallel program is launched as set of independent, identical processes The same program code and instructions Can reside in different nodes or even in different computers The way to launch parallel program is implementation dependent mpirun, mpiexec, srun, aprun, poe,...

21 MPI ranks MPI runtime assigns each process a rank identification of the processes ranks start from 0 and extent to N-1 Processes can perform different tasks and handle different data basing on their rank... if ( rank == 0 ) {... } if ( rank == 1) {... }...

22 Data model All variables and data structures are local to the process Processes can exchange data by sending and receiving messages a = 1.0 b = 2.0 Process 1 (rank 0 ) MPI Messages a = -1.0 b = -2.0 Process 2 (rank 1 )

23 MPI communicator Communicator is an object connecting a group of processes Initially, there is always a communicator MPI_COMM_WORLD which contains all the processes Most MPI functions require communicator as an argument Users can define own communicators

24 Routines of the MPI library Information about the communicator number of processes rank of the process Communication between processes sending and receiving messages between two processes sending and receiving messages between several processes Synchronization between processes Advanced features

25 Programming MPI MPI standard defines interfaces to C and Fortran programming languages There are unofficial bindings to Python, Perl and Java C call convention rc = MPI_Xxxx(parameter,...) some arguments have to passed as pointers Fortran call convention CALL MPI_XXXX(parameter,..., rc) return code in the last argument

26 First five MPI commands Set up the MPI environment MPI_Init() Information about the communicator MPI_Comm_size(comm, size) MPI_Comm_rank(comm, rank) Parameters comm communicator size number of processes in the communicator rank rank of this process

27 Synchronize processes MPI_Barrier(comm) Finalize MPI environment MPI_Finalize() First five MPI commands

28 Include MPI header files Writing an MPI program C: #include <mpi.h> Fortran: INCLUDE 'mpif.h' Call MPI_Init Write the actual program Call MPI_Finalize before exiting from the main program

29 Summary In MPI, a set of independent processes is launched Processes are identified by ranks Data is always local to the process Processes can exchange data by sending and receiving messages MPI library contains functions for Communication and synchronization between processes Communicator manipulation

30 Point-to-point communication

31 Introduction MPI processes are independent, they communicate to coordinate work Point-to-point communication Messages are sent between two processes Collective communication Involving a number of processes at the same time

32 MPI point-to-point operations One process sends a message to another process that receives it Sends and receives in a program should match one receive per send

33 MPI point-to-point operations Each message (envelope) contains The actual data that is to be sent The datatype of each element of data. The number of elements the data consists of An identification number for the message (tag) The ranks of the source and destination process

34 Presenting syntax Operations presented in pseudocode, C and Fortran bindings presented in extra material slides. INPUT arguments in red OUTPUT arguments in blue Note! Extra error parameter for Fortran Slide with extra material included in handouts

35 Send operation MPI_Send(buf, count, datatype, dest, tag, comm) buf count datatype dest tag comm error The data that is sent Number of elements in buffer Type of each element in buf (see later slides) The rank of the receiver An integer identifying the message A communicator Error value; in C/C++ it s the return value of the function, and in Fortran an additional output parameter

36 Receive operation MPI_Recv(buf, count, datatype, source, tag, comm, status) buf count datatype source tag comm status error Buffer for storing received data Number of elements in buffer, not the number of element that are actually received Type of each element in buf Sender of the message Number identifying the message Communicator Information on the received message As for send operation

37 MPI datatypes MPI has a number of predefined datatypes to represent data Each C or Fortran datatype has a corresponding MPI datatype C examples: MPI_INT for int and MPI_DOUBLE for double Fortran example: MPI_INTEGER for integer One can also define custom datatypes

38 Case study: parallel sum Memory P0 P1 Array originally on process #0 (P0) Parallel algorithm Scatter Half of the array is sent to process 1 Compute P0 & P1 sum independently their segments Reduction Partial sum on P1 sent to P0 P0 sums the partial sums

39 Memory P0 P1 Case study: parallel sum Step 1.1: Receive operation in scatter Timeline P0 P1 Recv P1 posts a receive to receive half of the array from P0

40 Memory P0 P1 Case study: parallel sum Step 1.2: Send operation in scatter Timeline P0 P1 Send Recv P0 posts a send to send the lower part of the array to P1

41 Memory P0 P1 Case study: parallel sum Step 2: Compute the sum in parallel Timeline P0 Send Compute = P1 Recv Compute = P0 & P1 computes their parallel sums and store them locally

42 Memory P0 P1 Case study: parallel sum Step 3.1: Receive operation in reduction Timeline P0 Send Compute R = P1 Recv Compute = P0 posts a receive to receive partial sum

43 Memory P0 P1 Case study: parallel sum Step 3.2: send operation in reduction Timeline = P0 P1 Send Compute R Recv Compute S = P1 posts a send with partial sum

44 Case study: parallel sum Memory Step 4: Compute final answer P0 P1 Timeline P0 P1 Send Recv Compute Compute = P0 sums the partial sums

45 MORE ABOUT POINT-TO-POINT COMMUNICATION

46 Special parameter values MPI_Send(buf, count, datatype, dest, tag, comm) dest MPI_PROC_NULL Null destination, no operation takes place comm MPI_COMM_WORLD Includes all processes error MPI_SUCCESS Operation successful

47 Special parameter values MPI_Recv(buf, count, datatype, source, tag, comm, status) source MPI_PROC_NULL No sender, no operation takes place MPI_ANY_SOURCE Receive from any sender tag MPI_ANY_TAG Receive messages with any tag comm MPI_COMM_WORLD Includes all processes status MPI_STATUS_IGNORE Do not store any status data error MPI_SUCCESS Operation successful

48 Status parameter The status parameter in MPI_Recv contains information on how the receive succeeded Number and datatype of received elements Tag of the received message Rank of the sender In C the status parameter is a struct, in Fortran it is an integer array

49 Received elements Status parameter Use the function MPI_Get_count(status, datatype, count) Tag of the received message C: status.mpi_tag, Fortran: status(mpi_tag) Rank of the sender C: status.mpi_source, Fortran: status(mpi_source)

50 Blocking routines Blocking routines & deadlocks Completion depends on other processes Risk for deadlocks the program is stuck forever MPI_Send exits once the send buffer can be safely read and written to MPI_Recv exits once it has received the message in the receive buffer

51 Point-to-point communication patterns Pairwise exchange Process 0 Process 1 Process 2 Process 3 Pipe, a ring of processes exchanging data Process 0 Process 1 Process 2 Process 3

52 Combined send & receive MPI_Sendrecv(sendbuf, sendcount, sendtype, dest, sendtag, recvbuf, recvcount, recvtype, source, recvtag, comm, status) Parameters as for MPI_Send and MPI_Recv combined Sends one message and receives another one, with one single command Reduces risk for deadlocks Destination rank and source rank can be same or different

53 Case study 2: Domain decomposition Computation inside each domain can be carried out independently; hence in parallel Ghost layer at boundary represent the value of the elements of the other process Serial P P P Parallel

54 CS2: One iteration step Have to carefully schedule the order of sends and receives in order to avoid deadlocks P P P Parallel P0 Send Recv Compute Timeline P1 Recv Send Recv Send Compute P2 Send Recv Compute

55 CS2: MPI_Sendrecv MPI_Sendrecv Sends and receives with one command No risk of deadlocks P P P P0 Send Recv Compute P1 Sendrecv Sendrecv Compute P2 Recv Send Compute

56 Summary Point-to-point communication Messages are sent between two processes We discussed send and receive operations enabling any parallel application MPI_Send & MPI_Recv MPI_Sendrecv Status parameter Special argument values

57 Non-blocking communication

58 Non-blocking communication Non-blocking sends and receives MPI_Isend & MPI_Irecv returns immediately and sends/receives in background Enables some computing concurrently with communication Avoids many common dead-lock situations

59 Nonblocking communication Have to finalize send/receive operations MPI_Wait, MPI_Waitall, Waits for the communication started with MPI_Isend or MPI_Irecv to finish (blocking) MPI_Test, Tests if the communication has finished (non-blocking) You can mix non-blocking and blocking routines! e.g., receive MPI_Isend with MPI_Recv

60 Typical usage pattern MPI_Irecv(ghost_data) MPI_Isend(border_data) Compute(ghost_independent_data) MPI_Waitall(receives) Compute(border_data) MPI_Waitall(sends) P P P

61 Non-blocking send MPI_Isend(buf, count, datatype, dest, tag, comm, request) Parameters Similar to MPI_Send but has an request parameter buf send buffer shall not be written to until one has checked that the operation is over request a handle that is used when checking if the operation has finished

62 Order of sends Sends done in the specified order even for non-blocking routines Beware of badly ordered sends!

63 Non-blocking receive MPI_Irecv(buf, count, datatype, source, tag, comm, request) parameters similar to MPI_Recv but has no status parameter buf receive buffer guaranteed to contain the data only after one has checked that the operation is over request a handle that is used when checking if the operation has finished

64 Wait for non-blocking operation MPI_Wait(request, status) Parameters request status handle of the non-blocking communication status of the completed communication, see MPI_Recv A call to MPI_WAIT returns when the operation identified by request is complete

65 Wait for non-blocking operations MPI_Waitall(count, requests, status) Parameters count requests status number of requests array of requests array of statuses for the operations that are waited for A call to MPI_Waitall returns when all operations identified by the array of requests are complete

66 Cs2: Non-blocking Isend & Irecv Better load balance Overlapping of communication & computation P P P Parallel Timeline P 0 P 1 ir ir is is Comp-in wait(r) border ir ir is is Comp-in wait(r) border P 2 ir ir is is Comp-in wait(r) border

67 Additional completion operations other useful routines: MPI_Waitany MPI_Waitsome MPI_Test MPI_Testall MPI_Testany MPI_Testsome MPI_Probe

68 Wait for non-blocking operations MPI_Waitany(count, requests, index, status) Parameters count requests index status number of requests array of requests index of request that completed status for the completed operations A call to MPI_Waitany returns when one operation identified by the array of requests is complete

69 Wait for non-blocking operations MPI_Waitsome(count, requests, done, index, status) Parameters count requests done index status number of requests array of requests number of completed requests array of indexes of completed requests array of statuses of completed requests A call to MPI_Waitsome returns when one or more operation identified by the array of requests is complete

70 Non-blocking test for non-blocking operations MPI_Test(request, flag, status) Parameters request flag status request True if operation has completed status for the completed operations A call to MPI_Test is non-blocking Allows one to schedule alternative activities while periodically checking for completion

71 Summary Non-blocking communication is usually the smarter way to do point-to-point communication in MPI Non-blocking communication realization MPI_Isend MPI_Irecv MPI_Wait(all)

72 Collective operations

73 Outline Introduction to collective communication One-to-many collective operations Many-to-one collective operations Many-to-many collective operations Non-blocking collective operations

74 Introduction Collective communication transmits data among all processes in a process group These routines must be called by all the processes in the group Collective communication includes data movement collective computation synchronization Example MPI_Barrier makes each task hold until all tasks have called it int MPI_Barrier(comm) MPI_BARRIER(comm, rc)

75 Introduction Collective communication outperforms normally pointto-point communication Code becomes more compact and easier to read: if (my_id == 0) then do i = 1, ntasks-1 call mpi_send(a, , & MPI_REAL, i, tag, & MPI_COMM_WORLD, rc) end do else call mpi_recv(a, , & MPI_REAL, 0, tag, & MPI_COMM_WORLD, status, rc) end if call mpi_bcast(a, , & MPI_REAL, 0, & MPI_COMM_WORLD, rc) Communicating a vector a consisting of 1M float elements from the task 0 to all other tasks

76 Introduction Amount of sent and received data must match Non-blocking routines are available in the MPI 3 standard Older libraries do not support this feature No tag arguments Order of execution must coincide across processes

77 Broadcasting Send the same data from one process to all the other P 0 P 0 A P 1 P 2 A BCAST P 1 P 2 A A P 3 P 3 A This buffer may contain any contiguous chunk of memory (any datatype, any number of elements)

78 Broadcasting With MPI_Bcast, the task root sends a buffer of data to all other tasks MPI_Bcast(buffer, count, datatype, root, comm) buffer count datatype root comm data to be distributed number of entries in buffer data type of buffer rank of broadcast root communicator

79 Scattering Send equal amount of data from one process to others P 0 A B C D P 0 A P 1 P 2 SCATTER P 1 P 2 B C P 3 P 3 D Segments A, B, may contain multiple elements

80 Scattering MPI_Scatter: Task root sends an equal share of data (sendbuf) to all other processes MPI_Scatter(sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, root, comm) sendbuf sendcount sendtype recvbuf recvcount recvtype root comm send buffer (data to be scattered) number of elements sent to each process data type of send buffer elements receive buffer number of elements in receive buffer data type of receive buffer elements rank of sending process communicator

81 if (my_id==0) then do i = 1, 16 a(i) = i end do end if call mpi_bcast(a,16,mpi_integer,0, & MPI_COMM_WORLD,rc) if (my_id==3) print *, a(:) One-to-all example if (my_id==0) then do i = 1, 16 a(i) = i end do end if call mpi_scatter(a,4,mpi_integer, & aloc,4,mpi_integer, & 0,MPI_COMM_WORLD,rc) if (my_id==3) print *, aloc(:) Assume 4 MPI tasks. What would the (full) program print? A B C A B C

82 Varying-sized scatter Like MPI_Scatter, but messages can have different sizes and displacements MPI_Scatterv(sendbuf, sendcounts, displs, sendtype, recvbuf, recvcount, recvtype, root, comm) sendbuf send buffer sendcounts array (of length ntasks) specifying the number of elements to send to each processor displs array (of length ntasks). Entry i specifies the displacement (relative to sendbuf) sendtype data type of send buffer elements recvbuf receive buffer recvcount recvtype root comm number of elements in receive buffer data type of receive buffer elements rank of sending process communicator

83 Scatterv example if (my_id==0) then do i = 1, 10 a(i) = i end do sendcnts = (/ 1, 2, 3, 4 /) displs = (/ 0, 1, 3, 6 /) end if call mpi_scatterv(a, sendcnts, & displs, MPI_INTEGER,& aloc, 4, MPI_INTEGER, & 0, MPI_COMM_WORLD, rc) A B C Assume 4 MPI tasks. What are the values in aloc in the last task (#3)?

84 Gathering Collect data from all the process to one process P 0 A P 0 A B C D P 1 P 2 B C GATHER P 1 P 2 P 3 D P 3 Segments A, B, may contain multiple elements

85 Gathering MPI_Gather: Collect equal share of data (in sendbuf) from all processes to root MPI_Gather(sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, root, comm) sendbuf sendcount sendtype recvbuf recvcount recvtype root comm send buffer (data to be gathered) number of elements pulled from each process data type of send buffer elements receive buffer number of elements in any single receive data type of receive buffer elements rank of receiving process communicator

86 Reduce operation Applies an operation over set of processes and places result in single process P 0 A 0 B 0 C 0 D 0 P 0 Σ A i Σ B i Σ C i Σ D i P 1 A 1 B 1 C 1 D 1 REDUCE P 1 P 2 A 2 B 2 C 2 D 2 (SUM) P 2 P 3 A 3 B 3 C 3 D 3 P 3

87 Reduce operation Applies a reduction operation op to sendbuf over the set of tasks and places the result in recvbuf on root MPI_Reduce(sendbuf, recvbuf, count, datatype, op, root, comm) sendbuf recvbuf count datatype op root comm send buffer receive buffer number of elements in send buffer data type of elements of send buffer operation rank of root process communicator

88 Global reduce operation MPI_Allreduce combines values from all processes and distributes the result back to all processes Compare: MPI_Reduce + MPI_Bcast MPI_Allreduce(sendbuf, recvbuf, count, datatype, op, comm) sendbuf recvbuf count datatype op comm starting address of send buffer starting address of receive buffer number of elements in send buffer P 0 data type of elements in P 1 send buffer operation P 2 communicator P 3 A 0 B 0 C 0 D 0 A 1 B 1 C 1 D 1 A 2 B 2 C 2 D 2 A 3 B 3 C 3 D 3 REDUCE (SUM) P 0 P 1 P 2 P 3 Σ A i Σ B i Σ C i Σ D i Σ A i Σ B i Σ C i Σ D i Σ A i Σ B i Σ C i Σ D i Σ A i Σ B i Σ C i Σ D i

89 Allreduce example: parallel dot product > mpirun -np 8 a.out id= 6 local= global= id= 7 local= global= id= 1 local= global= id= 3 local= global= id= 5 local= global= real :: a(1024), aloc(128) id= 0 local= global= id= 2 local= global= if (my_id==0) then id= 4 local= global= call random_number(a) end if call mpi_scatter(a, 128, MPI_INTEGER, & aloc, 128, MPI_INTEGER, & 0, MPI_COMM_WORLD, rc) rloc = dot_product(aloc,aloc) call mpi_allreduce(rloc, r, 1, MPI_REAL,& MPI_SUM, MPI_COMM_WORLD, rc)

90 All-to-one plus one-to-all MPI_Allgather gathers data from each task and distributes the resulting data to each task Compare: MPI_Gather + MPI_Bcast MPI_Allgather(sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, comm) sendbuf sendcount sendtype recvbuf recvcount recvtype send buffer number of elements in send buffer data type of send buffer elements receive buffer number of elements received from any process data type of receive buffer P 0 P 1 P 2 P 3 A B C D ALLGATHER P 0 P 1 P 2 P 3 A B C D A B C D A B C D A B C D

91 From each to every Send a distinct message from each task to every task P 0 A 0 B 0 C 0 D 0 P 0 A 0 A 1 A 2 A 3 P 1 A 1 B 1 C 1 D 1 ALL2ALL P 1 B 0 B 1 B 2 B 3 P 2 A 2 B 2 C 2 D 2 P 2 C 0 C 1 C 2 C 3 P 3 A 3 B 3 C 3 D 3 P 3 D 0 D 1 D 2 D 3 Transpose like operation

92 From each to every MPI_Alltoall sends a distinct message from each task to every task Compare: All scatter MPI_Alltoall(sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, comm) sendbuf sendcount sendtype recvbuf recvcount recvtype comm send buffer number of elements to send to each process data type of send buffer elements receive buffer number of elements received from any process data type of receive buffer elements communicator

93 All-to-all example if (my_id==0) then do i = 1, 16 a(i) = i end do end if call mpi_bcast(a, 16, MPI_INTEGER, 0, & MPI_COMM_WORLD, rc) call mpi_alltoall(a, 4, MPI_INTEGER, & aloc, 4, MPI_INTEGER, & MPI_COMM_WORLD, rc) Assume 4 MPI tasks. What will be the values of aloc in the process #0? A. 1, 2, 3, 4 B. 1,...,16 C. 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4

94 Non-blocking collective operations MPI 3 standard added support for non-blocking collective operations Naming similar to the p2p routines (e.g. MPI_Ibcast) MPI_Cancel and MPI_Request_free are not supported Support all synchronization calls (MPI_Wait, etc.) Can not be mixed with blocking collectives All processes of a communicator have to use the same routines in same order

95 Common mistakes with collectives Using a collective operation within one branch of an iftest of the rank IF (my_id == 0) CALL MPI_BCAST(... All processes, both the root (the sender or the gatherer) and the rest (receivers or senders), must call the collective routine! Assuming that all processes making a collective call would complete at the same time Using the input buffer as the output buffer CALL MPI_ALLREDUCE(a, a, n, MPI_REAL, MPI_SUM,...

96 Summary Collective communications involve all the processes within a communicator All processes must call them Collective operations make code more transparent and compact Collective routines allow optimizations by MPI library Performance consideration: Alltoall is expensive operation, avoid it when possible

97 User-defined communicators

98 Communicators The communicator determines the "communication universe" The source and destination of a message is identified by process rank within the communicator So far: MPI_COMM_WORLD Processes can be divided into subcommunicators Task level parallelism with process groups performing separate tasks Parallel I/O

99 Communicators Communicators are dynamic A task can belong simultaneously to several communicators In each of them it has a unique ID, however Communication is normally within the communicator

100 Grouping processes in communicators MPI_COMM_WORLD Comm Comm Comm 2

101 Creating a communicator MPI_Comm_split creates new communicators based on 'colors' and 'keys' MPI_Comm_split(comm, color, key, newcomm) comm color key newcomm communicator handle control of subset assignment, processes with the same color belong to the same new communicator control of rank assignment new communicator handle If color = MPI_UNDEFINED, a process does not belong to any of the new communicators

102 Creating a communicator if (myid%2 == 0) { color = 1; } else { color = 2; } MPI_Comm_split(MPI_COMM_WORLD, color, myid, &subcomm); MPI_Comm_rank(subcomm, &mysubid); printf ("I am rank %d in MPI_COMM_WORLD, but %d in Comm %d.\n",myid, mysubid, color); I am rank 2 in MPI_COMM_WORLD, but 1 in Comm 1. I am rank 7 in MPI_COMM_WORLD, but 3 in Comm 2. I am rank 0 in MPI_COMM_WORLD, but 0 in Comm 1. I am rank 4 in MPI_COMM_WORLD, but 2 in Comm 1. I am rank 6 in MPI_COMM_WORLD, but 3 in Comm 1. I am rank 3 in MPI_COMM_WORLD, but 1 in Comm 2. I am rank 5 in MPI_COMM_WORLD, but 2 in Comm 2. I am rank 1 in MPI_COMM_WORLD, but 0 in Comm 2.

103 Communicator manipulation MPI_Comm_size MPI_Comm_rank MPI_Comm_compare MPI_Comm_dup MPI_Comm_free Returns number of processes in communicator's group Returns rank of calling process in communicator's group Compares two communicators Duplicates a communicator Marks a communicator for deallocation

104 PROCESS TOPOLOGIES

105 Process topologies MPI process topologies allow for simple referencing scheme of processes Cartesian and graph topologies are supported Process topology defines a new communicator MPI topologies are virtual No relation to the physical structure of the computer Data mapping "more natural" only to the programmer Usually no performance benefits But code becomes more compact and readable

106 Creating a communication topology New communicator with processes ordered in a Cartesian grid MPI_Cart_create(oldcomm, ndims, dims, oldcomm ndims dims periods reorder newcomm periods, reorder, newcomm) communicator dimension of the Cartesian topology integer array (size ndims) that defines the number of processes in each dimension array that defines the periodicity of each dimension is MPI allowed to renumber the ranks new Cartesian communicator

107 Ranks and coordinates Translate a rank to coordinates MPI_Cart_coords(comm, rank, maxdim, coords) comm Cartesian communicator rank rank to convert maxdim dimension of coords coords coordinates in Cartesian topology that corresponds to rank

108 Ranks and coordinates Translate a set of coordinates to a rank MPI_Cart_rank(comm, coords, rank) comm Cartesian communicator coords array of coordinates rank a rank corresponding to coords

109 Creating a communication topology dims(1)=4 dims(2)=4 period=(/.true.,.true. /) call mpi_cart_create(mpi_comm_world, 2, & dims, period,.true., comm2d, rc) call mpi_comm_rank(comm2d, my_id, rc) call mpi_cart_coords(comm2d, my_id, 2, & coords, rc) 0 (0,0) 4 (1,0) 8 (2,0) 1 (0,1) 5 (1,1) 9 (2,1) 2 (0,2) 6 (1,2) 10 (2,2) 3 (0,3) 7 (1,3) 11 (2,3) 12 (3,0) 13 (3,1) 14 (3,2) 15 (3,3)

110 Communication in a topology Counting sources/destinations on the grid MPI_Cart_shift(comm, direction, displ, source, dest) comm Cartesian communicator direction shift direction (e.g. 0 or 1 in 2D) displ shift displacement (1 for next cell etc, < 0 for source from "down"/"right" directions) source dest rank of source process rank of destination process Note that both source and dest are output parameters. The coordinates of the calling task is implicit input. With a non-periodic grid, source or dest can land outside of the grid; then MPI_PROC_NULL is returned.

111 Halo exchange dims(1)=4 dims(2)=4 period =(/.true.,.true. /) call mpi_cart_create(mpi_comm_world, 2,& dims, period,.true., comm2d, rc) call mpi_cart_shift(comm2d,0,1,nbr_up,nbr_down,rc) call mpi_cart_shift(comm2d,1,1,nbr_left,nbr_right,rc)... 0 (0,0) 4 (1,0) 8 (2,0) 12 (3,0) 1 (0,1) 5 (1,1) 9 (2,1) 13 (3,1) call mpi_sendrecv(hor_send, msglen, mpi_double_precision, nbr_left,& tag_left, hor_recv, msglen, mpi_double_precision, nbr_right,& tag_left, comm2d, mpi_status_ignore, rc)... call mpi_sendrecv(vert_send, msglen, mpi_double_precision, nbr_up,& tag_up, vert_recv, msglen, mpi_double_precision, nbr_down,& tag_up, comm2d, mpi_status_ignore, rc)... 2 (0,2) 6 (1,2) 10 (2,2) 14 (3,2) 3 (0,3) 7 (1,3) 11 (2,3) 15 (3,3)

112 Neighborhood collectives on process topologies MPI 3.0 has routines for exchanging data with the nearest neighbors With Cartesian topologies, only nearest neighbor communication (corresponding to MPI_Cart_shift with displ=1) is supported These routines simplify the neighbour data exchange especially when using graph topologies

113 MPI summary Communication User-defined communicators Topologies One-to-all collectives Point-to-point communication Collective communication All-to-one collectives Sendrecv Send & Recv All-to-all collectives

114 Web resources List of MPI functions with detailed descriptions Good online MPI tutorial: MPI standard MPI Implementations MPICH OpenMPI

Programming with MPI Collectives

Programming with MPI Collectives Programming with MPI Collectives Jan Thorbecke Type to enter text Delft University of Technology Challenge the future Collectives Classes Communication types exercise: BroadcastBarrier Gather Scatter exercise:

More information

MPI point-to-point communication

MPI point-to-point communication MPI point-to-point communication Slides Sebastian von Alfthan CSC Tieteen tietotekniikan keskus Oy CSC IT Center for Science Ltd. Introduction MPI processes are independent, they communicate to coordinate

More information

MPI Message Passing Interface. Source:

MPI Message Passing Interface. Source: MPI Message Passing Interface Source: http://www.netlib.org/utk/papers/mpi-book/mpi-book.html Message Passing Principles Explicit communication and synchronization Programming complexity is high But widely

More information

Standard MPI - Message Passing Interface

Standard MPI - Message Passing Interface c Ewa Szynkiewicz, 2007 1 Standard MPI - Message Passing Interface The message-passing paradigm is one of the oldest and most widely used approaches for programming parallel machines, especially those

More information

High Performance Computing

High Performance Computing High Performance Computing Course Notes 2009-2010 2010 Message Passing Programming II 1 Communications Point-to-point communications: involving exact two processes, one sender and one receiver For example,

More information

Parallel Programming

Parallel Programming Parallel Programming Point-to-point communication Prof. Paolo Bientinesi pauldj@aices.rwth-aachen.de WS 18/19 Scenario Process P i owns matrix A i, with i = 0,..., p 1. Objective { Even(i) : compute Ti

More information

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI CS 470 Spring 2017 Mike Lam, Professor Distributed Programming & MPI MPI paradigm Single program, multiple data (SPMD) One program, multiple processes (ranks) Processes communicate via messages An MPI

More information

Outline. Communication modes MPI Message Passing Interface Standard

Outline. Communication modes MPI Message Passing Interface Standard MPI THOAI NAM Outline Communication modes MPI Message Passing Interface Standard TERMs (1) Blocking If return from the procedure indicates the user is allowed to reuse resources specified in the call Non-blocking

More information

Basic MPI Communications. Basic MPI Communications (cont d)

Basic MPI Communications. Basic MPI Communications (cont d) Basic MPI Communications MPI provides two non-blocking routines: MPI_Isend(buf,cnt,type,dst,tag,comm,reqHandle) buf: source of data to be sent cnt: number of data elements to be sent type: type of each

More information

High-Performance Computing: MPI (ctd)

High-Performance Computing: MPI (ctd) High-Performance Computing: MPI (ctd) Adrian F. Clark: alien@essex.ac.uk 2015 16 Adrian F. Clark: alien@essex.ac.uk High-Performance Computing: MPI (ctd) 2015 16 1 / 22 A reminder Last time, we started

More information

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI CS 470 Spring 2018 Mike Lam, Professor Distributed Programming & MPI MPI paradigm Single program, multiple data (SPMD) One program, multiple processes (ranks) Processes communicate via messages An MPI

More information

Slides prepared by : Farzana Rahman 1

Slides prepared by : Farzana Rahman 1 Introduction to MPI 1 Background on MPI MPI - Message Passing Interface Library standard defined by a committee of vendors, implementers, and parallel programmers Used to create parallel programs based

More information

Introduction to parallel programming

Introduction to parallel programming Sami Ilvonen Martti Louhivuori Introduction to parallel programming September 24-26, 2012 PRACE Advanced Training Centre CSC IT Center for Science Ltd, Finland!!*** subroutine mpi_utils_step_parallel_edge

More information

Parallel programming MPI

Parallel programming MPI Parallel programming MPI Distributed memory Each unit has its own memory space If a unit needs data in some other memory space, explicit communication (often through network) is required Point-to-point

More information

HPC Parallel Programing Multi-node Computation with MPI - I

HPC Parallel Programing Multi-node Computation with MPI - I HPC Parallel Programing Multi-node Computation with MPI - I Parallelization and Optimization Group TATA Consultancy Services, Sahyadri Park Pune, India TCS all rights reserved April 29, 2013 Copyright

More information

Recap of Parallelism & MPI

Recap of Parallelism & MPI Recap of Parallelism & MPI Chris Brady Heather Ratcliffe The Angry Penguin, used under creative commons licence from Swantje Hess and Jannis Pohlmann. Warwick RSE 13/12/2017 Parallel programming Break

More information

Outline. Communication modes MPI Message Passing Interface Standard. Khoa Coâng Ngheä Thoâng Tin Ñaïi Hoïc Baùch Khoa Tp.HCM

Outline. Communication modes MPI Message Passing Interface Standard. Khoa Coâng Ngheä Thoâng Tin Ñaïi Hoïc Baùch Khoa Tp.HCM THOAI NAM Outline Communication modes MPI Message Passing Interface Standard TERMs (1) Blocking If return from the procedure indicates the user is allowed to reuse resources specified in the call Non-blocking

More information

CSE 613: Parallel Programming. Lecture 21 ( The Message Passing Interface )

CSE 613: Parallel Programming. Lecture 21 ( The Message Passing Interface ) CSE 613: Parallel Programming Lecture 21 ( The Message Passing Interface ) Jesmin Jahan Tithi Department of Computer Science SUNY Stony Brook Fall 2013 ( Slides from Rezaul A. Chowdhury ) Principles of

More information

CINES MPI. Johanne Charpentier & Gabriel Hautreux

CINES MPI. Johanne Charpentier & Gabriel Hautreux Training @ CINES MPI Johanne Charpentier & Gabriel Hautreux charpentier@cines.fr hautreux@cines.fr Clusters Architecture OpenMP MPI Hybrid MPI+OpenMP MPI Message Passing Interface 1. Introduction 2. MPI

More information

Data parallelism. [ any app performing the *same* operation across a data stream ]

Data parallelism. [ any app performing the *same* operation across a data stream ] Data parallelism [ any app performing the *same* operation across a data stream ] Contrast stretching: Version Cores Time (secs) Speedup while (step < NumSteps &&!converged) { step++; diffs = 0; foreach

More information

MPI. (message passing, MIMD)

MPI. (message passing, MIMD) MPI (message passing, MIMD) What is MPI? a message-passing library specification extension of C/C++ (and Fortran) message passing for distributed memory parallel programming Features of MPI Point-to-point

More information

Message Passing with MPI

Message Passing with MPI Message Passing with MPI PPCES 2016 Hristo Iliev IT Center / JARA-HPC IT Center der RWTH Aachen University Agenda Motivation Part 1 Concepts Point-to-point communication Non-blocking operations Part 2

More information

Lecture 9: MPI continued

Lecture 9: MPI continued Lecture 9: MPI continued David Bindel 27 Sep 2011 Logistics Matrix multiply is done! Still have to run. Small HW 2 will be up before lecture on Thursday, due next Tuesday. Project 2 will be posted next

More information

Topics. Lecture 7. Review. Other MPI collective functions. Collective Communication (cont d) MPI Programming (III)

Topics. Lecture 7. Review. Other MPI collective functions. Collective Communication (cont d) MPI Programming (III) Topics Lecture 7 MPI Programming (III) Collective communication (cont d) Point-to-point communication Basic point-to-point communication Non-blocking point-to-point communication Four modes of blocking

More information

CSE. Parallel Algorithms on a cluster of PCs. Ian Bush. Daresbury Laboratory (With thanks to Lorna Smith and Mark Bull at EPCC)

CSE. Parallel Algorithms on a cluster of PCs. Ian Bush. Daresbury Laboratory (With thanks to Lorna Smith and Mark Bull at EPCC) Parallel Algorithms on a cluster of PCs Ian Bush Daresbury Laboratory I.J.Bush@dl.ac.uk (With thanks to Lorna Smith and Mark Bull at EPCC) Overview This lecture will cover General Message passing concepts

More information

Introduction to the Message Passing Interface (MPI)

Introduction to the Message Passing Interface (MPI) Introduction to the Message Passing Interface (MPI) CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Introduction to the Message Passing Interface (MPI) Spring 2018

More information

Collective Communication in MPI and Advanced Features

Collective Communication in MPI and Advanced Features Collective Communication in MPI and Advanced Features Pacheco s book. Chapter 3 T. Yang, CS240A. Part of slides from the text book, CS267 K. Yelick from UC Berkeley and B. Gropp, ANL Outline Collective

More information

Introduction to parallel computing concepts and technics

Introduction to parallel computing concepts and technics Introduction to parallel computing concepts and technics Paschalis Korosoglou (support@grid.auth.gr) User and Application Support Unit Scientific Computing Center @ AUTH Overview of Parallel computing

More information

Introduction to MPI. May 20, Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign

Introduction to MPI. May 20, Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign Introduction to MPI May 20, 2013 Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign Top500.org PERFORMANCE DEVELOPMENT 1 Eflop/s 162 Pflop/s PROJECTED 100 Pflop/s

More information

Introduction to MPI Part II Collective Communications and communicators

Introduction to MPI Part II Collective Communications and communicators Introduction to MPI Part II Collective Communications and communicators Andrew Emerson, Fabio Affinito {a.emerson,f.affinito}@cineca.it SuperComputing Applications and Innovation Department Collective

More information

Distributed Memory Programming with MPI

Distributed Memory Programming with MPI Distributed Memory Programming with MPI Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna moreno.marzolla@unibo.it Algoritmi Avanzati--modulo 2 2 Credits Peter Pacheco,

More information

MPI Collective communication

MPI Collective communication MPI Collective communication CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) MPI Collective communication Spring 2018 1 / 43 Outline 1 MPI Collective communication

More information

MPI MESSAGE PASSING INTERFACE

MPI MESSAGE PASSING INTERFACE MPI MESSAGE PASSING INTERFACE David COLIGNON CÉCI - Consortium des Équipements de Calcul Intensif http://hpc.montefiore.ulg.ac.be Outline Introduction From serial source code to parallel execution MPI

More information

Introduction to MPI: Part II

Introduction to MPI: Part II Introduction to MPI: Part II Pawel Pomorski, University of Waterloo, SHARCNET ppomorsk@sharcnetca November 25, 2015 Summary of Part I: To write working MPI (Message Passing Interface) parallel programs

More information

Parallel Short Course. Distributed memory machines

Parallel Short Course. Distributed memory machines Parallel Short Course Message Passing Interface (MPI ) I Introduction and Point-to-point operations Spring 2007 Distributed memory machines local disks Memory Network card 1 Compute node message passing

More information

Acknowledgments. Programming with MPI Basic send and receive. A Minimal MPI Program (C) Contents. Type to enter text

Acknowledgments. Programming with MPI Basic send and receive. A Minimal MPI Program (C) Contents. Type to enter text Acknowledgments Programming with MPI Basic send and receive Jan Thorbecke Type to enter text This course is partly based on the MPI course developed by Rolf Rabenseifner at the High-Performance Computing-Center

More information

Week 3: MPI. Day 02 :: Message passing, point-to-point and collective communications

Week 3: MPI. Day 02 :: Message passing, point-to-point and collective communications Week 3: MPI Day 02 :: Message passing, point-to-point and collective communications Message passing What is MPI? A message-passing interface standard MPI-1.0: 1993 MPI-1.1: 1995 MPI-2.0: 1997 (backward-compatible

More information

Message Passing Interface

Message Passing Interface MPSoC Architectures MPI Alberto Bosio, Associate Professor UM Microelectronic Departement bosio@lirmm.fr Message Passing Interface API for distributed-memory programming parallel code that runs across

More information

Programming with MPI Basic send and receive

Programming with MPI Basic send and receive Programming with MPI Basic send and receive Jan Thorbecke Type to enter text Delft University of Technology Challenge the future Acknowledgments This course is partly based on the MPI course developed

More information

Non-Blocking Communications

Non-Blocking Communications Non-Blocking Communications Deadlock 1 5 2 3 4 Communicator 0 2 Completion The mode of a communication determines when its constituent operations complete. - i.e. synchronous / asynchronous The form of

More information

MPI 5. CSCI 4850/5850 High-Performance Computing Spring 2018

MPI 5. CSCI 4850/5850 High-Performance Computing Spring 2018 MPI 5 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives

More information

AMath 483/583 Lecture 21

AMath 483/583 Lecture 21 AMath 483/583 Lecture 21 Outline: Review MPI, reduce and bcast MPI send and receive Master Worker paradigm References: $UWHPSC/codes/mpi class notes: MPI section class notes: MPI section of bibliography

More information

DEADLOCK DETECTION IN MPI PROGRAMS

DEADLOCK DETECTION IN MPI PROGRAMS 1 DEADLOCK DETECTION IN MPI PROGRAMS Glenn Luecke, Yan Zou, James Coyle, Jim Hoekstra, Marina Kraeva grl@iastate.edu, yanzou@iastate.edu, jjc@iastate.edu, hoekstra@iastate.edu, kraeva@iastate.edu High

More information

Practical Course Scientific Computing and Visualization

Practical Course Scientific Computing and Visualization July 5, 2006 Page 1 of 21 1. Parallelization Architecture our target architecture: MIMD distributed address space machines program1 data1 program2 data2 program program3 data data3.. program(data) program1(data1)

More information

CEE 618 Scientific Parallel Computing (Lecture 5): Message-Passing Interface (MPI) advanced

CEE 618 Scientific Parallel Computing (Lecture 5): Message-Passing Interface (MPI) advanced 1 / 32 CEE 618 Scientific Parallel Computing (Lecture 5): Message-Passing Interface (MPI) advanced Albert S. Kim Department of Civil and Environmental Engineering University of Hawai i at Manoa 2540 Dole

More information

CS 6230: High-Performance Computing and Parallelization Introduction to MPI

CS 6230: High-Performance Computing and Parallelization Introduction to MPI CS 6230: High-Performance Computing and Parallelization Introduction to MPI Dr. Mike Kirby School of Computing and Scientific Computing and Imaging Institute University of Utah Salt Lake City, UT, USA

More information

MPI. What to Learn This Week? MPI Program Structure. What is MPI? This week, we will learn the basics of MPI programming.

MPI. What to Learn This Week? MPI Program Structure. What is MPI? This week, we will learn the basics of MPI programming. What to Learn This Week? This week, we will learn the basics of MPI programming. MPI This will give you a taste of MPI, but it is far from comprehensive discussion. Again, the focus will be on MPI communications.

More information

The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing

The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Parallelism Decompose the execution into several tasks according to the work to be done: Function/Task

More information

Practical Scientific Computing: Performanceoptimized

Practical Scientific Computing: Performanceoptimized Practical Scientific Computing: Performanceoptimized Programming Programming with MPI November 29, 2006 Dr. Ralf-Peter Mundani Department of Computer Science Chair V Technische Universität München, Germany

More information

An Introduction to MPI

An Introduction to MPI An Introduction to MPI Parallel Programming with the Message Passing Interface William Gropp Ewing Lusk Argonne National Laboratory 1 Outline Background The message-passing model Origins of MPI and current

More information

A Message Passing Standard for MPP and Workstations

A Message Passing Standard for MPP and Workstations A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker Message Passing Interface (MPI) Message passing library Can be

More information

Non-Blocking Communications

Non-Blocking Communications Non-Blocking Communications Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

IPM Workshop on High Performance Computing (HPC08) IPM School of Physics Workshop on High Perfomance Computing/HPC08

IPM Workshop on High Performance Computing (HPC08) IPM School of Physics Workshop on High Perfomance Computing/HPC08 IPM School of Physics Workshop on High Perfomance Computing/HPC08 16-21 February 2008 MPI tutorial Luca Heltai Stefano Cozzini Democritos/INFM + SISSA 1 When

More information

Practical Introduction to Message-Passing Interface (MPI)

Practical Introduction to Message-Passing Interface (MPI) 1 Practical Introduction to Message-Passing Interface (MPI) October 1st, 2015 By: Pier-Luc St-Onge Partners and Sponsors 2 Setup for the workshop 1. Get a user ID and password paper (provided in class):

More information

Message Passing Interface

Message Passing Interface Message Passing Interface DPHPC15 TA: Salvatore Di Girolamo DSM (Distributed Shared Memory) Message Passing MPI (Message Passing Interface) A message passing specification implemented

More information

Review of MPI Part 2

Review of MPI Part 2 Review of MPI Part Russian-German School on High Performance Computer Systems, June, 7 th until July, 6 th 005, Novosibirsk 3. Day, 9 th of June, 005 HLRS, University of Stuttgart Slide Chap. 5 Virtual

More information

Message Passing Interface. most of the slides taken from Hanjun Kim

Message Passing Interface. most of the slides taken from Hanjun Kim Message Passing Interface most of the slides taken from Hanjun Kim Message Passing Pros Scalable, Flexible Cons Someone says it s more difficult than DSM MPI (Message Passing Interface) A standard message

More information

Intermediate MPI features

Intermediate MPI features Intermediate MPI features Advanced message passing Collective communication Topologies Group communication Forms of message passing (1) Communication modes: Standard: system decides whether message is

More information

MPI Tutorial. Shao-Ching Huang. IDRE High Performance Computing Workshop

MPI Tutorial. Shao-Ching Huang. IDRE High Performance Computing Workshop MPI Tutorial Shao-Ching Huang IDRE High Performance Computing Workshop 2013-02-13 Distributed Memory Each CPU has its own (local) memory This needs to be fast for parallel scalability (e.g. Infiniband,

More information

Lecture 7: More about MPI programming. Lecture 7: More about MPI programming p. 1

Lecture 7: More about MPI programming. Lecture 7: More about MPI programming p. 1 Lecture 7: More about MPI programming Lecture 7: More about MPI programming p. 1 Some recaps (1) One way of categorizing parallel computers is by looking at the memory configuration: In shared-memory systems

More information

COSC 6374 Parallel Computation

COSC 6374 Parallel Computation COSC 6374 Parallel Computation Message Passing Interface (MPI ) II Advanced point-to-point operations Spring 2008 Overview Point-to-point taxonomy and available functions What is the status of a message?

More information

MPI Application Development with MARMOT

MPI Application Development with MARMOT MPI Application Development with MARMOT Bettina Krammer University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de Matthias Müller University of Dresden Centre for Information

More information

More about MPI programming. More about MPI programming p. 1

More about MPI programming. More about MPI programming p. 1 More about MPI programming More about MPI programming p. 1 Some recaps (1) One way of categorizing parallel computers is by looking at the memory configuration: In shared-memory systems, the CPUs share

More information

MA471. Lecture 5. Collective MPI Communication

MA471. Lecture 5. Collective MPI Communication MA471 Lecture 5 Collective MPI Communication Today: When all the processes want to send, receive or both Excellent website for MPI command syntax available at: http://www-unix.mcs.anl.gov/mpi/www/ 9/10/2003

More information

Distributed Systems + Middleware Advanced Message Passing with MPI

Distributed Systems + Middleware Advanced Message Passing with MPI Distributed Systems + Middleware Advanced Message Passing with MPI Gianpaolo Cugola Dipartimento di Elettronica e Informazione Politecnico, Italy cugola@elet.polimi.it http://home.dei.polimi.it/cugola

More information

Introduction to MPI part II. Fabio AFFINITO

Introduction to MPI part II. Fabio AFFINITO Introduction to MPI part II Fabio AFFINITO (f.affinito@cineca.it) Collective communications Communications involving a group of processes. They are called by all the ranks involved in a communicator (or

More information

Department of Informatics V. HPC-Lab. Session 4: MPI, CG M. Bader, A. Breuer. Alex Breuer

Department of Informatics V. HPC-Lab. Session 4: MPI, CG M. Bader, A. Breuer. Alex Breuer HPC-Lab Session 4: MPI, CG M. Bader, A. Breuer Meetings Date Schedule 10/13/14 Kickoff 10/20/14 Q&A 10/27/14 Presentation 1 11/03/14 H. Bast, Intel 11/10/14 Presentation 2 12/01/14 Presentation 3 12/08/14

More information

COMP 322: Fundamentals of Parallel Programming

COMP 322: Fundamentals of Parallel Programming COMP 322: Fundamentals of Parallel Programming https://wiki.rice.edu/confluence/display/parprog/comp322 Lecture 37: Introduction to MPI (contd) Vivek Sarkar Department of Computer Science Rice University

More information

MPI Tutorial Part 1 Design of Parallel and High-Performance Computing Recitation Session

MPI Tutorial Part 1 Design of Parallel and High-Performance Computing Recitation Session S. DI GIROLAMO [DIGIROLS@INF.ETHZ.CH] MPI Tutorial Part 1 Design of Parallel and High-Performance Computing Recitation Session Slides credits: Pavan Balaji, Torsten Hoefler https://htor.inf.ethz.ch/teaching/mpi_tutorials/ppopp13/2013-02-24-ppopp-mpi-basic.pdf

More information

MPI MESSAGE PASSING INTERFACE

MPI MESSAGE PASSING INTERFACE MPI MESSAGE PASSING INTERFACE David COLIGNON, ULiège CÉCI - Consortium des Équipements de Calcul Intensif http://www.ceci-hpc.be Outline Introduction From serial source code to parallel execution MPI functions

More information

Claudio Chiaruttini Dipartimento di Matematica e Informatica Centro Interdipartimentale per le Scienze Computazionali (CISC) Università di Trieste

Claudio Chiaruttini Dipartimento di Matematica e Informatica Centro Interdipartimentale per le Scienze Computazionali (CISC) Università di Trieste Claudio Chiaruttini Dipartimento di Matematica e Informatica Centro Interdipartimentale per le Scienze Computazionali (CISC) Università di Trieste http://www.dmi.units.it/~chiarutt/didattica/parallela

More information

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI CS 470 Spring 2019 Mike Lam, Professor Distributed Programming & MPI MPI paradigm Single program, multiple data (SPMD) One program, multiple processes (ranks) Processes communicate via messages An MPI

More information

Parallel Computing and the MPI environment

Parallel Computing and the MPI environment Parallel Computing and the MPI environment Claudio Chiaruttini Dipartimento di Matematica e Informatica Centro Interdipartimentale per le Scienze Computazionali (CISC) Università di Trieste http://www.dmi.units.it/~chiarutt/didattica/parallela

More information

MPI Tutorial Part 1 Design of Parallel and High-Performance Computing Recitation Session

MPI Tutorial Part 1 Design of Parallel and High-Performance Computing Recitation Session S. DI GIROLAMO [DIGIROLS@INF.ETHZ.CH] MPI Tutorial Part 1 Design of Parallel and High-Performance Computing Recitation Session Slides credits: Pavan Balaji, Torsten Hoefler https://htor.inf.ethz.ch/teaching/mpi_tutorials/ppopp13/2013-02-24-ppopp-mpi-basic.pdf

More information

Message passing. Week 3: MPI. Day 02 :: Message passing, point-to-point and collective communications. What is MPI?

Message passing. Week 3: MPI. Day 02 :: Message passing, point-to-point and collective communications. What is MPI? Week 3: MPI Day 02 :: Message passing, point-to-point and collective communications Message passing What is MPI? A message-passing interface standard MPI-1.0: 1993 MPI-1.1: 1995 MPI-2.0: 1997 (backward-compatible

More information

Part - II. Message Passing Interface. Dheeraj Bhardwaj

Part - II. Message Passing Interface. Dheeraj Bhardwaj Part - II Dheeraj Bhardwaj Department of Computer Science & Engineering Indian Institute of Technology, Delhi 110016 India http://www.cse.iitd.ac.in/~dheerajb 1 Outlines Basics of MPI How to compile and

More information

Parallel Programming with MPI MARCH 14, 2018

Parallel Programming with MPI MARCH 14, 2018 Parallel Programming with MPI SARDAR USMAN & EMAD ALAMOUDI SUPERVISOR: PROF. RASHID MEHMOOD RMEHMOOD@KAU.EDU.SA MARCH 14, 2018 Sources The presentation is compiled using following sources. http://mpi-forum.org/docs/

More information

A Message Passing Standard for MPP and Workstations. Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W.

A Message Passing Standard for MPP and Workstations. Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. 1 A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker 2 Message Passing Interface (MPI) Message passing library Can

More information

Parallel Computing Paradigms

Parallel Computing Paradigms Parallel Computing Paradigms Message Passing João Luís Ferreira Sobral Departamento do Informática Universidade do Minho 31 October 2017 Communication paradigms for distributed memory Message passing is

More information

Topics. Lecture 6. Point-to-point Communication. Point-to-point Communication. Broadcast. Basic Point-to-point communication. MPI Programming (III)

Topics. Lecture 6. Point-to-point Communication. Point-to-point Communication. Broadcast. Basic Point-to-point communication. MPI Programming (III) Topics Lecture 6 MPI Programming (III) Point-to-point communication Basic point-to-point communication Non-blocking point-to-point communication Four modes of blocking communication Manager-Worker Programming

More information

東京大学情報基盤中心准教授片桐孝洋 Takahiro Katagiri, Associate Professor, Information Technology Center, The University of Tokyo

東京大学情報基盤中心准教授片桐孝洋 Takahiro Katagiri, Associate Professor, Information Technology Center, The University of Tokyo Overview of MPI 東京大学情報基盤中心准教授片桐孝洋 Takahiro Katagiri, Associate Professor, Information Technology Center, The University of Tokyo 台大数学科学中心科学計算冬季学校 1 Agenda 1. Features of MPI 2. Basic MPI Functions 3. Reduction

More information

Buffering in MPI communications

Buffering in MPI communications Buffering in MPI communications Application buffer: specified by the first parameter in MPI_Send/Recv functions System buffer: Hidden from the programmer and managed by the MPI library Is limitted and

More information

Parallel Programming

Parallel Programming Parallel Programming for Multicore and Cluster Systems von Thomas Rauber, Gudula Rünger 1. Auflage Parallel Programming Rauber / Rünger schnell und portofrei erhältlich bei beck-shop.de DIE FACHBUCHHANDLUNG

More information

CS4961 Parallel Programming. Lecture 18: Introduction to Message Passing 11/3/10. Final Project Purpose: Mary Hall November 2, 2010.

CS4961 Parallel Programming. Lecture 18: Introduction to Message Passing 11/3/10. Final Project Purpose: Mary Hall November 2, 2010. Parallel Programming Lecture 18: Introduction to Message Passing Mary Hall November 2, 2010 Final Project Purpose: - A chance to dig in deeper into a parallel programming model and explore concepts. -

More information

Parallel Programming Using MPI

Parallel Programming Using MPI Parallel Programming Using MPI Short Course on HPC 15th February 2019 Aditya Krishna Swamy adityaks@iisc.ac.in SERC, Indian Institute of Science When Parallel Computing Helps? Want to speed up your calculation

More information

Introduction to MPI, the Message Passing Library

Introduction to MPI, the Message Passing Library Chapter 3, p. 1/57 Basics of Basic Messages -To-? Introduction to, the Message Passing Library School of Engineering Sciences Computations for Large-Scale Problems I Chapter 3, p. 2/57 Outline Basics of

More information

Distributed Memory Programming with MPI

Distributed Memory Programming with MPI Distributed Memory Programming with MPI Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna moreno.marzolla@unibo.it 2 Credits Peter Pacheco, Dept. of Computer Science,

More information

A few words about MPI (Message Passing Interface) T. Edwald 10 June 2008

A few words about MPI (Message Passing Interface) T. Edwald 10 June 2008 A few words about MPI (Message Passing Interface) T. Edwald 10 June 2008 1 Overview Introduction and very short historical review MPI - as simple as it comes Communications Process Topologies (I have no

More information

Practical stuff! ü OpenMP

Practical stuff! ü OpenMP Practical stuff! REALITY: Ways of actually get stuff done in HPC: Ø Message Passing (send, receive, broadcast,...) Ø Shared memory (load, store, lock, unlock) ü MPI Ø Transparent (compiler works magic)

More information

Programming Scalable Systems with MPI. Clemens Grelck, University of Amsterdam

Programming Scalable Systems with MPI. Clemens Grelck, University of Amsterdam Clemens Grelck University of Amsterdam UvA / SurfSARA High Performance Computing and Big Data Course June 2014 Parallel Programming with Compiler Directives: OpenMP Message Passing Gentle Introduction

More information

Chip Multiprocessors COMP Lecture 9 - OpenMP & MPI

Chip Multiprocessors COMP Lecture 9 - OpenMP & MPI Chip Multiprocessors COMP35112 Lecture 9 - OpenMP & MPI Graham Riley 14 February 2018 1 Today s Lecture Dividing work to be done in parallel between threads in Java (as you are doing in the labs) is rather

More information

MPI Optimisation. Advanced Parallel Programming. David Henty, Iain Bethune, Dan Holmes EPCC, University of Edinburgh

MPI Optimisation. Advanced Parallel Programming. David Henty, Iain Bethune, Dan Holmes EPCC, University of Edinburgh MPI Optimisation Advanced Parallel Programming David Henty, Iain Bethune, Dan Holmes EPCC, University of Edinburgh Overview Can divide overheads up into four main categories: Lack of parallelism Load imbalance

More information

CS 179: GPU Programming. Lecture 14: Inter-process Communication

CS 179: GPU Programming. Lecture 14: Inter-process Communication CS 179: GPU Programming Lecture 14: Inter-process Communication The Problem What if we want to use GPUs across a distributed system? GPU cluster, CSIRO Distributed System A collection of computers Each

More information

High Performance Computing Course Notes Message Passing Programming I

High Performance Computing Course Notes Message Passing Programming I High Performance Computing Course Notes 2008-2009 2009 Message Passing Programming I Message Passing Programming Message Passing is the most widely used parallel programming model Message passing works

More information

Lecture 4 Introduction to MPI

Lecture 4 Introduction to MPI CS075 1896 Lecture 4 Introduction to MPI Jeremy Wei Center for HPC, SJTU Mar 13th, 2017 1920 1987 2006 Recap of the last lecture (OpenMP) OpenMP is a standardized pragma-based intra-node parallel programming

More information

In the simplest sense, parallel computing is the simultaneous use of multiple computing resources to solve a problem.

In the simplest sense, parallel computing is the simultaneous use of multiple computing resources to solve a problem. 1. Introduction to Parallel Processing In the simplest sense, parallel computing is the simultaneous use of multiple computing resources to solve a problem. a) Types of machines and computation. A conventional

More information

ECE 574 Cluster Computing Lecture 13

ECE 574 Cluster Computing Lecture 13 ECE 574 Cluster Computing Lecture 13 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 21 March 2017 Announcements HW#5 Finally Graded Had right idea, but often result not an *exact*

More information

Peter Pacheco. Chapter 3. Distributed Memory Programming with MPI. Copyright 2010, Elsevier Inc. All rights Reserved

Peter Pacheco. Chapter 3. Distributed Memory Programming with MPI. Copyright 2010, Elsevier Inc. All rights Reserved An Introduction to Parallel Programming Peter Pacheco Chapter 3 Distributed Memory Programming with MPI 1 Roadmap Writing your first MPI program. Using the common MPI functions. The Trapezoidal Rule in

More information

CS4961 Parallel Programming. Lecture 16: Introduction to Message Passing 11/3/11. Administrative. Mary Hall November 3, 2011.

CS4961 Parallel Programming. Lecture 16: Introduction to Message Passing 11/3/11. Administrative. Mary Hall November 3, 2011. CS4961 Parallel Programming Lecture 16: Introduction to Message Passing Administrative Next programming assignment due on Monday, Nov. 7 at midnight Need to define teams and have initial conversation with

More information

The MPI Message-passing Standard Practical use and implementation (V) SPD Course 6/03/2017 Massimo Coppola

The MPI Message-passing Standard Practical use and implementation (V) SPD Course 6/03/2017 Massimo Coppola The MPI Message-passing Standard Practical use and implementation (V) SPD Course 6/03/2017 Massimo Coppola Intracommunicators COLLECTIVE COMMUNICATIONS SPD - MPI Standard Use and Implementation (5) 2 Collectives

More information