Introduzione al Message Passing Interface (MPI) Andrea Clematis IMATI CNR

Size: px
Start display at page:

Download "Introduzione al Message Passing Interface (MPI) Andrea Clematis IMATI CNR"

Transcription

1 Introduzione al Message Passing Interface (MPI) Andrea Clematis IMATI CNR

2 Ack. & riferimenti An Introduction to MPI Parallel Programming with the Message Passing InterfaceWilliam Gropp,Ewing Lusk Argonne National Laboratory Online examples available at ftp://ftp.mcs.anl.gov/mpi/mpiexmpl.tar.gz contains source code and run scripts that allows you to evaluate your own MPI implementation

3 Sommario Collective communication Considerazioni su array processing Calcolo di pgreco Heat equation Tipi di comunicazioni punto-punto Derived data types Groups and communicator Virtual topologies

4 Collective communication All or None: Collective communication must involve all processes in the scope of a communicator. All processes are by default, members in the communicator MPI_COMM_WORLD. It is the programmer's responsibility to insure that all processes within a communicator participate in any collective operations.

5 Collective communication Types of Collective Operations: Synchronization - processes wait until all members of the group have reached the synchronization point. Data Movement - broadcast, scatter/gather, all to all. Collective Computation (reductions) - one member of the group collects data from the other members and performs an operation (min, max, add, multiply, etc.) on that data.

6 Collective communication Programming Considerations and Restrictions: Collective operations are blocking. Collective communication routines do not take message tag arguments. Collective operations within subsets of processes are accomplished by first partitioning the subsets into new groups and then attaching the new groups to new communicators Can only be used with MPI predefined datatypes - not with MPI Derived Data Types.

7 Collective communication MPI_Barrier Creates a barrier synchronization in a group. Each task, when reaching the MPI_Barrier call, blocks until all tasks in the group reach the same MPI_Barrier call. MPI_Barrier (comm) MPI_BARRIER (comm,ierr) C synopsis #include <mpi.h> int MPI_Barrier(MPI_Comm comm); Parameters: comm is a communicator (handle) (IN) IERROR is the FORTRAN return code. It is always the last argument.

8 Collective communication MPI_Bcast Broadcasts (sends) a message from the process with rank "root" to all other processes in the group. MPI_Bcast (&buffer,count,datatype,root,comm) MPI_BCAST (buffer,count,datatype,root,comm,ierr) C synopsis #include <mpi.h> int MPI_Bcast(void* buffer, int count, MPI_Datatype datatype, int root, MPI_Comm comm); Parameters buffer is the starting address of the buffer (choice) (INOUT) count is the number of elements in the buffer (integer) (IN) datatype is the datatype of the buffer elements (handle) (IN) root is the rank of the root task (integer) (IN) comm is the communicator (handle) (IN) IERROR is the FORTRAN return code. It is always the last argument.

9 Collective communication

10 MPI_Scatter Collective communication Distributes distinct messages from a single source task to each task in the group. MPI_Scatter (&sendbuf,sendcnt,sendtype,&recvbuf,... recvcnt,recvtype,root,comm) MPI_SCATTER (sendbuf,sendcnt,sendtype,recvbuf,... recvcnt,recvtype,root,comm,ierr) C synopsis #include <mpi.h> int MPI_Scatter(void* sendbuf,int sendcount,mpi_datatype sendtype,void* recvbuf, int recvcount,mpi_datatype recvtype,int root,mpi_comm comm);

11 MPI_Scatter Collective communication MPI_SCATTER distributes individual messages from root to each task in comm. This subroutine is the inverse operation to MPI_GATHER. The type signature associated with sendcount, sendtype at the root must be equal to the type signature associated with recvcount, recvtype at all tasks. (Type maps can be different.) This means the amount of data sent must be equal to the amount of data received, pairwise between each task and the root. Distinct type maps between sender and receiver are allowed. The following is information regarding MPI_SCATTER arguments and tasks: On the task root, all arguments to the function are significant. On other tasks, only the arguments recvbuf, recvcount, recvtype, root, and comm are significant. The argument root must be the same on all tasks. A call where the specification of counts and types causes any location on the root to be read more than once is erroneous.

12 Collective communication

13 int MPI_Scatter(void* sendbuf,int sendcount,mpi_datatype sendtype,void* recvbuf, int recvcount,mpi_datatype recvtype,int root,mpi_comm comm);

14

15

16 Collective communication MPI_Gather Gathers distinct messages from each task in the group to a single destination task. This routine is the reverse operation of MPI_Scatter. MPI_Gather (&sendbuf,sendcnt,sendtype,&recvbuf,... recvcount,recvtype,root,comm) MPI_GATHER (sendbuf,sendcnt,sendtype,recvbuf,... recvcount,recvtype,root,comm,ierr) C synopsis #include <mpi.h> int MPI_Gather(void* sendbuf,int sendcount,mpi_datatype sendtype, void* recvbuf,int recvcount,mpi_datatype recvtype,int root, MPI_Comm comm);

17 Collective communication The type signature of sendcount, sendtype on task i must be equal to the type signature of recvcount, recvtype at the root. This means the amount of data sent must be equal to the amount of data received, pairwise between each task and the root. Distinct type maps between sender and receiver are allowed. The following is information regarding MPI_GATHER arguments and tasks: On the task root, all arguments to the function are significant. On other tasks, only the arguments sendbuf, sendcount, sendtype, root, and comm are significant. The argument root must be the same on all tasks. Note that the argument revcount at the root indicates the number of items it receives from each task. It is not the total number of items received. A call where the specification of counts and types causes any location on the root to be written more than once is erroneous.

18 Parameters Collective communication sendbuf is the starting address of the send buffer (choice) (IN) sendcount is the number of elements in the send buffer (integer) (IN) sendtype is the datatype of the send buffer elements (handle) (IN) recvbuf is the address of the receive buffer (choice, significant only at root) (OUT) recvcount is the number of elements for any single receive (integer, significant only at root) (IN) recvtype is the datatype of the receive buffer elements (handle, significant only at root) (IN) root is the rank of the receiving task (integer) (IN) comm is the communicator (handle) (IN) IERROR is the FORTRAN return code. It is always the last argument.

19 Collective communication

20 MPI_Gather (&sendbuf,sendcnt,sendtype,&recvbuf,... recvcount,recvtype,root,comm)

21

22 MPI_Scatterv Scatters a buffer in parts to all tasks in a group Synopsis #include "mpi.h" int MPI_Scatterv ( void *sendbuf, int *sendcnts, int *displs, MPI_Datatype sendtype, void *recvbuf, int recvcnt, MPI_Datatype recvtype, int root, MPI_Comm comm ) Input Parameters sendbuf address of send buffer (choice, significant only at root) sendcounts integer array (of length group size) specifying the number of elements to send to each processor displs integer array (of length group size). Entry i specifies the displacement (relative to sendbuf from which to take the outgoing data to process i sendtype data type of send buffer elements (handle) recvcount number of elements in receive buffer (integer) recvtype data type of receive buffer elements (handle) root rank of sending process (integer) comm communicator (handle) Output Parameter recvbuf address of receive buffer (choice)

23 MPI_Reduce Collective communication Applies a reduction operation on all tasks in the group and places the result in one task. MPI_Reduce (&sendbuf,&recvbuf,count,datatype,op,root,comm) MPI_REDUCE (sendbuf,recvbuf,count,datatype,op,root,comm,ierr) C synopsis #include <mpi.h> int MPI_Reduce(void* sendbuf,void* recvbuf,int count, MPI_Datatype datatype,mpi_op op,int root,mpi_comm comm);

24 Collective communication This subroutine applies a reduction operation to the vector sendbuf over the set of tasks specified by comm and places the result in recvbuf on root. The input buffer and the output buffer have the same number of elements with the same type. The arguments sendbuf, count, and datatype define the send or input buffer. The arguments recvbuf, count and datatype define the output buffer. MPI_REDUCE is called by all group members using the same arguments for count, datatype, op, and root. If a sequence of elements is provided to a task, the reduction operation is executed element-wise on each entry of the sequence. Here's an example. If the operation is MPI_MAX and the send buffer contains two elements that are floating point numbers (count = 2 and datatype = MPI_FLOAT), recvbuf(1) = global max(sendbuf(1)) and recvbuf(2) = global max(sendbuf(2)). Users can define their own operations or use the predefined operations provided by MPI. User-defined operations can be overloaded to operate on several datatypes, either basic or derived. The argument datatype of MPI_REDUCE must be compatible with op.

25 Collective communication Parameters sendbuf is the address of the send buffer (choice) (IN) recvbuf is the address of the receive buffer (choice, significant only at root) (OUT) count is the number of elements in the send buffer (integer) (IN) datatype is the datatype of elements of the send buffer (handle) (IN) op is the reduction operation (handle) (IN) root is the rank of the root task (integer) (IN) comm is the communicator (handle) (IN) IERROR is the FORTRAN return code. It is always the last argument.

26 Collective communication The predefined MPI reduction operations appear below. Users can also define their own reduction functions by using the MPI_Op_create routine.

27 MPI_Scatter Collective communication Parameters sendbuf is the address of the send buffer (choice, significant only at root) (IN) sendcount is the number of elements to be sent to each task (integer, significant only at root) (IN) sendtype is the datatype of the send buffer elements (handle, significant only at root) (IN) recvbuf is the address of the receive buffer (choice) (OUT) recvcount is the number of elements in the receive buffer (integer) (IN) recvtype is the datatype of the receive buffer elements (handle) (IN) root is the rank of the sending task (integer) (IN) comm is the communicator (handle) (IN) IERROR is the FORTRAN return code. It is always the last argument.

28 Collective communication

29 Sommario Collective communication Considerazioni su array processing Calcolo di pgreco Heat equation Tipi di comunicazioni punto-punto Derived data types Groups and communicator Virtual topologies

30 Array Processing This example demonstrates calculations on 2-dimensional array elements, with the computation on each array element being independent from other array elements. The serial program calculates one element at a time in sequential order. Serial code could be of the form:

31 Array processing The calculation of elements is independent of one another - leads to an embarrassingly parallel situation. The problem should be computationally intensive.

32 Array Processing Parallel Solution 1 Arrays elements are distributed so that each processor owns a portion of an array (subarray). Independent calculation of array elements insures there is no need for communication between tasks. Distribution scheme is chosen by other criteria, e.g. unit stride (stride of 1) through the subarrays. Unit stride maximizes cache/memory usage. Since it is desirable to have unit stride through the subarrays, the choice of a distribution scheme depends on the programming language. Block - Cyclic Distributions Diagram for the options.

33 Array Processing Parallel Solution 1 After the array is distributed, each task executes the portion of the loop corresponding to the data it owns. For example, with Fortran block distribution: Notice that only the outer loop variables are different from the serial solution. Come sarà in C?

34 One Possible Solution: Implement as SPMD model. Master process initializes array, sends info to worker processes and receives results. Worker process receives info, performs its share of computation and sends results to master. Using the Fortran storage scheme, perform block distribution of the array Pseudo code solution: red highlights changes for parallelism.

35 Implementazione in C

36

37

38

39

40

41 Array Processing Parallel Solution 2: Pool of Tasks The previous array solution demonstrated static load balancing: Each task has a fixed amount of work to do May be significant idle time for faster or more lightly loaded processors - slowest tasks determines overall performance. Static load balancing is not usually a major concern if all tasks are performing the same amount of work on identical machines. If you have a load balance problem (some tasks work faster than others), you may benefit by using a "pool of tasks" scheme.

42 Pool of Tasks Scheme: Two processes are employed Master Process: o o o Holds pool of tasks for worker processes to do Sends worker a task when requested Collects results from workers Worker Process: repeatedly does the following o o o Gets task from master process Performs computation Sends results to master Worker processes do not know before runtime which portion of array they will handle or how many tasks they will perform. Dynamic load balancing occurs at run time: the faster tasks will get more work to do.

43 Pseudo code solution: red highlights changes for parallelism.

44 Discussion In the above pool of tasks example, each task calculated an individual array element as a job. The computation to communication ratio is finely granular. Finely granular solutions incur more communication overhead in order to reduce task idle time. A more optimal solution might be to distribute more work with each job. The "right" amount of work is problem dependent.

45 Sommario Collective communication Considerazioni su array processing Calcolo di pgreco Heat equation Tipi di comunicazioni punto-punto Derived data types Groups and communicator Virtual topologies

46 PI Calculation an embarassling parallel algorithm

47

48 From sequential to parallel

49 From sequential to parallel

50

51

52 Sommario Collective communication Considerazioni su array processing Calcolo di pgreco Heat equation Tipi di comunicazioni punto-punto Derived data types Groups and communicator Virtual topologies

53 Simple Heat Equation Most problems in parallel computing require communication among the tasks. A number of common problems require communication with "neighbor" tasks. The heat equation describes the temperature change over time, given initial temperature distribution and boundary conditions. A finite differencing scheme is employed to solve the heat equation numerically on a square region.

54 Simple Heat Equation The initial temperature is zero on the boundaries and high in the middle. The boundary temperature is held at zero. For the fully explicit problem, a time stepping algorithm is used. The elements of a 2-dimensional array represent the temperature at points on the square. The calculation of an element is dependent upon neighbor element values.

55 Simple Heat Equation A serial program would contain code like:

56 Simple Heat Equation Parallel Solution 1 Implement as an SPMD model The entire array is partitioned and distributed as subarrays to all tasks. Each task owns a portion of the total array. Determine data dependencies interior elements belonging to a task are independent of other tasks border elements are dependent upon a neighbor task's data, necessitating communication.

57 Simple Heat Equation Parallel Solution 1 Master process sends initial info to workers, checks for convergence and collects results Worker process calculates solution, communicating as necessary with neighbor processes Pseudo code solution: red highlights changes for parallelism.

58 Simple Heat Equation Parallel Solution 2: Overlapping Communication and Computation In the previous solution, it was assumed that blocking communications were used by the worker tasks. Blocking communications wait for the communication process to complete before continuing to the next program instruction. In the previous solution, neighbor tasks communicated border data, then each process updated its portion of the array. Computing times can often be reduced by using non-blocking communication. Non-blocking communications allow work to be performed while communication is in progress. Each task could update the interior of its part of the solution array while the communication of border data is occurring, and update its border after communication has completed. Pseudo code for the second solution: red highlights changes for non-blocking communications.

59

60 Sommario Collective communication Considerazioni su array processing Calcolo di pgreco Heat equation Tipi di comunicazioni punto-punto Derived data types Groups and communicator Virtual topologies

61 Types of Point-to-Point Operations: There are different types of send and receive routines used for different purposes. For example: Synchronous send Blocking send / blocking receive Non-blocking send / non-blocking receive Buffered send Combined send/receive "Ready" send Any type of send routine can be paired with any type of receive routine. MPI also provides several routines associated with send - receive operations, such as those used to wait for a message's arrival or probe to find out if a message has arrived.

62 Data Movement in point to point communication

63 Data Movement in point to point communication

64 Buffering In a perfect world, every send operation would be perfectly synchronized with its matching receive. This is rarely the case. Somehow or other, the MPI implementation must be able to deal with storing data when the two tasks are out of sync. Consider the following two cases: A send operation occurs 5 seconds before the receive is ready - where is the message while the receive is pending? Multiple sends arrive at the same receiving task which can only accept one send at a time - what happens to the messages that are "backing up"? The MPI implementation (not the MPI standard) decides what happens to data in these types of cases. Typically, a system buffer area is reserved to hold data in transit. For example:

65 Buffering System buffer space is: Opaque to the programmer and managed entirely by the MPI library A finite resource that can be easy to exhaust Often mysterious and not well documented Able to exist on the sending side, the receiving side, or both Something that may improve program performance because it allows send - receive operations to be asynchronous. User managed address space (i.e. your program variables) is called the application buffer. MPI also provides for a user managed send buffer.

66 Blocking vs. Non-blocking Most of the MPI point-to-point routines can be used in either blocking or non-blocking mode. Blocking: A blocking send routine will only "return" after it is safe to modify the application buffer (your send data) for reuse. Safe means that modifications will not affect the data intended for the receive task. Safe does not imply that the data was actually received - it may very well be sitting in a system buffer. A blocking send can be synchronous which means there is handshaking occurring with the receive task to confirm a safe send. A blocking send can be asynchronous if a system buffer is used to hold the data for eventual delivery to the receive. A blocking receive only "returns" after the data has arrived and is ready for use by the program.

67 Blocking vs. Non-blocking Non-blocking: Non-blocking send and receive routines behave similarly - they will return almost immediately. They do not wait for any communication events to complete, such as message copying from user memory to system buffer space or the actual arrival of message. Non-blocking operations simply "request" the MPI library to perform the operation when it is able. The user can not predict when that will happen. It is unsafe to modify the application buffer (your variable space) until you know for a fact the requested non-blocking operation was actually performed by the library. There are "wait" routines used to do this. Non-blocking communications are primarily used to overlap computation with communication and exploit possible performance gains.

68 Order and Fairness Order: MPI guarantees that messages will not overtake each other. If a sender sends two messages (Message 1 and Message 2) in succession to the same destination, and both match the same receive, the receive operation will receive Message 1 before Message 2. If a receiver posts two receives (Receive 1 and Receive 2), in succession, and both are looking for the same message, Receive 1 will receive the message before Receive 2. Order rules do not apply if there are multiple threads participating in the communication operations. Fairness: MPI does not guarantee fairness - it's up to the programmer to prevent "operation starvation". Example: task 0 sends a message to task 2. However, task 1 sends a competing message that matches task 2's receive. Only one of the sends will complete.

69 Sommario Collective communication Considerazioni su array processing Calcolo di pgreco Heat equation Tipi di comunicazioni punto-punto Derived data types Groups and communicator Virtual topologies

70 Derived Data Type Routines MPI_Type_contiguous The simplest constructor. Produces a new data type by making count copies of an existing data type. MPI_Type_contiguous (count,oldtype,&newtype) MPI_TYPE_CONTIGUOUS (count,oldtype,newtype,ierr) C synopsis #include <mpi.h> int MPI_Type_contiguous(int count,mpi_datatype oldtype,mpi_datatype *newtype);

71

72 MPI_Type_commit Commits new datatype to the system. Required for all user constructed (derived) datatypes. Makes a datatype ready for use in communication. MPI_Type_commit (&datatype) MPI_TYPE_COMMIT (datatype,ierr) C synopsis #include <mpi.h> int MPI_Type_commit(MPI_Datatype *datatype);

73 Sample program output: rank= 0 b= rank= 1 b= rank= 2 b= rank= 3 b=

74 MPI_Type_vector Similar to contiguous, but allows for regular gaps (stride) in the displacements. MPI_Type_vector (count,blocklength,stride,oldtype,&newtype) MPI_TYPE_VECTOR (count,blocklength,stride,oldtype,newtype,ierr) C synopsis #include <mpi.h> int MPI_Type_vector(int count,int blocklength,int stride, MPI_Datatype oldtype,mpi_datatype *newtype);

75

76

77 Sommario Collective communication Considerazioni su array processing Calcolo di pgreco Heat equation Tipi di comunicazioni punto-punto Derived data types Groups and communicator Virtual topologies

78 Group and Communicator Management Routines Groups vs. Communicators: A group is an ordered set of processes. Each process in a group is associated with a unique integer rank. Rank values start at zero and go to N-1, where N is the number of processes in the group. In MPI, a group is represented within system memory as an object. It is accessible to the programmer only by a "handle". A group is always associated with a communicator object. A communicator encompasses a group of processes that may communicate with each other. All MPI messages must specify a communicator. In the simplest sense, the communicator is an extra "tag" that must be included with MPI calls. Like groups, communicators are represented within system memory as objects and are accessible to the programmer only by "handles". For example, the handle for the communicator that comprises all tasks is MPI_COMM_WORLD. From the programmer's perspective, a group and a communicator are one. The group routines are primarily used to specify which processes should be used to construct a communicator

79 Group and Communicator Management Routines Primary Purposes of Group and Communicator Objects: 1. Allow you to organize tasks, based upon function, into task groups. 2. Enable Collective Communications operations across a subset of related tasks. 3. Provide basis for implementing user defined virtual topologies 4. Provide for safe communications

80 Group and Communicator Management Routines Programming Considerations and Restrictions: Groups/communicators are dynamic - they can be created and destroyed during program execution. Processes may be in more than one group/communicator. They will have a unique rank within each group/communicator. MPI provides over 40 routines related to groups, communicators, and virtual topologies.

81 Typical usage: 1.Extract handle of global group from MPI_COMM_WORLD using MPI_Comm_group 2.Form new group as a subset of global group using MPI_Group_incl 3.Create new communicator for new group using MPI_Comm_create 4.Determine new rank in new communicator using MPI_Comm_rank 5.Conduct communications using any MPI message passing routine 6.When finished, free up new communicator and group (optional) using MPI_Comm_free and MPI_Group_free

82

83

84 Sommario Collective communication Considerazioni su array processing Calcolo di pgreco Heat equation Tipi di comunicazioni punto-punto Derived data types Groups and communicator Virtual topologies

85 Virtual Topologies What Are They? In terms of MPI, a virtual topology describes a mapping/ordering of MPI processes into a geometric "shape". The two main types of topologies supported by MPI are Cartesian (grid) and Graph. MPI topologies are virtual - there may be no relation between the physical structure of the parallel machine and the process topology. Virtual topologies are built upon MPI communicators and groups. Must be "programmed" by the application developer

86 Virtual Topologies Why Use Them? Convenience Virtual topologies may be useful for applications with specific communication patterns - patterns that match an MPI topology structure. For example, a Cartesian topology might prove convenient for an application that requires 4-way nearest neighbor communications for grid based data. Communication Efficiency Some hardware architectures may impose penalties for communications between successively distant "nodes". A particular implementation may optimize process mapping based upon the physical characteristics of a given parallel machine. The mapping of processes into an MPI virtual topology is dependent upon the MPI implementation, and may be totally ignored.

87 Virtual Topologies Example: A simplified mapping of processes into a Cartesian virtual topology appears below:

88 int MPI_Cart_create( MPI_Comm comm_old, int ndims, int *dims, int *periods, int reorder, MPI_Comm *comm_cart); Makes a new communicator to which topology information has been attached Parameters comm_old [in] input communicator (handle) ndims [in] number of dimensions of cartesian grid (integer) dims [in] integer array of size ndims specifying the number of processes in each dimension periods [in] logical array of size ndims specifying whether the grid is periodic (true) or not (false) in each dimension reorder [in] ranking may be reordered (true) or not (false) (logical) comm_cart [out] communicator with new cartesian topology (handle)

89 int MPI_Cart_coords( MPI_Comm comm, int rank, int maxdims, int *coords); Determines process coords in cartesian topology given rank in group Parameters comm [in] communicator with cartesian structure (handle) rank [in] rank of a process within group of comm (integer) maxdims [in] length of vector coords in the calling program (integer) coords [out] integer array (of size ndims) containing the Cartesian coordinates of specified process (integer)

90 int MPI_Cart_shift(MPI_Comm comm, int direction, int displ, int *source, int *dest) Loosely speaking, MPI_Cart_shift is used to find two "nearby" neighbors of the calling process along a specific direction of an N-dimensional cartesian topology. This direction is specified by the input argument, direction, to MPI_Cart_shift. The two neighbors are called source and destination ranks and the proximity of these two neighbors to the calling process is determined by the input parameter displ. If displ = 1, the neighbors are the two adjoining processes along the specified direction and the source is the process with the lower rank number while the destination rank is the process with the higher rank. On the other hand, if displ = -1, the reverse is true.

91 Create a 4 x 4 Cartesian topology from 16 processors and have each process exchange its rank with four neighbors

High-Performance Computing: MPI (ctd)

High-Performance Computing: MPI (ctd) High-Performance Computing: MPI (ctd) Adrian F. Clark: alien@essex.ac.uk 2015 16 Adrian F. Clark: alien@essex.ac.uk High-Performance Computing: MPI (ctd) 2015 16 1 / 22 A reminder Last time, we started

More information

Intermediate MPI features

Intermediate MPI features Intermediate MPI features Advanced message passing Collective communication Topologies Group communication Forms of message passing (1) Communication modes: Standard: system decides whether message is

More information

Topics. Lecture 7. Review. Other MPI collective functions. Collective Communication (cont d) MPI Programming (III)

Topics. Lecture 7. Review. Other MPI collective functions. Collective Communication (cont d) MPI Programming (III) Topics Lecture 7 MPI Programming (III) Collective communication (cont d) Point-to-point communication Basic point-to-point communication Non-blocking point-to-point communication Four modes of blocking

More information

Review of MPI Part 2

Review of MPI Part 2 Review of MPI Part Russian-German School on High Performance Computer Systems, June, 7 th until July, 6 th 005, Novosibirsk 3. Day, 9 th of June, 005 HLRS, University of Stuttgart Slide Chap. 5 Virtual

More information

Standard MPI - Message Passing Interface

Standard MPI - Message Passing Interface c Ewa Szynkiewicz, 2007 1 Standard MPI - Message Passing Interface The message-passing paradigm is one of the oldest and most widely used approaches for programming parallel machines, especially those

More information

MPI. (message passing, MIMD)

MPI. (message passing, MIMD) MPI (message passing, MIMD) What is MPI? a message-passing library specification extension of C/C++ (and Fortran) message passing for distributed memory parallel programming Features of MPI Point-to-point

More information

Introduction to MPI Part II Collective Communications and communicators

Introduction to MPI Part II Collective Communications and communicators Introduction to MPI Part II Collective Communications and communicators Andrew Emerson, Fabio Affinito {a.emerson,f.affinito}@cineca.it SuperComputing Applications and Innovation Department Collective

More information

High Performance Computing

High Performance Computing High Performance Computing Course Notes 2009-2010 2010 Message Passing Programming II 1 Communications Point-to-point communications: involving exact two processes, one sender and one receiver For example,

More information

Message Passing with MPI

Message Passing with MPI Message Passing with MPI PPCES 2016 Hristo Iliev IT Center / JARA-HPC IT Center der RWTH Aachen University Agenda Motivation Part 1 Concepts Point-to-point communication Non-blocking operations Part 2

More information

MPI Collective communication

MPI Collective communication MPI Collective communication CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) MPI Collective communication Spring 2018 1 / 43 Outline 1 MPI Collective communication

More information

CSE 613: Parallel Programming. Lecture 21 ( The Message Passing Interface )

CSE 613: Parallel Programming. Lecture 21 ( The Message Passing Interface ) CSE 613: Parallel Programming Lecture 21 ( The Message Passing Interface ) Jesmin Jahan Tithi Department of Computer Science SUNY Stony Brook Fall 2013 ( Slides from Rezaul A. Chowdhury ) Principles of

More information

Outline. Communication modes MPI Message Passing Interface Standard

Outline. Communication modes MPI Message Passing Interface Standard MPI THOAI NAM Outline Communication modes MPI Message Passing Interface Standard TERMs (1) Blocking If return from the procedure indicates the user is allowed to reuse resources specified in the call Non-blocking

More information

Parallel programming MPI

Parallel programming MPI Parallel programming MPI Distributed memory Each unit has its own memory space If a unit needs data in some other memory space, explicit communication (often through network) is required Point-to-point

More information

Topics. Lecture 6. Point-to-point Communication. Point-to-point Communication. Broadcast. Basic Point-to-point communication. MPI Programming (III)

Topics. Lecture 6. Point-to-point Communication. Point-to-point Communication. Broadcast. Basic Point-to-point communication. MPI Programming (III) Topics Lecture 6 MPI Programming (III) Point-to-point communication Basic point-to-point communication Non-blocking point-to-point communication Four modes of blocking communication Manager-Worker Programming

More information

Collective Communications

Collective Communications Collective Communications Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Message Passing Interface

Message Passing Interface Message Passing Interface DPHPC15 TA: Salvatore Di Girolamo DSM (Distributed Shared Memory) Message Passing MPI (Message Passing Interface) A message passing specification implemented

More information

Buffering in MPI communications

Buffering in MPI communications Buffering in MPI communications Application buffer: specified by the first parameter in MPI_Send/Recv functions System buffer: Hidden from the programmer and managed by the MPI library Is limitted and

More information

Programming with MPI Collectives

Programming with MPI Collectives Programming with MPI Collectives Jan Thorbecke Type to enter text Delft University of Technology Challenge the future Collectives Classes Communication types exercise: BroadcastBarrier Gather Scatter exercise:

More information

Basic MPI Communications. Basic MPI Communications (cont d)

Basic MPI Communications. Basic MPI Communications (cont d) Basic MPI Communications MPI provides two non-blocking routines: MPI_Isend(buf,cnt,type,dst,tag,comm,reqHandle) buf: source of data to be sent cnt: number of data elements to be sent type: type of each

More information

In the simplest sense, parallel computing is the simultaneous use of multiple computing resources to solve a problem.

In the simplest sense, parallel computing is the simultaneous use of multiple computing resources to solve a problem. 1. Introduction to Parallel Processing In the simplest sense, parallel computing is the simultaneous use of multiple computing resources to solve a problem. a) Types of machines and computation. A conventional

More information

Distributed Memory Programming with MPI

Distributed Memory Programming with MPI Distributed Memory Programming with MPI Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna moreno.marzolla@unibo.it Algoritmi Avanzati--modulo 2 2 Credits Peter Pacheco,

More information

Part - II. Message Passing Interface. Dheeraj Bhardwaj

Part - II. Message Passing Interface. Dheeraj Bhardwaj Part - II Dheeraj Bhardwaj Department of Computer Science & Engineering Indian Institute of Technology, Delhi 110016 India http://www.cse.iitd.ac.in/~dheerajb 1 Outlines Basics of MPI How to compile and

More information

Cluster Computing MPI. Industrial Standard Message Passing

Cluster Computing MPI. Industrial Standard Message Passing MPI Industrial Standard Message Passing MPI Features Industrial Standard Highly portable Widely available SPMD programming model Synchronous execution MPI Outer scope int MPI_Init( int *argc, char ** argv)

More information

MPI 5. CSCI 4850/5850 High-Performance Computing Spring 2018

MPI 5. CSCI 4850/5850 High-Performance Computing Spring 2018 MPI 5 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives

More information

The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing

The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Parallelism Decompose the execution into several tasks according to the work to be done: Function/Task

More information

The MPI Message-passing Standard Practical use and implementation (V) SPD Course 6/03/2017 Massimo Coppola

The MPI Message-passing Standard Practical use and implementation (V) SPD Course 6/03/2017 Massimo Coppola The MPI Message-passing Standard Practical use and implementation (V) SPD Course 6/03/2017 Massimo Coppola Intracommunicators COLLECTIVE COMMUNICATIONS SPD - MPI Standard Use and Implementation (5) 2 Collectives

More information

CME 213 SPRING Eric Darve

CME 213 SPRING Eric Darve CME 213 SPRING 2017 Eric Darve LINEAR ALGEBRA MATRIX-VECTOR PRODUCTS Application example: matrix-vector product We are going to use that example to illustrate additional MPI functionalities. This will

More information

Distributed Memory Systems: Part IV

Distributed Memory Systems: Part IV Chapter 5 Distributed Memory Systems: Part IV Max Planck Institute Magdeburg Jens Saak, Scientific Computing II 293/342 The Message Passing Interface is a standard for creation of parallel programs using

More information

Masterpraktikum - Scientific Computing, High Performance Computing

Masterpraktikum - Scientific Computing, High Performance Computing Masterpraktikum - Scientific Computing, High Performance Computing Message Passing Interface (MPI) Thomas Auckenthaler Wolfgang Eckhardt Technische Universität München, Germany Outline Hello World P2P

More information

Collective Communications II

Collective Communications II Collective Communications II Ned Nedialkov McMaster University Canada SE/CS 4F03 January 2014 Outline Scatter Example: parallel A b Distributing a matrix Gather Serial A b Parallel A b Allocating memory

More information

Message Passing Interface. most of the slides taken from Hanjun Kim

Message Passing Interface. most of the slides taken from Hanjun Kim Message Passing Interface most of the slides taken from Hanjun Kim Message Passing Pros Scalable, Flexible Cons Someone says it s more difficult than DSM MPI (Message Passing Interface) A standard message

More information

Outline. Communication modes MPI Message Passing Interface Standard. Khoa Coâng Ngheä Thoâng Tin Ñaïi Hoïc Baùch Khoa Tp.HCM

Outline. Communication modes MPI Message Passing Interface Standard. Khoa Coâng Ngheä Thoâng Tin Ñaïi Hoïc Baùch Khoa Tp.HCM THOAI NAM Outline Communication modes MPI Message Passing Interface Standard TERMs (1) Blocking If return from the procedure indicates the user is allowed to reuse resources specified in the call Non-blocking

More information

A Message Passing Standard for MPP and Workstations

A Message Passing Standard for MPP and Workstations A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker Message Passing Interface (MPI) Message passing library Can be

More information

High Performance Computing Course Notes Message Passing Programming III

High Performance Computing Course Notes Message Passing Programming III High Performance Computing Course Notes 2008-2009 2009 Message Passing Programming III Communication modes Synchronous mode The communication is considered complete when the sender receives the acknowledgement

More information

L15: Putting it together: N-body (Ch. 6)!

L15: Putting it together: N-body (Ch. 6)! Outline L15: Putting it together: N-body (Ch. 6)! October 30, 2012! Review MPI Communication - Blocking - Non-Blocking - One-Sided - Point-to-Point vs. Collective Chapter 6 shows two algorithms (N-body

More information

Masterpraktikum - Scientific Computing, High Performance Computing

Masterpraktikum - Scientific Computing, High Performance Computing Masterpraktikum - Scientific Computing, High Performance Computing Message Passing Interface (MPI) and CG-method Michael Bader Alexander Heinecke Technische Universität München, Germany Outline MPI Hello

More information

High Performance Computing Course Notes Message Passing Programming III

High Performance Computing Course Notes Message Passing Programming III High Performance Computing Course Notes 2009-2010 2010 Message Passing Programming III Blocking synchronous send the sender doesn t return until it receives the acknowledgement from the receiver that the

More information

Introduction to MPI. HY555 Parallel Systems and Grids Fall 2003

Introduction to MPI. HY555 Parallel Systems and Grids Fall 2003 Introduction to MPI HY555 Parallel Systems and Grids Fall 2003 Outline MPI layout Sending and receiving messages Collective communication Datatypes An example Compiling and running Typical layout of an

More information

Parallel Programming, MPI Lecture 2

Parallel Programming, MPI Lecture 2 Parallel Programming, MPI Lecture 2 Ehsan Nedaaee Oskoee 1 1 Department of Physics IASBS IPM Grid and HPC workshop IV, 2011 Outline 1 Point-to-Point Communication Non Blocking PTP Communication 2 Collective

More information

Introduction to MPI part II. Fabio AFFINITO

Introduction to MPI part II. Fabio AFFINITO Introduction to MPI part II Fabio AFFINITO (f.affinito@cineca.it) Collective communications Communications involving a group of processes. They are called by all the ranks involved in a communicator (or

More information

MA471. Lecture 5. Collective MPI Communication

MA471. Lecture 5. Collective MPI Communication MA471 Lecture 5 Collective MPI Communication Today: When all the processes want to send, receive or both Excellent website for MPI command syntax available at: http://www-unix.mcs.anl.gov/mpi/www/ 9/10/2003

More information

Practical Scientific Computing: Performanceoptimized

Practical Scientific Computing: Performanceoptimized Practical Scientific Computing: Performanceoptimized Programming Advanced MPI Programming December 13, 2006 Dr. Ralf-Peter Mundani Department of Computer Science Chair V Technische Universität München,

More information

Distributed Systems + Middleware Advanced Message Passing with MPI

Distributed Systems + Middleware Advanced Message Passing with MPI Distributed Systems + Middleware Advanced Message Passing with MPI Gianpaolo Cugola Dipartimento di Elettronica e Informazione Politecnico, Italy cugola@elet.polimi.it http://home.dei.polimi.it/cugola

More information

ECE 574 Cluster Computing Lecture 13

ECE 574 Cluster Computing Lecture 13 ECE 574 Cluster Computing Lecture 13 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 21 March 2017 Announcements HW#5 Finally Graded Had right idea, but often result not an *exact*

More information

Cornell Theory Center. Discussion: MPI Collective Communication I. Table of Contents. 1. Introduction

Cornell Theory Center. Discussion: MPI Collective Communication I. Table of Contents. 1. Introduction 1 of 18 11/1/2006 3:59 PM Cornell Theory Center Discussion: MPI Collective Communication I This is the in-depth discussion layer of a two-part module. For an explanation of the layers and how to navigate

More information

UNIVERSITY OF MORATUWA

UNIVERSITY OF MORATUWA UNIVERSITY OF MORATUWA FACULTY OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING B.Sc. Engineering 2012 Intake Semester 8 Examination CS4532 CONCURRENT PROGRAMMING Time allowed: 2 Hours March

More information

More Communication (cont d)

More Communication (cont d) Data types and the use of communicators can simplify parallel program development and improve code readability Sometimes, however, simply treating the processors as an unstructured collection is less than

More information

Programming SoHPC Course June-July 2015 Vladimir Subotic MPI - Message Passing Interface

Programming SoHPC Course June-July 2015 Vladimir Subotic MPI - Message Passing Interface www.bsc.es Programming with Message-Passing Libraries SoHPC Course June-July 2015 Vladimir Subotic 1 Data Transfer Blocking: Function does not return, before message can be accessed again Process is blocked

More information

Programming Using the Message Passing Paradigm

Programming Using the Message Passing Paradigm Programming Using the Message Passing Paradigm Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003. Topic Overview

More information

HPC Parallel Programing Multi-node Computation with MPI - I

HPC Parallel Programing Multi-node Computation with MPI - I HPC Parallel Programing Multi-node Computation with MPI - I Parallelization and Optimization Group TATA Consultancy Services, Sahyadri Park Pune, India TCS all rights reserved April 29, 2013 Copyright

More information

CDP. MPI Derived Data Types and Collective Communication

CDP. MPI Derived Data Types and Collective Communication CDP MPI Derived Data Types and Collective Communication Why Derived Data Types? Elements in an MPI message are of the same type. Complex data, requires two separate messages. Bad example: typedef struct

More information

Non-Blocking Communications

Non-Blocking Communications Non-Blocking Communications Deadlock 1 5 2 3 4 Communicator 0 2 Completion The mode of a communication determines when its constituent operations complete. - i.e. synchronous / asynchronous The form of

More information

Data parallelism. [ any app performing the *same* operation across a data stream ]

Data parallelism. [ any app performing the *same* operation across a data stream ] Data parallelism [ any app performing the *same* operation across a data stream ] Contrast stretching: Version Cores Time (secs) Speedup while (step < NumSteps &&!converged) { step++; diffs = 0; foreach

More information

Topic Notes: Message Passing Interface (MPI)

Topic Notes: Message Passing Interface (MPI) Computer Science 400 Parallel Processing Siena College Fall 2008 Topic Notes: Message Passing Interface (MPI) The Message Passing Interface (MPI) was created by a standards committee in the early 1990

More information

Non-Blocking Communications

Non-Blocking Communications Non-Blocking Communications Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

For developers. If you do need to have all processes write e.g. debug messages, you d then use channel 12 (see below).

For developers. If you do need to have all processes write e.g. debug messages, you d then use channel 12 (see below). For developers A. I/O channels in SELFE You need to exercise caution when dealing with parallel I/O especially for writing. For writing outputs, you d generally let only 1 process do the job, e.g. if(myrank==0)

More information

Lecture 17: Array Algorithms

Lecture 17: Array Algorithms Lecture 17: Array Algorithms CS178: Programming Parallel and Distributed Systems April 4, 2001 Steven P. Reiss I. Overview A. We talking about constructing parallel programs 1. Last time we discussed sorting

More information

Introduction to MPI Programming Part 2

Introduction to MPI Programming Part 2 Introduction to MPI Programming Part 2 Outline Collective communication Derived data types Collective Communication Collective communications involves all processes in a communicator One to all, all to

More information

Advanced MPI. Andrew Emerson

Advanced MPI. Andrew Emerson Advanced MPI Andrew Emerson (a.emerson@cineca.it) Agenda 1. One sided Communications (MPI-2) 2. Dynamic processes (MPI-2) 3. Profiling MPI and tracing 4. MPI-I/O 5. MPI-3 11/12/2015 Advanced MPI 2 One

More information

Matrix-vector Multiplication

Matrix-vector Multiplication Matrix-vector Multiplication Review matrix-vector multiplication Propose replication of vectors Develop three parallel programs, each based on a different data decomposition Outline Sequential algorithm

More information

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI CS 470 Spring 2018 Mike Lam, Professor Distributed Programming & MPI MPI paradigm Single program, multiple data (SPMD) One program, multiple processes (ranks) Processes communicate via messages An MPI

More information

Parallel Programming

Parallel Programming Parallel Programming for Multicore and Cluster Systems von Thomas Rauber, Gudula Rünger 1. Auflage Parallel Programming Rauber / Rünger schnell und portofrei erhältlich bei beck-shop.de DIE FACHBUCHHANDLUNG

More information

Collective Communication: Gatherv. MPI v Operations. root

Collective Communication: Gatherv. MPI v Operations. root Collective Communication: Gather MPI v Operations A Gather operation has data from all processes collected, or gathered, at a central process, referred to as the root Even the root process contributes

More information

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI CS 470 Spring 2017 Mike Lam, Professor Distributed Programming & MPI MPI paradigm Single program, multiple data (SPMD) One program, multiple processes (ranks) Processes communicate via messages An MPI

More information

Lecture 7: More about MPI programming. Lecture 7: More about MPI programming p. 1

Lecture 7: More about MPI programming. Lecture 7: More about MPI programming p. 1 Lecture 7: More about MPI programming Lecture 7: More about MPI programming p. 1 Some recaps (1) One way of categorizing parallel computers is by looking at the memory configuration: In shared-memory systems

More information

Reusing this material

Reusing this material Virtual Topologies Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

a. Assuming a perfect balance of FMUL and FADD instructions and no pipeline stalls, what would be the FLOPS rate of the FPU?

a. Assuming a perfect balance of FMUL and FADD instructions and no pipeline stalls, what would be the FLOPS rate of the FPU? CPS 540 Fall 204 Shirley Moore, Instructor Test November 9, 204 Answers Please show all your work.. Draw a sketch of the extended von Neumann architecture for a 4-core multicore processor with three levels

More information

Parallel Programming. Using MPI (Message Passing Interface)

Parallel Programming. Using MPI (Message Passing Interface) Parallel Programming Using MPI (Message Passing Interface) Message Passing Model Simple implementation of the task/channel model Task Process Channel Message Suitable for a multicomputer Number of processes

More information

CEE 618 Scientific Parallel Computing (Lecture 5): Message-Passing Interface (MPI) advanced

CEE 618 Scientific Parallel Computing (Lecture 5): Message-Passing Interface (MPI) advanced 1 / 32 CEE 618 Scientific Parallel Computing (Lecture 5): Message-Passing Interface (MPI) advanced Albert S. Kim Department of Civil and Environmental Engineering University of Hawai i at Manoa 2540 Dole

More information

Recap of Parallelism & MPI

Recap of Parallelism & MPI Recap of Parallelism & MPI Chris Brady Heather Ratcliffe The Angry Penguin, used under creative commons licence from Swantje Hess and Jannis Pohlmann. Warwick RSE 13/12/2017 Parallel programming Break

More information

Collective Communication: Gather. MPI - v Operations. Collective Communication: Gather. MPI_Gather. root WORKS A OK

Collective Communication: Gather. MPI - v Operations. Collective Communication: Gather. MPI_Gather. root WORKS A OK Collective Communication: Gather MPI - v Operations A Gather operation has data from all processes collected, or gathered, at a central process, referred to as the root Even the root process contributes

More information

MPI - v Operations. Collective Communication: Gather

MPI - v Operations. Collective Communication: Gather MPI - v Operations Based on notes by Dr. David Cronk Innovative Computing Lab University of Tennessee Cluster Computing 1 Collective Communication: Gather A Gather operation has data from all processes

More information

INTRODUCTION TO MPI COMMUNICATORS AND VIRTUAL TOPOLOGIES

INTRODUCTION TO MPI COMMUNICATORS AND VIRTUAL TOPOLOGIES INTRODUCTION TO MPI COMMUNICATORS AND VIRTUAL TOPOLOGIES Introduction to Parallel Computing with MPI and OpenMP 24 november 2017 a.marani@cineca.it WHAT ARE COMMUNICATORS? Many users are familiar with

More information

MPI Workshop - III. Research Staff Cartesian Topologies in MPI and Passing Structures in MPI Week 3 of 3

MPI Workshop - III. Research Staff Cartesian Topologies in MPI and Passing Structures in MPI Week 3 of 3 MPI Workshop - III Research Staff Cartesian Topologies in MPI and Passing Structures in MPI Week 3 of 3 Schedule 4Course Map 4Fix environments to run MPI codes 4CartesianTopology! MPI_Cart_create! MPI_

More information

MPI 2. CSCI 4850/5850 High-Performance Computing Spring 2018

MPI 2. CSCI 4850/5850 High-Performance Computing Spring 2018 MPI 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives

More information

Peter Pacheco. Chapter 3. Distributed Memory Programming with MPI. Copyright 2010, Elsevier Inc. All rights Reserved

Peter Pacheco. Chapter 3. Distributed Memory Programming with MPI. Copyright 2010, Elsevier Inc. All rights Reserved An Introduction to Parallel Programming Peter Pacheco Chapter 3 Distributed Memory Programming with MPI 1 Roadmap Writing your first MPI program. Using the common MPI functions. The Trapezoidal Rule in

More information

Distributed Memory Programming with MPI. Copyright 2010, Elsevier Inc. All rights Reserved

Distributed Memory Programming with MPI. Copyright 2010, Elsevier Inc. All rights Reserved An Introduction to Parallel Programming Peter Pacheco Chapter 3 Distributed Memory Programming with MPI 1 Roadmap Writing your first MPI program. Using the common MPI functions. The Trapezoidal Rule in

More information

Advanced MPI. Andrew Emerson

Advanced MPI. Andrew Emerson Advanced MPI Andrew Emerson (a.emerson@cineca.it) Agenda 1. One sided Communications (MPI-2) 2. Dynamic processes (MPI-2) 3. Profiling MPI and tracing 4. MPI-I/O 5. MPI-3 22/02/2017 Advanced MPI 2 One

More information

Introduction to the Message Passing Interface (MPI)

Introduction to the Message Passing Interface (MPI) Introduction to the Message Passing Interface (MPI) CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Introduction to the Message Passing Interface (MPI) Spring 2018

More information

5/5/2012. Message Passing Programming Model Blocking communication. Non-Blocking communication Introducing MPI. Non-Buffered Buffered

5/5/2012. Message Passing Programming Model Blocking communication. Non-Blocking communication Introducing MPI. Non-Buffered Buffered Lecture 7: Programming Using the Message-Passing Paradigm 1 Message Passing Programming Model Blocking communication Non-Buffered Buffered Non-Blocking communication Introducing MPI 2 1 Programming models

More information

Advanced Parallel Programming

Advanced Parallel Programming Advanced Parallel Programming Derived Datatypes Dr Daniel Holmes Applications Consultant dholmes@epcc.ed.ac.uk Overview Lecture will cover derived datatypes memory layouts vector datatypes floating vs

More information

Advanced Parallel Programming

Advanced Parallel Programming Advanced Parallel Programming Derived Datatypes Dr David Henty HPC Training and Support Manager d.henty@epcc.ed.ac.uk +44 131 650 5960 16/01/2014 MPI-IO 2: Derived Datatypes 2 Overview Lecture will cover

More information

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 8 Matrix-vector Multiplication Chapter Objectives Review matrix-vector multiplication Propose replication of vectors Develop three

More information

More about MPI programming. More about MPI programming p. 1

More about MPI programming. More about MPI programming p. 1 More about MPI programming More about MPI programming p. 1 Some recaps (1) One way of categorizing parallel computers is by looking at the memory configuration: In shared-memory systems, the CPUs share

More information

INTRODUCTION TO MPI VIRTUAL TOPOLOGIES

INTRODUCTION TO MPI VIRTUAL TOPOLOGIES INTRODUCTION TO MPI VIRTUAL TOPOLOGIES Introduction to Parallel Computing with MPI and OpenMP 18-19-20 november 2013 a.marani@cineca.it g.muscianisi@cineca.it l.ferraro@cineca.it VIRTUAL TOPOLOGY Topology:

More information

Programming Using the Message-Passing Paradigm (Chapter 6) Alexandre David

Programming Using the Message-Passing Paradigm (Chapter 6) Alexandre David Programming Using the Message-Passing Paradigm (Chapter 6) Alexandre David 1.2.05 1 Topic Overview Principles of Message-Passing Programming MPI: the Message Passing Interface Topologies and Embedding

More information

DPHPC Recitation Session 2 Advanced MPI Concepts

DPHPC Recitation Session 2 Advanced MPI Concepts TIMO SCHNEIDER DPHPC Recitation Session 2 Advanced MPI Concepts Recap MPI is a widely used API to support message passing for HPC We saw that six functions are enough to write useful

More information

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI CS 470 Spring 2019 Mike Lam, Professor Distributed Programming & MPI MPI paradigm Single program, multiple data (SPMD) One program, multiple processes (ranks) Processes communicate via messages An MPI

More information

COMP 322: Fundamentals of Parallel Programming

COMP 322: Fundamentals of Parallel Programming COMP 322: Fundamentals of Parallel Programming https://wiki.rice.edu/confluence/display/parprog/comp322 Lecture 37: Introduction to MPI (contd) Vivek Sarkar Department of Computer Science Rice University

More information

Message Passing Interface

Message Passing Interface MPSoC Architectures MPI Alberto Bosio, Associate Professor UM Microelectronic Departement bosio@lirmm.fr Message Passing Interface API for distributed-memory programming parallel code that runs across

More information

An Introduction to Parallel Programming

An Introduction to Parallel Programming Guide 48 Version 2 An Introduction to Parallel Programming Document code: Guide 48 Title: An Introduction to Parallel Programming Version: 2 Date: 31/01/2011 Produced by: University of Durham Information

More information

MPI Programming Techniques

MPI Programming Techniques MPI Programming Techniques Copyright (c) 2012 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any

More information

The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs

The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs 1 The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs http://mpi-forum.org https://www.open-mpi.org/ Mike Bailey mjb@cs.oregonstate.edu Oregon State University mpi.pptx

More information

Parallel Programming. Matrix Decomposition Options (Matrix-Vector Product)

Parallel Programming. Matrix Decomposition Options (Matrix-Vector Product) Parallel Programming Matrix Decomposition Options (Matrix-Vector Product) Matrix Decomposition Sequential algorithm and its complexity Design, analysis, and implementation of three parallel programs using

More information

Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 8

Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 8 Chapter 8 Matrix-vector Multiplication Chapter Objectives Review matrix-vector multiplicaiton Propose replication of vectors Develop three parallel programs, each based on a different data decomposition

More information

MPI for Scalable Computing (continued from yesterday)

MPI for Scalable Computing (continued from yesterday) MPI for Scalable Computing (continued from yesterday) Bill Gropp, University of Illinois at Urbana-Champaign Rusty Lusk, Argonne National Laboratory Rajeev Thakur, Argonne National Laboratory Topology

More information

The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs

The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs 1 The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) s http://mpi-forum.org https://www.open-mpi.org/ Mike Bailey mjb@cs.oregonstate.edu Oregon State University mpi.pptx

More information

Distributed Memory Parallel Programming

Distributed Memory Parallel Programming COSC Big Data Analytics Parallel Programming using MPI Edgar Gabriel Spring 201 Distributed Memory Parallel Programming Vast majority of clusters are homogeneous Necessitated by the complexity of maintaining

More information

Working with IITJ HPC Environment

Working with IITJ HPC Environment Working with IITJ HPC Environment by Training Agenda for 23 Dec 2011 1. Understanding Directory structure of IITJ HPC 2. User vs root 3. What is bash_profile 4. How to install any source code in your user

More information

Introduction to MPI: Part II

Introduction to MPI: Part II Introduction to MPI: Part II Pawel Pomorski, University of Waterloo, SHARCNET ppomorsk@sharcnetca November 25, 2015 Summary of Part I: To write working MPI (Message Passing Interface) parallel programs

More information