Parallel Programming with MPI MARCH 14, 2018

Size: px
Start display at page:

Download "Parallel Programming with MPI MARCH 14, 2018"

Transcription

1 Parallel Programming with MPI SARDAR USMAN & EMAD ALAMOUDI SUPERVISOR: PROF. RASHID MEHMOOD MARCH 14, 2018

2 Sources The presentation is compiled using following sources ioma=en PARALLEL PROGRAMMGIN WITH MPI 2

3 Outline Parallel programming overview Brief overview of MPI General Program Structure Basic MPI Functions Point to point Communication Communication modes Blocking and non-blocking communication MPI Collective Communication MPI Groups and Communicators PARALLEL PROGRAMMGIN WITH MPI 3

4 Parallel Computing PARALLEL PROGRAMMGIN WITH MPI 4

5 Why Parallel Computing Save Time and Money PARALLEL PROGRAMMGIN WITH MPI 5

6 Why Parallel Computing Solve Large/More Complex problems e.g. Web Search engines/databases processing millions of instructions/second. Provide Concurrency Take advantage of non-local resources Make better use of underlying parallel hardware Parallelism is future of computing and race is already on for Exascale computing. PARALLEL PROGRAMMGIN WITH MPI 6

7 Parallel Programming Models There are several parallel programming models in common use: Shared Memory (without threads) Threads Distributed Memory / Message Passing Hybrid Single Program Multiple Data (SPMD) Multiple Program Multiple Data (MPMD) Parallel programming models exist as an abstraction above hardware and memory architectures. PARALLEL PROGRAMMGIN WITH MPI 7

8 Two Memory Models Shared memory: all processors share the same address space OpenMP: directive-based programming PGAD languages (UPC, Titanium, X10) Distributed memory: every processor has its own address space MPI: Message Passing Interface PARALLEL PROGRAMMGIN WITH MPI 8

9 Shared Memory Model On A Distributed Memory Machine Kendall Square Research (KSR) ALLCACHE approach. Machine memory was physically distributed across networked machines, but appeared to the user as a single shared memory global address space. Generically, this approach is referred to as "virtual shared memory". PARALLEL PROGRAMMGIN WITH MPI 11

10 Distributed Memory Model On A Shared Memory Machine Message Passing Interface (MPI) on SGI Origin The SGI Origin 2000 employed the CC-NUMA type of shared memory architecture, where every task has direct access to global address space spread across all machines. However, the ability to send and receive messages using MPI, as is commonly done over a network of distributed memory machines, was implemented and commonly used. PARALLEL PROGRAMMGIN WITH MPI 12

11 Parallel Programming Models This model demonstrates the following characteristics: A set of tasks that use their own local memory during computation. Multiple tasks can reside on the same physical machine and/or across an arbitrary number of machines. Tasks exchange data through communications by sending and receiving messages. Data transfer usually requires cooperative operations to be performed by each process. For example, a send operation must have a matching receive operation. PARALLEL PROGRAMMGIN WITH MPI 13

12 Message Passing Interface (MPI) MPI is a specification for message passing library that is standardized by MPI Forum Multiple vendor-specific implementations: MPICH, OpenMPI, Intel MPI MPI implementations are used for programming systems with distributed memory Each process has a different address space Processes need to communicate with each other Can also be used for shared memory and hybrid architectures MPI specifications have been defined for C, C++ and Fortran programs Goal of MPI is to establish portable, efficient and flexible standard for writing message passing programs PARALLEL PROGRAMMGIN WITH MPI 14

13 Message Passing Interface (MPI) Reasons for using MPI Standardization Portability Functionality Availability Multiple vendor-specific implementations: MPICH, OpenMPI, Intel MPI PARALLEL PROGRAMMGIN WITH MPI 15

14 Applications (Science and Engineering) MPI is widely used in large scale parallel applications in science and engineering Atmosphere, Earth, Environment Physics - applied, nuclear, particle, condensed matter, high pressure, fusion, photonics Bioscience, Biotechnology, Genetics Chemistry, Molecular Sciences Geology, Seismology Mechanical Engineering - from prosthetics to spacecraft Electrical Engineering, Circuit Design, Microelectronics Computer Science, Mathematics PARALLEL PROGRAMMGIN WITH MPI 16

15 Important considerations while using MPI All parallelism is explicit: the programmer is responsible for correctly identifying parallelism and implementing parallel algorithms using MPI constructs PARALLEL PROGRAMMGIN WITH MPI 17

16 MPI Application Structure PARALLEL PROGRAMMGIN WITH MPI 18

17 MPI Basics mpirun starts the required number of processes. All processes which have been started by mpirun are organized in a process group (Communicator) called MPI_COMM_WORLD Every process has unique identifier(rank) which is between 0 and n-1. PARALLEL PROGRAMMGIN WITH MPI 19

18 Compiling and Running MPI applications MPI is a library Applications can be written in C, C++ or Fortran and appropriate calls to MPI can be added where required Compilation: Regular applications: gcc test.c -o test MPI applications mpicc test.c -o test Execution: Regular applications:./test MPI applications (running with 16 processes): mpiexec n 16./test PARALLEL PROGRAMMGIN WITH MPI 20

19 MPI Basics MPI_Init (&argc,&argv) : Initializes the MPI execution environment. MPI_Comm_size (comm,&size) : Returns the total number of MPI processes in the specified communicator MPI_Comm_rank (comm,&rank) : Returns the rank of the calling MPI process within the specified communicator MPI_Finalize () : Terminates the MPI execution environment PARALLEL PROGRAMMGIN WITH MPI 22

20 Simple MPI Program Identifying Processes PARALLEL PROGRAMMGIN WITH MPI 23

21 Data Communication Data communication in MPI is like exchange One process sends a copy of the data to another process (or a group of processes), and the other process receives it Communication requires the following information: Sender has to know: Whom to send the data to (receiver s process rank) What kind of data to send (100 integers or 200 characters, etc) A user-defined tag for the message (think of it as an subject; allows the receiver to understand what type of data is being received) Receiver might have to know: Who is sending the data (OK if the receiver does not know; in this case sender rank will be MPI_ANY_SOURCE, meaning anyone can send) What kind of data is being received (partial information is OK: I might receive up to 1000 integers) What the user-defined tag of the message is (OK if the receiver does not know; in this case tag will be MPI_ANY_TAG) PARALLEL PROGRAMMGIN WITH MPI 24

22 Ranks for communication When sending data, the sender has to specify the destination process rank Tells where the message should go The receiver has to specify the source process rank Tells where the message will come from MPI_ANY_SOURCE is a special wild-card source that can be Used by the receiver to match any source PARALLEL PROGRAMMGIN WITH MPI 25

23 Point-to-Point Communication Communication between two processes Source process sends message to destination process Communication takes place within a communicator Destination process is identified by its rank in the communicator destination source 0 2 PARALLEL PROGRAMMGIN WITH MPI 26

24 Definitions Completion means that memory locations used in the message transfer can be safely accessed send: variable sent can be reused after completion receive: variable received can now be used MPI communication modes differ in what conditions on the receiving end are needed for completion Communication modes can be blocking or non-blocking Blocking: return from function call implies completion Non-blocking: routine returns immediately PARALLEL PROGRAMMGIN WITH MPI 27

25 Simple Communication in MPI PARALLEL PROGRAMMGIN WITH MPI 28

26 Sending data Number of Elements to send Data Type of Element Process rank where data need to be sent MPI_Send(&S_buf, count, MPI_Datatype, dest, tag, MPI_COMM_WORLD) Data to send Process group for all Process started by mpirun User defined Unique message identifier PARALLEL PROGRAMMGIN WITH MPI 29

27 Receiving data Number of Elements to send Data Type of Element Process rank from where data is sent MPI_Send(&S_buf, count, MPI_Datatype, source, tag, MPI_COMM_WORLD, &status) Data to send Process group for all Process started by mpirun User defined Unique message identifier Status Information PARALLEL PROGRAMMGIN WITH MPI 30

28 Array decomposition and addition The master task first initializes an array and then distributes an equal portion that array to the other tasks. Other tasks perform an addition operation to each array element As each of the non-master tasks finish, they send their updated portion of the array to the master. Finally, the master task displays selected parts of the final array and the global sum of all array elements PARALLEL PROGRAMMGIN WITH MPI 31

29 Array Decomposition and Addition Process Process 0 Process 1 Process 2 Process 3 Process = 36 PARALLEL PROGRAMMGIN WITH MPI 32

30 Array Decomposition and Addition PARALLEL PROGRAMMGIN WITH MPI 33

31 Serial Vs Parallel 0.06 Execution Time Serial Parallel Execution Time PARALLEL PROGRAMMGIN WITH MPI 34

32 Communication Modes PARALLEL PROGRAMMGIN WITH MPI 35

33 Buffering In a perfect world, every send operation would be perfectly synchronized with its matching receive. MPI implementation must be able to deal with storing data when the two tasks are out of sync. A send operation occurs 5 seconds before the receive is ready - where is the message while the receive is pending? Multiple sends arrive at the same receiving task which can only accept one send at a time - what happens to the messages that are "backing up"? PARALLEL PROGRAMMGIN WITH MPI 36

34 Buffering PARALLEL PROGRAMMGIN WITH MPI 37

35 System buffer space Opaque to the programmer and managed entirely by the MPI library A finite resource that can be easy to exhaust Able to exist on the sending side, the receiving side, or both. Something that may improve program performance because it allows send - receive operations to be asynchronous. MPI also provides for a user managed send buffer. PARALLEL PROGRAMMGIN WITH MPI 38

36 Blocking and Non-blocking Send and receive can be blocking or non-blocking A blocking send can be used with a non-blocking receive, and viceversa Non-blocking sends can use any mode synchronous, buffered, standard, or ready. Non-blocking send and receive routines return almost immediately. Non-blocking operations simply "request" the MPI library to perform the operation when it is able. It is unsafe to modify the application buffer (your variable space) until you know for a fact the requested non-blocking operation was actually performed by the library. There are "wait" routines used to do this. PARALLEL PROGRAMMGIN WITH MPI 39

37 Non-blocking Non-blocking communications are primarily used to overlap computation with communication and exploit possible performance gains. Characteristics of non-blocking communications No possibility of deadlocks Decrease in synchronization overhead Extra computation and code to test and wait for completion Must not access buffer before completion PARALLEL PROGRAMMGIN WITH MPI 40

38 Avoiding Race conditions Following code is not safe int i=123; MPI_Request myrequest; MPI_Isend(&i, 1, MPI_INT, 1, MY_LITTLE_TAG, MPI_COMM_WORLD, &myrequest); i=234; Backend MPI routines will read the value of i after we changed it to 234, so 234 will be sent instead of 123. This is a race condition, which can be very difficult to debug. PARALLEL PROGRAMMGIN WITH MPI 41

39 Avoiding Race conditions int i=123; MPI_Request myrequest; MPI_Isend(&i, 1, MPI_INT, 1, MY_LITTLE_TAG, MPI_COMM_WORLD, &myrequest); // do some calculations here // Before we re-use variable i, we need to wait until the asynchronous function call is complete MPI_Status mystatus; MPI_Wait(&myRequest, &mystatus); i=234; PARALLEL PROGRAMMGIN WITH MPI 42

40 Non-blocking Communication Functions PARALLEL PROGRAMMGIN WITH MPI 43

41 Non-blocking Communication Simple hello world program that uses nonblocking send/receive routines. Request: non-blocking operations may return before the requested system buffer space is obtained, the system issues a unique "request number". PARALLEL PROGRAMMGIN WITH MPI 44

42 Blocking A blocking send routine will only "return" after it is safe to modify the application buffer (your send data) for reuse. The message might be copied directly into the matching receive buffer, or it might be copied into a temporary system buffer. A blocking send can be synchronous which means there is handshaking occurring with the receive task to confirm a safe send. A blocking send can be asynchronous if a system buffer is used to hold the data for eventual delivery to the receive. A blocking receive only "returns" after the data has arrived and is ready for use by the program. PARALLEL PROGRAMMGIN WITH MPI 45

43 Standard and Synchronous Send Standard send Completes once message has been sent May or may not imply that message arrived Don t make any assumptions (implementation dependent) Buffered send Data to be sent is copied to a user-specified buffer Higher system overhead of copying data to and from buffer Lower synchronization overhead for sender PARALLEL PROGRAMMGIN WITH MPI 46

44 Ready and Buffered Send Ready send Ready to receive notification must be posted; otherwise it exits with an error Should not be used unless user is certain that corresponding receive is posted before the send Lower synchronization overhead for sender as compared to synchronous send Synchronous send Use if need to know that message has been received Sending and receiving process synchronize regardless of who is faster. Thus, processor idle time is possible Large synchronization overhead Safest communication method PARALLEL PROGRAMMGIN WITH MPI 47

45 Blocking Communication Functions PARALLEL PROGRAMMGIN WITH MPI 48

46 MPI_Bsend This is a simple program that tests MPI_bsend. PARALLEL PROGRAMMGIN WITH MPI 49

47 Blocking Buffered Communication PARALLEL PROGRAMMGIN WITH MPI 50

48 Non-blocking Non-buffered Communication PARALLEL PROGRAMMGIN WITH MPI 51

49 For a Communication to Succeed Sender must specify a valid destination rank Receiver must specify a valid source rank The communicator must be the same Tags must match Receiver s buffer must be large enough User-specified buffer should be large enough (buffered send only) Receive posted before send (ready send only) PARALLEL PROGRAMMGIN WITH MPI 52

50 Completion Tests Waiting and Testing for completion Wait: function does not return until completion finished Test: function returns a TRUE or FALSE value depending on whether or not the communication has completed int MPI_Wait(MPI_Request *request, MPI_Status *status) int MPI_Test(MPI_Request *request, int *flag, MPI_Status *status) PARALLEL PROGRAMMGIN WITH MPI 53

51 Testing Multiple Communications Test or wait for completion of one (and only one) message MPI_Waitany & MPI_Testany Test or wait for completion of all messages MPI_Waitall & MPI_Testall Test or wait for completion of as many messages as possible MPI_Waitsome & MPI_Testsome PARALLEL PROGRAMMGIN WITH MPI 54

52 Wildcarding Receiver can wildcard To receive from any source specify MPI_ANY_SOURCE as rank of source To receive with any tag specify MPI_ANY_TAG as tag Actual source and tag are returned in the receiver s status parameter PARALLEL PROGRAMMGIN WITH MPI 55

53 Receive Information Information of data is returned from MPI_Recv (or MPI_Irecv) as status Information includes: Source: status.mpi_source Tag: status.mpi_tag Error: status.mpi_error Count: message received may not fill receive buffer. Use following function to find number of elements actually received: int MPI_Get_count(MPI_Status status, MPI_Datatype datatype, int *count) Message order preservation: messages do not overtake each other. Messages are received in the order sent. PARALLEL PROGRAMMGIN WITH MPI 56

54 Timers double MPI_Wtime(void) Time is measured in seconds Time to perform a task is measured by consulting the timer before and after PARALLEL PROGRAMMGIN WITH MPI 57

55 Deadlocks A deadlock occurs when two or more processors try to access the same set of resources Deadlocks are possible in blocking communication Example: Two processors initiate a blocking send to each other without posting a receive Process 0 Process 1 MPI_Send(P1) MPI_Recv(P1) MPI_Send(P0) MPI_Recv (P0) PARALLEL PROGRAMMGIN WITH MPI 58

56 Avoiding Deadlocks Different ordering of send and receive: one processor post the send while the other posts the receive Use non-blocking functions: Post non-blocking receives early and test for completion Use buffered mode: Use buffered sends so that execution continues after copying to user-specified buffer PARALLEL PROGRAMMGIN WITH MPI 59

57 Matrix Multiplication In this code, the master task distributes a matrix to numtasks-1 worker tasks. Each worker task performs the multiplication on their chunk of matrices. And send the result back to Master. PARALLEL PROGRAMMGIN WITH MPI 60

58 Matrix Multiplication PARALLEL PROGRAMMGIN WITH MPI 61

59 Serial vs. Parallel 0.7 Execution Time Execution Time Serial Parallel PARALLEL PROGRAMMGIN WITH MPI 62

60 MPI Collective Communication PARALLEL PROGRAMMGIN WITH MPI 63

61 MPI Collective Communication All processes in the group have to participate in the same operation. Process group is defined by the communicator. For each communicator, one can have one collective operation ongoing at a time. Eases programming Enables low-level optimizations and adaptations to the hardware infrastructure. PARALLEL PROGRAMMGIN WITH MPI 64

62 Characteristics of Collective Communication Collective communication will not interfere with point-to-point communication All processes must call the collective function Substitute for a sequence of point-to-point function calls Synchronization not guaranteed (except for barrier) No tags are needed PARALLEL PROGRAMMGIN WITH MPI 65

63 Types of Collective Communication Synchronization barrier Data exchange broadcast gather, scatter, all-gather, and all-to-all exchange Variable-size-location versions of above Global reduction (collective operations) sum, minimum, maximum, etc PARALLEL PROGRAMMGIN WITH MPI 66

64 Synchronization COLLECTIVE COMMUNICATION PARALLEL PROGRAMMGIN WITH MPI 67

65 Barrier Synchronization PARALLEL PROGRAMMGIN WITH MPI 68

66 Barrier Synchronization Red light for each processor: turns green when all processors have arrived A process calling it will be blocked until all processes in the group (communicator) have called it MPI_ Barrier(MPI_Comm comm) comm: communicator whose processes need to be synchronized PARALLEL PROGRAMMGIN WITH MPI 69

67 Data exchange COLLECTIVE COMMUNICATION PARALLEL PROGRAMMGIN WITH MPI 70

68 MPI_Bcast The process with the rank root distributes the data stored in buffer to all other processes in the communicator comm. Data in buffer is identical on all other processes after the broadcast. PARALLEL PROGRAMMGIN WITH MPI 71

69 Traditional Send and Receive PARALLEL PROGRAMMGIN WITH MPI 72

70 Collective communication using MPI_Bcast PARALLEL PROGRAMMGIN WITH MPI 73

71 Broadcast One-to-all communication: same data sent from root process to all others in communicator All processes must call the function specifying the same root and communicator MPI_Bcast (&buf, count, datatype, root, comm). buf: starting address of buffer (sending and receiving) count: number of elements to be sent/received datatype: MPI datatype of elements root: rank of sending process comm: MPI communicator of processors involved PARALLEL PROGRAMMGIN WITH MPI 74

72 MPI_Bcast 5 Before MPI_Bcast Process 1 Process 2 Process 3 Process 4 After MPI_Bcast Process 1 Process 2 Process 3 Process PARALLEL PROGRAMMGIN WITH MPI 75

73 MPI_Bcast A simple example that synchronize before and after sending data, then calculated the time taken. PARALLEL PROGRAMMGIN WITH MPI 76

74 MPI_Scatter The process with the rank root distribute data stored in sendbuf to all other processes in communicator comm. Every process gets different segments of the original data at the root process. PARALLEL PROGRAMMGIN WITH MPI 77

75 Scatter Example: partitioning an array equally among the processes MPI_Scatter(&sbuf, scount, stype,&rbuf, rcount, rtype, root, comm) sbuf and rbuf: starting address of send and receive buffers scount and rcount: number of elements sent and received to/from each process stype and rtype: MPI datatype of sent/received data root: rank of sending process comm: MPI communicator of processors involved PARALLEL PROGRAMMGIN WITH MPI 78

76 MPI_Scatter Before MPI_Scatter Process 1 Process 2 Process 3 Process 4 Process 1 After MPI_Scatter Process 2 Process 3 Process PARALLEL PROGRAMMGIN WITH MPI 79

77 MPI_Scatter A simple program that distribute a table to different processors that each one takes a row. PARALLEL PROGRAMMGIN WITH MPI 80

78 MPI_Gather Reverse operation of MPI_Scatter. The root process receives the data stored in send buffer on all other process in the communicator comm into the receive buffer. PARALLEL PROGRAMMGIN WITH MPI 81

79 MPI_Gather MPI_Gather(&sbuf, scount, stype, &rbuf, rcount, rtype, root, comm) sbuf and rbuf: starting address of send and receive buffers scount and rcount: number of elements sent and received to/from each process stype and rtype: MPI datatype of sent/received data root: rank of sending process comm: MPI communicator of processors involved PARALLEL PROGRAMMGIN WITH MPI 82

80 MPI_Gather Process 1 Before MPI_Gather Process 2 Process 3 Process After MPI_Gather Process 1 Process 2 Process 3 Process PARALLEL PROGRAMMGIN WITH MPI 83

81 MPI_Gather A simple program the collect integer values from different processor. The master (Processor 0) is the one that will have all the values. PARALLEL PROGRAMMGIN WITH MPI 84

82 All-Gather and All-to-All (1) All-gather All processes, rather than just the root, gather data from the group All-to-all mpi_alltoall is an extension of mpi_allgather to the case where each process sends distinct data to each of the receivers. All processes receive data from all processes in rank order No root process specified PARALLEL PROGRAMMGIN WITH MPI 85

83 MPI AllGather Example MPI_Allgather PARALLEL PROGRAMMGIN WITH MPI 86

84 MPI Alltoall Example MPI_Alltoall PARALLEL PROGRAMMGIN WITH MPI 87

85 All-Gather and All-to-All (2) MPI_Allgather(&sbuf, scount, stype, &rbuf, rcount, rtype, comm) MPI_Alltoall(&sbuf, scount, stype, &rbuf, rcount, rtype, comm) scount: number of elements sent to each process; for all-to-all communication, size of sbuf should be scount*p (p = # of processes) rcount: number of elements received from any process; size of rbuf should be rcount*p (p = # of processes) PARALLEL PROGRAMMGIN WITH MPI 88

86 MPI_Allgather A program that computes the average of an array of elements in parallel using MPI_Scatter and MPI_Allgather PARALLEL PROGRAMMGIN WITH MPI 89

87 Variable-Size-Location Collective Functions Allows varying size and relative locations of messages in buffer Examples: MPI_Scatterv, MPI_Gatherv, MPI_Allgatherv, MPI_Alltoallv Advantages: More flexibility in writing code More compact code Disadvantage: may be less efficient than fixed size/location functions PARALLEL PROGRAMMGIN WITH MPI 90

88 Scatterv and Gatherv MPI_Scatterv(&sbuf, &scount, &displs, stype, &rbuf, rcount, rtype, root, comm) MPI_Gatherv(&sbuf, scount, stype, &rbuf, &rcount, &displs, rtype, root, comm) &scount and &rcount: integer array containing number of elements sent/received to/from each process &displs: integer array specifying the displacements relative to start of buffer at which to send/place data to corresponding process PARALLEL PROGRAMMGIN WITH MPI 91

89 MPI_Gatherv Count displs 7 P1 P2 P 0 P P4 PARALLEL PROGRAMMGIN WITH MPI 92

90 MPI_Scatterv P1 P2 P3 Root P Count 1 3 displs 7 PARALLEL PROGRAMMGIN WITH MPI 93

91 MPI_Scatterv A program the distribute data to several processors. However, the data have variant size. PARALLEL PROGRAMMGIN WITH MPI 94

92 Global reduction COLLECTIVE COMMUNICATION PARALLEL PROGRAMMGIN WITH MPI 95

93 Global Reduction Operations (1) Used to compute a result involving data distributed over a group of processes Result placed in specified process or all processes Examples Global sum or product Global maximum or minimum PARALLEL PROGRAMMGIN WITH MPI 96

94 Global Reduction Operations (2) MPI_Reduce returns results to a single process (root) MPI_Allreduce returns results to all processes in the group MPI_Reduce_scatter scatters a vector, which results from a reduce operation, across all processes PARALLEL PROGRAMMGIN WITH MPI 97

95 Global Reduction Operations (3) MPI_Reduce(&sbuf, &rbuf, count, stype, op, root, comm) MPI_Allreduce(&sbuf, &rbuf, count, stype, op, comm) MPI_Reduce_scatter(&sbuf, &rbuf, &rcounts, stype, op, comm) sbuf: address of send buffer rbuf: address of receive buffer rcounts: integer array that has counts of elements received from each process op: reduce operation, which may be MPI predefined or userdefined (by using MPI_Op_create) PARALLEL PROGRAMMGIN WITH MPI 98

96 Predefined Reduction Operations MPI name MPI_MAX MPI_MIN MPI_SUM MPI_PROD MPI_LAND MPI_BAND MPI_LOR MPI_BOR MPI_LXOR MPI_BXOR MPI_MAXLOC MPI_MINLOC Function Maximum Minimum Sum Product Logical AND Bitwise AND Logical OR Bitwise OR Logical exclusive OR Bitwise exclusive OR Maximum and location Minimum and location PARALLEL PROGRAMMGIN WITH MPI 99

97 MPI Reduce PARALLEL PROGRAMMGIN WITH MPI 100

98 MPI Reduce Example PARALLEL PROGRAMMGIN WITH MPI 101

99 MPI_Op PARALLEL PROGRAMMGIN WITH MPI 102

100 MPI_Reduce Program that computes the average of an array of elements in parallel using MPI_Reduce. PARALLEL PROGRAMMGIN WITH MPI 103

101 MPI AllReduce PARALLEL PROGRAMMGIN WITH MPI 104

102 MPI AllReduce Example PARALLEL PROGRAMMGIN WITH MPI 105

103 MPI_AllReduce Program that computes the standard deviation of an array of elements in parallel using MPI_AllReduce. PARALLEL PROGRAMMGIN WITH MPI 106

104 MPI Reduce_scatter PARALLEL PROGRAMMGIN WITH MPI 107

105 MPI_Reduce_scatter A program that calculate a summation of a vector, then distribute the local results to each processors using MPI_Reduce_scatter. PARALLEL PROGRAMMGIN WITH MPI 108

106 MPI Scan PARALLEL PROGRAMMGIN WITH MPI 109

107 MPI_Scan We have a histogram distributed across nodes in exp_pdf_i, and then calculate the cumulative frequency histogram (exp_cdf_i) across all nodes. PARALLEL PROGRAMMGIN WITH MPI 110

108 Minloc and Maxloc Designed to compute a global minimum/maximum and an index associated with the extreme value index is processor rank that held the extreme value If more than one extreme exists, index returned is for the first Designed to work on operands that consist of a value and index pair. MPI defines such special data types: MPI_FLOAT_INT, MPI_DOUBLE_INT, MPI_LONG_INT, MPI_2INT, MPI_SHORT_INT, MPI_LONG_DOUBLE_INT PARALLEL PROGRAMMGIN WITH MPI 111

109 MPI_Minloc A program that locate a minimum value and its location form an arrays of integer. PARALLEL PROGRAMMGIN WITH MPI 112

110 MPI Groups and Communicators PARALLEL PROGRAMMGIN WITH MPI 113

111 MPI Groups and Communicators A group is an ordered set of processes Each process in a group is associated with a unique integer rank between 0 and P-1, with P the number of processes in the group A communicator encompasses a group of processes that may communicate with each other Communicators can be created for specific groups Processes may be in more than one group/communicator Groups/communicators are dynamic and can be setup and removed at any time From the programmer s perspective, a group and a communicator are the same PARALLEL PROGRAMMGIN WITH MPI 114

112 MPI Groups and Communicators Group MPI_COMM_WORLD 9 Group comm1 comm Communication PARALLEL PROGRAMMGIN WITH MPI 115

113 MPI Group Operations MPI_Comm_group Returns the group associated with a communicator MPI_Group_union Creates a group by combining two groups MPI_Group_intersection Creates a group from the intersection of two groups MPI_Group_difference Creates a group from the difference between two groups MPI_Group_incl Creates a group from listed members of an existing group MOI_Group_excl Creates a group excluding listed members of an existing group MPI_Group_free Marks a group for deallocation PARALLEL PROGRAMMGIN WITH MPI 116

114 MPI Group Operations Union Intersection 3 PARALLEL PROGRAMMGIN WITH MPI 117

115 MPI Communicator Operations MPI_Comm_size Returns number of processes in communicator s group MPI_Comm_rank Returns rank of calling process in communicator s group MPI_Comm_compare Compares two communicators MPI_Comm_dup Duplicates a communicator MPI_Comm_create Creates a new communicator for a group MPI_Comm_split Splits a communicator into multiple, non-overlapping communicators MPI_Comm_free Marks a communicator for deallocation PARALLEL PROGRAMMGIN WITH MPI 118

116 MPI_Comm_Split PARALLEL PROGRAMMGIN WITH MPI 119

117 MPI_Comm_split Example using MPI_Comm_split to divide a communicator into subcommunicators PARALLEL PROGRAMMGIN WITH MPI 120

118 MPI_Group Example using MPI_Comm_group to divide a communicator into subcommunicators PARALLEL PROGRAMMGIN WITH MPI 121

119 Thanks PARALLEL PROGRAMMGIN WITH MPI 122

120 Contact Prof. Rashid Mehmood Director of Research, Training, and Consultancy HPC Center, King Abdul Aziz University, Jeddah, Saudi Arabia, PARALLEL PROGRAMMGIN WITH MPI 123

Practical Scientific Computing: Performanceoptimized

Practical Scientific Computing: Performanceoptimized Practical Scientific Computing: Performanceoptimized Programming Programming with MPI November 29, 2006 Dr. Ralf-Peter Mundani Department of Computer Science Chair V Technische Universität München, Germany

More information

Cornell Theory Center. Discussion: MPI Collective Communication I. Table of Contents. 1. Introduction

Cornell Theory Center. Discussion: MPI Collective Communication I. Table of Contents. 1. Introduction 1 of 18 11/1/2006 3:59 PM Cornell Theory Center Discussion: MPI Collective Communication I This is the in-depth discussion layer of a two-part module. For an explanation of the layers and how to navigate

More information

Outline. Communication modes MPI Message Passing Interface Standard

Outline. Communication modes MPI Message Passing Interface Standard MPI THOAI NAM Outline Communication modes MPI Message Passing Interface Standard TERMs (1) Blocking If return from the procedure indicates the user is allowed to reuse resources specified in the call Non-blocking

More information

Distributed Memory Parallel Programming

Distributed Memory Parallel Programming COSC Big Data Analytics Parallel Programming using MPI Edgar Gabriel Spring 201 Distributed Memory Parallel Programming Vast majority of clusters are homogeneous Necessitated by the complexity of maintaining

More information

Outline. Communication modes MPI Message Passing Interface Standard. Khoa Coâng Ngheä Thoâng Tin Ñaïi Hoïc Baùch Khoa Tp.HCM

Outline. Communication modes MPI Message Passing Interface Standard. Khoa Coâng Ngheä Thoâng Tin Ñaïi Hoïc Baùch Khoa Tp.HCM THOAI NAM Outline Communication modes MPI Message Passing Interface Standard TERMs (1) Blocking If return from the procedure indicates the user is allowed to reuse resources specified in the call Non-blocking

More information

Part - II. Message Passing Interface. Dheeraj Bhardwaj

Part - II. Message Passing Interface. Dheeraj Bhardwaj Part - II Dheeraj Bhardwaj Department of Computer Science & Engineering Indian Institute of Technology, Delhi 110016 India http://www.cse.iitd.ac.in/~dheerajb 1 Outlines Basics of MPI How to compile and

More information

Topics. Lecture 7. Review. Other MPI collective functions. Collective Communication (cont d) MPI Programming (III)

Topics. Lecture 7. Review. Other MPI collective functions. Collective Communication (cont d) MPI Programming (III) Topics Lecture 7 MPI Programming (III) Collective communication (cont d) Point-to-point communication Basic point-to-point communication Non-blocking point-to-point communication Four modes of blocking

More information

Slides prepared by : Farzana Rahman 1

Slides prepared by : Farzana Rahman 1 Introduction to MPI 1 Background on MPI MPI - Message Passing Interface Library standard defined by a committee of vendors, implementers, and parallel programmers Used to create parallel programs based

More information

CDP. MPI Derived Data Types and Collective Communication

CDP. MPI Derived Data Types and Collective Communication CDP MPI Derived Data Types and Collective Communication Why Derived Data Types? Elements in an MPI message are of the same type. Complex data, requires two separate messages. Bad example: typedef struct

More information

Review of MPI Part 2

Review of MPI Part 2 Review of MPI Part Russian-German School on High Performance Computer Systems, June, 7 th until July, 6 th 005, Novosibirsk 3. Day, 9 th of June, 005 HLRS, University of Stuttgart Slide Chap. 5 Virtual

More information

Parallel Programming. Using MPI (Message Passing Interface)

Parallel Programming. Using MPI (Message Passing Interface) Parallel Programming Using MPI (Message Passing Interface) Message Passing Model Simple implementation of the task/channel model Task Process Channel Message Suitable for a multicomputer Number of processes

More information

Standard MPI - Message Passing Interface

Standard MPI - Message Passing Interface c Ewa Szynkiewicz, 2007 1 Standard MPI - Message Passing Interface The message-passing paradigm is one of the oldest and most widely used approaches for programming parallel machines, especially those

More information

Programming with MPI Collectives

Programming with MPI Collectives Programming with MPI Collectives Jan Thorbecke Type to enter text Delft University of Technology Challenge the future Collectives Classes Communication types exercise: BroadcastBarrier Gather Scatter exercise:

More information

Recap of Parallelism & MPI

Recap of Parallelism & MPI Recap of Parallelism & MPI Chris Brady Heather Ratcliffe The Angry Penguin, used under creative commons licence from Swantje Hess and Jannis Pohlmann. Warwick RSE 13/12/2017 Parallel programming Break

More information

The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing

The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Parallelism Decompose the execution into several tasks according to the work to be done: Function/Task

More information

Introduction to MPI. May 20, Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign

Introduction to MPI. May 20, Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign Introduction to MPI May 20, 2013 Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign Top500.org PERFORMANCE DEVELOPMENT 1 Eflop/s 162 Pflop/s PROJECTED 100 Pflop/s

More information

MPI Collective communication

MPI Collective communication MPI Collective communication CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) MPI Collective communication Spring 2018 1 / 43 Outline 1 MPI Collective communication

More information

Parallel Computing Paradigms

Parallel Computing Paradigms Parallel Computing Paradigms Message Passing João Luís Ferreira Sobral Departamento do Informática Universidade do Minho 31 October 2017 Communication paradigms for distributed memory Message passing is

More information

High Performance Computing

High Performance Computing High Performance Computing Course Notes 2009-2010 2010 Message Passing Programming II 1 Communications Point-to-point communications: involving exact two processes, one sender and one receiver For example,

More information

Distributed Systems + Middleware Advanced Message Passing with MPI

Distributed Systems + Middleware Advanced Message Passing with MPI Distributed Systems + Middleware Advanced Message Passing with MPI Gianpaolo Cugola Dipartimento di Elettronica e Informazione Politecnico, Italy cugola@elet.polimi.it http://home.dei.polimi.it/cugola

More information

CSE 613: Parallel Programming. Lecture 21 ( The Message Passing Interface )

CSE 613: Parallel Programming. Lecture 21 ( The Message Passing Interface ) CSE 613: Parallel Programming Lecture 21 ( The Message Passing Interface ) Jesmin Jahan Tithi Department of Computer Science SUNY Stony Brook Fall 2013 ( Slides from Rezaul A. Chowdhury ) Principles of

More information

Lecture 4 Introduction to MPI

Lecture 4 Introduction to MPI CS075 1896 Lecture 4 Introduction to MPI Jeremy Wei Center for HPC, SJTU Mar 13th, 2017 1920 1987 2006 Recap of the last lecture (OpenMP) OpenMP is a standardized pragma-based intra-node parallel programming

More information

MPI MESSAGE PASSING INTERFACE

MPI MESSAGE PASSING INTERFACE MPI MESSAGE PASSING INTERFACE David COLIGNON CÉCI - Consortium des Équipements de Calcul Intensif http://hpc.montefiore.ulg.ac.be Outline Introduction From serial source code to parallel execution MPI

More information

Week 3: MPI. Day 02 :: Message passing, point-to-point and collective communications

Week 3: MPI. Day 02 :: Message passing, point-to-point and collective communications Week 3: MPI Day 02 :: Message passing, point-to-point and collective communications Message passing What is MPI? A message-passing interface standard MPI-1.0: 1993 MPI-1.1: 1995 MPI-2.0: 1997 (backward-compatible

More information

Parallel Computing. MPI Collective communication

Parallel Computing. MPI Collective communication Parallel Computing MPI Collective communication Thorsten Grahs, 18. May 2015 Table of contents Collective Communication Communicator Intercommunicator 18. May 2015 Thorsten Grahs Parallel Computing I SS

More information

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI CS 470 Spring 2017 Mike Lam, Professor Distributed Programming & MPI MPI paradigm Single program, multiple data (SPMD) One program, multiple processes (ranks) Processes communicate via messages An MPI

More information

Collective Communications

Collective Communications Collective Communications Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Message Passing Interface

Message Passing Interface Message Passing Interface DPHPC15 TA: Salvatore Di Girolamo DSM (Distributed Shared Memory) Message Passing MPI (Message Passing Interface) A message passing specification implemented

More information

MPI. (message passing, MIMD)

MPI. (message passing, MIMD) MPI (message passing, MIMD) What is MPI? a message-passing library specification extension of C/C++ (and Fortran) message passing for distributed memory parallel programming Features of MPI Point-to-point

More information

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI CS 470 Spring 2018 Mike Lam, Professor Distributed Programming & MPI MPI paradigm Single program, multiple data (SPMD) One program, multiple processes (ranks) Processes communicate via messages An MPI

More information

Experiencing Cluster Computing Message Passing Interface

Experiencing Cluster Computing Message Passing Interface Experiencing Cluster Computing Message Passing Interface Class 6 Message Passing Paradigm The Underlying Principle A parallel program consists of p processes with different address spaces. Communication

More information

Introduction to the Message Passing Interface (MPI)

Introduction to the Message Passing Interface (MPI) Introduction to the Message Passing Interface (MPI) CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Introduction to the Message Passing Interface (MPI) Spring 2018

More information

Message Passing Interface. most of the slides taken from Hanjun Kim

Message Passing Interface. most of the slides taken from Hanjun Kim Message Passing Interface most of the slides taken from Hanjun Kim Message Passing Pros Scalable, Flexible Cons Someone says it s more difficult than DSM MPI (Message Passing Interface) A standard message

More information

MA471. Lecture 5. Collective MPI Communication

MA471. Lecture 5. Collective MPI Communication MA471 Lecture 5 Collective MPI Communication Today: When all the processes want to send, receive or both Excellent website for MPI command syntax available at: http://www-unix.mcs.anl.gov/mpi/www/ 9/10/2003

More information

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 4 Message-Passing Programming Learning Objectives n Understanding how MPI programs execute n Familiarity with fundamental MPI functions

More information

4. Parallel Programming with MPI

4. Parallel Programming with MPI 4. Parallel Programming with MPI 4. Parallel Programming with MPI... 4.. MPI: Basic Concepts and Definitions...3 4... The Concept of Parallel Program...3 4..2. Data Communication Operations...3 4..3. Communicators...3

More information

Cluster Computing MPI. Industrial Standard Message Passing

Cluster Computing MPI. Industrial Standard Message Passing MPI Industrial Standard Message Passing MPI Features Industrial Standard Highly portable Widely available SPMD programming model Synchronous execution MPI Outer scope int MPI_Init( int *argc, char ** argv)

More information

Distributed Memory Programming with MPI

Distributed Memory Programming with MPI Distributed Memory Programming with MPI Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna moreno.marzolla@unibo.it Algoritmi Avanzati--modulo 2 2 Credits Peter Pacheco,

More information

Message Passing Interface

Message Passing Interface MPSoC Architectures MPI Alberto Bosio, Associate Professor UM Microelectronic Departement bosio@lirmm.fr Message Passing Interface API for distributed-memory programming parallel code that runs across

More information

Message Passing with MPI

Message Passing with MPI Message Passing with MPI PPCES 2016 Hristo Iliev IT Center / JARA-HPC IT Center der RWTH Aachen University Agenda Motivation Part 1 Concepts Point-to-point communication Non-blocking operations Part 2

More information

CS 6230: High-Performance Computing and Parallelization Introduction to MPI

CS 6230: High-Performance Computing and Parallelization Introduction to MPI CS 6230: High-Performance Computing and Parallelization Introduction to MPI Dr. Mike Kirby School of Computing and Scientific Computing and Imaging Institute University of Utah Salt Lake City, UT, USA

More information

Message Passing Interface

Message Passing Interface Message Passing Interface by Kuan Lu 03.07.2012 Scientific researcher at Georg-August-Universität Göttingen and Gesellschaft für wissenschaftliche Datenverarbeitung mbh Göttingen Am Faßberg, 37077 Göttingen,

More information

Intra and Inter Communicators

Intra and Inter Communicators Intra and Inter Communicators Groups A group is a set of processes The group have a size And each process have a rank Creating a group is a local operation Why we need groups To make a clear distinction

More information

Message passing. Week 3: MPI. Day 02 :: Message passing, point-to-point and collective communications. What is MPI?

Message passing. Week 3: MPI. Day 02 :: Message passing, point-to-point and collective communications. What is MPI? Week 3: MPI Day 02 :: Message passing, point-to-point and collective communications Message passing What is MPI? A message-passing interface standard MPI-1.0: 1993 MPI-1.1: 1995 MPI-2.0: 1997 (backward-compatible

More information

IPM Workshop on High Performance Computing (HPC08) IPM School of Physics Workshop on High Perfomance Computing/HPC08

IPM Workshop on High Performance Computing (HPC08) IPM School of Physics Workshop on High Perfomance Computing/HPC08 IPM School of Physics Workshop on High Perfomance Computing/HPC08 16-21 February 2008 MPI tutorial Luca Heltai Stefano Cozzini Democritos/INFM + SISSA 1 When

More information

A Message Passing Standard for MPP and Workstations

A Message Passing Standard for MPP and Workstations A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker Message Passing Interface (MPI) Message passing library Can be

More information

Parallel programming MPI

Parallel programming MPI Parallel programming MPI Distributed memory Each unit has its own memory space If a unit needs data in some other memory space, explicit communication (often through network) is required Point-to-point

More information

Scientific Computing

Scientific Computing Lecture on Scientific Computing Dr. Kersten Schmidt Lecture 21 Technische Universität Berlin Institut für Mathematik Wintersemester 2014/2015 Syllabus Linear Regression, Fast Fourier transform Modelling

More information

MPI MESSAGE PASSING INTERFACE

MPI MESSAGE PASSING INTERFACE MPI MESSAGE PASSING INTERFACE David COLIGNON, ULiège CÉCI - Consortium des Équipements de Calcul Intensif http://www.ceci-hpc.be Outline Introduction From serial source code to parallel execution MPI functions

More information

Intermediate MPI features

Intermediate MPI features Intermediate MPI features Advanced message passing Collective communication Topologies Group communication Forms of message passing (1) Communication modes: Standard: system decides whether message is

More information

COMP 322: Fundamentals of Parallel Programming

COMP 322: Fundamentals of Parallel Programming COMP 322: Fundamentals of Parallel Programming https://wiki.rice.edu/confluence/display/parprog/comp322 Lecture 37: Introduction to MPI (contd) Vivek Sarkar Department of Computer Science Rice University

More information

Scalasca performance properties The metrics tour

Scalasca performance properties The metrics tour Scalasca performance properties The metrics tour Markus Geimer m.geimer@fz-juelich.de Scalasca analysis result Generic metrics Generic metrics Time Total CPU allocation time Execution Overhead Visits Hardware

More information

Paul Burton April 2015 An Introduction to MPI Programming

Paul Burton April 2015 An Introduction to MPI Programming Paul Burton April 2015 Topics Introduction Initialising MPI & basic concepts Compiling and running a parallel program on the Cray Practical : Hello World MPI program Synchronisation Practical Data types

More information

Basic MPI Communications. Basic MPI Communications (cont d)

Basic MPI Communications. Basic MPI Communications (cont d) Basic MPI Communications MPI provides two non-blocking routines: MPI_Isend(buf,cnt,type,dst,tag,comm,reqHandle) buf: source of data to be sent cnt: number of data elements to be sent type: type of each

More information

MPI Tutorial Part 1 Design of Parallel and High-Performance Computing Recitation Session

MPI Tutorial Part 1 Design of Parallel and High-Performance Computing Recitation Session S. DI GIROLAMO [DIGIROLS@INF.ETHZ.CH] MPI Tutorial Part 1 Design of Parallel and High-Performance Computing Recitation Session Slides credits: Pavan Balaji, Torsten Hoefler https://htor.inf.ethz.ch/teaching/mpi_tutorials/ppopp13/2013-02-24-ppopp-mpi-basic.pdf

More information

High-Performance Computing: MPI (ctd)

High-Performance Computing: MPI (ctd) High-Performance Computing: MPI (ctd) Adrian F. Clark: alien@essex.ac.uk 2015 16 Adrian F. Clark: alien@essex.ac.uk High-Performance Computing: MPI (ctd) 2015 16 1 / 22 A reminder Last time, we started

More information

Parallel Short Course. Distributed memory machines

Parallel Short Course. Distributed memory machines Parallel Short Course Message Passing Interface (MPI ) I Introduction and Point-to-point operations Spring 2007 Distributed memory machines local disks Memory Network card 1 Compute node message passing

More information

MPI Tutorial Part 1 Design of Parallel and High-Performance Computing Recitation Session

MPI Tutorial Part 1 Design of Parallel and High-Performance Computing Recitation Session S. DI GIROLAMO [DIGIROLS@INF.ETHZ.CH] MPI Tutorial Part 1 Design of Parallel and High-Performance Computing Recitation Session Slides credits: Pavan Balaji, Torsten Hoefler https://htor.inf.ethz.ch/teaching/mpi_tutorials/ppopp13/2013-02-24-ppopp-mpi-basic.pdf

More information

The MPI Message-passing Standard Practical use and implementation (VI) SPD Course 08/03/2017 Massimo Coppola

The MPI Message-passing Standard Practical use and implementation (VI) SPD Course 08/03/2017 Massimo Coppola The MPI Message-passing Standard Practical use and implementation (VI) SPD Course 08/03/2017 Massimo Coppola Datatypes REFINING DERIVED DATATYPES LAYOUT FOR COMPOSITION SPD - MPI Standard Use and Implementation

More information

COSC 6374 Parallel Computation

COSC 6374 Parallel Computation COSC 6374 Parallel Computation Message Passing Interface (MPI ) II Advanced point-to-point operations Spring 2008 Overview Point-to-point taxonomy and available functions What is the status of a message?

More information

Introduction to parallel computing concepts and technics

Introduction to parallel computing concepts and technics Introduction to parallel computing concepts and technics Paschalis Korosoglou (support@grid.auth.gr) User and Application Support Unit Scientific Computing Center @ AUTH Overview of Parallel computing

More information

Introduction to Parallel Programming with MPI

Introduction to Parallel Programming with MPI Introduction to Parallel Programming with MPI Slides are available at http://www.mcs.anl.gov/~balaji/tmp/csdms-mpi-basic.pdf Pavan Balaji Argonne National Laboratory balaji@mcs.anl.gov http://www.mcs.anl.gov/~balaji

More information

Holland Computing Center Kickstart MPI Intro

Holland Computing Center Kickstart MPI Intro Holland Computing Center Kickstart 2016 MPI Intro Message Passing Interface (MPI) MPI is a specification for message passing library that is standardized by MPI Forum Multiple vendor-specific implementations:

More information

Programming Using the Message Passing Paradigm

Programming Using the Message Passing Paradigm Programming Using the Message Passing Paradigm Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003. Topic Overview

More information

Department of Informatics V. HPC-Lab. Session 4: MPI, CG M. Bader, A. Breuer. Alex Breuer

Department of Informatics V. HPC-Lab. Session 4: MPI, CG M. Bader, A. Breuer. Alex Breuer HPC-Lab Session 4: MPI, CG M. Bader, A. Breuer Meetings Date Schedule 10/13/14 Kickoff 10/20/14 Q&A 10/27/14 Presentation 1 11/03/14 H. Bast, Intel 11/10/14 Presentation 2 12/01/14 Presentation 3 12/08/14

More information

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 4 Message-Passing Programming Learning Objectives Understanding how MPI programs execute Familiarity with fundamental MPI functions

More information

The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs

The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs 1 The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) s http://mpi-forum.org https://www.open-mpi.org/ Mike Bailey mjb@cs.oregonstate.edu Oregon State University mpi.pptx

More information

The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs

The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs 1 The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs http://mpi-forum.org https://www.open-mpi.org/ Mike Bailey mjb@cs.oregonstate.edu Oregon State University mpi.pptx

More information

Agenda. MPI Application Example. Praktikum: Verteiltes Rechnen und Parallelprogrammierung Introduction to MPI. 1) Recap: MPI. 2) 2.

Agenda. MPI Application Example. Praktikum: Verteiltes Rechnen und Parallelprogrammierung Introduction to MPI. 1) Recap: MPI. 2) 2. Praktikum: Verteiltes Rechnen und Parallelprogrammierung Introduction to MPI Agenda 1) Recap: MPI 2) 2. Übungszettel 3) Projektpräferenzen? 4) Nächste Woche: 3. Übungszettel, Projektauswahl, Konzepte 5)

More information

Collective Communication in MPI and Advanced Features

Collective Communication in MPI and Advanced Features Collective Communication in MPI and Advanced Features Pacheco s book. Chapter 3 T. Yang, CS240A. Part of slides from the text book, CS267 K. Yelick from UC Berkeley and B. Gropp, ANL Outline Collective

More information

Message Passing Interface. George Bosilca

Message Passing Interface. George Bosilca Message Passing Interface George Bosilca bosilca@icl.utk.edu Message Passing Interface Standard http://www.mpi-forum.org Current version: 3.1 All parallelism is explicit: the programmer is responsible

More information

MPI Tutorial. Shao-Ching Huang. IDRE High Performance Computing Workshop

MPI Tutorial. Shao-Ching Huang. IDRE High Performance Computing Workshop MPI Tutorial Shao-Ching Huang IDRE High Performance Computing Workshop 2013-02-13 Distributed Memory Each CPU has its own (local) memory This needs to be fast for parallel scalability (e.g. Infiniband,

More information

Parallel Programming

Parallel Programming Parallel Programming Point-to-point communication Prof. Paolo Bientinesi pauldj@aices.rwth-aachen.de WS 18/19 Scenario Process P i owns matrix A i, with i = 0,..., p 1. Objective { Even(i) : compute Ti

More information

Topics. Lecture 6. Point-to-point Communication. Point-to-point Communication. Broadcast. Basic Point-to-point communication. MPI Programming (III)

Topics. Lecture 6. Point-to-point Communication. Point-to-point Communication. Broadcast. Basic Point-to-point communication. MPI Programming (III) Topics Lecture 6 MPI Programming (III) Point-to-point communication Basic point-to-point communication Non-blocking point-to-point communication Four modes of blocking communication Manager-Worker Programming

More information

Parallel Computing and the MPI environment

Parallel Computing and the MPI environment Parallel Computing and the MPI environment Claudio Chiaruttini Dipartimento di Matematica e Informatica Centro Interdipartimentale per le Scienze Computazionali (CISC) Università di Trieste http://www.dmi.units.it/~chiarutt/didattica/parallela

More information

Chapter 4. Message-passing Model

Chapter 4. Message-passing Model Chapter 4 Message-Passing Programming Message-passing Model 2 1 Characteristics of Processes Number is specified at start-up time Remains constant throughout the execution of program All execute same program

More information

MPI - The Message Passing Interface

MPI - The Message Passing Interface MPI - The Message Passing Interface The Message Passing Interface (MPI) was first standardized in 1994. De facto standard for distributed memory machines. All Top500 machines (http://www.top500.org) are

More information

AMath 483/583 Lecture 21

AMath 483/583 Lecture 21 AMath 483/583 Lecture 21 Outline: Review MPI, reduce and bcast MPI send and receive Master Worker paradigm References: $UWHPSC/codes/mpi class notes: MPI section class notes: MPI section of bibliography

More information

CS4961 Parallel Programming. Lecture 16: Introduction to Message Passing 11/3/11. Administrative. Mary Hall November 3, 2011.

CS4961 Parallel Programming. Lecture 16: Introduction to Message Passing 11/3/11. Administrative. Mary Hall November 3, 2011. CS4961 Parallel Programming Lecture 16: Introduction to Message Passing Administrative Next programming assignment due on Monday, Nov. 7 at midnight Need to define teams and have initial conversation with

More information

mpidl The Power of MPI in IDL Version Tech-X Corporation 5621 Arapahoe Avenue, Suite A Boulder, CO

mpidl The Power of MPI in IDL Version Tech-X Corporation 5621 Arapahoe Avenue, Suite A Boulder, CO mpidl The Power of MPI in IDL Version 2.4.0 Tech-X Corporation 5621 Arapahoe Avenue, Suite A Boulder, CO 80303 http://www.txcorp.com info@txcorp.com mpidl User Guide CONTENTS Contents Table of Contents

More information

CS 179: GPU Programming. Lecture 14: Inter-process Communication

CS 179: GPU Programming. Lecture 14: Inter-process Communication CS 179: GPU Programming Lecture 14: Inter-process Communication The Problem What if we want to use GPUs across a distributed system? GPU cluster, CSIRO Distributed System A collection of computers Each

More information

CSE. Parallel Algorithms on a cluster of PCs. Ian Bush. Daresbury Laboratory (With thanks to Lorna Smith and Mark Bull at EPCC)

CSE. Parallel Algorithms on a cluster of PCs. Ian Bush. Daresbury Laboratory (With thanks to Lorna Smith and Mark Bull at EPCC) Parallel Algorithms on a cluster of PCs Ian Bush Daresbury Laboratory I.J.Bush@dl.ac.uk (With thanks to Lorna Smith and Mark Bull at EPCC) Overview This lecture will cover General Message passing concepts

More information

Introduction to MPI. SuperComputing Applications and Innovation Department 1 / 143

Introduction to MPI. SuperComputing Applications and Innovation Department 1 / 143 Introduction to MPI Isabella Baccarelli - i.baccarelli@cineca.it Mariella Ippolito - m.ippolito@cineca.it Cristiano Padrin - c.padrin@cineca.it Vittorio Ruggiero - v.ruggiero@cineca.it SuperComputing Applications

More information

HPC Parallel Programing Multi-node Computation with MPI - I

HPC Parallel Programing Multi-node Computation with MPI - I HPC Parallel Programing Multi-node Computation with MPI - I Parallelization and Optimization Group TATA Consultancy Services, Sahyadri Park Pune, India TCS all rights reserved April 29, 2013 Copyright

More information

The MPI Message-passing Standard Practical use and implementation (V) SPD Course 6/03/2017 Massimo Coppola

The MPI Message-passing Standard Practical use and implementation (V) SPD Course 6/03/2017 Massimo Coppola The MPI Message-passing Standard Practical use and implementation (V) SPD Course 6/03/2017 Massimo Coppola Intracommunicators COLLECTIVE COMMUNICATIONS SPD - MPI Standard Use and Implementation (5) 2 Collectives

More information

Data parallelism. [ any app performing the *same* operation across a data stream ]

Data parallelism. [ any app performing the *same* operation across a data stream ] Data parallelism [ any app performing the *same* operation across a data stream ] Contrast stretching: Version Cores Time (secs) Speedup while (step < NumSteps &&!converged) { step++; diffs = 0; foreach

More information

Scalasca performance properties The metrics tour

Scalasca performance properties The metrics tour Scalasca performance properties The metrics tour Markus Geimer m.geimer@fz-juelich.de Scalasca analysis result Generic metrics Generic metrics Time Total CPU allocation time Execution Overhead Visits Hardware

More information

東京大学情報基盤中心准教授片桐孝洋 Takahiro Katagiri, Associate Professor, Information Technology Center, The University of Tokyo

東京大学情報基盤中心准教授片桐孝洋 Takahiro Katagiri, Associate Professor, Information Technology Center, The University of Tokyo Overview of MPI 東京大学情報基盤中心准教授片桐孝洋 Takahiro Katagiri, Associate Professor, Information Technology Center, The University of Tokyo 台大数学科学中心科学計算冬季学校 1 Agenda 1. Features of MPI 2. Basic MPI Functions 3. Reduction

More information

Peter Pacheco. Chapter 3. Distributed Memory Programming with MPI. Copyright 2010, Elsevier Inc. All rights Reserved

Peter Pacheco. Chapter 3. Distributed Memory Programming with MPI. Copyright 2010, Elsevier Inc. All rights Reserved An Introduction to Parallel Programming Peter Pacheco Chapter 3 Distributed Memory Programming with MPI 1 Roadmap Writing your first MPI program. Using the common MPI functions. The Trapezoidal Rule in

More information

Distributed Memory Programming with MPI. Copyright 2010, Elsevier Inc. All rights Reserved

Distributed Memory Programming with MPI. Copyright 2010, Elsevier Inc. All rights Reserved An Introduction to Parallel Programming Peter Pacheco Chapter 3 Distributed Memory Programming with MPI 1 Roadmap Writing your first MPI program. Using the common MPI functions. The Trapezoidal Rule in

More information

Parallel Computing. PD Dr. rer. nat. habil. Ralf-Peter Mundani. Computation in Engineering / BGU Scientific Computing in Computer Science / INF

Parallel Computing. PD Dr. rer. nat. habil. Ralf-Peter Mundani. Computation in Engineering / BGU Scientific Computing in Computer Science / INF Parallel Computing PD Dr. rer. nat. habil. Ralf-Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2018/19 Part 5: Programming Memory-Coupled Systems

More information

Intermediate MPI. M. D. Jones, Ph.D. Center for Computational Research University at Buffalo State University of New York

Intermediate MPI. M. D. Jones, Ph.D. Center for Computational Research University at Buffalo State University of New York Intermediate MPI M. D. Jones, Ph.D. Center for Computational Research University at Buffalo State University of New York High Performance Computing I, 2008 M. D. Jones, Ph.D. (CCR/UB) Intermediate MPI

More information

ECE 574 Cluster Computing Lecture 13

ECE 574 Cluster Computing Lecture 13 ECE 574 Cluster Computing Lecture 13 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 21 March 2017 Announcements HW#5 Finally Graded Had right idea, but often result not an *exact*

More information

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 Message passing vs. Shared memory Client Client Client Client send(msg) recv(msg) send(msg) recv(msg) MSG MSG MSG IPC Shared

More information

MPI Message Passing Interface

MPI Message Passing Interface MPI Message Passing Interface Portable Parallel Programs Parallel Computing A problem is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information

More information

Message-Passing Computing

Message-Passing Computing Chapter 2 Slide 41þþ Message-Passing Computing Slide 42þþ Basics of Message-Passing Programming using userlevel message passing libraries Two primary mechanisms needed: 1. A method of creating separate

More information

Chapter 3. Distributed Memory Programming with MPI

Chapter 3. Distributed Memory Programming with MPI An Introduction to Parallel Programming Peter Pacheco Chapter 3 Distributed Memory Programming with MPI 1 Roadmap n Writing your first MPI program. n Using the common MPI functions. n The Trapezoidal Rule

More information

High performance computing. Message Passing Interface

High performance computing. Message Passing Interface High performance computing Message Passing Interface send-receive paradigm sending the message: send (target, id, data) receiving the message: receive (source, id, data) Versatility of the model High efficiency

More information

Message-Passing and MPI Programming

Message-Passing and MPI Programming Message-Passing and MPI Programming 2.1 Transfer Procedures Datatypes and Collectives N.M. Maclaren Computing Service nmm1@cam.ac.uk ext. 34761 July 2010 These are the procedures that actually transfer

More information

Claudio Chiaruttini Dipartimento di Matematica e Informatica Centro Interdipartimentale per le Scienze Computazionali (CISC) Università di Trieste

Claudio Chiaruttini Dipartimento di Matematica e Informatica Centro Interdipartimentale per le Scienze Computazionali (CISC) Università di Trieste Claudio Chiaruttini Dipartimento di Matematica e Informatica Centro Interdipartimentale per le Scienze Computazionali (CISC) Università di Trieste http://www.dmi.units.it/~chiarutt/didattica/parallela

More information