Parallel programming with MPI. Jianfeng Yang Internet and Information Technology Lab Wuhan university

Size: px
Start display at page:

Download "Parallel programming with MPI. Jianfeng Yang Internet and Information Technology Lab Wuhan university"

Transcription

1 Parallel programming with MPI Jianfeng Yang Internet and Information Technology Lab Wuhan university

2 Agenda Part Ⅰ: Seeking Parallelism/Concurrency Part Ⅱ: Parallel Algorithm Design Part Ⅲ: Message-Passing Programming 2

3 Part Ⅰ Seeking Parallel/Concurrency

4 1 Introduction 2 Seeking Parallel Outline 4

5 1 Introduction(1/6) Well done is quickly done Caesar Auguest Fast, Fast, Fast is not fast enough. How to get Higher Performance Parallel Computing. 5

6 1 Introduction(2/6) What is parallel computing? is the use of a parallel computer to reduce the time needed to solve a single computational problem. is now considered a standard way for computational scientists and engineers to solve problems in areas as diverse as galactic evolution, climate modeling, aircraft design, molecular dynamics and economic analysis. 6

7 Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we basically need? The ability to start the tasks A way for them to communicate 7

8 1 Introduction(3/6) What s s parallel computer? Is a Multi-processor computer system supporting parallel programming. Multi-computer Is a parallel computer constructed out of multiple computers and an interconnection network. The processors on different computers interact by passing message e to each other. Centralized multiprocessor (SMP: Symmetrical multiprocessor) Is a more high integrated system in which all CPUs share access to a single global memory. The shared memory supports communications and synchronization among processors. 8

9 1 Introduction(4/6) Multi-core platform Integrated duo/quad or more core in one processor, and each core has their own registers and Level 1 cache, all cores share Level 2 cache, which supports communications and synchronizations among cores. All cores share access to a global memory. 9

10 1 Introduction(5/6) What s s parallel programming? Is programming in language that allows you to explicitly indicate how different portions of the computation may be executed paralleled/concurrently by different processors/cores. Do I need parallel programming really? YES, for the reasons of: Although a lot of research has been invested in and many experimental parallelizing compilers have been developed, there are still no commercial system thus far. The alternative is for you to write your own parallel programs. 10

11 1 Introduction(6/6) Why should I program using MPI and OpenMP? MPI ( Message Passing Interface) is a standard specification for message passing libraries. Which is available on virtually every parallel computer system. Free. If you develop programs using MPI, you will be able to reuse them when you get access to a newer, faster parallel computer. On Multi-core platform or SMP, the cores/cpus have a shared memory space. While MPI is a perfect satisfactory way for cores/processors to communicate with each other, OpenMP is a better way for cores/processors with a single Processor/SMP to interact. The hybrid MPI/OpenMP program can get even high performance. 11

12 2 Seeking Parallel(1/7) In order to take advantage of multi-core/multiple processors, programmers must be able to identify operations that may be performed in parallel. Several ways: Data Dependence Graphs Data Parallelism Functional Parallelism Pipelining 12

13 2 Seeking Parallel(2/7) Data Dependence Graphs A directed graph Each vertex: represent a task to be completed. An edge from vertex u to vertex v means: task u must be completed before task v begins Task v is dependent on task u. If there is no path from u to v, then the tasks are independent and may be performed parallelized. 13

14 2 Seeking Parallel(3/7) Data Dependence Graphs 14

15 2 Seeking Parallel(4/7) Data Parallelism Independent tasks applying the same operation to different elements of a data set. e.g. 15

16 2 Seeking Parallel(5/7) Functional Parallelism Independent tasks applying different operations to different data elements of a data set. 16

17 2 Seeking Parallel(6/7) Pipelining A data dependence graph forming a simple path/chain admits no parallelism if only a single problem instance must be processed. If multiple problems instance to be processed: If a computation can be divided into several stage with the same e time consumption. Then, can support parallelism. E.g. Assembly line. 17

18 2 Seeking Parallel(7/7) Pipelining P[0] = a[0]; P[1] = p[0]+a[1]; P[2] = p[1]+a[2]; P[3] = p[2]+a[3]; 18

19 For Example: Landscape maintains Prepare for dinner Data cluster 19

20 Homework Given a task that can be divided into m subtasks, each require one unit of time, how much time is needed for an m-stage pipeline to process n tasks? Consider the data dependence graph in figure below. identify all sources of data parallelism; identify all sources of functional parallelism. I A A A B C D A A A O 20

21 Part Ⅱ Parallel Algorithm Design

22 Outline 1.Introduction 2.The Task/Channel Model 3.Foster s s Design Methodology 22

23 1.Introduction Foster, Ian. Design and Building Parallel Programs: Concepts and Tools for Parallel Software engineering. Reading, MA: Addison-Wesley, Describe the Task/Channel Model; A few simple problems 23

24 2.The Task/Channel Model The model represents a parallel computation as a set of tasks that may interact with each other by sending message through channels. Task: is a program, its local memory, and a collection of I/O ports. Local memory: instructions private data 24

25 2.The Task/Channel Model channel: Via channel: A task can send local data to other tasks via output ports; A task can receive data value from other tasks via input ports. A channel is a message queue: Connect one task s s output port with another task s s input port. Data value appears at the inputs port in the same order in which they were placed in the output port of the other end of the channel. Receiving data can be blocked: Synchronous. Sending data can never be blocked: Asynchronous. Access to local memory: faster than nonlocal data access. 25

26 3.Foster s s Design Methodology Four-step process: Partitioning Communication Agglomeration mapping Problem Partitioning Communication Mapping Agglomeration 26

27 3.Foster s s Design Methodology Partitioning Is the process of dividing the computation and the data into pieces. More small pieces is good. How to Data-centric approach Function-centric centric approach Domain Decomposition First, divide data into pieces; Then, determine how to associate computations with the data. Focus on: the largest and/or most frequently accessed data structure in the program. E.g., Functional Decomposition 27

28 3.Foster s s Design Methodology Domain Decomposition 1-D 2-D Primitive Task 3-D Better 28

29 3.Foster s s Design Methodology Functional Decomposition Yield collections of tasks that achieve parallel through pipelining. E.g., a system supporting interactive image-guided guided surgery. 29

30 3.Foster s s Design Methodology The quality of Partition (evaluation) At least an order of magnitude more primitive tasks than processors in the target parallel computer. Otherwise: later design options may be too constrained. Redundant computations and redundant data structure storage are minimized. Otherwise: the design may not work well when the size of the problem increases. Primitive tasks are roughly the same size. Otherwise: it may be hard to balance work among the processors/cores. ores. The number of tasks is an increasing function of the problem size. Otherwise: it may be impossible to use more processor/cores to solve s large problem. 30

31 3.Foster s s Design Methodology Communication After identifying the primitive tasks, the communications type between those primitive tasks should be determined. Two kinds of communication type: Local Global 31

32 3.Foster s s Design Methodology Communication Local: A task needs values from a small number of other tasks in order to perform a computation, a channel is created from the tasks supplying the data to the task consuming the data. Global: When a significant number of the primitive tasks must be contribute data in order to perform a computation. E.g., computing the sums of the values held by the primitive processes. 32

33 3.Foster s s Design Methodology Communication Evaluate the communication structure of the designed parallel algorithm. The communication operations are balanced among the tasks. Each task communications with only a small number of neighbors. Tasks can perform their communication in parallel/concurrently. Tasks can perform their computations in parallel/concurrently. 33

34 3.Foster s s Design Methodology Agglomeration Why we need agglomeration? If the number of tasks exceeds the number of processors/cores by several orders of magnitude, simply creating these tasks would be a source of significant overhead. So, combine primitive tasks into large tasks and map them into physical processors/cores to reduce the amount of parallel overhead. What s s agglomeration? Is the process of grouping tasks into large tasks in order to improve performance or simplify programming. When developing MPI programs, ONE task per core/processor is better. 34

35 3.Foster s s Design Methodology Agglomeration Goals 1: lower communication overhead. Eliminate communication among tasks. Increasing the locality of parallelism. Combining groups of sending and receiving tasks. 35

36 3.Foster s s Design Methodology Agglomeration Goals 2: Maintain the scalability of the parallel design. Enable that we have not combined so many tasks that we will not be able to port our program at some point in the future to a computer with more processors/cores. E.g. 3-D 3 D Matrix Operation size: 8*128*258 36

37 3.Foster s s Design Methodology Agglomeration Goals 3: reduce software engineering costs. Make greater use of the existing sequential code. Reducing time; Reducing expense. 37

38 3.Foster s s Design Methodology Agglomeration evaluation: Has increased the locality of the parallel algorithm. Replicated computations take less time than the computations the replace. The amount of replicated data is small enough to allow algorithm to scale. Agglomeration tasks have similar computational and communication costs. The number of tasks is an increasing function of the problem size. The number of tasks is as small as possible, yet at least as great as the number of cores/processors in the target computers. The trade-off between the chosen agglomeration and the cost of modifications to existing sequential code is reasonable. 38

39 3.Foster s s Design Methodology Mapping Increasing processor utilization Minimizing inter-processor communication 39

40 Part Ⅲ Message-Passing Programming

41 Preface 41

42 42

43 process 0 process 1 process 2 Load Process Gather Store 43

44 Hello World! #include <stdio.h< stdio.h> #include mpi.h int main(int argc,char *argv[]) { int size, rank; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD,, &size); MPI_Comm_rank(MPI_COMM_WORLD,, &rank); print( Process %d of %d: Hello world,, rank, size); MPI_Finalize(); } Hello world from process 0 of 4 Hello world from process 1 of 4 Hello world from process 2 of 4 Hello world from process 3 of 4 44

45 Outline Introduction The Message-Passing Model The Message-Passing Interface (MPI) Communication Mode Circuit satisfiability Point-to to-point Communication Collective Communication Benchmarking parallel performance 45

46 Introduction MPI: Message Passing Interface Is a library, not a parallel language. C&MPI, Fortran&MPI Is a standard, not a implement for a actually problem. MPICH Intel MPI MSMPI LAM MPI Is a Message Passing Model 46

47 Introduction The history of MPI: Draft: 1992 MPI-1: 1994 MPI-2:

48 Introduction MPICH: unix.mcs.anl.gov/mpi/mpich1/download.html; unix.mcs.anl.gov/mpi/mpich2/index.htm#download Main Features: Open source; Synchronized on MPI standard; Supports MPMD (multiple Program Multiple Data) and heterogeneous clusters. Supports combining with C/C++, Fortran77 and Fortran90; Supports Unix, Windows NT platform; Supports multi-core, SMP, Cluster, Large Scale Parallel Computer System. 48

49 Introduction Intel MPI According to MPI-2 2 standard. Latest version: 3.1 DAPL (Direct Access Programming Library) 49

50 Introduction-Intel Intel MPI Intel MPI Library Supports Multiple Hardware Fabrics 50

51 Introduction-Intel Intel MPI Features is a multi-fabric message passing library. implements the Message Passing Interface, v2 (MPI-2) specification. provides a standard library across Intel platforms that: Focuses on making applications perform best on IA based clusters Enables adoption of the MPI-2 2 functions as the customer needs dictate Delivers best in class performance for enterprise, divisional, departmental and workgroup high performance computing 51

52 Introduction-Intel Intel MPI Why Intel MPI Library? High performance MPI-2 2 implementation Linux and Windows CCS support Interconnect independence Smart fabric selection Easy installation Free Runtime Environment Close integration with the Intel and 3rd party development tools Internet based licensing and technical support 52

53 Introduction-Intel Intel MPI Standards Based Argonne National Laboratory's MPICH-2 implementation. Integration, can be easily integrated with: Platform LSF 6.1 and higher Altair PBS Pro* 7.1 and higher OpenPBS* * 2.3 Torque* and higher Parallelnavi* * NQS* for Linux V2.0L10 and higher Parallelnavi for Linux Advanced Edition V1.0L10A and higher NetBatch* * 6.x and higher 53

54 Introduction-Intel Intel MPI System Requirements: Host and Target Systems hardware: IA-32, Intel 64, or IA-64 architecture using Intel Pentium 4, Intel Xeon processor, Intel Itanium processor family and compatible platforms 1 GB of RAM - 4 GB recommended Minimum 100 MB of free hard disk space - 10GB recommended. 54

55 Introduction-Intel Intel MPI Operating Systems Requirements: Microsoft Windows* Compute Cluster Server 2003 (Intel 64 architecture only) Red Hat Enterprise Linux* 3.0, 4.0, or 5.0 SUSE* Linux Enterprise Server 9 or 10 SUSE Linux 9.0 thru 10.0 (all except Intel 64 architecture starts at 9.1) HaanSoft Linux 2006 Server* Miracle Linux* 4.0 Red Flag* DC Server 5.0 Asianux* * Linux 2.0 Fedora Core 4, 5, or 6 (IA-32 and Intel 64 architectures only) TurboLinux*10 (IA-32 and Intel 64 architecture) Mandriva/Mandrake* 10.1 (IA-32 architecture only) SGI* ProPack 4.0 (IA-64 architecture only) or 5.0 (IA-64 and Intel 64 architectures) 55

56 The Message-Passing Model Processor Memory Processor Memory Processor Memory Processor Memory Interconnection network Processor Memory Processor Memory Processor Memory Processor Memory 56

57 The Message-Passing Model A task in task/channel model become a process in Message-Passing Model; The number of processes: Is specified by user; Is specified when the program begins; Is constant throughout the execution of the program; Each process: Has a unique ID number; Processor Memory Processor Memory Processor Memory Processor Memory Interconnection network Processor Memory Processor Memory Processor Processor Memory Memory 57

58 The Message-Passing Model Goals of Message-Passing Model: Communication with each other; Synchronization with each other; 58

59 The Message-Passing Interface (MPI) Advantages: Run well on a wide variety of MPMD architectures; Easily to debugging; Threading safe 59

60 What is in MPI Point-to to-point message passing Collective communication Support for process groups Support for communication contexts Support for application topologies Environmental inquiry routines Profiling interface 60

61 Introduction to Groups & Communicator Process model and groups Communication scope Communicators 61

62 Process model and groups Fundamental computational unit is the process. Each process has: an independent thread of control, a separate address space MPI processes execute in MIMD style, but: No mechanism for loading code onto processors, or assigning processes to processors No mechanism for creating or destroying processes MPI supports dynamic process groups. Process groups can be created and destroyed Membership is static Groups may overlap No explicit support for multithreading, but MPI is designed to be b thread-safe. 62

63 Communication scope In MPI, a process is specified by: a group a rank relative to the group ( ) A message label is specified by: a message context a message tag relative to the context Groups are used to partition process space Contexts are used to partition ``message label space'' Groups and contexts are bound together to form a communicator object. Contexts are not visible at the application level. A communicator defines the scope of a communication operation 63

64 Communicators Communicators are used to create independent ``message universes''. Communicators are used to disambiguate message selection when an application calls a library routine that performs message passing. Nondeterminacy may arise if processes enter the library routine asynchronously, if processes enter the library routine synchronously, but there are outstanding communication operations. A communicator binds together groups and contexts defines the scope of a communication operation is represented by an opaque object 64

65 A communicator handle defines which processes a particular command will apply to All MPI communication calls take a communicator handle as a parameter, which is effectively the context in which the communication will take place MPI_INIT defines a communicator called MPI_COMM_WORLD for each process that calls it 65

66 Every communicator contains a group which is a list of processes The processes are ordered and numbered consecutively from 0. The number of each process is known as its rank The rank identifies each process within the communicator The group of MPI_COMM_WORLD is the set of all MPI processes 66

67 Skeleton MPI Program #include <mpi.h> main( int argc, char** argv ) { MPI_Init( &argc, &argv ); /* main part of the program */ } MPI_Finalize(); 67

68 Circuit satisfiability a b What combinations of input value will the circuit output the value of 1? c d e f g h i j k l m n o p 68

69 Circuit satisfiability Analysis: 16 input, a-p, a each take on 2 values of 0 or =65536 design a parallel algorithm Partition Function decomposition No channel between tasks Tasks are independent; Suit for parallelism; Partition Communication Agglomeration Mapping 69

70 Circuit satisfiability Communication: Tasks are independent; 70

71 Circuit satisfiability Agglomeration and Mapping Fixed number of tasks; The time for each task to complete is variable. WHY? How to balance the computation load? Mapping tasks in cyclic fashion. Partition Communication Agglomeration Mapping 71

72 Circuit satisfiability Each process will examine a combination of inputs in turn. 72

73 Circuit satisfiability #define EXTRACT_BIT(n,i) ((n&(1<<i))?1:0) void check_circuit(int id,int z){ int v[16]; int i; for( i=0;i<16;i++) v[i] = EXTRACT_BIT(z,i) ; if((v[0] v[1]) && (!v[1]!v[3]) && (v[2] v[3]) && (!v[3]!v[4]) && (v[4]!v[5]) && ( v[5]!v[6]) && (v[5] v[6]) && ( v[6]!v[15]) && (v[7]!v[8]) && (!v[7]!v[13]) && (v[8] v[9]) && ( v[9] v[11]) && (v[10] v[11]) && ( v[12] v[13]) && (v[13]!v[14]) && (v[14] v[15]) ) { printf( %d) %d%d%d%d%d%d%d%d%d%d%d%d%d%d%d %d,id,v[0],v[1],v[2],v[3],v[4],v[5],v[6],v[7],v[8],v[9], v[10],v[11],v[12],v[13],v[14],v[15]); fflush(stdout); } } 73

74 Point-to to-point Communication Overview Blocking Behaviors Non-Blocking Behaviors 74

75 overview A message is sent from a sender to a receiver There are several variations on how the sending of a message can interact with the program 75

76 Synchronous does not complete until the message has been received A FAX or registered mail 76

77 Asynchronous completes as soon as the message is on the way. A post card or 77

78 communication modes is selected with send routine. synchronous mode ("safest") ready mode (lowest system overhead) buffered mode (decouples sender from receiver) standard mode (compromise) Calls are also blocking or nonblocking. Blocking stops the program until the message buffer is safe to use Non-blocking separates communication from computation 78

79 Blocking Behavior int MPI_Send(void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm) buf is the beginning of the buffer containing the data to be sent. For Fortran, this is often the name of an array in your program. For C, it is an address. count is the number of elements to be sent (not bytes) datatype is the type of data dest is the rank of the process which is the destination for the message tag is an arbitrary number which can be used to distinguish among messages comm is the communicator 79

80 Temporary Knowledge Message Msg: buf,, count, datatype Msg envelop: dest,, tag, comm Tag why? ( ) ( ) ( ) ( ) 80

81 81

82 When using standard-mode send It is up to MPI to decide whether outgoing messages will be buffered. Completes once the message has been sent, which may or may not imply that the massage has arrived at its destination Can be started whether or not a matching receive has been posted. It may complete before a matching receive is posted. Has non-local completion semantics, since successful completion of the send operation may depend on the occurrence of a matching receive. 82

83 Blocking Standard Send 83

84 MPI_Recv int MPI_Recv(void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status) buf is the beginning of the buffer where the incoming data are to be b stored. For Fortran, this is often the name of an array in your program. For C, it is an address. count is the number of elements (not bytes) in your receive buffer datatype is the type of data source is the rank of the process from which data will be accepted (This can be a wildcard, by specifying the parameter MPI_ANY_SOURCE.) tag is an arbitrary number which can be used to distinguish among messages (This can be a wildcard, by specifying the parameter MPI_ANY_TAG.) comm is the communicator status is an array or structure of information that is returned. For example, e if you specify a wildcard for source or tag, status will tell you u the actual rank or tag for the message received 84

85 85

86 86

87 Blocking Synchronous Send 87

88 Cont. can be started whether or not a matching receive was posted will complete successfully only if a matching receive is posted, and the receive operation has started to receive the message sent by the synchronous send. provides synchronous communication semantics: a communication does not complete at either end before both processes rendezvous at the communication. has non-local completion semantics. 88

89 Blocking Ready Send 89

90 completes immediately may be started only if the matching receive has already been posted. has the same semantics as a standard-mode send. saves on overhead by avoiding handshaking and buffering 90

91 Blocking Buffered Send 91

92 Can be started whether or not a matching receive has been posted. It may complete before a matching receive is posted. Has local completion semantics: its completion does not depend on the occurrence of a matching receive. In order to complete the operation, it may be necessary to buffer the outgoing message locally. For that purpose, buffer space is provided by the application. 92

93 Non-Blocking Behavior MPI_Isend (buf,count,dtype,dest,tag,comm, buf,count,dtype,dest,tag,comm,request) MPI_Wait (request,status) request matches request on Isend or Irecv request status returns status equivalent to status for Recv when complete Blocks for send until message is buffered or sent so message variable is free Blocks for receive until message is received and ready 93

94 Non-blocking Synchronous Send int MPI_Issend (void *buf* buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm, MPI_Request *request) IN = provided by programmer, OUT = set by routine buf: starting address of message buffer (IN( IN) count: : number of elements in message (IN( IN) datatype: : type of elements in message (IN( IN) dest: : rank of destination task in communicator comm (IN) tag: : message tag (IN( IN) comm: : communicator (IN( IN) request: : identifies a communication event (OUT( OUT) 94

95 Non-blocking Ready Send int MPI_Irsend (void *buf* buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm, MPI_Request *request) 95

96 Non-blocking Buffered Send int MPI_Ibsend (void *buf* buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm, MPI_Request *request) 96

97 Non-blocking Standard Send int MPI_Isend (void *buf* buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm, MPI_Request *request) 97

98 Non-blocking Receive IN = provided by programmer, OUT = set by routine buf: : starting address of message buffer (OUT-buffer contents written) count: : number of elements in message (IN( IN) datatype: : type of elements in message (IN( IN) source: : rank of source task in communicator comm (IN) tag: : message tag (IN( IN) comm: : communicator (IN( IN) request: : identifies a communication event (OUT( OUT) 98

99 int MPI_Irecv (void* buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Request *request) 99

100 request: : identifies a communication event (INOUT( INOUT) status: : status of communication event (OUT( OUT) count: : number of communication events (IN( IN) index: : index in array of requests of completed event (OUT( OUT) incount: : number of communication events (IN( IN) outcount: : number of completed events (OUT( OUT) 100

101 int MPI_Wait (MPI_Request *request, MPI_Status *status) int MPI_Waitall (int count, MPI_Request *array_of_requests, MPI_Status *array_of_statuses) int MPI_Waitany (int count, MPI_Request *array_of_requests, int *index, MPI_Status *status) int MPI_Waitsome (int incount, MPI_Request *array_of_requests, int *outcount, int* array_of_indices, MPI_Status *array_of_statuses) 101

102 Communication Mode Synchronous Ready Buffered Standard Blocking Routines MPI_SSEND MPI_RSEND MPI_BSEND MPI_SEND MPI_RECV Non-Blocking Routines MPI_ISSEND MPI_IRSEND MPI_IBSEND MPI_ISEND MPI_IRECV 102

103 Synchro nous Ready Buffered Standard Advantages Safest, and therefore most portable SEND/RECV order not critical Amount of buffer space irrelevant Lowest total overhead SEND/RECV handshake not required Decouples SEND from RECV No sync overhead on SEND Order of SEND/RECV irrelevant Programmer can control size of buffer space Good for many cases Disadvantages Can incur substantial synchronization overhead RECV must precede SEND Additional system overhead incurred by copy to buffer Your program may not be suitable 103

104 MPI Quick Start MPI_Init MPI_BCast MPI_Wtime MPI_Comm_rank MPI_Scatter MPI_Wtick MPI_Comm_size MPI_Gather MPI_Barrier MPI_Send MPI_Reduce MPI_Recv MPI_Finalize MPI_Xxxxx 104

105 MPI Routines MPI_Init To Initialize MPI execution environment. argc: Pointer to the number of arguments argv: Pointer to the argument vector The First MPI function call; Allow system to do any setup needed to hander further calls to MPI Library. defines a communicator called MPI_COMM_WORLD for each process that calls it MPI_Init must be called before any other MPI functions. Exception: MPI_Initializes,, checks to see if MPI has been initialzed. May be called before MPI_Init. 105

106 MPI Routines MPI_Comm_rank To determine a process s s ID number. Return: Process s s ID by rank Communicator: MPI_Comm: : MPI_COMM_WORLD, include all process when MPI initialized. 106

107 MPI Routines MPI_Comm_size To find the number of processes -- size 107

108 MPI Routines MPI_Send The source process send the data in buffer to destination process. buf count The starting address of the data to be transmitted. The number of data items. datatype The type of data items.(all of the data items must be in the same type) dest tag comm The rank of the process to receive the data. An integer label for the message, allowing messages serving different purpose to be identified. Indicates the communicator in which this message is being sent. 108

109 MPI Routines MPI_Send Blocks until the message buffer is once again availabel. MPI constants for C data types. 109

110 MPI Routines MPI_Recv buf count The starting address where the received data is to be stored. The maximum number of data items the receiving process is willing to receive. datatype The type of data items source tag comm status The rank of the process sending this message. The desired tag value for the message Indicates the communicator in which this message is being passed. MPI data structure. Return the status. 110

111 MPI Routines MPI_Recv Receive the message from the source process. The data type and tag of message received must be in according with the data type and tag define in MPI_Recv funciton. The count of data items received must be less than the count define in this function. Otherwise, will cause the overflow error condition. If count equal to zero, then message is empty. Blocks until the message has been recived. Or an error conditions cause the function to return. 111

112 MPI Routines MPI_Recv status->mpi_source status->mpi_tag The rank of the process sending the msg. The msg s tag value. status->mpi_erroe The error condition. int MPI_Abort(MPI_Comm comm, int errorcode) 112

113 MPI Routines MPI_Finalize Allowing system to free up resources, such as memory, that have been allocated to MPI. Without MPI_Finalize,, the result of program will unknowns. 113

114 summary 114

115 Collective communication Communication operation A group of processes work together to distribute or gather together a set of one or more values. 115

116 Collective communication MPI_Bcast A root process broadcast one or more data items of the same type to all other processed in a communicator. 116

117 Collective communication MPI_Bcast int MPI_Bcast( void* buffer, //addr of 1st broadcast element int count, // #element to be broadcast MPI_Datatype datatype, // type of element to be broadcast int root, // ID of process doing broadcast MPI_Comm comm) //communicator 117

118 Collective communication MPI_Scatter The root process send the different parts of data item to other processes. 118

119 Collective communication MPI_Scatter 119

120 Collective communication MPI_Gather Each process sending data of its buffer to root process

121 Collective communication MPI_Gather 121

122 Collective communication MPI_Reduce After a process has completed its share of the work, it is ready to participate in the reduction operation. MPI_Reduce perform one or more reduction operations on values submitted by all the processed in a communicator. 122

123 Collective communication MPI_Reduce 123

124 Collective communication MPI_Reduce MPI s built-in in reduction operators MPI_BAND MPI_BOR MPI_BXOR MPI_LAND MPI_LOR MPI_LXOR MPI_MAX MPI_MAXLOC MPI_MIN MPI_MINLOC MPI_PORD MPI_SUM Bitwise and Bitwise or Bitwise exclusive or logical and logical or Logical exclusive or Maximum Maximum and location of maximum Minimum Minimum and location of maximum Product Sum 124

125 summary 125

126 126

127 127

128 128

129 Benchmarking parallel performance Measure the performance of a parallel application. How? Measuring the number of seconds that elapse from the time we initiate execution until the program terminates. double MPI_Wtime(void) Returns the numbers of seconds that have elapsed since some point of time in the past. double MPI_Wtick(void) Returns the precision of the result returned by MPI_Wtime. 129

130 Benchmarking parallel performance MPI_Barrier int MPI_Barrier(MPI_Comm comm) comm: : indicate in which communicator the processes will participate the barrier synchronization. Function of MPI_Barrier is. 130

131 For example Send and receive operation 131

132 For example Compute pi dx = arctan( x) 0 = arctan(1) arctan(0) = arctan(1) = π / x f 4 ( x) = 2 (1 + x ) 1 f ( x) dx = π 0 132

133 For example π = N 2 i 1 f ( ) 2 N 1 N i= 1 i= 1 = 1 N N f i ( 0.5 ) N 133

134 For example Compute pi 134

135 For example Matrix Multiplication MPI_Scatter(&iaA[0][0],N,MPI_INT,&iaA[iRank][0],N,MPI_INT,0,MPI_COMM_WORLD); MPI_Bcast(&iaB[0][0],N*N,MPI_INT,0,MPI_COMM_WORLD); for(i=0;i<n;i++) { temp = 0; for(j=0;j<n;j++) { remp = temp+iaa[irank][j] * iab[j][i]; } iac[irank][i] = temp; } MPI_Gather(&iaC[iRank][0],N,MPI_INT,&iaC[0][0],N,MPI_INT,0,MPI_COMM_WORLD); 135

136 136

137 l 1 C i, = = a b j i, k k, j k 0 where A is an n x l matrix and B is an l x m matrix. 137

138 138

139 139

140 140

141 141

142 Summary MPI is a Library. Six foundational functions of MPI. collective communication. MPI communication Model. 142

143 Thanks! Fell free to contact me via for any questions or suggestions. And Welcome to Wuhan University!

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 4 Message-Passing Programming Learning Objectives n Understanding how MPI programs execute n Familiarity with fundamental MPI functions

More information

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 4 Message-Passing Programming Learning Objectives Understanding how MPI programs execute Familiarity with fundamental MPI functions

More information

Introduction to parallel computing concepts and technics

Introduction to parallel computing concepts and technics Introduction to parallel computing concepts and technics Paschalis Korosoglou (support@grid.auth.gr) User and Application Support Unit Scientific Computing Center @ AUTH Overview of Parallel computing

More information

Outline. Communication modes MPI Message Passing Interface Standard

Outline. Communication modes MPI Message Passing Interface Standard MPI THOAI NAM Outline Communication modes MPI Message Passing Interface Standard TERMs (1) Blocking If return from the procedure indicates the user is allowed to reuse resources specified in the call Non-blocking

More information

MPI Message Passing Interface

MPI Message Passing Interface MPI Message Passing Interface Portable Parallel Programs Parallel Computing A problem is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information

More information

Practical Scientific Computing: Performanceoptimized

Practical Scientific Computing: Performanceoptimized Practical Scientific Computing: Performanceoptimized Programming Programming with MPI November 29, 2006 Dr. Ralf-Peter Mundani Department of Computer Science Chair V Technische Universität München, Germany

More information

Outline. Communication modes MPI Message Passing Interface Standard. Khoa Coâng Ngheä Thoâng Tin Ñaïi Hoïc Baùch Khoa Tp.HCM

Outline. Communication modes MPI Message Passing Interface Standard. Khoa Coâng Ngheä Thoâng Tin Ñaïi Hoïc Baùch Khoa Tp.HCM THOAI NAM Outline Communication modes MPI Message Passing Interface Standard TERMs (1) Blocking If return from the procedure indicates the user is allowed to reuse resources specified in the call Non-blocking

More information

Message Passing Interface

Message Passing Interface Message Passing Interface DPHPC15 TA: Salvatore Di Girolamo DSM (Distributed Shared Memory) Message Passing MPI (Message Passing Interface) A message passing specification implemented

More information

Point-to-Point Communication. Reference:

Point-to-Point Communication. Reference: Point-to-Point Communication Reference: http://foxtrot.ncsa.uiuc.edu:8900/public/mpi/ Introduction Point-to-point communication is the fundamental communication facility provided by the MPI library. Point-to-point

More information

Slides prepared by : Farzana Rahman 1

Slides prepared by : Farzana Rahman 1 Introduction to MPI 1 Background on MPI MPI - Message Passing Interface Library standard defined by a committee of vendors, implementers, and parallel programmers Used to create parallel programs based

More information

Message Passing Interface. most of the slides taken from Hanjun Kim

Message Passing Interface. most of the slides taken from Hanjun Kim Message Passing Interface most of the slides taken from Hanjun Kim Message Passing Pros Scalable, Flexible Cons Someone says it s more difficult than DSM MPI (Message Passing Interface) A standard message

More information

Acknowledgments. Programming with MPI Basic send and receive. A Minimal MPI Program (C) Contents. Type to enter text

Acknowledgments. Programming with MPI Basic send and receive. A Minimal MPI Program (C) Contents. Type to enter text Acknowledgments Programming with MPI Basic send and receive Jan Thorbecke Type to enter text This course is partly based on the MPI course developed by Rolf Rabenseifner at the High-Performance Computing-Center

More information

Programming with MPI Basic send and receive

Programming with MPI Basic send and receive Programming with MPI Basic send and receive Jan Thorbecke Type to enter text Delft University of Technology Challenge the future Acknowledgments This course is partly based on the MPI course developed

More information

Part - II. Message Passing Interface. Dheeraj Bhardwaj

Part - II. Message Passing Interface. Dheeraj Bhardwaj Part - II Dheeraj Bhardwaj Department of Computer Science & Engineering Indian Institute of Technology, Delhi 110016 India http://www.cse.iitd.ac.in/~dheerajb 1 Outlines Basics of MPI How to compile and

More information

CS 426. Building and Running a Parallel Application

CS 426. Building and Running a Parallel Application CS 426 Building and Running a Parallel Application 1 Task/Channel Model Design Efficient Parallel Programs (or Algorithms) Mainly for distributed memory systems (e.g. Clusters) Break Parallel Computations

More information

MPI. (message passing, MIMD)

MPI. (message passing, MIMD) MPI (message passing, MIMD) What is MPI? a message-passing library specification extension of C/C++ (and Fortran) message passing for distributed memory parallel programming Features of MPI Point-to-point

More information

Message Passing Interface. George Bosilca

Message Passing Interface. George Bosilca Message Passing Interface George Bosilca bosilca@icl.utk.edu Message Passing Interface Standard http://www.mpi-forum.org Current version: 3.1 All parallelism is explicit: the programmer is responsible

More information

Lesson 1. MPI runs on distributed memory systems, shared memory systems, or hybrid systems.

Lesson 1. MPI runs on distributed memory systems, shared memory systems, or hybrid systems. The goals of this lesson are: understanding the MPI programming model managing the MPI environment handling errors point-to-point communication 1. The MPI Environment Lesson 1 MPI (Message Passing Interface)

More information

Standard MPI - Message Passing Interface

Standard MPI - Message Passing Interface c Ewa Szynkiewicz, 2007 1 Standard MPI - Message Passing Interface The message-passing paradigm is one of the oldest and most widely used approaches for programming parallel machines, especially those

More information

Chapter 4. Message-passing Model

Chapter 4. Message-passing Model Chapter 4 Message-Passing Programming Message-passing Model 2 1 Characteristics of Processes Number is specified at start-up time Remains constant throughout the execution of program All execute same program

More information

High Performance Computing

High Performance Computing High Performance Computing Course Notes 2009-2010 2010 Message Passing Programming II 1 Communications Point-to-point communications: involving exact two processes, one sender and one receiver For example,

More information

The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing

The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Parallelism Decompose the execution into several tasks according to the work to be done: Function/Task

More information

Distributed Memory Parallel Programming

Distributed Memory Parallel Programming COSC Big Data Analytics Parallel Programming using MPI Edgar Gabriel Spring 201 Distributed Memory Parallel Programming Vast majority of clusters are homogeneous Necessitated by the complexity of maintaining

More information

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI CS 470 Spring 2017 Mike Lam, Professor Distributed Programming & MPI MPI paradigm Single program, multiple data (SPMD) One program, multiple processes (ranks) Processes communicate via messages An MPI

More information

Recap of Parallelism & MPI

Recap of Parallelism & MPI Recap of Parallelism & MPI Chris Brady Heather Ratcliffe The Angry Penguin, used under creative commons licence from Swantje Hess and Jannis Pohlmann. Warwick RSE 13/12/2017 Parallel programming Break

More information

CSE 613: Parallel Programming. Lecture 21 ( The Message Passing Interface )

CSE 613: Parallel Programming. Lecture 21 ( The Message Passing Interface ) CSE 613: Parallel Programming Lecture 21 ( The Message Passing Interface ) Jesmin Jahan Tithi Department of Computer Science SUNY Stony Brook Fall 2013 ( Slides from Rezaul A. Chowdhury ) Principles of

More information

Parallel Computing Paradigms

Parallel Computing Paradigms Parallel Computing Paradigms Message Passing João Luís Ferreira Sobral Departamento do Informática Universidade do Minho 31 October 2017 Communication paradigms for distributed memory Message passing is

More information

Message Passing Interface

Message Passing Interface Message Passing Interface by Kuan Lu 03.07.2012 Scientific researcher at Georg-August-Universität Göttingen and Gesellschaft für wissenschaftliche Datenverarbeitung mbh Göttingen Am Faßberg, 37077 Göttingen,

More information

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI CS 470 Spring 2018 Mike Lam, Professor Distributed Programming & MPI MPI paradigm Single program, multiple data (SPMD) One program, multiple processes (ranks) Processes communicate via messages An MPI

More information

Introduction to MPI. May 20, Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign

Introduction to MPI. May 20, Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign Introduction to MPI May 20, 2013 Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign Top500.org PERFORMANCE DEVELOPMENT 1 Eflop/s 162 Pflop/s PROJECTED 100 Pflop/s

More information

IPM Workshop on High Performance Computing (HPC08) IPM School of Physics Workshop on High Perfomance Computing/HPC08

IPM Workshop on High Performance Computing (HPC08) IPM School of Physics Workshop on High Perfomance Computing/HPC08 IPM School of Physics Workshop on High Perfomance Computing/HPC08 16-21 February 2008 MPI tutorial Luca Heltai Stefano Cozzini Democritos/INFM + SISSA 1 When

More information

COSC 6374 Parallel Computation

COSC 6374 Parallel Computation COSC 6374 Parallel Computation Message Passing Interface (MPI ) II Advanced point-to-point operations Spring 2008 Overview Point-to-point taxonomy and available functions What is the status of a message?

More information

CS 6230: High-Performance Computing and Parallelization Introduction to MPI

CS 6230: High-Performance Computing and Parallelization Introduction to MPI CS 6230: High-Performance Computing and Parallelization Introduction to MPI Dr. Mike Kirby School of Computing and Scientific Computing and Imaging Institute University of Utah Salt Lake City, UT, USA

More information

Discussion: MPI Basic Point to Point Communication I. Table of Contents. Cornell Theory Center

Discussion: MPI Basic Point to Point Communication I. Table of Contents. Cornell Theory Center 1 of 14 11/1/2006 3:58 PM Cornell Theory Center Discussion: MPI Point to Point Communication I This is the in-depth discussion layer of a two-part module. For an explanation of the layers and how to navigate

More information

Topics. Lecture 7. Review. Other MPI collective functions. Collective Communication (cont d) MPI Programming (III)

Topics. Lecture 7. Review. Other MPI collective functions. Collective Communication (cont d) MPI Programming (III) Topics Lecture 7 MPI Programming (III) Collective communication (cont d) Point-to-point communication Basic point-to-point communication Non-blocking point-to-point communication Four modes of blocking

More information

Parallel Short Course. Distributed memory machines

Parallel Short Course. Distributed memory machines Parallel Short Course Message Passing Interface (MPI ) I Introduction and Point-to-point operations Spring 2007 Distributed memory machines local disks Memory Network card 1 Compute node message passing

More information

CS 179: GPU Programming. Lecture 14: Inter-process Communication

CS 179: GPU Programming. Lecture 14: Inter-process Communication CS 179: GPU Programming Lecture 14: Inter-process Communication The Problem What if we want to use GPUs across a distributed system? GPU cluster, CSIRO Distributed System A collection of computers Each

More information

Scientific Computing

Scientific Computing Lecture on Scientific Computing Dr. Kersten Schmidt Lecture 21 Technische Universität Berlin Institut für Mathematik Wintersemester 2014/2015 Syllabus Linear Regression, Fast Fourier transform Modelling

More information

Programming Scalable Systems with MPI. Clemens Grelck, University of Amsterdam

Programming Scalable Systems with MPI. Clemens Grelck, University of Amsterdam Clemens Grelck University of Amsterdam UvA / SurfSARA High Performance Computing and Big Data Course June 2014 Parallel Programming with Compiler Directives: OpenMP Message Passing Gentle Introduction

More information

Holland Computing Center Kickstart MPI Intro

Holland Computing Center Kickstart MPI Intro Holland Computing Center Kickstart 2016 MPI Intro Message Passing Interface (MPI) MPI is a specification for message passing library that is standardized by MPI Forum Multiple vendor-specific implementations:

More information

Topics. Lecture 6. Point-to-point Communication. Point-to-point Communication. Broadcast. Basic Point-to-point communication. MPI Programming (III)

Topics. Lecture 6. Point-to-point Communication. Point-to-point Communication. Broadcast. Basic Point-to-point communication. MPI Programming (III) Topics Lecture 6 MPI Programming (III) Point-to-point communication Basic point-to-point communication Non-blocking point-to-point communication Four modes of blocking communication Manager-Worker Programming

More information

Chip Multiprocessors COMP Lecture 9 - OpenMP & MPI

Chip Multiprocessors COMP Lecture 9 - OpenMP & MPI Chip Multiprocessors COMP35112 Lecture 9 - OpenMP & MPI Graham Riley 14 February 2018 1 Today s Lecture Dividing work to be done in parallel between threads in Java (as you are doing in the labs) is rather

More information

A Message Passing Standard for MPP and Workstations

A Message Passing Standard for MPP and Workstations A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker Message Passing Interface (MPI) Message passing library Can be

More information

Message Passing Interface

Message Passing Interface MPSoC Architectures MPI Alberto Bosio, Associate Professor UM Microelectronic Departement bosio@lirmm.fr Message Passing Interface API for distributed-memory programming parallel code that runs across

More information

Parallel programming MPI

Parallel programming MPI Parallel programming MPI Distributed memory Each unit has its own memory space If a unit needs data in some other memory space, explicit communication (often through network) is required Point-to-point

More information

HPC Parallel Programing Multi-node Computation with MPI - I

HPC Parallel Programing Multi-node Computation with MPI - I HPC Parallel Programing Multi-node Computation with MPI - I Parallelization and Optimization Group TATA Consultancy Services, Sahyadri Park Pune, India TCS all rights reserved April 29, 2013 Copyright

More information

Parallel Programming. Using MPI (Message Passing Interface)

Parallel Programming. Using MPI (Message Passing Interface) Parallel Programming Using MPI (Message Passing Interface) Message Passing Model Simple implementation of the task/channel model Task Process Channel Message Suitable for a multicomputer Number of processes

More information

MPI: Parallel Programming for Extreme Machines. Si Hammond, High Performance Systems Group

MPI: Parallel Programming for Extreme Machines. Si Hammond, High Performance Systems Group MPI: Parallel Programming for Extreme Machines Si Hammond, High Performance Systems Group Quick Introduction Si Hammond, (sdh@dcs.warwick.ac.uk) WPRF/PhD Research student, High Performance Systems Group,

More information

Introduction to the Message Passing Interface (MPI)

Introduction to the Message Passing Interface (MPI) Introduction to the Message Passing Interface (MPI) CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Introduction to the Message Passing Interface (MPI) Spring 2018

More information

A few words about MPI (Message Passing Interface) T. Edwald 10 June 2008

A few words about MPI (Message Passing Interface) T. Edwald 10 June 2008 A few words about MPI (Message Passing Interface) T. Edwald 10 June 2008 1 Overview Introduction and very short historical review MPI - as simple as it comes Communications Process Topologies (I have no

More information

Distributed Systems + Middleware Advanced Message Passing with MPI

Distributed Systems + Middleware Advanced Message Passing with MPI Distributed Systems + Middleware Advanced Message Passing with MPI Gianpaolo Cugola Dipartimento di Elettronica e Informazione Politecnico, Italy cugola@elet.polimi.it http://home.dei.polimi.it/cugola

More information

Programming SoHPC Course June-July 2015 Vladimir Subotic MPI - Message Passing Interface

Programming SoHPC Course June-July 2015 Vladimir Subotic MPI - Message Passing Interface www.bsc.es Programming with Message-Passing Libraries SoHPC Course June-July 2015 Vladimir Subotic 1 Data Transfer Blocking: Function does not return, before message can be accessed again Process is blocked

More information

Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS. Teacher: Jan Kwiatkowski, Office 201/15, D-2

Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS. Teacher: Jan Kwiatkowski, Office 201/15, D-2 Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS Teacher: Jan Kwiatkowski, Office 201/15, D-2 COMMUNICATION For questions, email to jan.kwiatkowski@pwr.edu.pl with 'Subject=your name.

More information

Experiencing Cluster Computing Message Passing Interface

Experiencing Cluster Computing Message Passing Interface Experiencing Cluster Computing Message Passing Interface Class 6 Message Passing Paradigm The Underlying Principle A parallel program consists of p processes with different address spaces. Communication

More information

Document Classification

Document Classification Document Classification Introduction Search engine on web Search directories, subdirectories for documents Search for documents with extensions.html,.txt, and.tex Using a dictionary of key words, create

More information

Programming Scalable Systems with MPI. UvA / SURFsara High Performance Computing and Big Data. Clemens Grelck, University of Amsterdam

Programming Scalable Systems with MPI. UvA / SURFsara High Performance Computing and Big Data. Clemens Grelck, University of Amsterdam Clemens Grelck University of Amsterdam UvA / SURFsara High Performance Computing and Big Data Message Passing as a Programming Paradigm Gentle Introduction to MPI Point-to-point Communication Message Passing

More information

Parallel Programming

Parallel Programming Parallel Programming Point-to-point communication Prof. Paolo Bientinesi pauldj@aices.rwth-aachen.de WS 18/19 Scenario Process P i owns matrix A i, with i = 0,..., p 1. Objective { Even(i) : compute Ti

More information

MPI MESSAGE PASSING INTERFACE

MPI MESSAGE PASSING INTERFACE MPI MESSAGE PASSING INTERFACE David COLIGNON, ULiège CÉCI - Consortium des Équipements de Calcul Intensif http://www.ceci-hpc.be Outline Introduction From serial source code to parallel execution MPI functions

More information

Message Passing Interface - MPI

Message Passing Interface - MPI Message Passing Interface - MPI Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico October 24, 2011 Many slides adapted from lectures by

More information

Introduction to MPI: Part II

Introduction to MPI: Part II Introduction to MPI: Part II Pawel Pomorski, University of Waterloo, SHARCNET ppomorsk@sharcnetca November 25, 2015 Summary of Part I: To write working MPI (Message Passing Interface) parallel programs

More information

Lecture 7: Distributed memory

Lecture 7: Distributed memory Lecture 7: Distributed memory David Bindel 15 Feb 2010 Logistics HW 1 due Wednesday: See wiki for notes on: Bottom-up strategy and debugging Matrix allocation issues Using SSE and alignment comments Timing

More information

MPI MESSAGE PASSING INTERFACE

MPI MESSAGE PASSING INTERFACE MPI MESSAGE PASSING INTERFACE David COLIGNON CÉCI - Consortium des Équipements de Calcul Intensif http://hpc.montefiore.ulg.ac.be Outline Introduction From serial source code to parallel execution MPI

More information

4. Parallel Programming with MPI

4. Parallel Programming with MPI 4. Parallel Programming with MPI 4. Parallel Programming with MPI... 4.. MPI: Basic Concepts and Definitions...3 4... The Concept of Parallel Program...3 4..2. Data Communication Operations...3 4..3. Communicators...3

More information

An Introduction to MPI

An Introduction to MPI An Introduction to MPI Parallel Programming with the Message Passing Interface William Gropp Ewing Lusk Argonne National Laboratory 1 Outline Background The message-passing model Origins of MPI and current

More information

High Performance Computing Course Notes Message Passing Programming I

High Performance Computing Course Notes Message Passing Programming I High Performance Computing Course Notes 2008-2009 2009 Message Passing Programming I Message Passing Programming Message Passing is the most widely used parallel programming model Message passing works

More information

Message Passing Interface - MPI

Message Passing Interface - MPI Message Passing Interface - MPI Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico March 31, 2016 Many slides adapted from lectures by Bill

More information

COMP 322: Fundamentals of Parallel Programming

COMP 322: Fundamentals of Parallel Programming COMP 322: Fundamentals of Parallel Programming https://wiki.rice.edu/confluence/display/parprog/comp322 Lecture 37: Introduction to MPI (contd) Vivek Sarkar Department of Computer Science Rice University

More information

CS4961 Parallel Programming. Lecture 18: Introduction to Message Passing 11/3/10. Final Project Purpose: Mary Hall November 2, 2010.

CS4961 Parallel Programming. Lecture 18: Introduction to Message Passing 11/3/10. Final Project Purpose: Mary Hall November 2, 2010. Parallel Programming Lecture 18: Introduction to Message Passing Mary Hall November 2, 2010 Final Project Purpose: - A chance to dig in deeper into a parallel programming model and explore concepts. -

More information

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 Message passing vs. Shared memory Client Client Client Client send(msg) recv(msg) send(msg) recv(msg) MSG MSG MSG IPC Shared

More information

The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs

The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs 1 The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) s http://mpi-forum.org https://www.open-mpi.org/ Mike Bailey mjb@cs.oregonstate.edu Oregon State University mpi.pptx

More information

CS4961 Parallel Programming. Lecture 16: Introduction to Message Passing 11/3/11. Administrative. Mary Hall November 3, 2011.

CS4961 Parallel Programming. Lecture 16: Introduction to Message Passing 11/3/11. Administrative. Mary Hall November 3, 2011. CS4961 Parallel Programming Lecture 16: Introduction to Message Passing Administrative Next programming assignment due on Monday, Nov. 7 at midnight Need to define teams and have initial conversation with

More information

High performance computing. Message Passing Interface

High performance computing. Message Passing Interface High performance computing Message Passing Interface send-receive paradigm sending the message: send (target, id, data) receiving the message: receive (source, id, data) Versatility of the model High efficiency

More information

Message-Passing Computing

Message-Passing Computing Chapter 2 Slide 41þþ Message-Passing Computing Slide 42þþ Basics of Message-Passing Programming using userlevel message passing libraries Two primary mechanisms needed: 1. A method of creating separate

More information

Programming with MPI. Pedro Velho

Programming with MPI. Pedro Velho Programming with MPI Pedro Velho Science Research Challenges Some applications require tremendous computing power - Stress the limits of computing power and storage - Who might be interested in those applications?

More information

Collective Communication in MPI and Advanced Features

Collective Communication in MPI and Advanced Features Collective Communication in MPI and Advanced Features Pacheco s book. Chapter 3 T. Yang, CS240A. Part of slides from the text book, CS267 K. Yelick from UC Berkeley and B. Gropp, ANL Outline Collective

More information

Parallel Programming with MPI: Day 1

Parallel Programming with MPI: Day 1 Parallel Programming with MPI: Day 1 Science & Technology Support High Performance Computing Ohio Supercomputer Center 1224 Kinnear Road Columbus, OH 43212-1163 1 Table of Contents Brief History of MPI

More information

Week 3: MPI. Day 02 :: Message passing, point-to-point and collective communications

Week 3: MPI. Day 02 :: Message passing, point-to-point and collective communications Week 3: MPI Day 02 :: Message passing, point-to-point and collective communications Message passing What is MPI? A message-passing interface standard MPI-1.0: 1993 MPI-1.1: 1995 MPI-2.0: 1997 (backward-compatible

More information

Programming with MPI on GridRS. Dr. Márcio Castro e Dr. Pedro Velho

Programming with MPI on GridRS. Dr. Márcio Castro e Dr. Pedro Velho Programming with MPI on GridRS Dr. Márcio Castro e Dr. Pedro Velho Science Research Challenges Some applications require tremendous computing power - Stress the limits of computing power and storage -

More information

PCAP Assignment I. 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail.

PCAP Assignment I. 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail. PCAP Assignment I 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail. The multicore CPUs are designed to maximize the execution speed

More information

15-440: Recitation 8

15-440: Recitation 8 15-440: Recitation 8 School of Computer Science Carnegie Mellon University, Qatar Fall 2013 Date: Oct 31, 2013 I- Intended Learning Outcome (ILO): The ILO of this recitation is: Apply parallel programs

More information

Introduction to MPI. Ricardo Fonseca. https://sites.google.com/view/rafonseca2017/

Introduction to MPI. Ricardo Fonseca. https://sites.google.com/view/rafonseca2017/ Introduction to MPI Ricardo Fonseca https://sites.google.com/view/rafonseca2017/ Outline Distributed Memory Programming (MPI) Message Passing Model Initializing and terminating programs Point to point

More information

Cluster Computing MPI. Industrial Standard Message Passing

Cluster Computing MPI. Industrial Standard Message Passing MPI Industrial Standard Message Passing MPI Features Industrial Standard Highly portable Widely available SPMD programming model Synchronous execution MPI Outer scope int MPI_Init( int *argc, char ** argv)

More information

MPI and comparison of models Lecture 23, cs262a. Ion Stoica & Ali Ghodsi UC Berkeley April 16, 2018

MPI and comparison of models Lecture 23, cs262a. Ion Stoica & Ali Ghodsi UC Berkeley April 16, 2018 MPI and comparison of models Lecture 23, cs262a Ion Stoica & Ali Ghodsi UC Berkeley April 16, 2018 MPI MPI - Message Passing Interface Library standard defined by a committee of vendors, implementers,

More information

Introduction to Parallel Programming

Introduction to Parallel Programming University of Nizhni Novgorod Faculty of Computational Mathematics & Cybernetics Section 4. Part 1. Introduction to Parallel Programming Parallel Programming with MPI Gergel V.P., Professor, D.Sc., Software

More information

Distributed Memory Programming with Message-Passing

Distributed Memory Programming with Message-Passing Distributed Memory Programming with Message-Passing Pacheco s book Chapter 3 T. Yang, CS240A Part of slides from the text book and B. Gropp Outline An overview of MPI programming Six MPI functions and

More information

The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs

The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs 1 The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs http://mpi-forum.org https://www.open-mpi.org/ Mike Bailey mjb@cs.oregonstate.edu Oregon State University mpi.pptx

More information

Parallel Computing. Distributed memory model MPI. Leopold Grinberg T. J. Watson IBM Research Center, USA. Instructor: Leopold Grinberg

Parallel Computing. Distributed memory model MPI. Leopold Grinberg T. J. Watson IBM Research Center, USA. Instructor: Leopold Grinberg Parallel Computing Distributed memory model MPI Leopold Grinberg T. J. Watson IBM Research Center, USA Why do we need to compute in parallel large problem size - memory constraints computation on a single

More information

Department of Informatics V. HPC-Lab. Session 4: MPI, CG M. Bader, A. Breuer. Alex Breuer

Department of Informatics V. HPC-Lab. Session 4: MPI, CG M. Bader, A. Breuer. Alex Breuer HPC-Lab Session 4: MPI, CG M. Bader, A. Breuer Meetings Date Schedule 10/13/14 Kickoff 10/20/14 Q&A 10/27/14 Presentation 1 11/03/14 H. Bast, Intel 11/10/14 Presentation 2 12/01/14 Presentation 3 12/08/14

More information

Framework of an MPI Program

Framework of an MPI Program MPI Charles Bacon Framework of an MPI Program Initialize the MPI environment MPI_Init( ) Run computation / message passing Finalize the MPI environment MPI_Finalize() Hello World fragment #include

More information

What s in this talk? Quick Introduction. Programming in Parallel

What s in this talk? Quick Introduction. Programming in Parallel What s in this talk? Parallel programming methodologies - why MPI? Where can I use MPI? MPI in action Getting MPI to work at Warwick Examples MPI: Parallel Programming for Extreme Machines Si Hammond,

More information

Basic MPI Communications. Basic MPI Communications (cont d)

Basic MPI Communications. Basic MPI Communications (cont d) Basic MPI Communications MPI provides two non-blocking routines: MPI_Isend(buf,cnt,type,dst,tag,comm,reqHandle) buf: source of data to be sent cnt: number of data elements to be sent type: type of each

More information

Practical Introduction to Message-Passing Interface (MPI)

Practical Introduction to Message-Passing Interface (MPI) 1 Practical Introduction to Message-Passing Interface (MPI) October 1st, 2015 By: Pier-Luc St-Onge Partners and Sponsors 2 Setup for the workshop 1. Get a user ID and password paper (provided in class):

More information

CS4961 Parallel Programming. Lecture 19: Message Passing, cont. 11/5/10. Programming Assignment #3: Simple CUDA Due Thursday, November 18, 11:59 PM

CS4961 Parallel Programming. Lecture 19: Message Passing, cont. 11/5/10. Programming Assignment #3: Simple CUDA Due Thursday, November 18, 11:59 PM Parallel Programming Lecture 19: Message Passing, cont. Mary Hall November 4, 2010 Programming Assignment #3: Simple CUDA Due Thursday, November 18, 11:59 PM Today we will cover Successive Over Relaxation.

More information

int sum;... sum = sum + c?

int sum;... sum = sum + c? int sum;... sum = sum + c? Version Cores Time (secs) Speedup manycore Message Passing Interface mpiexec int main( ) { int ; char ; } MPI_Init( ); MPI_Comm_size(, &N); MPI_Comm_rank(, &R); gethostname(

More information

Parallel Computing and the MPI environment

Parallel Computing and the MPI environment Parallel Computing and the MPI environment Claudio Chiaruttini Dipartimento di Matematica e Informatica Centro Interdipartimentale per le Scienze Computazionali (CISC) Università di Trieste http://www.dmi.units.it/~chiarutt/didattica/parallela

More information

COSC 6374 Parallel Computation. Message Passing Interface (MPI ) I Introduction. Distributed memory machines

COSC 6374 Parallel Computation. Message Passing Interface (MPI ) I Introduction. Distributed memory machines Network card Network card 1 COSC 6374 Parallel Computation Message Passing Interface (MPI ) I Introduction Edgar Gabriel Fall 015 Distributed memory machines Each compute node represents an independent

More information

Programming Using the Message Passing Paradigm

Programming Using the Message Passing Paradigm Programming Using the Message Passing Paradigm Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003. Topic Overview

More information

More about MPI programming. More about MPI programming p. 1

More about MPI programming. More about MPI programming p. 1 More about MPI programming More about MPI programming p. 1 Some recaps (1) One way of categorizing parallel computers is by looking at the memory configuration: In shared-memory systems, the CPUs share

More information

CSE 160 Lecture 15. Message Passing

CSE 160 Lecture 15. Message Passing CSE 160 Lecture 15 Message Passing Announcements 2013 Scott B. Baden / CSE 160 / Fall 2013 2 Message passing Today s lecture The Message Passing Interface - MPI A first MPI Application The Trapezoidal

More information

CSE. Parallel Algorithms on a cluster of PCs. Ian Bush. Daresbury Laboratory (With thanks to Lorna Smith and Mark Bull at EPCC)

CSE. Parallel Algorithms on a cluster of PCs. Ian Bush. Daresbury Laboratory (With thanks to Lorna Smith and Mark Bull at EPCC) Parallel Algorithms on a cluster of PCs Ian Bush Daresbury Laboratory I.J.Bush@dl.ac.uk (With thanks to Lorna Smith and Mark Bull at EPCC) Overview This lecture will cover General Message passing concepts

More information