The Message Passing Model

Similar documents
Tutorial: parallel coding MPI

Message Passing Interface

Collective Communication in MPI and Advanced Features

CS 426. Building and Running a Parallel Application

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

MPI. (message passing, MIMD)

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

An Introduction to MPI

Message Passing Interface

Practical Introduction to Message-Passing Interface (MPI)

Parallel Programming Using MPI

Lesson 1. MPI runs on distributed memory systems, shared memory systems, or hybrid systems.

MPI Message Passing Interface

Parallel Computing Paradigms

Programming Scalable Systems with MPI. Clemens Grelck, University of Amsterdam

Introduction to MPI. Ekpe Okorafor. School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014

Introduction in Parallel Programming - MPI Part I

CSE 160 Lecture 18. Message Passing

Introduction to the Message Passing Interface (MPI)

HPC Parallel Programing Multi-node Computation with MPI - I

Holland Computing Center Kickstart MPI Intro

Message Passing Interface. most of the slides taken from Hanjun Kim

Programming Scalable Systems with MPI. UvA / SURFsara High Performance Computing and Big Data. Clemens Grelck, University of Amsterdam

MPI: Parallel Programming for Extreme Machines. Si Hammond, High Performance Systems Group

Introduction to MPI. HY555 Parallel Systems and Grids Fall 2003

Message Passing Interface

Introduction to MPI. Ricardo Fonseca.

MPI Program Structure

Practical Scientific Computing: Performanceoptimized

Parallel Programming using MPI. Supercomputing group CINECA

Parallel Programming, MPI Lecture 2

Introduction to parallel computing concepts and technics

Parallel programming with MPI Part I -Introduction and Point-to-Point Communications

CS 179: GPU Programming. Lecture 14: Inter-process Communication

Message-Passing Computing

Parallel programming with MPI Part I -Introduction and Point-to-Point

mith College Computer Science CSC352 Week #7 Spring 2017 Introduction to MPI Dominique Thiébaut

CSE 160 Lecture 15. Message Passing

The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing

Parallel Programming in C with MPI and OpenMP

Lecture 3 Message-Passing Programming Using MPI (Part 1)

Parallel Programming. Using MPI (Message Passing Interface)

Parallel Programming in C with MPI and OpenMP

Chapter 4. Message-passing Model

High Performance Computing Course Notes Message Passing Programming I

Parallel Programming with MPI: Day 1

IPM Workshop on High Performance Computing (HPC08) IPM School of Physics Workshop on High Perfomance Computing/HPC08

15-440: Recitation 8

DISTRIBUTED MEMORY PROGRAMMING WITH MPI. Carlos Jaime Barrios Hernández, PhD.

PCAP Assignment I. 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail.

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

Parallel Computing and the MPI environment

Introduction to MPI. SHARCNET MPI Lecture Series: Part I of II. Paul Preney, OCT, M.Sc., B.Ed., B.Sc.

The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs

MPI: The Message-Passing Interface. Most of this discussion is from [1] and [2].

Distributed Memory Programming with Message-Passing

CSE 613: Parallel Programming. Lecture 21 ( The Message Passing Interface )

What s in this talk? Quick Introduction. Programming in Parallel

MultiCore Architecture and Parallel Programming Final Examination

Outline. CSC 447: Parallel Programming for Multi-Core and Cluster Systems 2

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

Outline. Communication modes MPI Message Passing Interface Standard. Khoa Coâng Ngheä Thoâng Tin Ñaïi Hoïc Baùch Khoa Tp.HCM

Introduction to Parallel Programming

Department of Informatics V. HPC-Lab. Session 4: MPI, CG M. Bader, A. Breuer. Alex Breuer

MPI MPI. Linux. Linux. Message Passing Interface. Message Passing Interface. August 14, August 14, 2007 MPICH. MPI MPI Send Recv MPI

Introduction to MPI: Part II

Recap of Parallelism & MPI

Introduction to MPI. Jerome Vienne Texas Advanced Computing Center January 10 th,

Distributed Systems + Middleware Advanced Message Passing with MPI

Simple examples how to run MPI program via PBS on Taurus HPC

CS 6230: High-Performance Computing and Parallelization Introduction to MPI

Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS. Teacher: Jan Kwiatkowski, Office 201/15, D-2

MPI Collective communication

Message Passing Interface - MPI

Introduction to MPI. May 20, Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign

A message contains a number of elements of some particular datatype. MPI datatypes:

MPI introduction - exercises -

int sum;... sum = sum + c?

Slides prepared by : Farzana Rahman 1

Programming for High Performance Computing. Programming Environment Dec 11, 2014 Osamu Tatebe

MPI 5. CSCI 4850/5850 High-Performance Computing Spring 2018

Distributed Memory Parallel Programming

Reusing this material

Introduction to MPI. Ritu Arora Texas Advanced Computing Center June 17,

Introduction to MPI. John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center. Copyright 2018

Message Passing Interface - MPI

lslogin3$ cd lslogin3$ tar -xvf ~train00/mpibasic_lab.tar cd mpibasic_lab/pi cd mpibasic_lab/decomp1d

MPI and comparison of models Lecture 23, cs262a. Ion Stoica & Ali Ghodsi UC Berkeley April 16, 2018

Outline. Communication modes MPI Message Passing Interface Standard

Chip Multiprocessors COMP Lecture 9 - OpenMP & MPI

High Performance Computing Lecture 41. Matthew Jacob Indian Institute of Science

ME964 High Performance Computing for Engineering Applications

COSC 6374 Parallel Computation. Message Passing Interface (MPI ) I Introduction. Distributed memory machines

CS4961 Parallel Programming. Lecture 16: Introduction to Message Passing 11/3/11. Administrative. Mary Hall November 3, 2011.

Lecture 6: Message Passing Interface

MPI 2. CSCI 4850/5850 High-Performance Computing Spring 2018

P a g e 1. HPC Example for C with OpenMPI

MPI MESSAGE PASSING INTERFACE

Distributed Memory Programming with MPI

The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs

Transcription:

Introduction to MPI

The Message Passing Model Applications that do not share a global address space need a Message Passing Framework. An application passes messages among processes in order to perform a task. Almost any parallel application can be expressed with the message passing model. Four classes of operations: Environment Management Data movement/ Communication Collective computation/communication Synchronization

General MPI Program Structure Header File include "mpi.h" include 'mpif.h' Initialize MPI Env. MPI_Init(..) Terminate MPI Env. MPI_Finalize()

General MPI Program Structure Header File include "mpi.h" include 'mpif.h' Initialize MPI Env. MPI_Init(..) Terminate MPI Env. MPI_Finalize() #include "mpi.h" #include <stdio.h> int main( int argc, char *argv[] ) { MPI_Init( &argc, &argv ); printf( "Hello, world!\n" ); MPI_Finalize(); return 0; }

Environment Management Routines Group of Routines used for interrogating and setting the MPI execution environment. MPI_Init Initializes the MPI execution environment. This function must be called in every MPI program MPI_Finalize Terminates the MPI execution environment. This function should be the last MPI routine called in every MPI program - no other MPI routines may be called after it.

Environment Management Routines MPI_Get_processor_name Returns the processor name. Also returns the length of the name. The buffer for "name" must be at least MPI_MAX_PROCESSOR_NAME characters in size. What is returned into "name" is implementation dependent - may not be the same as the output of the "hostname" or "host" shell commands. MPI_Get_processor_name (&name,&resultlength) MPI_Wtime Returns an elapsed wall clock time in seconds (double precision) on the calling processor. MPI_Wtime ()

Communication Communicator : All MPI communication occurs within a group of processes. Rank : Each process in the group has a unique identifier Size: Number of processes in a group or communicator The Default/ pre-defined communicator is the MPI_COMM_WORLD which is a group of all processes.

Environment / Communication MPI_Comm_size Returns the total number of MPI processes in the specified communicator, such as MPI_COMM_WORLD. If the communicator is MPI_COMM_WORLD, then it represents the number of MPI tasks available to your application. MPI_Comm_size (comm,&size) MPI_Comm_rank Returns the rank of the calling MPI process within the specified communicator. Initially, each process will be assigned a unique integer rank between 0 and number of tasks - 1 within the communicator MPI_COMM_WORLD. This rank is often referred to as a task ID. If a process becomes associated with other communicators, it will have a unique rank within each of these as well. MPI_Comm_rank (comm,&rank)

MPI HelloWorld Example #include <mpi.h> #include<iostream.h> int main(int argc, char **argv) { int rank; int size; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); cout << "Hello, I m process " << rank << " of " << size << endl; MPI_Finalize(); return 0; } MPI Init(int *argc, char ***argv); MPI_init(NULL,NULL) Hello, I m process 0 of 3 Hello, I m process 2 of 3 Hello, I m process 1 of 3 Not necessarily sorted!

Communication Point-to-point communications : Transfer message from one process to another process It involves an explicit send and receive, which is called two-sided communication. Message: data + (source + destination + communicator ) Almost all of the MPI commands are built around point-to-point operations.

MPI Send and Receive The foundation of communication is built upon send and receive operations among processes. Almost every single function in MPI can be implemented with basic send and receive calls. 1. process A decides a message needs to be sent to process B. 2. Process A then packs up all of its necessary data into a buffer for process B. 3. These buffers are often referred to as envelopes since the data is being packed into a single message before transmission. 4. After the data is packed into a buffer, the communication device (which is often a network) is responsible for routing the message to the proper location. 5. Location identifier is the rank of the process

MPI Send and Receive 6. Send and Recv has to occur in pairs and are Blocking functions. 7. Even though the message is routed to B, process B still has to acknowledge that it wants to receive A s data. Once it does this, the data has been transmitted. Process A is acknowledged that the data has been transmitted and may go back to work. (Blocking) 8. Sometimes there are cases when A might have to send many different types of messages to B. Instead of B having to go through extra measures to differentiate all these messages. 9. MPI allows senders and receivers to also specify message IDs with the message (known as tags). 10. When process B only requests a message with a certain tag number, messages with different tags will be buffered by the network until B is ready for them.

Blocking Send & Receive

Non-Blocking Send & Receive

More MPI Concepts Blocking: blocking send or receive routines does not return until operation is complete. --blocking sends ensure that it is safe to overwrite the sent data --blocking receives ensure that the data has arrived and is ready for use Non-blocking: Non-blocking send or receive routines returns immediately, with no information about completion. -- User should test for success or failure of communication. -- In between, the process is free to handle other tasks. -- It is less likely to form deadlocking code -- It is used with MPI_Wait() or MPI_Test()

MPI Send and Receive MPI_Send ( &data, count, MPI_INT, 1, tag, comm); Address of data Number of Elements Destination (Rank) Data Type Message Identifier (int) Communicator Parses memory based on the starting address, size, and count based on contiguous data MPI_Recv (void* data, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm communicator, MPI_Status* status);

MPI Send and Receive

MPI Datatypes MPI predefines its primitive data types Primitive data types are contiguous. MPI also provides facilities for you to define your own data structures based upon sequences of the MPI primitive data types. Such user defined structures are called derived data types MPI datatype MPI_SHORT MPI_INT MPI_LONG MPI_LONG_LONG MPI_UNSIGNED_CHAR MPI_UNSIGNED_SHORT MPI_UNSIGNED MPI_UNSIGNED_LONG MPI_UNSIGNED_LONG_L ONG MPI_FLOAT MPI_DOUBLE MPI_LONG_DOUBLE MPI_BYTE C equivalent short int int long int long long int unsigned char unsigned short int unsigned int unsigned long int unsigned long long int float double long double char

Compute pi by Numerical Integration N processes (0,1.. N-1) Master Process: Process 0 Divide the computational task into N portions and each processor will compute its own (partial) sum. Then at the end, the master (processor 0) collects all (partial) sums and forms a total sum. Basic set of MPI functions used Init Finalize Send Recv Comm Size Rank

MPI_Init(&argc,&argv); // Initialize MPI_Comm_size(MPI_COMM_WORLD, &num_procs); // Get # processors MPI_Comm_rank(MPI_COMM_WORLD, &myid); N = # intervals used to do the integration... w = 1.0/(double) N; mypi = 0.0; // My partial sum (from a MPI processor) Compute my part of the partial sum based on 1. myid 2. num_procs if ( I am the master of the group ) { for ( i = 1; i < num_procs; i++) { receive the partial sum from MPI processor i; Add partial sum to my own partial sum; } Print final total; } else { Send my partial sum to the master of the MPI group; } MPI_Finalize();

Compute PI by Numerical Integration C code int main(int argc, char *argv[]) { int N; // Number of intervals double w, x; // width and x point int i, myid; double mypi, others_pi; MPI_Init(&argc,&argv); // Initialize // Get # processors MPI_Comm_size(MPI_COMM_WORLD, &num_procs); MPI_Comm_rank(MPI_COMM_WORLD, &myid); N = atoi(argv[1]); w = 1.0/(double) N; mypi = 0.0; //Each MPI Process has its own copy of every variable

Compute PI by Numerical Integration C code /* -------------------------------------------------------------------------------- Every MPI process computes a partial sum for the integral ------------------------------------------------------------------------------------ */ for (i = myid; i < N; i = i + num_procs) { x = w*(i + 0.5); mypi = mypi + w*f(x); } P = total number of Processes, N in the total number of rectangles Process 0 computes the sum of f(w *(0.5)), f(w*(p+0.5)), f(w*(2p+0.5)) Process 1 computes the sum of f(w *(1.5)), f(w*(p+1.5)), f(w*(2p+1.5)) Process 2 computes the sum of f(w *(2.5)), f(w*(p+2.5)), f(w*(2p+2.5)) Process 3 computes the sum of f(w *(3.5)), f(w*(p+3.5)), f(w*(2p+3.5)) Process 4 computes the sum of f(w *(4.5)), f(w*(p+4.5)), f(w*(2p+4.5))

Compute PI by Numerical Integration C code if ( myid == 0 ) //Now put the sum together... { // Proc 0 collects and others send data to proc 0 for (i = 1; i < num_procs; i++) { MPI_Recv(&others_pi, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, NULL); mypi += others_pi; } cout << "Pi = " << mypi<< endl << endl; // Output... } else { //The other processors send their partial sum to processor 0 MPI_Send(&mypi, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD); } } MPI_Finalize();

Collective Communication Communications that involve all processes in a group. One to all Broadcast Scatter (personalized) All to one Gather All to all Allgather Alltoall (personalized) Personalized means each process gets different data

Collective Communication In a collective operation, processes must reach the same point in the program code in order for the communication to begin. The call to the collective function is blocking.

Collective Communication Broadcast: Root Process sends the same piece of data to all Processes in a communicator group. Scatter: Takes an array of elements and distributes the elements in the order of process rank.

Collective Communication Gather: Takes elements from many processes and gathers them to one single process. This routine is highly useful to many parallel algorithms, such as parallel sorting and searching

Collective Computation MPI_Reduce (&local_sum, &global_sum, 1, MPI_FLOAT, MPI_SUM, 0, MPI_COMM_WORLD) Reduce: Takes an array of input elements on each process and returns an array of output elements to the root process. The output elements contain the reduced result. Reduction Operation: Max, Min, Sum, Product, Logical and Bitwise Operations.

Collective Computation

All Gather: Just like MPI_Gather, the elements from each process are gathered in order of their rank, except this time the elements are gathered to all processes All to All: Extension to MPI_Allgather. The jth block from process i is received by process j and stored in the i-th block. Useful in applications like matrix transposes or FFTs

Collectives: Use If one process reads data from disc or the command line, it can use a broadcast or a gather to get the information to other processes. Likewise, at the end of a program run, a gather or reduction can be used to collect summary information about the program run. However, a more common scenario is that the result of a collective is needed on all processes. Consider the computation of the standard deviation : Assume that every processor stores just one Xi value You can compute μ by doing a reduction followed by a broadcast. It is better to use a so-called allreduce operation, which does the reduction and leaves the result on all processors.

Synchronization MPI_Barrier(MPI_Comm comm) Provides the ability to block the calling process until all processes in the communicator have reached this routine. #include "mpi.h" #include int main(int argc, char *argv[]) { int rank, nprocs; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&nprocs); MPI_Comm_rank(MPI_COMM_WORLD,&rank); MPI_Barrier(MPI_COMM_WORLD); printf("hello, world. I am %d of %d\n", rank, procs); fflush(stdout); MPI_Finalize(); return 0; }

Compute PI by Numerical Integration C code MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&numprocs); MPI_Comm_rank(MPI_COMM_WORLD,&myid); MPI_Get_processor_name(processor_name,&namelen); fprintf(stderr,"process# %d with name %s on %d processors\n", myid, processor_name, numprocs); if (myid == 0) { scanf("%d",&n); printf("number of intervals: %d (0 quits)\n", n); startwtime = MPI_Wtime(); } MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD);

Compute PI by Numerical Integration C code h = 1.0 / (double) n; sum = 0.0; for (i = myid + 1; i <= n; i += numprocs) { x = h * ((double)i - 0.5); sum += f(x); } mypi = h * sum; MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD); if (myid == 0){ printf("pi is approximately %.16f\n", pi ); endwtime = MPI_Wtime(); printf("wall clock time = %f\n", endwtime-startwtime); } MPI_Finalize()

Programming Environment A Programming Environment (PrgEnv) Set of related software components like compilers, scientific software libraries, implementations of parallel programming paradigms, batch job schedulers, and other thirdparty tools, all of which cooperate with each other. Current Environments on Cray PrgEnv-cray, PrgEnv-gnu and PrgEnv-intel

Implementations of MPI Examples of Different Implementations MPICH - developed by Argonne National Labs (free) MPI/LAM - developed by Indiana, OSC, Notre Dame (free) MPI/Pro - commercial product Apple's X Grid OpenMPI CRAY XC40 provides an implementation of the MPI-3.0 standard via the Cray Message Passing Toolkit (MPT), which is based on the MPICH 3 library and optimised for the Cray Aries interconnect. All Programming Environments (PrgEnv-cray, PrgEnv-gnu and PrgEnv-intel) can utilize the MPI library that is implemented by Cray

Compiling an MPI program Depends upon the implementation of MPI Some Standard implementations : MPICH/ OPENMPI Language C C++ Fortran Wrapper Compiler Name mpicc mpicxx or mpic++ mpifort (for v1.7 and above) mpif77 and mpif90 (for older versions)

Running an MPI program Execution mode: Interactive Mode : mpirun -np <#Number of Processors> <name_of_executable> Batch Mode: Using a job script (details on the SERC webpage) #!/bin/csh #PBS -N jobname #PBS -l nodes=1: ppn=16 #PBS -l walltime=1:00:00 #PBS -e /path_of_executable/error.log cd /path_of_executable NPROCS=`wc -l < $PBS_NODEFILE` HOSTS=`cat $PBS_NODEFILE uniq tr '\n' "," sed 's,$ '` mpirun -np $NPROCS --host $HOSTS /name_of_executable

Error Handling Most MPI routines include a return/error code parameter. However, according to the MPI standard, the default behavior of an MPI call is to abort if there is an error. You will probably not be able to capture a return/error code other than MPI_SUCCESS (zero). The standard does provide a means to override this default error handler. Consult the error handling section of the relevant MPI Standard documentation located at http://www.mpi-forum.org/docs/. The types of errors displayed to the user are implementation dependent.

Thank You