Introduction to MPI-2 (Message-Passing Interface)

Similar documents
MPI Parallel I/O. Chieh-Sen (Jason) Huang. Department of Applied Mathematics. National Sun Yat-sen University

P a g e 1. HPC Example for C with OpenMPI

Message-Passing Computing

MPI-3 One-Sided Communication

Collective Communication in MPI and Advanced Features

An Introduction to MPI

Parallel Programming Using MPI

Advanced MPI. Andrew Emerson

Tutorial: parallel coding MPI

Advanced MPI. Andrew Emerson

Message Passing Interface

Introduction to I/O at CHPC

Parallel Programming Using MPI

Anomalies. The following issues might make the performance of a parallel program look different than it its:

Lecture 34: One-sided Communication in MPI. William Gropp

CS 426. Building and Running a Parallel Application

The Message Passing Model

MPI (Message Passing Interface)

Message Passing Interface

Introduction to parallel computing concepts and technics

Holland Computing Center Kickstart MPI Intro

Parallel programming in the last 25 years forward or backward? Jun Makino Interactive Research Center of Science Tokyo Institute of Technology

Point-to-Point Communication. Reference:

int sum;... sum = sum + c?

MPI Message Passing Interface

Introduction to MPI: Part II

Collective Communications I

Compute Cluster Server Lab 2: Carrying out Jobs under Microsoft Compute Cluster Server 2003

Lesson 1. MPI runs on distributed memory systems, shared memory systems, or hybrid systems.

Parallel Numerical Algorithms

Parallel I/O. Steve Lantz Senior Research Associate Cornell CAC. Workshop: Data Analysis on Ranger, January 19, 2012

Introduction to I/O at CHPC

Introduction to MPI. Ekpe Okorafor. School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014

Lecture 9: MPI continued

Parallel hardware. Distributed Memory. Parallel software. COMP528 MPI Programming, I. Flynn s taxonomy:

PCAP Assignment I. 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail.

COSC 6374 Parallel Computation. Remote Direct Memory Acces

Lecture 6: Parallel Matrix Algorithms (part 3)

MultiCore Architecture and Parallel Programming Final Examination

Hybrid MPI/OpenMP parallelization. Recall: MPI uses processes for parallelism. Each process has its own, separate address space.

High Performance Computing Lecture 41. Matthew Jacob Indian Institute of Science

Parallel I/O with MPI TMA4280 Introduction to Supercomputing

High Performance Computing Course Notes Message Passing Programming I

CS 179: GPU Programming. Lecture 14: Inter-process Communication

OpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.

COSC 6374 Parallel Computation. Remote Direct Memory Access

Hybrid MPI and OpenMP Parallel Programming

MPI-IO. Warwick RSE. Chris Brady Heather Ratcliffe. The Angry Penguin, used under creative commons licence from Swantje Hess and Jannis Pohlmann.

Programming for High Performance Computing. Programming Environment Dec 11, 2014 Osamu Tatebe

Lecture Topic : Multi-Core Processors : MPI 2.0 Overview Part-II

CS 351 Week The C Programming Language, Dennis M Ritchie, Kernighan, Brain.W

Faculty of Electrical and Computer Engineering Department of Electrical and Computer Engineering Program: Computer Engineering

CSE 613: Parallel Programming. Lecture 21 ( The Message Passing Interface )

Report S1 C. Kengo Nakajima Information Technology Center. Technical & Scientific Computing II ( ) Seminar on Computer Science II ( )

Introduction in Parallel Programming - MPI Part I

Introduction to the Message Passing Interface (MPI)

MPI. (message passing, MIMD)

DPHPC Recitation Session 2 Advanced MPI Concepts

OpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

Chip Multiprocessors COMP Lecture 9 - OpenMP & MPI

MPI MPI. Linux. Linux. Message Passing Interface. Message Passing Interface. August 14, August 14, 2007 MPICH. MPI MPI Send Recv MPI

MPI-2. Introduction Dynamic Process Creation. Based on notes by Sathish Vadhiyar, Rob Thacker, and David Cronk

MPI, Part 3. Scientific Computing Course, Part 3

HPC Parallel Programing Multi-node Computation with MPI - I

MPI Program Structure

Outline. CSC 447: Parallel Programming for Multi-Core and Cluster Systems 2

Parallel Programming. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

MPI Tutorial Part 2 Design of Parallel and High-Performance Computing Recitation Session

Message Passing Interface

Parallel Programming with MPI: Day 1

MPI introduction - exercises -

MPI Collective communication

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

Introduction to Parallel Programming with MPI

CMSC 714 Lecture 3 Message Passing with PVM and MPI

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

Distributed Memory Programming with Message-Passing

MPI Introduction. Torsten Hoefler. (some slides borrowed from Rajeev Thakur and Pavan Balaji)

mpi-02.c 1/1. 15/10/26 mpi-01.c 1/1. 15/10/26

Report S1 C. Kengo Nakajima. Programming for Parallel Computing ( ) Seminar on Advanced Computing ( )

Report S1 C. Kengo Nakajima

CSE 160 Lecture 15. Message Passing

Assignment 3 Key CSCI 351 PARALLEL PROGRAMMING FALL, Q1. Calculate log n, log n and log n for the following: Answer: Q2. mpi_trap_tree.

Parallel I/O. Steve Lantz Senior Research Associate Cornell CAC. Workshop: Parallel Computing on Ranger and Lonestar, May 16, 2012

Message Passing Interface. George Bosilca

CSE 160 Lecture 18. Message Passing

Parallel Computing. Lecture 17: OpenMP Last Touch

MPI MESSAGE PASSING INTERFACE

Programming for High Performance Computing. Programming Environment Dec 22, 2011 Osamu Tatebe

Introduction to MPI. SHARCNET MPI Lecture Series: Part I of II. Paul Preney, OCT, M.Sc., B.Ed., B.Sc.

Parallel I/O for SwissTx

MPI: Parallel Programming for Extreme Machines. Si Hammond, High Performance Systems Group

15-440: Recitation 8

mith College Computer Science CSC352 Week #7 Spring 2017 Introduction to MPI Dominique Thiébaut

Distributed Memory Machines and Programming. Lecture 7

Parallel Programming in C with MPI and OpenMP

Lecture 7: Distributed memory

MPI and CUDA. Filippo Spiga, HPCS, University of Cambridge.

MPI 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Transcription:

Introduction to MPI-2 (Message-Passing Interface)

What are the major new features in MPI-2? Parallel I/O Remote Memory Operations Dynamic Process Management Support for Multithreading

Parallel I/O Includes basic operations similar to standard UNIX open, close, seek, read and write operations. But the power comes from advanced features such as noncontiguous access in both memory and file, collective I/O operations, use of explicit offsets to avoid separate seeks both individual and shared file pointers, nonblocking I/O, portable and customized data representations, and hints for implementation and file system.

Remote Memory Operations The API provides elements of the shared-memory model in an MPI environment. These are known as MPI one-sided or remote memory operations. The design is based on the idea of remote memory access windows: portions of each process s address space that it explicitly exposes to remote memory operations by other processes defined by an MPI communicator. The one-sided get, put and update operations can store into, load from, and update the windows exposed by other processes. All remote memory operations are nonblocking, and synchronization operations are necessary to ensure their completion.

Dynamic Process Management The ability of an MPI process to participate in the creation of new MPI processes or to establish communications with MPI processes that have been started separately. The process operations are collective. The resulting sets of processes are represented as an intercommunicator. Spawning is creating new sets of processes based on intercommunicators. Connecting is establishing communications with pre-existing MPI programs.

Support for Multithreading MPI-1 was designed to be thread-safe. In MPI-2, threads are recognized as potential part of the environment. Users can inquire what level of thread safety is allowed. If multiple levels of thread safety is supported, users can choose the level that meets the application s needs while still providing for the highest level of performance.

Support for Multithreading (contd) int MPI_Init_thread(int *argc, char ***argv, int required, int MPI_Query_thread(int *provided); int MPI_Is_thread_main(int *flag); MPI THREAD SINGLE - Only one thread will execute. MPI THREAD FUNNELED - The process may be multi-threaded, but only the main thread will make MPI calls (all MPI calls are funneled to the main thread). MPI THREAD SERIALIZED - The process may be multi-threaded, and multiple threads may make MPI calls, but only one at a time: MPI calls are not made concurrently from two distinct threads (all MPI calls are serialized). MPI THREAD MULTIPLE - Multiple threads may call MPI, with no restrictions.

Parallel I/O All MPI processes can send the data to be written to Process 0, which then writes the data to a file using standard library calls. This is the simplest but it also is the least scalable. Each MPI process writes data to its own local file using standard library calls. After the application finishes, all the separate files have to somehow be combined. This is more scalable but can also be complex. All MPI processes share a single file while still retaining the advantages of parallelism. The processes use MPI I/O calls instead of standard library calls.

Parallel I/O: Example 1 /* lab/mpi/parallel-io/io1.c /* example of sequential write into a common file */ #include <stdio.h> #include <mpi.h> #define BUFSIZE 1024*1024 int main(int argc, char *argv[]) { int i, myrank, numprocs, buf[bufsize]; MPI_Status status; FILE *myfile; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &myrank); MPI_Comm_size(MPI_COMM_WORLD, &numprocs); for (i=0; i<bufsize; i++) buf[i] = myrank * BUFSIZE + i; if (myrank!= 0) MPI_Send(buf, BUFSIZE, MPI_INT, 0, 99, MPI_COMM_WORLD); else { myfile = fopen("testfile", "w"); fwrite(buf, sizeof(int), BUFSIZE, myfile); for (i=1; i<numprocs; i++) { MPI_Recv(buf, BUFSIZE, MPI_INT, i, 99, MPI_COMM_WORLD, &status); fwrite(buf, sizeof(int), BUFSIZE, myfile); fclose(myfile); MPI_Finalize(); return 0;

Parallel I/O: Example 2 /* lab/mpi/parallel-io/io2.c: parallel MPI write into separate files */ /* appropriate header files *. #define BUFSIZE 1024*1024 int main(int argc, char *argv[]) { int i, myrank, buf[bufsize]; char filename[128]; MPI_File myfile; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &myrank); for (i=0; i<bufsize; i++) buf[i] = myrank * BUFSIZE + i; sprintf(filename, "testfile.%d", myrank); MPI_File_open(MPI_COMM_SELF, filename, MPI_MODE_WRONLY MPI_MODE_CREATE MPI_INFO_NULL, &myfile); MPI_File_write(myfile, buf, BUFSIZE, MPI_INT, MPI_STATUS_IGNORE); MPI_File_close(&myfile); MPI_Finalize(); return 0;

Parallel I/O: Example 3 /* lab/mpi/parallel-io/io3.c: parallel MPI write into a single file */ /* appropriate header files *. #define BUFSIZE 1024*1024 int main(int argc, char *argv[]) { int i, myrank, buf[bufsize]; char filename[128]; MPI_File thefile; MPI_Offset offset; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &myrank); sprintf(filename, "testfile"); for (i = 0; i < BUFSIZE; i++) buf[i] = myrank * BUFSIZE + i; MPI_File_open(MPI_COMM_WORLD, filename, (MPI_MODE_WRONLY MPI_MODE_CREATE) MPI_INFO_NULL, &thefile); MPI_File_set_view(thefile, 0, MPI_INT, MPI_INT, "native", MPI_INFO_NULL); offset = myrank * BUFSIZE; MPI_File_write_at(thefile, offset, buf, BUFSIZE, MPI_INT, MPI_STATUS_IGNORE MPI_File_close(&thefile); MPI_Finalize(); return 0;

Summary of basic MPI I/O Functions int MPI File open(mpi Comm comm, char *filename, int amode, MPI Info info, MPI File *fh); int MPI File set view(mpi File fh, MPI Offset offset, MPI Datatype etype, MPI Datatype filetype, char *datarep, MPI Info info) int MPI File write(mpi File fh, void *buf, MPI Datatype datatype, MPI Status *status); int MPI File read(mpi File fh, void *buf, int count, MPI Datatype datatype, MPI Status *status); int MPI File get size(mpi File fh, MPI Offset *size); int MPI File close(mpi File *fh);

More on Parallel I/O MPI File seek allows multiple processes to position themselves at specific byte offset in a file before reading or writing. MPI File read at and MPI File write at combine read/write with seek in one call. The shared file pointer is shared amongst all processes in the same communicator. Functions such as MPI File write shared data will write data and update shared pointer for all processes. Handy for writing to a common log file from multiple processes.

Remote Memory Access MPI does not provide a real shared-memory model. However the remote memory operations of MPI provide much of the flexibility of shared memory. Data movement can be initiated entirely by one process (one-sided operation). The synchronization needed to ensure that the data movement is complete is decoupled from the initiation of the operation. Each process can designate portions of its address space as available for other processes to be able to read and write. This is known as a window. A window object consists of multiple windows, each of which consists of all the local memory areas exposed to other processes by collective window-creation operation. A collection of processes may have several window objects.

Remote Memory Functions Window objects are represented by variables of type MPI Win in C. Window objects are made up of variables of single datatype. So we need one window for each type of variable. The MPI Win create operation is a collective operation. So all processes need to call it even though only one contributes memory to the window. The communicator used specifies which processes will have access to the window. MPI_Win nwin; on process 0: MPI_Win_create(&n, sizeof(int), 1, MPI_INFO_NULL, MPI_COMM_WORL on other processes: MPI_Win_create(MPI_BOTTOM, 0, 1, MPI_INFO_NULL, MPI_COMM_WORLD, The first argument is the address, the second the length, the third argument is the displacement unit to specify the offset into the memory in windows. The fourth argument is an MPI Info argument, which can be used to optimize the performance of remote memory operations. The next argument is a communicator and the last argument is the window object that is returned.

More Remote Memory Functions Any ordinary variable can be shared via remote memory operations get, put and accumulate. Special memory can also be allocated for this purpose via the MPI Alloc mem function. Before other processes can access remote memory, we need to synchronize. MPI provides three synchronization mechanisms. The simplest is the fence operation, which starts a RMA access epoch. The MPI call used is MPI Win fence. The function MPI Win fence takes two arguments: the first is an assertion argument permitting certain optimizations, the second is the window the fence operation is being performed on. A value of 0 is always valid for the first argument. MPI_Win_fence(0, nwin); MPI_Get(&n, 1, MPI_INT, 0, 0, 1, MPI_INT, nwin) MPI_Win_fence(0, nwin); The arguments for the MPI Get are receive address, count, datatype, rank of remote process, displacement into the memory window, count, type, window object.

MPI Remote Memory Operations int MPI Win create(void *base, MPI Aint size, int disp unit, MPI Info info, MPI Comm comm, MPI Win *win); int MPI Win fence(int assert, MPI Win win); int MPI Get(void *base, int origin count, MPI Datatype origin datatype, int target rank, MPI Aint target disp, int target count, MPI Datatype target datatype, MPI Win win); int MPI Put(void *origin addr, int origin count, MPI Datatype origin datatype, int target rank, MPI Aint target disp, int target count, MPI Datatype target datatype, MPI Win win); int MPI Accumulate(void *origin addr, int origin count, MPI Datatype origin datatype, int target rank, MPI Aint target disp, int target count, MPI Datatype target datatype, MPI Op op, MPI Win win); int MPI Win free(mpi Win *win);

Remote Memory Access Example /* lab/mpi/remote-memory/cpi-rma.c */ /* appropriate header files *. int main(int argc, char *argv[]) { int n, myid, numprocs, i; double PI25DT = 3.141592653589793238462643; double mypi, pi, h, sum, x; MPI_Win nwin, piwin; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&numprocs); MPI_Comm_rank(MPI_COMM_WORLD,&myid); if (myid == 0) { MPI_Win_create(&n, sizeof(int), 1, MPI_INFO_NULL, MPI_COMM_WORLD, &nwin); MPI_Win_create(&pi, sizeof(double), 1, MPI_INFO_NULL, MPI_COMM_WORLD, &piwin); else { MPI_Win_create(MPI_BOTTOM, 0, 1, MPI_INFO_NULL, MPI_COMM_WORLD, &nwin); MPI_Win_create(MPI_BOTTOM, 0, 1, MPI_INFO_NULL, MPI_COMM_WORLD, &piwin);

Remote Memory Access Example (contd.) while (1) { if (myid == 0) { printf("enter the number of intervals: (0 quits) "); scanf("%d",&n); pi = 0.0; MPI_Win_fence(0, nwin); if (myid!= 0) MPI_Get(&n, 1, MPI_INT, 0, 0, 1, MPI_INT, nwin); MPI_Win_fence(0, nwin); if (n == 0) break; else { h = 1.0 / (double) n; sum = 0.0; for (i = myid + 1; i <= n; i += numprocs) { x = h * ((double)i - 0.5); sum += (4.0 / (1.0 + x*x)); mypi = h * sum; MPI_Win_fence( 0, piwin); MPI_Accumulate(&mypi, 1, MPI_DOUBLE, 0, 0, 1, MPI_DOUBLE, MPI_SUM, piwin); MPI_Win_fence(0, piwin); if (myid == 0) printf("pi is approximately %.16f, Error is %.16f\n", pi, fabs(pi - PI25DT)); MPI_Win_free(&nwin); MPI_Win_free(&piwin); MPI_Finalize(); return 0;

Dynamic Process Management A collective operation over the spawning processes (parents) and the children processes (via MPI Init). Returns an intercommunicator in which, from the point of view of the parents, the local group contains the parents and the remote group contains the children. The function MPI Comm parent, called from the children, returns an intercommunicator in which the local group contains the children and the parents as the remote group.

Dynamic Process Management Functions int MPI Comm spawn(char *command, char *argv[], int maxprocs, MPI Info info, int root, MPI Comm comm, MPI Comm *intercomm, int array of errcodes[]); int MPI Comm get parent(mpi Comm *parent); int MPI Intercomm merge(mpi Comm intercomm, int high, MPI Comm *newintracomm); See example: lab/mpi/spawn-ex1/