Introduction to MPI-2 (Message-Passing Interface)

Size: px
Start display at page:

Download "Introduction to MPI-2 (Message-Passing Interface)"

Transcription

1 Introduction to MPI-2 (Message-Passing Interface)

2 What are the major new features in MPI-2? Parallel I/O Remote Memory Operations Dynamic Process Management Support for Multithreading

3 Parallel I/O Includes basic operations similar to standard UNIX open, close, seek, read and write operations. But the power comes from advanced features such as noncontiguous access in both memory and file, collective I/O operations, use of explicit offsets to avoid separate seeks both individual and shared file pointers, nonblocking I/O, portable and customized data representations, and hints for implementation and file system.

4 Remote Memory Operations The API provides elements of the shared-memory model in an MPI environment. These are known as MPI one-sided or remote memory operations. The design is based on the idea of remote memory access windows: portions of each process s address space that it explicitly exposes to remote memory operations by other processes defined by an MPI communicator. The one-sided get, put and update operations can store into, load from, and update the windows exposed by other processes. All remote memory operations are nonblocking, and synchronization operations are necessary to ensure their completion.

5 Dynamic Process Management The ability of an MPI process to participate in the creation of new MPI processes or to establish communications with MPI processes that have been started separately. The process operations are collective. The resulting sets of processes are represented as an intercommunicator. Spawning is creating new sets of processes based on intercommunicators. Connecting is establishing communications with pre-existing MPI programs.

6 Support for Multithreading MPI-1 was designed to be thread-safe. In MPI-2, threads are recognized as potential part of the environment. Users can inquire what level of thread safety is allowed. If multiple levels of thread safety is supported, users can choose the level that meets the application s needs while still providing for the highest level of performance.

7 Support for Multithreading (contd) int MPI_Init_thread(int *argc, char ***argv, int required, int MPI_Query_thread(int *provided); int MPI_Is_thread_main(int *flag); MPI THREAD SINGLE - Only one thread will execute. MPI THREAD FUNNELED - The process may be multi-threaded, but only the main thread will make MPI calls (all MPI calls are funneled to the main thread). MPI THREAD SERIALIZED - The process may be multi-threaded, and multiple threads may make MPI calls, but only one at a time: MPI calls are not made concurrently from two distinct threads (all MPI calls are serialized). MPI THREAD MULTIPLE - Multiple threads may call MPI, with no restrictions.

8 Parallel I/O All MPI processes can send the data to be written to Process 0, which then writes the data to a file using standard library calls. This is the simplest but it also is the least scalable. Each MPI process writes data to its own local file using standard library calls. After the application finishes, all the separate files have to somehow be combined. This is more scalable but can also be complex. All MPI processes share a single file while still retaining the advantages of parallelism. The processes use MPI I/O calls instead of standard library calls.

9 Parallel I/O: Example 1 /* lab/mpi/parallel-io/io1.c /* example of sequential write into a common file */ #include <stdio.h> #include <mpi.h> #define BUFSIZE 1024*1024 int main(int argc, char *argv[]) { int i, myrank, numprocs, buf[bufsize]; MPI_Status status; FILE *myfile; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &myrank); MPI_Comm_size(MPI_COMM_WORLD, &numprocs); for (i=0; i<bufsize; i++) buf[i] = myrank * BUFSIZE + i; if (myrank!= 0) MPI_Send(buf, BUFSIZE, MPI_INT, 0, 99, MPI_COMM_WORLD); else { myfile = fopen("testfile", "w"); fwrite(buf, sizeof(int), BUFSIZE, myfile); for (i=1; i<numprocs; i++) { MPI_Recv(buf, BUFSIZE, MPI_INT, i, 99, MPI_COMM_WORLD, &status); fwrite(buf, sizeof(int), BUFSIZE, myfile); fclose(myfile); MPI_Finalize(); return 0;

10 Parallel I/O: Example 2 /* lab/mpi/parallel-io/io2.c: parallel MPI write into separate files */ /* appropriate header files *. #define BUFSIZE 1024*1024 int main(int argc, char *argv[]) { int i, myrank, buf[bufsize]; char filename[128]; MPI_File myfile; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &myrank); for (i=0; i<bufsize; i++) buf[i] = myrank * BUFSIZE + i; sprintf(filename, "testfile.%d", myrank); MPI_File_open(MPI_COMM_SELF, filename, MPI_MODE_WRONLY MPI_MODE_CREATE MPI_INFO_NULL, &myfile); MPI_File_write(myfile, buf, BUFSIZE, MPI_INT, MPI_STATUS_IGNORE); MPI_File_close(&myfile); MPI_Finalize(); return 0;

11 Parallel I/O: Example 3 /* lab/mpi/parallel-io/io3.c: parallel MPI write into a single file */ /* appropriate header files *. #define BUFSIZE 1024*1024 int main(int argc, char *argv[]) { int i, myrank, buf[bufsize]; char filename[128]; MPI_File thefile; MPI_Offset offset; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &myrank); sprintf(filename, "testfile"); for (i = 0; i < BUFSIZE; i++) buf[i] = myrank * BUFSIZE + i; MPI_File_open(MPI_COMM_WORLD, filename, (MPI_MODE_WRONLY MPI_MODE_CREATE) MPI_INFO_NULL, &thefile); MPI_File_set_view(thefile, 0, MPI_INT, MPI_INT, "native", MPI_INFO_NULL); offset = myrank * BUFSIZE; MPI_File_write_at(thefile, offset, buf, BUFSIZE, MPI_INT, MPI_STATUS_IGNORE MPI_File_close(&thefile); MPI_Finalize(); return 0;

12 Summary of basic MPI I/O Functions int MPI File open(mpi Comm comm, char *filename, int amode, MPI Info info, MPI File *fh); int MPI File set view(mpi File fh, MPI Offset offset, MPI Datatype etype, MPI Datatype filetype, char *datarep, MPI Info info) int MPI File write(mpi File fh, void *buf, MPI Datatype datatype, MPI Status *status); int MPI File read(mpi File fh, void *buf, int count, MPI Datatype datatype, MPI Status *status); int MPI File get size(mpi File fh, MPI Offset *size); int MPI File close(mpi File *fh);

13 More on Parallel I/O MPI File seek allows multiple processes to position themselves at specific byte offset in a file before reading or writing. MPI File read at and MPI File write at combine read/write with seek in one call. The shared file pointer is shared amongst all processes in the same communicator. Functions such as MPI File write shared data will write data and update shared pointer for all processes. Handy for writing to a common log file from multiple processes.

14 Remote Memory Access MPI does not provide a real shared-memory model. However the remote memory operations of MPI provide much of the flexibility of shared memory. Data movement can be initiated entirely by one process (one-sided operation). The synchronization needed to ensure that the data movement is complete is decoupled from the initiation of the operation. Each process can designate portions of its address space as available for other processes to be able to read and write. This is known as a window. A window object consists of multiple windows, each of which consists of all the local memory areas exposed to other processes by collective window-creation operation. A collection of processes may have several window objects.

15 Remote Memory Functions Window objects are represented by variables of type MPI Win in C. Window objects are made up of variables of single datatype. So we need one window for each type of variable. The MPI Win create operation is a collective operation. So all processes need to call it even though only one contributes memory to the window. The communicator used specifies which processes will have access to the window. MPI_Win nwin; on process 0: MPI_Win_create(&n, sizeof(int), 1, MPI_INFO_NULL, MPI_COMM_WORL on other processes: MPI_Win_create(MPI_BOTTOM, 0, 1, MPI_INFO_NULL, MPI_COMM_WORLD, The first argument is the address, the second the length, the third argument is the displacement unit to specify the offset into the memory in windows. The fourth argument is an MPI Info argument, which can be used to optimize the performance of remote memory operations. The next argument is a communicator and the last argument is the window object that is returned.

16 More Remote Memory Functions Any ordinary variable can be shared via remote memory operations get, put and accumulate. Special memory can also be allocated for this purpose via the MPI Alloc mem function. Before other processes can access remote memory, we need to synchronize. MPI provides three synchronization mechanisms. The simplest is the fence operation, which starts a RMA access epoch. The MPI call used is MPI Win fence. The function MPI Win fence takes two arguments: the first is an assertion argument permitting certain optimizations, the second is the window the fence operation is being performed on. A value of 0 is always valid for the first argument. MPI_Win_fence(0, nwin); MPI_Get(&n, 1, MPI_INT, 0, 0, 1, MPI_INT, nwin) MPI_Win_fence(0, nwin); The arguments for the MPI Get are receive address, count, datatype, rank of remote process, displacement into the memory window, count, type, window object.

17 MPI Remote Memory Operations int MPI Win create(void *base, MPI Aint size, int disp unit, MPI Info info, MPI Comm comm, MPI Win *win); int MPI Win fence(int assert, MPI Win win); int MPI Get(void *base, int origin count, MPI Datatype origin datatype, int target rank, MPI Aint target disp, int target count, MPI Datatype target datatype, MPI Win win); int MPI Put(void *origin addr, int origin count, MPI Datatype origin datatype, int target rank, MPI Aint target disp, int target count, MPI Datatype target datatype, MPI Win win); int MPI Accumulate(void *origin addr, int origin count, MPI Datatype origin datatype, int target rank, MPI Aint target disp, int target count, MPI Datatype target datatype, MPI Op op, MPI Win win); int MPI Win free(mpi Win *win);

18 Remote Memory Access Example /* lab/mpi/remote-memory/cpi-rma.c */ /* appropriate header files *. int main(int argc, char *argv[]) { int n, myid, numprocs, i; double PI25DT = ; double mypi, pi, h, sum, x; MPI_Win nwin, piwin; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&numprocs); MPI_Comm_rank(MPI_COMM_WORLD,&myid); if (myid == 0) { MPI_Win_create(&n, sizeof(int), 1, MPI_INFO_NULL, MPI_COMM_WORLD, &nwin); MPI_Win_create(&pi, sizeof(double), 1, MPI_INFO_NULL, MPI_COMM_WORLD, &piwin); else { MPI_Win_create(MPI_BOTTOM, 0, 1, MPI_INFO_NULL, MPI_COMM_WORLD, &nwin); MPI_Win_create(MPI_BOTTOM, 0, 1, MPI_INFO_NULL, MPI_COMM_WORLD, &piwin);

19 Remote Memory Access Example (contd.) while (1) { if (myid == 0) { printf("enter the number of intervals: (0 quits) "); scanf("%d",&n); pi = 0.0; MPI_Win_fence(0, nwin); if (myid!= 0) MPI_Get(&n, 1, MPI_INT, 0, 0, 1, MPI_INT, nwin); MPI_Win_fence(0, nwin); if (n == 0) break; else { h = 1.0 / (double) n; sum = 0.0; for (i = myid + 1; i <= n; i += numprocs) { x = h * ((double)i - 0.5); sum += (4.0 / (1.0 + x*x)); mypi = h * sum; MPI_Win_fence( 0, piwin); MPI_Accumulate(&mypi, 1, MPI_DOUBLE, 0, 0, 1, MPI_DOUBLE, MPI_SUM, piwin); MPI_Win_fence(0, piwin); if (myid == 0) printf("pi is approximately %.16f, Error is %.16f\n", pi, fabs(pi - PI25DT)); MPI_Win_free(&nwin); MPI_Win_free(&piwin); MPI_Finalize(); return 0;

20 Dynamic Process Management A collective operation over the spawning processes (parents) and the children processes (via MPI Init). Returns an intercommunicator in which, from the point of view of the parents, the local group contains the parents and the remote group contains the children. The function MPI Comm parent, called from the children, returns an intercommunicator in which the local group contains the children and the parents as the remote group.

21 Dynamic Process Management Functions int MPI Comm spawn(char *command, char *argv[], int maxprocs, MPI Info info, int root, MPI Comm comm, MPI Comm *intercomm, int array of errcodes[]); int MPI Comm get parent(mpi Comm *parent); int MPI Intercomm merge(mpi Comm intercomm, int high, MPI Comm *newintracomm); See example: lab/mpi/spawn-ex1/

MPI Parallel I/O. Chieh-Sen (Jason) Huang. Department of Applied Mathematics. National Sun Yat-sen University

MPI Parallel I/O. Chieh-Sen (Jason) Huang. Department of Applied Mathematics. National Sun Yat-sen University MPI Parallel I/O Chieh-Sen (Jason) Huang Department of Applied Mathematics National Sun Yat-sen University Materials are taken from the book, Using MPI-2: Advanced Features of the Message-Passing Interface

More information

P a g e 1. HPC Example for C with OpenMPI

P a g e 1. HPC Example for C with OpenMPI P a g e 1 HPC Example for C with OpenMPI Revision History Version Date Prepared By Summary of Changes 1.0 Jul 3, 2017 Raymond Tsang Initial release 1.1 Jul 24, 2018 Ray Cheung Minor change HPC Example

More information

Message-Passing Computing

Message-Passing Computing Chapter 2 Slide 41þþ Message-Passing Computing Slide 42þþ Basics of Message-Passing Programming using userlevel message passing libraries Two primary mechanisms needed: 1. A method of creating separate

More information

MPI-3 One-Sided Communication

MPI-3 One-Sided Communication HLRN Parallel Programming Workshop Speedup your Code on Intel Processors at HLRN October 20th, 2017 MPI-3 One-Sided Communication Florian Wende Zuse Institute Berlin Two-sided communication Standard Message

More information

Collective Communication in MPI and Advanced Features

Collective Communication in MPI and Advanced Features Collective Communication in MPI and Advanced Features Pacheco s book. Chapter 3 T. Yang, CS240A. Part of slides from the text book, CS267 K. Yelick from UC Berkeley and B. Gropp, ANL Outline Collective

More information

An Introduction to MPI

An Introduction to MPI An Introduction to MPI Parallel Programming with the Message Passing Interface William Gropp Ewing Lusk Argonne National Laboratory 1 Outline Background The message-passing model Origins of MPI and current

More information

Parallel Programming Using MPI

Parallel Programming Using MPI Parallel Programming Using MPI Short Course on HPC 15th February 2019 Aditya Krishna Swamy adityaks@iisc.ac.in SERC, Indian Institute of Science When Parallel Computing Helps? Want to speed up your calculation

More information

Advanced MPI. Andrew Emerson

Advanced MPI. Andrew Emerson Advanced MPI Andrew Emerson (a.emerson@cineca.it) Agenda 1. One sided Communications (MPI-2) 2. Dynamic processes (MPI-2) 3. Profiling MPI and tracing 4. MPI-I/O 5. MPI-3 11/12/2015 Advanced MPI 2 One

More information

Tutorial: parallel coding MPI

Tutorial: parallel coding MPI Tutorial: parallel coding MPI Pascal Viot September 12, 2018 Pascal Viot Tutorial: parallel coding MPI September 12, 2018 1 / 24 Generalities The individual power of a processor is still growing, but at

More information

Advanced MPI. Andrew Emerson

Advanced MPI. Andrew Emerson Advanced MPI Andrew Emerson (a.emerson@cineca.it) Agenda 1. One sided Communications (MPI-2) 2. Dynamic processes (MPI-2) 3. Profiling MPI and tracing 4. MPI-I/O 5. MPI-3 22/02/2017 Advanced MPI 2 One

More information

Message Passing Interface

Message Passing Interface MPSoC Architectures MPI Alberto Bosio, Associate Professor UM Microelectronic Departement bosio@lirmm.fr Message Passing Interface API for distributed-memory programming parallel code that runs across

More information

Introduction to I/O at CHPC

Introduction to I/O at CHPC CENTER FOR HIGH PERFORMANCE COMPUTING Introduction to I/O at CHPC Martin Čuma, m.cumautah.edu Center for High Performance Computing Fall 2015 Outline Types of storage available at CHPC Types of file I/O

More information

Parallel Programming Using MPI

Parallel Programming Using MPI Parallel Programming Using MPI Prof. Hank Dietz KAOS Seminar, February 8, 2012 University of Kentucky Electrical & Computer Engineering Parallel Processing Process N pieces simultaneously, get up to a

More information

Anomalies. The following issues might make the performance of a parallel program look different than it its:

Anomalies. The following issues might make the performance of a parallel program look different than it its: Anomalies The following issues might make the performance of a parallel program look different than it its: When running a program in parallel on many processors, each processor has its own cache, so the

More information

Lecture 34: One-sided Communication in MPI. William Gropp

Lecture 34: One-sided Communication in MPI. William Gropp Lecture 34: One-sided Communication in MPI William Gropp www.cs.illinois.edu/~wgropp Thanks to This material based on the SC14 Tutorial presented by Pavan Balaji William Gropp Torsten Hoefler Rajeev Thakur

More information

CS 426. Building and Running a Parallel Application

CS 426. Building and Running a Parallel Application CS 426 Building and Running a Parallel Application 1 Task/Channel Model Design Efficient Parallel Programs (or Algorithms) Mainly for distributed memory systems (e.g. Clusters) Break Parallel Computations

More information

The Message Passing Model

The Message Passing Model Introduction to MPI The Message Passing Model Applications that do not share a global address space need a Message Passing Framework. An application passes messages among processes in order to perform

More information

MPI (Message Passing Interface)

MPI (Message Passing Interface) MPI (Message Passing Interface) Message passing library standard developed by group of academics and industrial partners to foster more widespread use and portability. Defines routines, not implementation.

More information

Message Passing Interface

Message Passing Interface Message Passing Interface by Kuan Lu 03.07.2012 Scientific researcher at Georg-August-Universität Göttingen and Gesellschaft für wissenschaftliche Datenverarbeitung mbh Göttingen Am Faßberg, 37077 Göttingen,

More information

Introduction to parallel computing concepts and technics

Introduction to parallel computing concepts and technics Introduction to parallel computing concepts and technics Paschalis Korosoglou (support@grid.auth.gr) User and Application Support Unit Scientific Computing Center @ AUTH Overview of Parallel computing

More information

Holland Computing Center Kickstart MPI Intro

Holland Computing Center Kickstart MPI Intro Holland Computing Center Kickstart 2016 MPI Intro Message Passing Interface (MPI) MPI is a specification for message passing library that is standardized by MPI Forum Multiple vendor-specific implementations:

More information

Parallel programming in the last 25 years forward or backward? Jun Makino Interactive Research Center of Science Tokyo Institute of Technology

Parallel programming in the last 25 years forward or backward? Jun Makino Interactive Research Center of Science Tokyo Institute of Technology Parallel programming in the last 25 years forward or backward? Jun Makino Interactive Research Center of Science Tokyo Institute of Technology MODEST-10d: High-Level Languages for Hugely Parallel Astrophysics

More information

Point-to-Point Communication. Reference:

Point-to-Point Communication. Reference: Point-to-Point Communication Reference: http://foxtrot.ncsa.uiuc.edu:8900/public/mpi/ Introduction Point-to-point communication is the fundamental communication facility provided by the MPI library. Point-to-point

More information

int sum;... sum = sum + c?

int sum;... sum = sum + c? int sum;... sum = sum + c? Version Cores Time (secs) Speedup manycore Message Passing Interface mpiexec int main( ) { int ; char ; } MPI_Init( ); MPI_Comm_size(, &N); MPI_Comm_rank(, &R); gethostname(

More information

MPI Message Passing Interface

MPI Message Passing Interface MPI Message Passing Interface Portable Parallel Programs Parallel Computing A problem is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information

More information

Introduction to MPI: Part II

Introduction to MPI: Part II Introduction to MPI: Part II Pawel Pomorski, University of Waterloo, SHARCNET ppomorsk@sharcnetca November 25, 2015 Summary of Part I: To write working MPI (Message Passing Interface) parallel programs

More information

Collective Communications I

Collective Communications I Collective Communications I Ned Nedialkov McMaster University Canada CS/SE 4F03 January 2016 Outline Introduction Broadcast Reduce c 2013 16 Ned Nedialkov 2/14 Introduction A collective communication involves

More information

Compute Cluster Server Lab 2: Carrying out Jobs under Microsoft Compute Cluster Server 2003

Compute Cluster Server Lab 2: Carrying out Jobs under Microsoft Compute Cluster Server 2003 Compute Cluster Server Lab 2: Carrying out Jobs under Microsoft Compute Cluster Server 2003 Compute Cluster Server Lab 2: Carrying out Jobs under Microsoft Compute Cluster Server 20031 Lab Objective...1

More information

Lesson 1. MPI runs on distributed memory systems, shared memory systems, or hybrid systems.

Lesson 1. MPI runs on distributed memory systems, shared memory systems, or hybrid systems. The goals of this lesson are: understanding the MPI programming model managing the MPI environment handling errors point-to-point communication 1. The MPI Environment Lesson 1 MPI (Message Passing Interface)

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms http://sudalabissu-tokyoacjp/~reiji/pna16/ [ 5 ] MPI: Message Passing Interface Parallel Numerical Algorithms / IST / UTokyo 1 PNA16 Lecture Plan General Topics 1 Architecture

More information

Parallel I/O. Steve Lantz Senior Research Associate Cornell CAC. Workshop: Data Analysis on Ranger, January 19, 2012

Parallel I/O. Steve Lantz Senior Research Associate Cornell CAC. Workshop: Data Analysis on Ranger, January 19, 2012 Parallel I/O Steve Lantz Senior Research Associate Cornell CAC Workshop: Data Analysis on Ranger, January 19, 2012 Based on materials developed by Bill Barth at TACC 1. Lustre 2 Lustre Components All Ranger

More information

Introduction to I/O at CHPC

Introduction to I/O at CHPC CENTER FOR HIGH PERFORMANCE COMPUTING Introduction to I/O at CHPC Martin Čuma, m.cuma@utah.edu Center for High Performance Computing Fall 2018 Outline Types of storage available at CHPC Types of file I/O

More information

Introduction to MPI. Ekpe Okorafor. School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014

Introduction to MPI. Ekpe Okorafor. School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014 Introduction to MPI Ekpe Okorafor School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014 Topics Introduction MPI Model and Basic Calls MPI Communication Summary 2 Topics Introduction

More information

Lecture 9: MPI continued

Lecture 9: MPI continued Lecture 9: MPI continued David Bindel 27 Sep 2011 Logistics Matrix multiply is done! Still have to run. Small HW 2 will be up before lecture on Thursday, due next Tuesday. Project 2 will be posted next

More information

Parallel hardware. Distributed Memory. Parallel software. COMP528 MPI Programming, I. Flynn s taxonomy:

Parallel hardware. Distributed Memory. Parallel software. COMP528 MPI Programming, I. Flynn s taxonomy: COMP528 MPI Programming, I www.csc.liv.ac.uk/~alexei/comp528 Alexei Lisitsa Dept of computer science University of Liverpool a.lisitsa@.liverpool.ac.uk Flynn s taxonomy: Parallel hardware SISD (Single

More information

PCAP Assignment I. 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail.

PCAP Assignment I. 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail. PCAP Assignment I 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail. The multicore CPUs are designed to maximize the execution speed

More information

COSC 6374 Parallel Computation. Remote Direct Memory Acces

COSC 6374 Parallel Computation. Remote Direct Memory Acces COSC 6374 Parallel Computation Remote Direct Memory Acces Edgar Gabriel Fall 2013 Communication Models A P0 receive send B P1 Message Passing Model: Two-sided communication A P0 put B P1 Remote Memory

More information

Lecture 6: Parallel Matrix Algorithms (part 3)

Lecture 6: Parallel Matrix Algorithms (part 3) Lecture 6: Parallel Matrix Algorithms (part 3) 1 A Simple Parallel Dense Matrix-Matrix Multiplication Let A = [a ij ] n n and B = [b ij ] n n be n n matrices. Compute C = AB Computational complexity of

More information

MultiCore Architecture and Parallel Programming Final Examination

MultiCore Architecture and Parallel Programming Final Examination MultiCore Architecture and Parallel Programming Final Examination Name: ID: Class: Date:2014/12 1. List at least four techniques for cache optimization and make a brief explanation of each technique.(10

More information

Hybrid MPI/OpenMP parallelization. Recall: MPI uses processes for parallelism. Each process has its own, separate address space.

Hybrid MPI/OpenMP parallelization. Recall: MPI uses processes for parallelism. Each process has its own, separate address space. Hybrid MPI/OpenMP parallelization Recall: MPI uses processes for parallelism. Each process has its own, separate address space. Thread parallelism (such as OpenMP or Pthreads) can provide additional parallelism

More information

High Performance Computing Lecture 41. Matthew Jacob Indian Institute of Science

High Performance Computing Lecture 41. Matthew Jacob Indian Institute of Science High Performance Computing Lecture 41 Matthew Jacob Indian Institute of Science Example: MPI Pi Calculating Program /Each process initializes, determines the communicator size and its own rank MPI_Init

More information

Parallel I/O with MPI TMA4280 Introduction to Supercomputing

Parallel I/O with MPI TMA4280 Introduction to Supercomputing Parallel I/O with MPI TMA4280 Introduction to Supercomputing NTNU, IMF March 27. 2017 1 Limits of data processing Development of computational resource allow running more complex simulations. Volume of

More information

High Performance Computing Course Notes Message Passing Programming I

High Performance Computing Course Notes Message Passing Programming I High Performance Computing Course Notes 2008-2009 2009 Message Passing Programming I Message Passing Programming Message Passing is the most widely used parallel programming model Message passing works

More information

CS 179: GPU Programming. Lecture 14: Inter-process Communication

CS 179: GPU Programming. Lecture 14: Inter-process Communication CS 179: GPU Programming Lecture 14: Inter-process Communication The Problem What if we want to use GPUs across a distributed system? GPU cluster, CSIRO Distributed System A collection of computers Each

More information

OpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.

OpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico. OpenMP and MPI Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico November 16, 2011 CPD (DEI / IST) Parallel and Distributed Computing 18

More information

COSC 6374 Parallel Computation. Remote Direct Memory Access

COSC 6374 Parallel Computation. Remote Direct Memory Access COSC 6374 Parallel Computation Remote Direct Memory Access Edgar Gabriel Fall 2015 Communication Models A P0 receive send B P1 Message Passing Model A B Shared Memory Model P0 A=B P1 A P0 put B P1 Remote

More information

Hybrid MPI and OpenMP Parallel Programming

Hybrid MPI and OpenMP Parallel Programming Hybrid MPI and OpenMP Parallel Programming Jemmy Hu SHARCNET HPTC Consultant July 8, 2015 Objectives difference between message passing and shared memory models (MPI, OpenMP) why or why not hybrid? a common

More information

MPI-IO. Warwick RSE. Chris Brady Heather Ratcliffe. The Angry Penguin, used under creative commons licence from Swantje Hess and Jannis Pohlmann.

MPI-IO. Warwick RSE. Chris Brady Heather Ratcliffe. The Angry Penguin, used under creative commons licence from Swantje Hess and Jannis Pohlmann. MPI-IO Chris Brady Heather Ratcliffe The Angry Penguin, used under creative commons licence from Swantje Hess and Jannis Pohlmann. Warwick RSE Getting data in and out The purpose of MPI-IO is to get data

More information

Programming for High Performance Computing. Programming Environment Dec 11, 2014 Osamu Tatebe

Programming for High Performance Computing. Programming Environment Dec 11, 2014 Osamu Tatebe Programming for High Performance Computing Programming Environment Dec 11, 2014 Osamu Tatebe Distributed Memory Machine (PC Cluster) A distributed memory machine consists of computers (compute nodes) connected

More information

Lecture Topic : Multi-Core Processors : MPI 2.0 Overview Part-II

Lecture Topic : Multi-Core Processors : MPI 2.0 Overview Part-II C-DAC Four Days Technology Workshop ON Hybrid Computing Coprocessors/Accelerators Power-Aware Computing Performance of Applications Kernels hypack-2013 (Mode-1:Multi-Core) Lecture Topic : Multi-Core Processors

More information

CS 351 Week The C Programming Language, Dennis M Ritchie, Kernighan, Brain.W

CS 351 Week The C Programming Language, Dennis M Ritchie, Kernighan, Brain.W CS 351 Week 6 Reading: 1. The C Programming Language, Dennis M Ritchie, Kernighan, Brain.W Objectives: 1. An Introduction to Message Passing Model 2. To learn about Message Passing Libraries Concepts:

More information

Faculty of Electrical and Computer Engineering Department of Electrical and Computer Engineering Program: Computer Engineering

Faculty of Electrical and Computer Engineering Department of Electrical and Computer Engineering Program: Computer Engineering Faculty of Electrical and Computer Engineering Department of Electrical and Computer Engineering Program: Computer Engineering Course Number EE 8218 011 Section Number 01 Course Title Parallel Computing

More information

CSE 613: Parallel Programming. Lecture 21 ( The Message Passing Interface )

CSE 613: Parallel Programming. Lecture 21 ( The Message Passing Interface ) CSE 613: Parallel Programming Lecture 21 ( The Message Passing Interface ) Jesmin Jahan Tithi Department of Computer Science SUNY Stony Brook Fall 2013 ( Slides from Rezaul A. Chowdhury ) Principles of

More information

Report S1 C. Kengo Nakajima Information Technology Center. Technical & Scientific Computing II ( ) Seminar on Computer Science II ( )

Report S1 C. Kengo Nakajima Information Technology Center. Technical & Scientific Computing II ( ) Seminar on Computer Science II ( ) Report S1 C Kengo Nakajima Information Technology Center Technical & Scientific Computing II (4820-1028) Seminar on Computer Science II (4810-1205) Problem S1-3 Report S1 (2/2) Develop parallel program

More information

Introduction in Parallel Programming - MPI Part I

Introduction in Parallel Programming - MPI Part I Introduction in Parallel Programming - MPI Part I Instructor: Michela Taufer WS2004/2005 Source of these Slides Books: Parallel Programming with MPI by Peter Pacheco (Paperback) Parallel Programming in

More information

Introduction to the Message Passing Interface (MPI)

Introduction to the Message Passing Interface (MPI) Introduction to the Message Passing Interface (MPI) CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Introduction to the Message Passing Interface (MPI) Spring 2018

More information

MPI. (message passing, MIMD)

MPI. (message passing, MIMD) MPI (message passing, MIMD) What is MPI? a message-passing library specification extension of C/C++ (and Fortran) message passing for distributed memory parallel programming Features of MPI Point-to-point

More information

DPHPC Recitation Session 2 Advanced MPI Concepts

DPHPC Recitation Session 2 Advanced MPI Concepts TIMO SCHNEIDER DPHPC Recitation Session 2 Advanced MPI Concepts Recap MPI is a widely used API to support message passing for HPC We saw that six functions are enough to write useful

More information

OpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.

OpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico. OpenMP and MPI Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico November 15, 2010 José Monteiro (DEI / IST) Parallel and Distributed Computing

More information

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI CS 470 Spring 2017 Mike Lam, Professor Distributed Programming & MPI MPI paradigm Single program, multiple data (SPMD) One program, multiple processes (ranks) Processes communicate via messages An MPI

More information

Chip Multiprocessors COMP Lecture 9 - OpenMP & MPI

Chip Multiprocessors COMP Lecture 9 - OpenMP & MPI Chip Multiprocessors COMP35112 Lecture 9 - OpenMP & MPI Graham Riley 14 February 2018 1 Today s Lecture Dividing work to be done in parallel between threads in Java (as you are doing in the labs) is rather

More information

MPI MPI. Linux. Linux. Message Passing Interface. Message Passing Interface. August 14, August 14, 2007 MPICH. MPI MPI Send Recv MPI

MPI MPI. Linux. Linux. Message Passing Interface. Message Passing Interface. August 14, August 14, 2007 MPICH. MPI MPI Send Recv MPI Linux MPI Linux MPI Message Passing Interface Linux MPI Linux MPI Message Passing Interface MPI MPICH MPI Department of Science and Engineering Computing School of Mathematics School Peking University

More information

MPI-2. Introduction Dynamic Process Creation. Based on notes by Sathish Vadhiyar, Rob Thacker, and David Cronk

MPI-2. Introduction Dynamic Process Creation. Based on notes by Sathish Vadhiyar, Rob Thacker, and David Cronk MPI-2 Introduction Dynamic Process Creation Based on notes by Sathish Vadhiyar, Rob Thacker, and David Cronk http://www-unix.mcs.anl.gov/mpi/mpi-standard/mpi-report-2.0/mpi2-report.htm Using MPI2: Advanced

More information

MPI, Part 3. Scientific Computing Course, Part 3

MPI, Part 3. Scientific Computing Course, Part 3 MPI, Part 3 Scientific Computing Course, Part 3 Non-blocking communications Diffusion: Had to Global Domain wait for communications to compute Could not compute end points without guardcell data All work

More information

HPC Parallel Programing Multi-node Computation with MPI - I

HPC Parallel Programing Multi-node Computation with MPI - I HPC Parallel Programing Multi-node Computation with MPI - I Parallelization and Optimization Group TATA Consultancy Services, Sahyadri Park Pune, India TCS all rights reserved April 29, 2013 Copyright

More information

MPI Program Structure

MPI Program Structure MPI Program Structure Handles MPI communicator MPI_COMM_WORLD Header files MPI function format Initializing MPI Communicator size Process rank Exiting MPI 1 Handles MPI controls its own internal data structures

More information

Outline. CSC 447: Parallel Programming for Multi-Core and Cluster Systems 2

Outline. CSC 447: Parallel Programming for Multi-Core and Cluster Systems 2 CSC 447: Parallel Programming for Multi-Core and Cluster Systems Message Passing with MPI Instructor: Haidar M. Harmanani Outline Message-passing model Message Passing Interface (MPI) Coding MPI programs

More information

Parallel Programming. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Parallel Programming. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Parallel Programming Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Challenges Difficult to write parallel programs Most programmers think sequentially

More information

MPI Tutorial Part 2 Design of Parallel and High-Performance Computing Recitation Session

MPI Tutorial Part 2 Design of Parallel and High-Performance Computing Recitation Session S. DI GIROLAMO [DIGIROLS@INF.ETHZ.CH] MPI Tutorial Part 2 Design of Parallel and High-Performance Computing Recitation Session Slides credits: Pavan Balaji, Torsten Hoefler https://htor.inf.ethz.ch/teaching/mpi_tutorials/isc16/hoefler-balaji-isc16-advanced-mpi.pdf

More information

Message Passing Interface

Message Passing Interface Message Passing Interface DPHPC15 TA: Salvatore Di Girolamo DSM (Distributed Shared Memory) Message Passing MPI (Message Passing Interface) A message passing specification implemented

More information

Parallel Programming with MPI: Day 1

Parallel Programming with MPI: Day 1 Parallel Programming with MPI: Day 1 Science & Technology Support High Performance Computing Ohio Supercomputer Center 1224 Kinnear Road Columbus, OH 43212-1163 1 Table of Contents Brief History of MPI

More information

MPI introduction - exercises -

MPI introduction - exercises - MPI introduction - exercises - Paolo Ramieri, Maurizio Cremonesi May 2016 Startup notes Access the server and go on scratch partition: ssh a08tra49@login.galileo.cineca.it cd $CINECA_SCRATCH Create a job

More information

MPI Collective communication

MPI Collective communication MPI Collective communication CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) MPI Collective communication Spring 2018 1 / 43 Outline 1 MPI Collective communication

More information

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 Message passing vs. Shared memory Client Client Client Client send(msg) recv(msg) send(msg) recv(msg) MSG MSG MSG IPC Shared

More information

Introduction to Parallel Programming with MPI

Introduction to Parallel Programming with MPI Introduction to Parallel Programming with MPI Slides are available at http://www.mcs.anl.gov/~balaji/tmp/csdms-mpi-basic.pdf Pavan Balaji Argonne National Laboratory balaji@mcs.anl.gov http://www.mcs.anl.gov/~balaji

More information

CMSC 714 Lecture 3 Message Passing with PVM and MPI

CMSC 714 Lecture 3 Message Passing with PVM and MPI Notes CMSC 714 Lecture 3 Message Passing with PVM and MPI Alan Sussman To access papers in ACM or IEEE digital library, must come from a UMD IP address Accounts handed out next week for deepthought2 cluster,

More information

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI CS 470 Spring 2018 Mike Lam, Professor Distributed Programming & MPI MPI paradigm Single program, multiple data (SPMD) One program, multiple processes (ranks) Processes communicate via messages An MPI

More information

Distributed Memory Programming with Message-Passing

Distributed Memory Programming with Message-Passing Distributed Memory Programming with Message-Passing Pacheco s book Chapter 3 T. Yang, CS240A Part of slides from the text book and B. Gropp Outline An overview of MPI programming Six MPI functions and

More information

MPI Introduction. Torsten Hoefler. (some slides borrowed from Rajeev Thakur and Pavan Balaji)

MPI Introduction. Torsten Hoefler. (some slides borrowed from Rajeev Thakur and Pavan Balaji) MPI Introduction Torsten Hoefler (some slides borrowed from Rajeev Thakur and Pavan Balaji) Course Outline Thursday Morning 9.00-10.30: Intro to MPI and parallel programming blocking sends/recvs, nonblocking

More information

mpi-02.c 1/1. 15/10/26 mpi-01.c 1/1. 15/10/26

mpi-02.c 1/1. 15/10/26 mpi-01.c 1/1. 15/10/26 mpi-01.c 1/1 main ( argc, char * argv[]) rank, size; prf ("I am process %d of %d\n", rank, size); mpi-02.c 1/1 #include main ( argc, char * argv[]) rank, size, src, dest, nc; tag = 50; // tag

More information

Report S1 C. Kengo Nakajima. Programming for Parallel Computing ( ) Seminar on Advanced Computing ( )

Report S1 C. Kengo Nakajima. Programming for Parallel Computing ( ) Seminar on Advanced Computing ( ) Report S1 C Kengo Nakajima Programming for Parallel Computing (616-2057) Seminar on Advanced Computing (616-4009) Problem S1-1 Report S1 (1/2) Read local files /a1.0~a1.3, /a2.0~a2.3. Develop

More information

Report S1 C. Kengo Nakajima

Report S1 C. Kengo Nakajima Report S1 C Kengo Nakajima Technical & Scientific Computing II (4820-1028) Seminar on Computer Science II (4810-1205) Hybrid Distributed Parallel Computing (3747-111) Problem S1-1 Report S1 Read local

More information

CSE 160 Lecture 15. Message Passing

CSE 160 Lecture 15. Message Passing CSE 160 Lecture 15 Message Passing Announcements 2013 Scott B. Baden / CSE 160 / Fall 2013 2 Message passing Today s lecture The Message Passing Interface - MPI A first MPI Application The Trapezoidal

More information

Assignment 3 Key CSCI 351 PARALLEL PROGRAMMING FALL, Q1. Calculate log n, log n and log n for the following: Answer: Q2. mpi_trap_tree.

Assignment 3 Key CSCI 351 PARALLEL PROGRAMMING FALL, Q1. Calculate log n, log n and log n for the following: Answer: Q2. mpi_trap_tree. CSCI 351 PARALLEL PROGRAMMING FALL, 2015 Assignment 3 Key Q1. Calculate log n, log n and log n for the following: a. n=3 b. n=13 c. n=32 d. n=123 e. n=321 Answer: Q2. mpi_trap_tree.c The mpi_trap_time.c

More information

Parallel I/O. Steve Lantz Senior Research Associate Cornell CAC. Workshop: Parallel Computing on Ranger and Lonestar, May 16, 2012

Parallel I/O. Steve Lantz Senior Research Associate Cornell CAC. Workshop: Parallel Computing on Ranger and Lonestar, May 16, 2012 Parallel I/O Steve Lantz Senior Research Associate Cornell CAC Workshop: Parallel Computing on Ranger and Lonestar, May 16, 2012 Based on materials developed by Bill Barth at TACC Introduction: The Parallel

More information

Message Passing Interface. George Bosilca

Message Passing Interface. George Bosilca Message Passing Interface George Bosilca bosilca@icl.utk.edu Message Passing Interface Standard http://www.mpi-forum.org Current version: 3.1 All parallelism is explicit: the programmer is responsible

More information

CSE 160 Lecture 18. Message Passing

CSE 160 Lecture 18. Message Passing CSE 160 Lecture 18 Message Passing Question 4c % Serial Loop: for i = 1:n/3-1 x(2*i) = x(3*i); % Restructured for Parallelism (CORRECT) for i = 1:3:n/3-1 y(2*i) = y(3*i); for i = 2:3:n/3-1 y(2*i) = y(3*i);

More information

Parallel Computing. Lecture 17: OpenMP Last Touch

Parallel Computing. Lecture 17: OpenMP Last Touch CSCI-UA.0480-003 Parallel Computing Lecture 17: OpenMP Last Touch Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Some slides from here are adopted from: Yun (Helen) He and Chris Ding

More information

MPI MESSAGE PASSING INTERFACE

MPI MESSAGE PASSING INTERFACE MPI MESSAGE PASSING INTERFACE David COLIGNON, ULiège CÉCI - Consortium des Équipements de Calcul Intensif http://www.ceci-hpc.be Outline Introduction From serial source code to parallel execution MPI functions

More information

Programming for High Performance Computing. Programming Environment Dec 22, 2011 Osamu Tatebe

Programming for High Performance Computing. Programming Environment Dec 22, 2011 Osamu Tatebe Programming for High Performance Computing Programming Environment Dec 22, 2011 Osamu Tatebe Distributed Memory Machine CPU CPU Parallel machine that consists computers (CPU and memory) connected by a

More information

Introduction to MPI. SHARCNET MPI Lecture Series: Part I of II. Paul Preney, OCT, M.Sc., B.Ed., B.Sc.

Introduction to MPI. SHARCNET MPI Lecture Series: Part I of II. Paul Preney, OCT, M.Sc., B.Ed., B.Sc. Introduction to MPI SHARCNET MPI Lecture Series: Part I of II Paul Preney, OCT, M.Sc., B.Ed., B.Sc. preney@sharcnet.ca School of Computer Science University of Windsor Windsor, Ontario, Canada Copyright

More information

Parallel I/O for SwissTx

Parallel I/O for SwissTx 16 February 2000 Parallel I/O for SwissTx SFIO parallel I/O system implementation for Swiss-Tx. MPI-II I/O interface support for SFIO. LSP DI EPFL 1.0 Introduction 1.1 Requirements for parallel I/O 1.1.1

More information

MPI: Parallel Programming for Extreme Machines. Si Hammond, High Performance Systems Group

MPI: Parallel Programming for Extreme Machines. Si Hammond, High Performance Systems Group MPI: Parallel Programming for Extreme Machines Si Hammond, High Performance Systems Group Quick Introduction Si Hammond, (sdh@dcs.warwick.ac.uk) WPRF/PhD Research student, High Performance Systems Group,

More information

15-440: Recitation 8

15-440: Recitation 8 15-440: Recitation 8 School of Computer Science Carnegie Mellon University, Qatar Fall 2013 Date: Oct 31, 2013 I- Intended Learning Outcome (ILO): The ILO of this recitation is: Apply parallel programs

More information

mith College Computer Science CSC352 Week #7 Spring 2017 Introduction to MPI Dominique Thiébaut

mith College Computer Science CSC352 Week #7 Spring 2017 Introduction to MPI Dominique Thiébaut mith College CSC352 Week #7 Spring 2017 Introduction to MPI Dominique Thiébaut dthiebaut@smith.edu Introduction to MPI D. Thiebaut Inspiration Reference MPI by Blaise Barney, Lawrence Livermore National

More information

Distributed Memory Machines and Programming. Lecture 7

Distributed Memory Machines and Programming. Lecture 7 Distributed Memory Machines and Programming Lecture 7 James Demmel www.cs.berkeley.edu/~demmel/cs267_spr16 Slides from Kathy Yelick CS267 Lecture 7 1 Outline Distributed Memory Architectures Properties

More information

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 4 Message-Passing Programming Learning Objectives Understanding how MPI programs execute Familiarity with fundamental MPI functions

More information

Lecture 7: Distributed memory

Lecture 7: Distributed memory Lecture 7: Distributed memory David Bindel 15 Feb 2010 Logistics HW 1 due Wednesday: See wiki for notes on: Bottom-up strategy and debugging Matrix allocation issues Using SSE and alignment comments Timing

More information

MPI and CUDA. Filippo Spiga, HPCS, University of Cambridge.

MPI and CUDA. Filippo Spiga, HPCS, University of Cambridge. MPI and CUDA Filippo Spiga, HPCS, University of Cambridge Outline Basic principle of MPI Mixing MPI and CUDA 1 st example : parallel GPU detect 2 nd example: heat2d CUDA- aware MPI, how

More information

MPI 2. CSCI 4850/5850 High-Performance Computing Spring 2018

MPI 2. CSCI 4850/5850 High-Performance Computing Spring 2018 MPI 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives

More information