Parallel Numerical Algorithms

Size: px
Start display at page:

Download "Parallel Numerical Algorithms"

Transcription

1 Parallel Numerical Algorithms [ 5 ] MPI: Message Passing Interface Parallel Numerical Algorithms / IST / UTokyo 1

2 PNA16 Lecture Plan General Topics 1 Architecture and Performance 2 Dependency 3 Locality 4 Scheduling MIMD / Distributed Memory 5 MPI: Message Passing Interface 6 Collective Communication 7 Distributed Data Structure MIMD / Shared Memory 8 OpenMP 9 Cache Performance Special Lectures 5/30 How to use FX10 (Prof Ohshima) 6/6 Dynamic Parallelism (Prof Peri) SIMD / Shared Memory 10 GPU and CUDA 11 SIMD Performance Parallel Numerical Algorithms / IST / UTokyo 2

3 Memory Models Distributed memory Network Proc Proc Proc Proc Memory Memory Memory Memory Shared memory Uniform Memory Access (UMA) Non Uniform Memory Access (NUMA) Proc Proc Proc Proc Proc Proc Proc Proc Memory Mem Mem Mem Mem Parallel Numerical Algorithms / IST / UTokyo 3

4 Message Passing start send(data, 1) recv(data, 1) end start recv(data, 0) send(data, 0) send(data, 2) recv(data, 2) end start send(data, 3) recv(data, 3) recv(data, 1) send(data, 1) end start recv(data, 2) send(data, 2) end Note 1: send and receive pair establishes data transfer Note 2: source or destination is specified Parallel Numerical Algorithms / IST / UTokyo 4

5 Local View Program describes the processing done by each (only one) process start send(data, 1) recv(data, 1) end start recv(data, 0) send(data, 0) send(data, 2) recv(data, 2) end start send(data, 3) recv(data, 3) recv(data, 1) send(data, 1) end start recv(data, 2) send(data, 2) end Parallel Numerical Algorithms / IST / UTokyo 5

6 SPMD Single Program Multiple Data One program describes all local processing start send(data, 1) recv(data, 1) end start recv(data, 0) send(data, 0) send(data, 2) recv(data, 2) end start send(data, 3) recv(data, 3) recv(data, 1) send(data, 1) end Assume: myid represents ID number of the process nproc represents number of the running processes if (myid % 2 == 0) { send(data, myid + 1); recv(data, myid + 1); else { start recv(data, 2) send(data, 2) Parallel Numerical Algorithms / IST / UTokyo 6 end recv(data, myid 1); send(data, myid 1); if (myid!= 0 && myid!= nproc 1) { if (myid % 2 == 0) { recv(data, myid 1); send(data, myid 1); else { send(data, myid + 1); recv(data, myid + 1);

7 Basic MPI Terms Rank ID number of a process From 0 to (number of processors) 1 Communicator Group of communicating processes MPI_COMM_WORLD: the set of all processes Communicator Size: number of processes Buffer Memory area that contains / stores data Specified by pointers Parallel Numerical Algorithms / IST / UTokyo 7

8 Short and complete MPI code #include <stdioh> #include <mpih> int main(int argc, char **argv) { MPI_Init(&argc, &argv); int myid, nproc; MPI_Comm_rank(MPI_COMM_WORLD, &myid); MPI_Comm_size(MPI_COMM_WORLD, &nproc); int mydata = 2 * myid + 1; printf("i am %d, mydata = %d n", myid, mydata); int recvdata; MPI_Status stat; if (myid == 1) { MPI_Send(&mydata, 1, MPI_INT, 0, 0, MPI_COMM_WORLD); else if (myid == 0) { MPI_Recv(&recvdata, 1, MPI_INT, 1, 0, MPI_COMM_WORLD, &stat); printf("sum = %d n", mydata + recvdata); MPI_Finalize(); return 0; Parallel Numerical Algorithms / IST / UTokyo 8

9 Short and complete MPI code #include <stdioh> #include <mpih> Include header file int main(int argc, char **argv) { MPI_Init(&argc, &argv); int myid, nproc; MPI_Comm_rank(MPI_COMM_WORLD, &myid); MPI_Comm_size(MPI_COMM_WORLD, &nproc); int mydata = 2 * myid + 1; printf("i am %d, mydata = %d n", myid, mydata); int recvdata; MPI_Status stat; if (myid == 1) { MPI_Send(&mydata, 1, MPI_INT, 0, 0, MPI_COMM_WORLD); else if (myid == 0) { MPI_Recv(&recvdata, 1, MPI_INT, 1, 0, MPI_COMM_WORLD, &stat); printf("sum = %d n", mydata + recvdata); MPI_Finalize(); return 0; Parallel Numerical Algorithms / IST / UTokyo 9

10 Short and complete MPI code #include <stdioh> #include <mpih> int main(int argc, char **argv) { MPI_Init(&argc, &argv); Initialize in this form int myid, nproc; MPI_Comm_rank(MPI_COMM_WORLD, &myid); MPI_Comm_size(MPI_COMM_WORLD, &nproc); int mydata = 2 * myid + 1; printf("i am %d, mydata = %d n", myid, mydata); int recvdata; MPI_Status stat; if (myid == 1) { MPI_Send(&mydata, 1, MPI_INT, 0, 0, MPI_COMM_WORLD); else if (myid == 0) { MPI_Recv(&recvdata, 1, MPI_INT, 1, 0, MPI_COMM_WORLD, &stat); printf("sum = %d n", mydata + recvdata); MPI_Finalize(); return 0; Parallel Numerical Algorithms / IST / UTokyo 10

11 Short and complete MPI code #include <stdioh> #include <mpih> int main(int argc, char **argv) { MPI_Init(&argc, &argv); int myid, nproc; MPI_Comm_rank(MPI_COMM_WORLD, &myid); MPI_Comm_size(MPI_COMM_WORLD, &nproc); Get myid and nproc int mydata = 2 * myid + 1; printf("i am %d, mydata = %d n", myid, mydata); int recvdata; MPI_Status stat; if (myid == 1) { MPI_Send(&mydata, 1, MPI_INT, 0, 0, MPI_COMM_WORLD); else if (myid == 0) { MPI_Recv(&recvdata, 1, MPI_INT, 1, 0, MPI_COMM_WORLD, &stat); printf("sum = %d n", mydata + recvdata); MPI_Finalize(); return 0; Parallel Numerical Algorithms / IST / UTokyo 11

12 Short and complete MPI code #include <stdioh> #include <mpih> int main(int argc, char **argv) { MPI_Init(&argc, &argv); int myid, nproc; MPI_Comm_rank(MPI_COMM_WORLD, &myid); MPI_Comm_size(MPI_COMM_WORLD, &nproc); int mydata = 2 * myid + 1; printf("i am %d, mydata = %d n", myid, mydata); int recvdata; MPI_Status stat; Make some data and print it out if (myid == 1) { MPI_Send(&mydata, 1, MPI_INT, 0, 0, MPI_COMM_WORLD); else if (myid == 0) { MPI_Recv(&recvdata, 1, MPI_INT, 1, 0, MPI_COMM_WORLD, &stat); printf("sum = %d n", mydata + recvdata); MPI_Finalize(); return 0; Parallel Numerical Algorithms / IST / UTokyo 12

13 Short and complete MPI code #include <stdioh> #include <mpih> int main(int argc, char **argv) { MPI_Init(&argc, &argv); int myid, nproc; MPI_Comm_rank(MPI_COMM_WORLD, &myid); MPI_Comm_size(MPI_COMM_WORLD, &nproc); int mydata = 2 * myid + 1; printf("i am %d, mydata = %d n", myid, mydata); int recvdata; MPI_Status stat; Data structure for receive if (myid == 1) { MPI_Send(&mydata, 1, MPI_INT, 0, 0, MPI_COMM_WORLD); else if (myid == 0) { MPI_Recv(&recvdata, 1, MPI_INT, 1, 0, MPI_COMM_WORLD, &stat); printf("sum = %d n", mydata + recvdata); MPI_Finalize(); return 0; Parallel Numerical Algorithms / IST / UTokyo 13

14 Short and complete MPI code #include <stdioh> #include <mpih> int main(int argc, char **argv) { MPI_Init(&argc, &argv); int myid, nproc; MPI_Comm_rank(MPI_COMM_WORLD, &myid); MPI_Comm_size(MPI_COMM_WORLD, &nproc); int mydata = 2 * myid + 1; printf("i am %d, mydata = %d n", myid, mydata); int recvdata; MPI_Status stat; if (myid == 1) { MPI_Send(&mydata, 1, MPI_INT, 0, 0, MPI_COMM_WORLD); else if (myid == 0) { MPI_Recv(&recvdata, 1, MPI_INT, 1, 0, MPI_COMM_WORLD, &stat); printf("sum = %d n", mydata + recvdata); Send a data MPI_Finalize(); return 0; Parallel Numerical Algorithms / IST / UTokyo 14

15 Short and complete MPI code #include <stdioh> #include <mpih> int main(int argc, char **argv) { MPI_Init(&argc, &argv); int myid, nproc; MPI_Comm_rank(MPI_COMM_WORLD, &myid); MPI_Comm_size(MPI_COMM_WORLD, &nproc); int mydata = 2 * myid + 1; printf("i am %d, mydata = %d n", myid, mydata); int recvdata; MPI_Status stat; if (myid == 1) { MPI_Send(&mydata, 1, MPI_INT, 0, 0, MPI_COMM_WORLD); else if (myid == 0) { MPI_Recv(&recvdata, 1, MPI_INT, 1, 0, MPI_COMM_WORLD, &stat); printf("sum = %d n", mydata + recvdata); Receive a data MPI_Finalize(); return 0; Parallel Numerical Algorithms / IST / UTokyo 15

16 Short and complete MPI code #include <stdioh> #include <mpih> int main(int argc, char **argv) { MPI_Init(&argc, &argv); int myid, nproc; MPI_Comm_rank(MPI_COMM_WORLD, &myid); MPI_Comm_size(MPI_COMM_WORLD, &nproc); int mydata = 2 * myid + 1; printf("i am %d, mydata = %d n", myid, mydata); int recvdata; MPI_Status stat; if (myid == 1) { MPI_Send(&mydata, 1, MPI_INT, 0, 0, MPI_COMM_WORLD); else if (myid == 0) { MPI_Recv(&recvdata, 1, MPI_INT, 1, 0, MPI_COMM_WORLD, &stat); printf("sum = %d n", mydata + recvdata); MPI_Finalize(); Finalize: must be called return 0; Parallel Numerical Algorithms / IST / UTokyo 16

17 MPI-C at a glance Parallel Numerical Algorithms / IST / UTokyo 17

18 MPI example: cmpsum Note: this function is provided by MPI_AllReduce Butterfly algorithm Only applicable to powers of 2 Parallel Numerical Algorithms / IST / UTokyo 18

19 Message transfer protocols Eager Protocol Data is sent without waiting matching receive call Rendez-vous Protocol Data is sent after matching receive is called Eager Protocol Rendez-vous Protocol src dst src dst user user user buffer address user system system system system Parallel Numerical Algorithms / IST / UTokyo 19

20 Message transfer protocols Eager Protocol First sent to system buffer, then copy to user area No wait for matching receive No dead lock Fast for small messages System to user data copy: slow for large messages Rendez-vous Protocol First receive buffer address is send, then transfer Need wait for matching receive Deadlock may happen Slow for small messages Direct data transfer: fast for large messages Parallel Numerical Algorithms / IST / UTokyo 20

21 MPI_Isend & MPI_Irecv Non-blocking Communication MPI_Isend and MPI_Irecv Returns at once (without waiting for the completion of the data transfer) Must call MPI_Wait for completion Warning: after MPI_Isend and before MPI_Wait, you must not modify the buffer Any combinations, eg Send-Irecv and Isend-Recv are OK Parallel Numerical Algorithms / IST / UTokyo 21

22 BREAK Parallel Numerical Algorithms / IST / UTokyo 22

23 MPI example: stencil Heat equation (dissipation) = κκ 2 uu xx 2 Finite Difference approximation uu ii,kk uu(iiδxx, kkδtt) uu ii,kk+1 uu ii,kk Δtt = κκ uu ii+1,kk 2uu ii,kk + uu ii 1,kk Δxx 2 r = kappa * delta_t / (delta_x * delta_x); u[i][k+1] = r * u[i+1][k] + (1 2* r) u[i][k] + r * u[i-1][k]; Parallel Numerical Algorithms / IST / UTokyo 23

24 MPI example: stencil 0 1 b b+1 2b 2b+1 n n+1 Allocated memory for rank 0 Compute: 3 elements Allocation: 5 elements Parallel Numerical Algorithms / IST / UTokyo 24

25 MPI example: stencil 0 1 b b+1 2b 2b+1 n n+1 Allocated memory for rank 1 Compute: 3 elements Allocation: 5 elements Parallel Numerical Algorithms / IST / UTokyo 25

26 MPI example: stencil 0 1 b b+1 2b 2b+1 n n+1 (n = 3b) Compute: 3 elements Allocation: 5 elements Allocated memory for rank 2 Parallel Numerical Algorithms / IST / UTokyo 26

27 Shadow / Halo Extra array region for incoming message Parallel Numerical Algorithms / IST / UTokyo 27

28 Order of messages Messages are not-overtaking But some reports that overtaking happens on FX10 Solvable by forced matching with tags No fairness is guaranteed Parallel Numerical Algorithms / IST / UTokyo 28

29 PNA16 Lecture Plan General Topics 1 Architecture and Performance 2 Dependency 3 Locality 4 Scheduling MIMD / Distributed Memory 5 MPI: Message Passing Interface 6 Collective Communication 7 Distributed Data Structure MIMD / Shared Memory 8 OpenMP 9 Cache Performance Special Lectures 5/30 How to use FX10 (Prof Ohshima) 6/6 Dynamic Parallelism (Prof Peri) SIMD / Shared Memory 10 GPU and CUDA 11 SIMD Performance Parallel Numerical Algorithms / IST / UTokyo 29

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms http://sudalab.is.s.u-tokyo.ac.jp/~reiji/pna16/ [ 8 ] OpenMP Parallel Numerical Algorithms / IST / UTokyo 1 PNA16 Lecture Plan General Topics 1. Architecture and Performance

More information

Message Passing Interface

Message Passing Interface MPSoC Architectures MPI Alberto Bosio, Associate Professor UM Microelectronic Departement bosio@lirmm.fr Message Passing Interface API for distributed-memory programming parallel code that runs across

More information

CSE 160 Lecture 18. Message Passing

CSE 160 Lecture 18. Message Passing CSE 160 Lecture 18 Message Passing Question 4c % Serial Loop: for i = 1:n/3-1 x(2*i) = x(3*i); % Restructured for Parallelism (CORRECT) for i = 1:3:n/3-1 y(2*i) = y(3*i); for i = 2:3:n/3-1 y(2*i) = y(3*i);

More information

CS 179: GPU Programming. Lecture 14: Inter-process Communication

CS 179: GPU Programming. Lecture 14: Inter-process Communication CS 179: GPU Programming Lecture 14: Inter-process Communication The Problem What if we want to use GPUs across a distributed system? GPU cluster, CSIRO Distributed System A collection of computers Each

More information

HPC Parallel Programing Multi-node Computation with MPI - I

HPC Parallel Programing Multi-node Computation with MPI - I HPC Parallel Programing Multi-node Computation with MPI - I Parallelization and Optimization Group TATA Consultancy Services, Sahyadri Park Pune, India TCS all rights reserved April 29, 2013 Copyright

More information

CSE 160 Lecture 15. Message Passing

CSE 160 Lecture 15. Message Passing CSE 160 Lecture 15 Message Passing Announcements 2013 Scott B. Baden / CSE 160 / Fall 2013 2 Message passing Today s lecture The Message Passing Interface - MPI A first MPI Application The Trapezoidal

More information

Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS. Teacher: Jan Kwiatkowski, Office 201/15, D-2

Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS. Teacher: Jan Kwiatkowski, Office 201/15, D-2 Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS Teacher: Jan Kwiatkowski, Office 201/15, D-2 COMMUNICATION For questions, email to jan.kwiatkowski@pwr.edu.pl with 'Subject=your name.

More information

MPI Message Passing Interface

MPI Message Passing Interface MPI Message Passing Interface Portable Parallel Programs Parallel Computing A problem is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information

More information

Parallel hardware. Distributed Memory. Parallel software. COMP528 MPI Programming, I. Flynn s taxonomy:

Parallel hardware. Distributed Memory. Parallel software. COMP528 MPI Programming, I. Flynn s taxonomy: COMP528 MPI Programming, I www.csc.liv.ac.uk/~alexei/comp528 Alexei Lisitsa Dept of computer science University of Liverpool a.lisitsa@.liverpool.ac.uk Flynn s taxonomy: Parallel hardware SISD (Single

More information

MPI: Parallel Programming for Extreme Machines. Si Hammond, High Performance Systems Group

MPI: Parallel Programming for Extreme Machines. Si Hammond, High Performance Systems Group MPI: Parallel Programming for Extreme Machines Si Hammond, High Performance Systems Group Quick Introduction Si Hammond, (sdh@dcs.warwick.ac.uk) WPRF/PhD Research student, High Performance Systems Group,

More information

High Performance Computing Course Notes Message Passing Programming I

High Performance Computing Course Notes Message Passing Programming I High Performance Computing Course Notes 2008-2009 2009 Message Passing Programming I Message Passing Programming Message Passing is the most widely used parallel programming model Message passing works

More information

Department of Informatics V. HPC-Lab. Session 4: MPI, CG M. Bader, A. Breuer. Alex Breuer

Department of Informatics V. HPC-Lab. Session 4: MPI, CG M. Bader, A. Breuer. Alex Breuer HPC-Lab Session 4: MPI, CG M. Bader, A. Breuer Meetings Date Schedule 10/13/14 Kickoff 10/20/14 Q&A 10/27/14 Presentation 1 11/03/14 H. Bast, Intel 11/10/14 Presentation 2 12/01/14 Presentation 3 12/08/14

More information

Chip Multiprocessors COMP Lecture 9 - OpenMP & MPI

Chip Multiprocessors COMP Lecture 9 - OpenMP & MPI Chip Multiprocessors COMP35112 Lecture 9 - OpenMP & MPI Graham Riley 14 February 2018 1 Today s Lecture Dividing work to be done in parallel between threads in Java (as you are doing in the labs) is rather

More information

MPI 2. CSCI 4850/5850 High-Performance Computing Spring 2018

MPI 2. CSCI 4850/5850 High-Performance Computing Spring 2018 MPI 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives

More information

mith College Computer Science CSC352 Week #7 Spring 2017 Introduction to MPI Dominique Thiébaut

mith College Computer Science CSC352 Week #7 Spring 2017 Introduction to MPI Dominique Thiébaut mith College CSC352 Week #7 Spring 2017 Introduction to MPI Dominique Thiébaut dthiebaut@smith.edu Introduction to MPI D. Thiebaut Inspiration Reference MPI by Blaise Barney, Lawrence Livermore National

More information

Scientific Computing

Scientific Computing Lecture on Scientific Computing Dr. Kersten Schmidt Lecture 21 Technische Universität Berlin Institut für Mathematik Wintersemester 2014/2015 Syllabus Linear Regression, Fast Fourier transform Modelling

More information

Programming with MPI. Pedro Velho

Programming with MPI. Pedro Velho Programming with MPI Pedro Velho Science Research Challenges Some applications require tremendous computing power - Stress the limits of computing power and storage - Who might be interested in those applications?

More information

MPI (Message Passing Interface)

MPI (Message Passing Interface) MPI (Message Passing Interface) Message passing library standard developed by group of academics and industrial partners to foster more widespread use and portability. Defines routines, not implementation.

More information

Holland Computing Center Kickstart MPI Intro

Holland Computing Center Kickstart MPI Intro Holland Computing Center Kickstart 2016 MPI Intro Message Passing Interface (MPI) MPI is a specification for message passing library that is standardized by MPI Forum Multiple vendor-specific implementations:

More information

Introduction to MPI: Part II

Introduction to MPI: Part II Introduction to MPI: Part II Pawel Pomorski, University of Waterloo, SHARCNET ppomorsk@sharcnetca November 25, 2015 Summary of Part I: To write working MPI (Message Passing Interface) parallel programs

More information

Programming with MPI on GridRS. Dr. Márcio Castro e Dr. Pedro Velho

Programming with MPI on GridRS. Dr. Márcio Castro e Dr. Pedro Velho Programming with MPI on GridRS Dr. Márcio Castro e Dr. Pedro Velho Science Research Challenges Some applications require tremendous computing power - Stress the limits of computing power and storage -

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms http://sudalab.is.s.u-tokyo.ac.jp/~reiji/pna16/ [ 9 ] Shared Memory Performance Parallel Numerical Algorithms / IST / UTokyo 1 PNA16 Lecture Plan General Topics 1. Architecture

More information

Parallel Programming Using MPI

Parallel Programming Using MPI Parallel Programming Using MPI Prof. Hank Dietz KAOS Seminar, February 8, 2012 University of Kentucky Electrical & Computer Engineering Parallel Processing Process N pieces simultaneously, get up to a

More information

ECE 574 Cluster Computing Lecture 13

ECE 574 Cluster Computing Lecture 13 ECE 574 Cluster Computing Lecture 13 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 15 October 2015 Announcements Homework #3 and #4 Grades out soon Homework #5 will be posted

More information

Message-Passing Computing

Message-Passing Computing Chapter 2 Slide 41þþ Message-Passing Computing Slide 42þþ Basics of Message-Passing Programming using userlevel message passing libraries Two primary mechanisms needed: 1. A method of creating separate

More information

CS4961 Parallel Programming. Lecture 16: Introduction to Message Passing 11/3/11. Administrative. Mary Hall November 3, 2011.

CS4961 Parallel Programming. Lecture 16: Introduction to Message Passing 11/3/11. Administrative. Mary Hall November 3, 2011. CS4961 Parallel Programming Lecture 16: Introduction to Message Passing Administrative Next programming assignment due on Monday, Nov. 7 at midnight Need to define teams and have initial conversation with

More information

Introduction to Parallel Programming Message Passing Interface Practical Session Part I

Introduction to Parallel Programming Message Passing Interface Practical Session Part I Introduction to Parallel Programming Message Passing Interface Practical Session Part I T. Streit, H.-J. Pflug streit@rz.rwth-aachen.de October 28, 2008 1 1. Examples We provide codes of the theoretical

More information

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI CS 470 Spring 2018 Mike Lam, Professor Distributed Programming & MPI MPI paradigm Single program, multiple data (SPMD) One program, multiple processes (ranks) Processes communicate via messages An MPI

More information

CS 6230: High-Performance Computing and Parallelization Introduction to MPI

CS 6230: High-Performance Computing and Parallelization Introduction to MPI CS 6230: High-Performance Computing and Parallelization Introduction to MPI Dr. Mike Kirby School of Computing and Scientific Computing and Imaging Institute University of Utah Salt Lake City, UT, USA

More information

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI CS 470 Spring 2017 Mike Lam, Professor Distributed Programming & MPI MPI paradigm Single program, multiple data (SPMD) One program, multiple processes (ranks) Processes communicate via messages An MPI

More information

Introduction in Parallel Programming - MPI Part I

Introduction in Parallel Programming - MPI Part I Introduction in Parallel Programming - MPI Part I Instructor: Michela Taufer WS2004/2005 Source of these Slides Books: Parallel Programming with MPI by Peter Pacheco (Paperback) Parallel Programming in

More information

Collective Communication in MPI and Advanced Features

Collective Communication in MPI and Advanced Features Collective Communication in MPI and Advanced Features Pacheco s book. Chapter 3 T. Yang, CS240A. Part of slides from the text book, CS267 K. Yelick from UC Berkeley and B. Gropp, ANL Outline Collective

More information

Distributed Memory Programming with Message-Passing

Distributed Memory Programming with Message-Passing Distributed Memory Programming with Message-Passing Pacheco s book Chapter 3 T. Yang, CS240A Part of slides from the text book and B. Gropp Outline An overview of MPI programming Six MPI functions and

More information

PCAP Assignment I. 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail.

PCAP Assignment I. 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail. PCAP Assignment I 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail. The multicore CPUs are designed to maximize the execution speed

More information

Outline. Introduction to HPC computing. OpenMP MPI. Introduction. Understanding communications. Collective communications. Communicators.

Outline. Introduction to HPC computing. OpenMP MPI. Introduction. Understanding communications. Collective communications. Communicators. Lecture 8 MPI Outline Introduction to HPC computing OpenMP MPI Introduction Understanding communications Collective communications Communicators Topologies Grouping Data for Communication Input / output

More information

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 Message passing vs. Shared memory Client Client Client Client send(msg) recv(msg) send(msg) recv(msg) MSG MSG MSG IPC Shared

More information

Introduction to MPI. Ekpe Okorafor. School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014

Introduction to MPI. Ekpe Okorafor. School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014 Introduction to MPI Ekpe Okorafor School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014 Topics Introduction MPI Model and Basic Calls MPI Communication Summary 2 Topics Introduction

More information

Framework of an MPI Program

Framework of an MPI Program MPI Charles Bacon Framework of an MPI Program Initialize the MPI environment MPI_Init( ) Run computation / message passing Finalize the MPI environment MPI_Finalize() Hello World fragment #include

More information

MPI and comparison of models Lecture 23, cs262a. Ion Stoica & Ali Ghodsi UC Berkeley April 16, 2018

MPI and comparison of models Lecture 23, cs262a. Ion Stoica & Ali Ghodsi UC Berkeley April 16, 2018 MPI and comparison of models Lecture 23, cs262a Ion Stoica & Ali Ghodsi UC Berkeley April 16, 2018 MPI MPI - Message Passing Interface Library standard defined by a committee of vendors, implementers,

More information

CSE 613: Parallel Programming. Lecture 21 ( The Message Passing Interface )

CSE 613: Parallel Programming. Lecture 21 ( The Message Passing Interface ) CSE 613: Parallel Programming Lecture 21 ( The Message Passing Interface ) Jesmin Jahan Tithi Department of Computer Science SUNY Stony Brook Fall 2013 ( Slides from Rezaul A. Chowdhury ) Principles of

More information

ECE 563 Midterm 1 Spring 2015

ECE 563 Midterm 1 Spring 2015 ECE 563 Midterm 1 Spring 215 To make it easy not to cheat, this exam is open book and open notes. Please print and sign your name below. By doing so you signify that you have not received or given any

More information

NUMERICAL PARALLEL COMPUTING

NUMERICAL PARALLEL COMPUTING Lecture 5, March 23, 2012: The Message Passing Interface http://people.inf.ethz.ch/iyves/pnc12/ Peter Arbenz, Andreas Adelmann Computer Science Dept, ETH Zürich E-mail: arbenz@inf.ethz.ch Paul Scherrer

More information

Introduction to parallel computing concepts and technics

Introduction to parallel computing concepts and technics Introduction to parallel computing concepts and technics Paschalis Korosoglou (support@grid.auth.gr) User and Application Support Unit Scientific Computing Center @ AUTH Overview of Parallel computing

More information

Acknowledgments. Programming with MPI Basic send and receive. A Minimal MPI Program (C) Contents. Type to enter text

Acknowledgments. Programming with MPI Basic send and receive. A Minimal MPI Program (C) Contents. Type to enter text Acknowledgments Programming with MPI Basic send and receive Jan Thorbecke Type to enter text This course is partly based on the MPI course developed by Rolf Rabenseifner at the High-Performance Computing-Center

More information

The Message Passing Model

The Message Passing Model Introduction to MPI The Message Passing Model Applications that do not share a global address space need a Message Passing Framework. An application passes messages among processes in order to perform

More information

The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs

The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs 1 The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs http://mpi-forum.org https://www.open-mpi.org/ Mike Bailey mjb@cs.oregonstate.edu Oregon State University mpi.pptx

More information

Programming with MPI Basic send and receive

Programming with MPI Basic send and receive Programming with MPI Basic send and receive Jan Thorbecke Type to enter text Delft University of Technology Challenge the future Acknowledgments This course is partly based on the MPI course developed

More information

MPI and CUDA. Filippo Spiga, HPCS, University of Cambridge.

MPI and CUDA. Filippo Spiga, HPCS, University of Cambridge. MPI and CUDA Filippo Spiga, HPCS, University of Cambridge Outline Basic principle of MPI Mixing MPI and CUDA 1 st example : parallel GPU detect 2 nd example: heat2d CUDA- aware MPI, how

More information

int sum;... sum = sum + c?

int sum;... sum = sum + c? int sum;... sum = sum + c? Version Cores Time (secs) Speedup manycore Message Passing Interface mpiexec int main( ) { int ; char ; } MPI_Init( ); MPI_Comm_size(, &N); MPI_Comm_rank(, &R); gethostname(

More information

COSC 6374 Parallel Computation

COSC 6374 Parallel Computation COSC 6374 Parallel Computation Message Passing Interface (MPI ) II Advanced point-to-point operations Spring 2008 Overview Point-to-point taxonomy and available functions What is the status of a message?

More information

Point-to-Point Communication. Reference:

Point-to-Point Communication. Reference: Point-to-Point Communication Reference: http://foxtrot.ncsa.uiuc.edu:8900/public/mpi/ Introduction Point-to-point communication is the fundamental communication facility provided by the MPI library. Point-to-point

More information

Recap of Parallelism & MPI

Recap of Parallelism & MPI Recap of Parallelism & MPI Chris Brady Heather Ratcliffe The Angry Penguin, used under creative commons licence from Swantje Hess and Jannis Pohlmann. Warwick RSE 13/12/2017 Parallel programming Break

More information

[4] 1 cycle takes 1/(3x10 9 ) seconds. One access to memory takes 50/(3x10 9 ) seconds. =16ns. Performance = 4 FLOPS / (2x50/(3x10 9 )) = 120 MFLOPS.

[4] 1 cycle takes 1/(3x10 9 ) seconds. One access to memory takes 50/(3x10 9 ) seconds. =16ns. Performance = 4 FLOPS / (2x50/(3x10 9 )) = 120 MFLOPS. Give your answers in the space provided with each question. Answers written elsewhere will not be graded. Q1). [4 points] Consider a memory system with level 1 cache of 64 KB and DRAM of 1GB with processor

More information

High performance computing. Message Passing Interface

High performance computing. Message Passing Interface High performance computing Message Passing Interface send-receive paradigm sending the message: send (target, id, data) receiving the message: receive (source, id, data) Versatility of the model High efficiency

More information

An Introduction to MPI

An Introduction to MPI An Introduction to MPI Parallel Programming with the Message Passing Interface William Gropp Ewing Lusk Argonne National Laboratory 1 Outline Background The message-passing model Origins of MPI and current

More information

Introduction to MPI. Branislav Jansík

Introduction to MPI. Branislav Jansík Introduction to MPI Branislav Jansík Resources https://computing.llnl.gov/tutorials/mpi/ http://www.mpi-forum.org/ https://www.open-mpi.org/doc/ Serial What is parallel computing Parallel What is MPI?

More information

Message Passing Interface. most of the slides taken from Hanjun Kim

Message Passing Interface. most of the slides taken from Hanjun Kim Message Passing Interface most of the slides taken from Hanjun Kim Message Passing Pros Scalable, Flexible Cons Someone says it s more difficult than DSM MPI (Message Passing Interface) A standard message

More information

Introduction to MPI. SHARCNET MPI Lecture Series: Part I of II. Paul Preney, OCT, M.Sc., B.Ed., B.Sc.

Introduction to MPI. SHARCNET MPI Lecture Series: Part I of II. Paul Preney, OCT, M.Sc., B.Ed., B.Sc. Introduction to MPI SHARCNET MPI Lecture Series: Part I of II Paul Preney, OCT, M.Sc., B.Ed., B.Sc. preney@sharcnet.ca School of Computer Science University of Windsor Windsor, Ontario, Canada Copyright

More information

MPI Program Structure

MPI Program Structure MPI Program Structure Handles MPI communicator MPI_COMM_WORLD Header files MPI function format Initializing MPI Communicator size Process rank Exiting MPI 1 Handles MPI controls its own internal data structures

More information

Message Passing Interface

Message Passing Interface Message Passing Interface DPHPC15 TA: Salvatore Di Girolamo DSM (Distributed Shared Memory) Message Passing MPI (Message Passing Interface) A message passing specification implemented

More information

Parallel Programming using MPI. Supercomputing group CINECA

Parallel Programming using MPI. Supercomputing group CINECA Parallel Programming using MPI Supercomputing group CINECA Contents Programming with message passing Introduction to message passing and MPI Basic MPI programs MPI Communicators Send and Receive function

More information

Programming for High Performance Computing. Programming Environment Dec 11, 2014 Osamu Tatebe

Programming for High Performance Computing. Programming Environment Dec 11, 2014 Osamu Tatebe Programming for High Performance Computing Programming Environment Dec 11, 2014 Osamu Tatebe Distributed Memory Machine (PC Cluster) A distributed memory machine consists of computers (compute nodes) connected

More information

Programming Scalable Systems with MPI. Clemens Grelck, University of Amsterdam

Programming Scalable Systems with MPI. Clemens Grelck, University of Amsterdam Clemens Grelck University of Amsterdam UvA / SurfSARA High Performance Computing and Big Data Course June 2014 Parallel Programming with Compiler Directives: OpenMP Message Passing Gentle Introduction

More information

Message Passing Interface

Message Passing Interface Message Passing Interface by Kuan Lu 03.07.2012 Scientific researcher at Georg-August-Universität Göttingen and Gesellschaft für wissenschaftliche Datenverarbeitung mbh Göttingen Am Faßberg, 37077 Göttingen,

More information

Distributed Systems + Middleware Advanced Message Passing with MPI

Distributed Systems + Middleware Advanced Message Passing with MPI Distributed Systems + Middleware Advanced Message Passing with MPI Gianpaolo Cugola Dipartimento di Elettronica e Informazione Politecnico, Italy cugola@elet.polimi.it http://home.dei.polimi.it/cugola

More information

Introduction to MPI. Ricardo Fonseca. https://sites.google.com/view/rafonseca2017/

Introduction to MPI. Ricardo Fonseca. https://sites.google.com/view/rafonseca2017/ Introduction to MPI Ricardo Fonseca https://sites.google.com/view/rafonseca2017/ Outline Distributed Memory Programming (MPI) Message Passing Model Initializing and terminating programs Point to point

More information

Lecture 6: Parallel Matrix Algorithms (part 3)

Lecture 6: Parallel Matrix Algorithms (part 3) Lecture 6: Parallel Matrix Algorithms (part 3) 1 A Simple Parallel Dense Matrix-Matrix Multiplication Let A = [a ij ] n n and B = [b ij ] n n be n n matrices. Compute C = AB Computational complexity of

More information

CS 351 Week The C Programming Language, Dennis M Ritchie, Kernighan, Brain.W

CS 351 Week The C Programming Language, Dennis M Ritchie, Kernighan, Brain.W CS 351 Week 6 Reading: 1. The C Programming Language, Dennis M Ritchie, Kernighan, Brain.W Objectives: 1. An Introduction to Message Passing Model 2. To learn about Message Passing Libraries Concepts:

More information

The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs

The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs 1 The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) s http://mpi-forum.org https://www.open-mpi.org/ Mike Bailey mjb@cs.oregonstate.edu Oregon State University mpi.pptx

More information

CS 426. Building and Running a Parallel Application

CS 426. Building and Running a Parallel Application CS 426 Building and Running a Parallel Application 1 Task/Channel Model Design Efficient Parallel Programs (or Algorithms) Mainly for distributed memory systems (e.g. Clusters) Break Parallel Computations

More information

First day. Basics of parallel programming. RIKEN CCS HPC Summer School Hiroya Matsuba, RIKEN CCS

First day. Basics of parallel programming. RIKEN CCS HPC Summer School Hiroya Matsuba, RIKEN CCS First day Basics of parallel programming RIKEN CCS HPC Summer School Hiroya Matsuba, RIKEN CCS Today s schedule: Basics of parallel programming 7/22 AM: Lecture Goals Understand the design of typical parallel

More information

Anomalies. The following issues might make the performance of a parallel program look different than it its:

Anomalies. The following issues might make the performance of a parallel program look different than it its: Anomalies The following issues might make the performance of a parallel program look different than it its: When running a program in parallel on many processors, each processor has its own cache, so the

More information

More about MPI programming. More about MPI programming p. 1

More about MPI programming. More about MPI programming p. 1 More about MPI programming More about MPI programming p. 1 Some recaps (1) One way of categorizing parallel computers is by looking at the memory configuration: In shared-memory systems, the CPUs share

More information

Introduction to MPI. May 20, Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign

Introduction to MPI. May 20, Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign Introduction to MPI May 20, 2013 Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign Top500.org PERFORMANCE DEVELOPMENT 1 Eflop/s 162 Pflop/s PROJECTED 100 Pflop/s

More information

CS4961 Parallel Programming. Lecture 18: Introduction to Message Passing 11/3/10. Final Project Purpose: Mary Hall November 2, 2010.

CS4961 Parallel Programming. Lecture 18: Introduction to Message Passing 11/3/10. Final Project Purpose: Mary Hall November 2, 2010. Parallel Programming Lecture 18: Introduction to Message Passing Mary Hall November 2, 2010 Final Project Purpose: - A chance to dig in deeper into a parallel programming model and explore concepts. -

More information

Lecture 6: Message Passing Interface

Lecture 6: Message Passing Interface Lecture 6: Message Passing Interface Introduction The basics of MPI Some simple problems More advanced functions of MPI A few more examples CA463D Lecture Notes (Martin Crane 2013) 50 When is Parallel

More information

Outline. Overview Theoretical background Parallel computing systems Parallel programming models MPI/OpenMP examples

Outline. Overview Theoretical background Parallel computing systems Parallel programming models MPI/OpenMP examples Outline Overview Theoretical background Parallel computing systems Parallel programming models MPI/OpenMP examples OVERVIEW y What is Parallel Computing? Parallel computing: use of multiple processors

More information

Introduction to the Message Passing Interface (MPI)

Introduction to the Message Passing Interface (MPI) Introduction to the Message Passing Interface (MPI) CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Introduction to the Message Passing Interface (MPI) Spring 2018

More information

MPI 5. CSCI 4850/5850 High-Performance Computing Spring 2018

MPI 5. CSCI 4850/5850 High-Performance Computing Spring 2018 MPI 5 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives

More information

ME964 High Performance Computing for Engineering Applications

ME964 High Performance Computing for Engineering Applications ME964 High Performance Computing for Engineering Applications Parallel Computing with MPI Building/Debugging MPI Executables MPI Send/Receive Collective Communications with MPI April 10, 2012 Dan Negrut,

More information

Programming Scalable Systems with MPI. UvA / SURFsara High Performance Computing and Big Data. Clemens Grelck, University of Amsterdam

Programming Scalable Systems with MPI. UvA / SURFsara High Performance Computing and Big Data. Clemens Grelck, University of Amsterdam Clemens Grelck University of Amsterdam UvA / SURFsara High Performance Computing and Big Data Message Passing as a Programming Paradigm Gentle Introduction to MPI Point-to-point Communication Message Passing

More information

Outline. Communication modes MPI Message Passing Interface Standard

Outline. Communication modes MPI Message Passing Interface Standard MPI THOAI NAM Outline Communication modes MPI Message Passing Interface Standard TERMs (1) Blocking If return from the procedure indicates the user is allowed to reuse resources specified in the call Non-blocking

More information

Faculty of Electrical and Computer Engineering Department of Electrical and Computer Engineering Program: Computer Engineering

Faculty of Electrical and Computer Engineering Department of Electrical and Computer Engineering Program: Computer Engineering Faculty of Electrical and Computer Engineering Department of Electrical and Computer Engineering Program: Computer Engineering Course Number EE 8218 011 Section Number 01 Course Title Parallel Computing

More information

6.189 IAP Lecture 5. Parallel Programming Concepts. Dr. Rodric Rabbah, IBM IAP 2007 MIT

6.189 IAP Lecture 5. Parallel Programming Concepts. Dr. Rodric Rabbah, IBM IAP 2007 MIT 6.189 IAP 2007 Lecture 5 Parallel Programming Concepts 1 6.189 IAP 2007 MIT Recap Two primary patterns of multicore architecture design Shared memory Ex: Intel Core 2 Duo/Quad One copy of data shared among

More information

Introduction to MPI-2 (Message-Passing Interface)

Introduction to MPI-2 (Message-Passing Interface) Introduction to MPI-2 (Message-Passing Interface) What are the major new features in MPI-2? Parallel I/O Remote Memory Operations Dynamic Process Management Support for Multithreading Parallel I/O Includes

More information

MMPI: Asynchronous Message Management for the. Message-Passing Interface. Harold Carter Edwards. The University of Texas at Austin

MMPI: Asynchronous Message Management for the. Message-Passing Interface. Harold Carter Edwards. The University of Texas at Austin MMPI: Asynchronous Message Management for the Message-Passing Interface Harold Carter Edwards Texas Institute for Computational and Applied Mathematics The University of Texas at Austin Austin, Texas,

More information

Assignment 3 Key CSCI 351 PARALLEL PROGRAMMING FALL, Q1. Calculate log n, log n and log n for the following: Answer: Q2. mpi_trap_tree.

Assignment 3 Key CSCI 351 PARALLEL PROGRAMMING FALL, Q1. Calculate log n, log n and log n for the following: Answer: Q2. mpi_trap_tree. CSCI 351 PARALLEL PROGRAMMING FALL, 2015 Assignment 3 Key Q1. Calculate log n, log n and log n for the following: a. n=3 b. n=13 c. n=32 d. n=123 e. n=321 Answer: Q2. mpi_trap_tree.c The mpi_trap_time.c

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms http://sudalab.is.s.u-tokyo.ac.jp/~reiji/pna16/ [ 4 ] Scheduling Theory Parallel Numerical Algorithms / IST / UTokyo 1 PNA16 Lecture Plan General Topics 1. Architecture and

More information

Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 9

Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 9 Chapter 9 Document Classification Document Classification Problem Search directories, subdirectories for documents (look for.html,.txt,.tex, etc.) Using a dictionary of key words, create a profile vector

More information

MPI. (message passing, MIMD)

MPI. (message passing, MIMD) MPI (message passing, MIMD) What is MPI? a message-passing library specification extension of C/C++ (and Fortran) message passing for distributed memory parallel programming Features of MPI Point-to-point

More information

MPI 3. CSCI 4850/5850 High-Performance Computing Spring 2018

MPI 3. CSCI 4850/5850 High-Performance Computing Spring 2018 MPI 3 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives

More information

Parallel Computing Paradigms

Parallel Computing Paradigms Parallel Computing Paradigms Message Passing João Luís Ferreira Sobral Departamento do Informática Universidade do Minho 31 October 2017 Communication paradigms for distributed memory Message passing is

More information

Simple examples how to run MPI program via PBS on Taurus HPC

Simple examples how to run MPI program via PBS on Taurus HPC Simple examples how to run MPI program via PBS on Taurus HPC MPI setup There's a number of MPI implementations install on the cluster. You can list them all issuing the following command: module avail/load/list/unload

More information

Performance Analysis of Parallel Applications Using LTTng & Trace Compass

Performance Analysis of Parallel Applications Using LTTng & Trace Compass Performance Analysis of Parallel Applications Using LTTng & Trace Compass Naser Ezzati DORSAL LAB Progress Report Meeting Polytechnique Montreal Dec 2017 What is MPI? Message Passing Interface (MPI) Industry-wide

More information

MPI introduction - exercises -

MPI introduction - exercises - MPI introduction - exercises - Paolo Ramieri, Maurizio Cremonesi May 2016 Startup notes Access the server and go on scratch partition: ssh a08tra49@login.galileo.cineca.it cd $CINECA_SCRATCH Create a job

More information

Parallel Computing and the MPI environment

Parallel Computing and the MPI environment Parallel Computing and the MPI environment Claudio Chiaruttini Dipartimento di Matematica e Informatica Centro Interdipartimentale per le Scienze Computazionali (CISC) Università di Trieste http://www.dmi.units.it/~chiarutt/didattica/parallela

More information

Message Passing Interface. George Bosilca

Message Passing Interface. George Bosilca Message Passing Interface George Bosilca bosilca@icl.utk.edu Message Passing Interface Standard http://www.mpi-forum.org Current version: 3.1 All parallelism is explicit: the programmer is responsible

More information

IPM Workshop on High Performance Computing (HPC08) IPM School of Physics Workshop on High Perfomance Computing/HPC08

IPM Workshop on High Performance Computing (HPC08) IPM School of Physics Workshop on High Perfomance Computing/HPC08 IPM School of Physics Workshop on High Perfomance Computing/HPC08 16-21 February 2008 MPI tutorial Luca Heltai Stefano Cozzini Democritos/INFM + SISSA 1 When

More information

Debugging on Blue Waters

Debugging on Blue Waters Debugging on Blue Waters Debugging tools and techniques for Blue Waters are described here with example sessions, output, and pointers to small test codes. For tutorial purposes, this material will work

More information

Introduction to Parallel Programming

Introduction to Parallel Programming Introduction to Parallel Programming Linda Woodard CAC 19 May 2010 Introduction to Parallel Computing on Ranger 5/18/2010 www.cac.cornell.edu 1 y What is Parallel Programming? Using more than one processor

More information