Message Passing Interface (MPI)

Similar documents
Lecture 6: Message Passing Interface

Parallel Techniques. Embarrassingly Parallel Computations. Partitioning and Divide-and-Conquer Strategies

Embarrassingly Parallel Computations

L21: Putting it together: Tree Search (Ch. 6)!

High Performance Computing

SimMPI: An MPI-blockset for Simulink

Parallel Techniques. Embarrassingly Parallel Computations. Partitioning and Divide-and-Conquer Strategies. Load Balancing and Termination Detection

L20: Putting it together: Tree Search (Ch. 6)!

COT 6936: Topics in Algorithms! Giri Narasimhan. ECS 254A / EC 2443; Phone: x3748

Modelling and implementation of algorithms in applied mathematics using MPI

Parallel Programming In C With MPI And OpenMP By Quinn,Michael READ ONLINE

Monte Carlo Method on Parallel Computing. Jongsoon Kim

Let s say I give you a homework assignment today with 100 problems. Each problem takes 2 hours to solve. The homework is due tomorrow.

Scalable Dynamic Adaptive Simulations with ParFUM

1. Practice the use of the C ++ repetition constructs of for, while, and do-while. 2. Use computer-generated random numbers.

Introduction to MPI. EAS 520 High Performance Scientific Computing. University of Massachusetts Dartmouth. Spring 2014

CSE. Parallel Algorithms on a cluster of PCs. Ian Bush. Daresbury Laboratory (With thanks to Lorna Smith and Mark Bull at EPCC)

Research Incubator: Combinatorial Optimization. Dr. Lixin Tao December 9, 2003

EE/CSCI 451: Parallel and Distributed Computation

BIG CPU, BIG DATA. Solving the World s Toughest Computational Problems with Parallel Computing. Alan Kaminsky

Parallelism in Software

EE/CSCI 451: Parallel and Distributed Computation

Using the MATLAB Parallel Computing Toolbox on the UB CCR cluster

Practical High Performance Computing

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)

Chapter 3 Parallel Software

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing

Developing the TELEMAC system for HECToR (phase 2b & beyond) Zhi Shang

Simulation. Outline. Common Mistakes in Simulation (3 of 4) Common Mistakes in Simulation (2 of 4) Performance Modeling Lecture #8

MPI and comparison of models Lecture 23, cs262a. Ion Stoica & Ali Ghodsi UC Berkeley April 16, 2018

BIG CPU, BIG DATA. Solving the World s Toughest Computational Problems with Parallel Computing. Second Edition. Alan Kaminsky

Parallel Programming Using Basic MPI. Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center

Common Parallel Programming Paradigms

AMSC/CMSC 662 Computer Organization and Programming for Scientific Computing Fall 2011 Introduction to Parallel Programming Dianne P.

Chapter Four: Loops. Slides by Evan Gallagher. C++ for Everyone by Cay Horstmann Copyright 2012 by John Wiley & Sons. All rights reserved

Andrew Shitov. Using Perl Programming Challenges Solved with the Perl 6 Programming Language

Solution of Exercise Sheet 2

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013

Big Orange Bramble. August 09, 2016

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

Lecture 4: Locality and parallelism in simulation I

Parallel & Cluster Computing. cs 6260 professor: elise de doncker by: lina hussein

A Series of Lectures on Approximate Dynamic Programming

Parallel Programming Models. Parallel Programming Models. Threads Model. Implementations 3/24/2014. Shared Memory Model (without threads)

Random and Parallel Algorithms a journey from Monte Carlo to Las Vegas

Conic Sections: Parabolas

Introduction to parallel Computing

Outline. CSC 447: Parallel Programming for Multi- Core and Cluster Systems

Performance Comparison between Blocking and Non-Blocking Communications for a Three-Dimensional Poisson Problem

Monte Carlo Techniques. Professor Stephen Sekula Guest Lecture PHY 4321/7305 Sep. 3, 2014

CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp

Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation

BIG CPU, BIG DATA. Solving the World s Toughest Computational Problems with Parallel Computing Second Edition. Alan Kaminsky

DISTRIBUTED SYSTEMS. Instructor: Prasun Dewan (FB 150,

Parallelizing a Monte Carlo simulation of the Ising model in 3D

5.12 EXERCISES Exercises 263

Programming with MPI

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP

Design of Parallel Programs Algoritmi e Calcolo Parallelo. Daniele Loiacono

PARA++ : C++ Bindings for Message Passing Libraries

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control.

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions

MPI Programming. Henrik R. Nagel Scientific Computing IT Division

DISTRIBUTED HIGH-SPEED COMPUTING OF MULTIMEDIA DATA

Introduction to Parallel Programming

CPS343 Parallel and High Performance Computing Project 1 Spring 2018

Exercises: Message-Passing Programming

Exercises: Message-Passing Programming

Load Balancing and Data Migration in a Hybrid Computational Fluid Dynamics Application

High Performance Computing Implementation on a Risk Assessment Problem

Chapter 31 Multi-GPU Programming

COMP251: Algorithms and Data Structures. Jérôme Waldispühl School of Computer Science McGill University

Developing a Thin and High Performance Implementation of Message Passing Interface 1

SPECIAL TECHNIQUES-II

CS Understanding Parallel Computing

Performance improvements to peer-to-peer file transfers using network coding

Exercises: Message-Passing Programming

CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)

Overview: The OpenMP Programming Model

(Refer Slide Time 04:53)

High Performance Computing Prof. Matthew Jacob Department of Computer Science and Automation Indian Institute of Science, Bangalore

Supercomputing Engine for Mathematica. Supercomputing 2008 Austin, Texas

Outline: Embarrassingly Parallel Problems. Example#1: Computation of the Mandelbrot Set. Embarrassingly Parallel Problems. The Mandelbrot Set

Introduction to OpenMP. Lecture 2: OpenMP fundamentals

Area rectangles & parallelograms

CS 470 Spring Parallel Algorithm Development. (Foster's Methodology) Mike Lam, Professor

Parallelisation. Michael O Boyle. March 2014

MATLAB is a multi-paradigm numerical computing environment fourth-generation programming language. A proprietary programming language developed by

Professor Stephen Sekula

Parallel Program for Sorting NXN Matrix Using PVM (Parallel Virtual Machine)

MPI Mechanic. December Provided by ClusterWorld for Jeff Squyres cw.squyres.com.

Introduction to OpenMP

Supercomputing Engine for Mathematica. Supercomputing 2009 Portland, Oregon

CA463 Concurrent Programming

Dr Tay Seng Chuan Tel: Office: S16-02, Dean s s Office at Level 2 URL:

The Public Shared Objects Run-Time System

Mixed Mode MPI / OpenMP Programming

Monte Carlo Methods; Combinatorial Search

Statistical Pi Simulation by Java

Transcription:

What the course is: An introduction to parallel systems and their implementation using MPI A presentation of all the basic functions and types you are likely to need in MPI A collection of examples What the course is not: A complete list of all MPI functions and the context in which to use them A guarantee that you will never be faced with a problem requiring those functions An alternative to a book on MPI Dimitri Perrin, November 2007 Lecture 1, page 1 / 16

How the course will work: Five one-hour lectures Slides available on the web (if possible in advance of the lecture) http://computing.dcu.ie/~dperrin/teaching.html#mpi Course outline: Introduction When is a parallel implementation useful? The basics of MPI Some simple problems More advanced functions of MPI A few more examples Dimitri Perrin, November 2007 Lecture 1, page 2 / 16

Want more? Tutorials: A few MPI tutorials are listed at: http://computing.dcu.ie/~dperrin/teaching.html#mpi Source code: Commented code for all examples will also be available at the same address. Books: - Using MPI Parallel programming with the Message-Passing Interface, Gropp, Lusk and Skjellum, ISBN: 0-262-57134-X http://www.powells.com/biblio?isbn=026257134x - Using MPI-2 Advanced features of the Message-Passing Interface, Gropp, Lusk and Thakur, ISBN: 0-262-057133-1 http://www.powells.com/biblio?isbn=0262571331 Dimitri Perrin, November 2007 Lecture 1, page 3 / 16

Why/when would you use a parallel approach? Large problems Problems suitable for parallelisation, i.e. you know what speed-up to expect => you need to be able to recognise them Three types of problems are suitable: Parallel Problems Regular and Synchronous Problems Irregular and/or Asynchronous Problems Dimitri Perrin, November 2007 Lecture 1, page 4 / 16

Parallel problems: The problem can be broken down into subparts Each subpart is independent of the others No communication is required, except to split up the problem and combine the final results Linear speed-up can be expected Ex: Monte-Carlo simulations Dimitri Perrin, November 2007 Lecture 1, page 5 / 16

Regular and Synchronous Problems: Same instruction set (regular algorithm) applied to all data Synchronous communication (or close to): each processor finishes its task at the same time Local (neighbour to neighbour) and collective (combine final results) communication Speed-up based on the computation to communication ratio => if it is large, you can expect good speed-up for local communications and reasonable speed-up for non-local communications Ex: Fast Fourier transforms (synchronous), matrix-vector products, sorting (loosely synch.) Dimitri Perrin, November 2007 Lecture 1, page 6 / 16

Irregular and/or Asynchronous Problems: Irregular algorithm which cannot be implemented efficiently except with message passing and high communication overhead Communication is usually asynchronous and requires careful coding and load balancing Often dynamic repartitioning of data between processors is required Speed-up is difficult to predict; if the problem can be split up into regular and irregular parts, this makes things easier Ex: Melting ice problem (or any moving boundary simulation) Dimitri Perrin, November 2007 Lecture 1, page 7 / 16

A first example: matrix-vector product Dimitri Perrin, November 2007 Lecture 1, page 8 / 16

A parallel approach: Each element of vector c depends on vector b and only one line of matrix A Each element of vector c can be calculated independently from the others Communication is only needed to split up the problem and combine the final results => a linear speed-up can be expected, for large matrices Dimitri Perrin, November 2007 Lecture 1, page 9 / 16

Another example: Monte-Carlo calculation of Pi π = 3.14159 π = area of a circle of radius 1 π/4 fraction of the points within the circle quadrant The more points we have, the more accurate the value for π is Dimitri Perrin, November 2007 Lecture 1, page 10 / 16

A parallel approach: Each point is randomly placed within the square The position of each point is independent of the position of the others We can split up the problem by letting each node randomly place a given number of points Communication is only needed to specify the number of points and combine final results => a linear speed-up can be expected, allowing for a larger number of points and therefore a greater accuracy in the estimation of π. Dimitri Perrin, November 2007 Lecture 1, page 11 / 16

Something more real : chess software After each move, the chess software must find the best move within a set This set is large, but finite Each move from this set can be evaluated independently, and the set can be partitioned Communication is only needed to split up the problem and combine the final results => A linear speed-up can be expected => This means that, in a reasonable time, moves can be studied more thoroughly => This depth of evaluation is what makes the software more competitive Dimitri Perrin, November 2007 Lecture 1, page 12 / 16

Some background on MPI: Who? MPI forum (made up of Industry, Academia and Govt.) They established a standardised Message-Passing Interface (MPI-1) in 1994 Why? Intended as an interface to both C and FORTRAN C++ bindings in MPI-2. Some Java bindings exist but are not standard yet Aim was to provide a specification which can be implemented on any parallel computer or cluster; hence portability of code was a big aim Dimitri Perrin, November 2007 Lecture 1, page 13 / 16

Advantages of MPI: Portable, hence protection of software investment A standard, agreed by everybody Designed using optimal features of existing message-passing libraries Kitchen-sink functionality, very rich environment (129 functions) Implementations for F77, C and C++ are freely downloadable Dimitri Perrin, November 2007 Lecture 1, page 14 / 16

Disadvantages of MPI: Kitchen-sink functionality, very rich environment (129 functions). Hard to learn all (but this not necessary: a bare dozen are needed in most cases) Implementations on shared-memory machines is often quite poor, and does not suit the programming model Has rivals in other message-passing libraries (e.g. PVM) Dimitri Perrin, November 2007 Lecture 1, page 15 / 16

Preliminaries: MPI provides support for: Point-to-point & collective (i.e. group) communications Inquiry routines to query the environment (how many nodes are there, which node number am I, etc.) Constants and data-types We will start with the basics: initialising MPI, and using point-to-point communication. Dimitri Perrin, November 2007 Lecture 1, page 16 / 16