More MPI. Bryan Mills, PhD. Spring 2017

Similar documents
Chapter 3. Distributed Memory Programming with MPI

Peter Pacheco. Chapter 3. Distributed Memory Programming with MPI. Copyright 2010, Elsevier Inc. All rights Reserved

Distributed Memory Programming with MPI. Copyright 2010, Elsevier Inc. All rights Reserved

Parallel Programming

Buffering in MPI communications

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

Collective Communication in MPI and Advanced Features

High Performance Computing

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

Part - II. Message Passing Interface. Dheeraj Bhardwaj

Intermediate MPI (Message-Passing Interface) 1/11

Intermediate MPI (Message-Passing Interface) 1/11

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

CSE 613: Parallel Programming. Lecture 21 ( The Message Passing Interface )

Reusing this material

Basic MPI Communications. Basic MPI Communications (cont d)

For developers. If you do need to have all processes write e.g. debug messages, you d then use channel 12 (see below).

ECE 587 Hardware/Software Co-Design Lecture 09 Concurrency in Practice Message Passing

High Performance Computing Course Notes Message Passing Programming III

CS 470 Spring Mike Lam, Professor. Advanced MPI Topics

Outline. Communication modes MPI Message Passing Interface Standard

Cluster Computing MPI. Industrial Standard Message Passing

Document Classification

CS 179: GPU Programming. Lecture 14: Inter-process Communication

Parallel Programming

Parallel programming MPI

Department of Informatics V. HPC-Lab. Session 4: MPI, CG M. Bader, A. Breuer. Alex Breuer

In the simplest sense, parallel computing is the simultaneous use of multiple computing resources to solve a problem.

High Performance Computing Course Notes Message Passing Programming III

Recap of Parallelism & MPI

NAME MPI_Address - Gets the address of a location in memory. INPUT PARAMETERS location - location in caller memory (choice)

DPHPC Recitation Session 2 Advanced MPI Concepts

Non-Blocking Communications

Introduction to MPI. HY555 Parallel Systems and Grids Fall 2003

Standard MPI - Message Passing Interface

HPC Parallel Programing Multi-node Computation with MPI - I

COSC 6374 Parallel Computation

Document Classification Problem

Topics. Lecture 6. Point-to-point Communication. Point-to-point Communication. Broadcast. Basic Point-to-point communication. MPI Programming (III)

MPI. (message passing, MIMD)

The MPI Message-passing Standard Practical use and implementation (III) SPD Course 03/10/2010 Massimo Coppola

High-Performance Computing: MPI (ctd)

MPI. What to Learn This Week? MPI Program Structure. What is MPI? This week, we will learn the basics of MPI programming.

Message Passing with MPI

High performance computing. Message Passing Interface

MPI 5. CSCI 4850/5850 High-Performance Computing Spring 2018

Topics. Lecture 7. Review. Other MPI collective functions. Collective Communication (cont d) MPI Programming (III)

Parallel Programming

Non-Blocking Communications

Parallel Programming with MPI and OpenMP

Distributed Memory Programming with MPI

Intermediate MPI features

Introduction to Parallel. Programming

Introduction to MPI: Part II

Message Passing Interface. George Bosilca

COMP 322: Fundamentals of Parallel Programming

The MPI Message-passing Standard Practical use and implementation (II) SPD Course 27/02/2017 Massimo Coppola

COSC 6374 Parallel Computation. Introduction to MPI V Derived Data Types. Edgar Gabriel Fall Derived Datatypes

Point-to-Point Communication. Reference:

Introduction to MPI. Shaohao Chen Research Computing Services Information Services and Technology Boston University

Data parallelism. [ any app performing the *same* operation across a data stream ]

Outline. Communication modes MPI Message Passing Interface Standard. Khoa Coâng Ngheä Thoâng Tin Ñaïi Hoïc Baùch Khoa Tp.HCM

Message Passing Interface

Programming Scalable Systems with MPI. Clemens Grelck, University of Amsterdam

Framework of an MPI Program

Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 9

Message Passing Interface. most of the slides taken from Hanjun Kim

All-Pairs Shortest Paths - Floyd s Algorithm

Scientific Computing

What is Point-to-Point Comm.?

Introduction to TDDC78 Lab Series. Lu Li Linköping University Parts of Slides developed by Usman Dastgeer

MPI Workshop - III. Research Staff Cartesian Topologies in MPI and Passing Structures in MPI Week 3 of 3

Parallel Programming. Functional Decomposition (Document Classification)

MPI Message Passing Interface

Practical Scientific Computing: Performanceoptimized

Introduction to MPI Programming Part 2

The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs

Introduction to MPI. SuperComputing Applications and Innovation Department 1 / 143

Parallel Programming in C with MPI and OpenMP

MPI Message Passing Interface. Source:

Derived Datatypes. MPI - Derived Data Structures. MPI Datatypes Procedure Datatype Construction Contiguous Datatype*

COSC 6374 Parallel Computation. Derived Data Types in MPI. Edgar Gabriel. Spring Derived Datatypes

Parallel Short Course. Distributed memory machines

A Message Passing Standard for MPP and Workstations

Programming SoHPC Course June-July 2015 Vladimir Subotic MPI - Message Passing Interface

Lecture 16. Parallel Sorting MPI Datatypes

IPM Workshop on High Performance Computing (HPC08) IPM School of Physics Workshop on High Perfomance Computing/HPC08

Introduction to parallel computing concepts and technics

Practical Scientific Computing: Performanceoptimized

Parallel Programming in C with MPI and OpenMP

int MPI_Cart_shift ( MPI_Comm comm, int direction, int displ, int *source, int *dest )

CS 6230: High-Performance Computing and Parallelization Introduction to MPI

MA471. Lecture 5. Collective MPI Communication

CPS 303 High Performance Computing

MPI Runtime Error Detection with MUST

Introduction to the Message Passing Interface (MPI)

MPI, Part 3. Scientific Computing Course, Part 3

PARALLEL AND DISTRIBUTED COMPUTING

Introduction to MPI. May 20, Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign

University of Notre Dame

Transcription:

More MPI Bryan Mills, PhD Spring 2017

MPI So Far Communicators Blocking Point- to- Point MPI_Send MPI_Recv CollecEve CommunicaEons MPI_Bcast MPI_Barrier MPI_Reduce MPI_Allreduce

Non-blocking Send int MPI_Isend( void* msg_buf /* input */, int msg_size /* input */, MPI_Datatype msg_type /* input */, int dest /* input */, int tag /* input */, MPI_Comm comm /* input */, MPI_Request &request /* output */) IdenEcal to MPI_Send but added argument for geing handle to MPI_Request struct.

Non-blocking Receive int MPI_Irecv( void* msg_buf /* output */, int msg_size /* input */, MPI_Datatype msg_type /* input */, int source /* input */, int tag /* input */, MPI_Comm comm /* input */, MPI_Request &request /* output */) IdenEcal to MPI_Read but replaces MPI_Status argument with MPI_Request struct. Status is now retrieved in the MPI_Wait.

Blocking Wait int MPI_Wait( MPI_Request request /* input */, MPI_Status &status /* output */) Same status that would have been returned from MPI_Recv, not as useful in MPI_Send. int MPI_Waitall( int count /* input */, MPI_Request** requests /* input */, MPI_Status** &statuses /* output */)

Non-blocking Test int MPI_Test( MPI_Request request /* input */, int &flag /* output */ MPI_Status &status /* output */) Flag returns 1 if completed, otherwise 0. MPI_Testall(incount, requests, flag, statuses) MPI_Testsome(incount, requests, outcount, indices, statuses) MPI_Testany(incount, requests, index, flag, status)

Serial implementation of vector addition

Parallel implementation of vector addition

Scatter MPI_Scatter can be used in a function that reads in an entire vector on process 0 but only sends the needed components to each of the other processes.

Reading and distributing a vector

Gather Collect all of the components of the vector onto process 0, and then process 0 can process all of the components.

Print a distributed vector (1)

Print a distributed vector (2)

Allgather Concatenates the contents of each process send_buf_p and stores this in each process recv_buf_p. As usual, recv_count is the amount of data being received from each process.

Matrix-vector multiplication i- th component of y Dot product of the ith row of A with x.

Matrix-vector multiplication

Multiply a matrix by a vector Serial pseudo- code

C style arrays stored as

Serial matrix-vector multiplication

An MPI matrix-vector multiplication function (1)

An MPI matrix-vector multiplication function (2)

PERFORMANCE EVALUATION

Elapsed parallel time Returns the number of seconds that have elapsed since some time in the past.

Run-times of serial and parallel matrix-vector multiplication (Seconds)

Speedups of Parallel Matrix-Vector Multiplication

Efficiencies of Parallel Matrix-Vector Multiplication

Scalability A program is scalable if the problem size can be increased at a rate so that the efficiency doesn t decrease as the number of processes increase. Programs that can maintain a constant efficiency without increasing the problem size are sometimes said to be strongly scalable. Programs that can maintain a constant efficiency if the problem size increases at the same rate as the number of processes are sometimes said to be weakly scalable.

MPI Derived Datatypes Allows transmission of more complex data types. In C this is struct. struct Point { float x; float y; int color[3]; };

MPI Derived Datatypes To create such a type you will need to create a custom MPI data type int MPI_Type_create_struct( int count, int array_of_blocklengths[], MPI_Aint array_of_displacements[], MPI_Datatype array_of_types[], MPI_Datatype *newtype)

MPI Derived Datatypes struct Point { float x; float y; int color[3]; }; Count How many elements in struct Example: 3 Block lengths How many elements in elements ex: {1, 1, 3} Displacements Where are these elements in relaeon to the base pointer value. Ex: {0, 4, 8} Data Types Mapping elements to base MPI types. Ex: {MPI_DOUBLE, MPI_DOUBLE, MPI_INT}

MPI Derived Datatypes struct Point points[n]; MPI_Datatype PointStruct; MPI_Datatype types[3] = {MPI_DOUBLE, MPI_DOUBLE, MPI_INT} Int blocklens[3] = {1, 1, 3}; MPI_Aint displacements[3]; MPI_Get_address(points, displacements); MPI_Get_address(&points [0].y, displacements+1); MPI_Get_address(&points [0].color, displacements+2); MPI_Aint base; base = displacements[0]; displacements[0] = 0; displacements[1] = displacements[1] base; displacements[2] = displacements[2] base; MPI_Type_create_struct(3, blocklens, displacements, types, &PointStruct); MPI_Type_commit(&PointStruct);

Derived Datatypes - OpEmized Once you have created a data type one can smash it together (someemes). Tools to do that Pack/Unpack and create resized data type. int MPI_Type_create_resized( ); MPI_Datatype oldtype, MPI_Aint lb, MPI_Aint extent, MPI_Datatype *newtype

Graveyard

Different partitions of a 12- component vector among 3 processes

Partitioning options Block partitioning Assign blocks of consecutive components to each process. Cyclic partitioning Assign components in a round robin fashion. Block-cyclic partitioning Use a cyclic distribution of blocks of components.