Distributed and Parallel Systems Group University of Innsbruck. Simone Pellegrini, Radu Prodan and Thomas Fahringer

Similar documents
C++ for numerical computing - part 2

A Message Passing Standard for MPP and Workstations

Introduction to the Message Passing Interface (MPI)

A Brief Introduction to the Boost MPI Library. Patrick Mann, Director of Operations May 9, 2018

CS 426. Building and Running a Parallel Application

Message Passing Interface

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

Short Notes of CS201

PCAP Assignment I. 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail.

CS201 - Introduction to Programming Glossary By

MPI 3. CSCI 4850/5850 High-Performance Computing Spring 2018

GridKa School 2013: Effective Analysis C++ Templates

G Programming Languages - Fall 2012

High Performance Computing Course Notes Message Passing Programming III

Slides prepared by : Farzana Rahman 1

MPI. (message passing, MIMD)

Intermediate MPI (Message-Passing Interface) 1/11

Non-Blocking Communications

Buffering in MPI communications

MPI 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Intermediate MPI (Message-Passing Interface) 1/11

High Performance Computing Course Notes Message Passing Programming III

Introduction to MPI. SHARCNET MPI Lecture Series: Part I of II. Paul Preney, OCT, M.Sc., B.Ed., B.Sc.

Introduction to MPI: Part II

Parallel Computing Paradigms

Basic MPI Communications. Basic MPI Communications (cont d)

Department of Informatics V. HPC-Lab. Session 4: MPI, CG M. Bader, A. Breuer. Alex Breuer

Point-to-Point Communication. Reference:

CSE 613: Parallel Programming. Lecture 21 ( The Message Passing Interface )

A message contains a number of elements of some particular datatype. MPI datatypes:

Non-Blocking Communications

Reusing this material

A Message Passing Standard for MPP and Workstations. Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W.

COSC 6374 Parallel Computation

The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs

Advanced Parallel Programming

Reusing this material

Agenda. MPI Application Example. Praktikum: Verteiltes Rechnen und Parallelprogrammierung Introduction to MPI. 1) Recap: MPI. 2) 2.

Programming Scalable Systems with MPI. Clemens Grelck, University of Amsterdam

Message Passing Programming. Modes, Tags and Communicators

The MPI Message-passing Standard Practical use and implementation (II) SPD Course 27/02/2017 Massimo Coppola

G Programming Languages Spring 2010 Lecture 4. Robert Grimm, New York University

High-Performance Computing: MPI (ctd)

MPI Message Passing Interface

Recap of Parallelism & MPI

MPI 5. CSCI 4850/5850 High-Performance Computing Spring 2018

Lecture on pointers, references, and arrays and vectors

Document Classification

Programming Scalable Systems with MPI. UvA / SURFsara High Performance Computing and Big Data. Clemens Grelck, University of Amsterdam

The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing

Introduction to parallel computing concepts and technics

Building Library Components That Can Use Any MPI Implementation

15-440: Recitation 8

CSE 160 Lecture 18. Message Passing

MPI. What to Learn This Week? MPI Program Structure. What is MPI? This week, we will learn the basics of MPI programming.

Advanced Parallel Programming

USER-DEFINED DATATYPES

Introduction to MPI. Ekpe Okorafor. School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014

C++ (Non for C Programmer) (BT307) 40 Hours

Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS. Teacher: Jan Kwiatkowski, Office 201/15, D-2

The MPI Message-passing Standard Practical use and implementation (VI) SPD Course 08/03/2017 Massimo Coppola

Parallel Programming with MPI: Day 1

Lecture 6: Message Passing Interface

Topics. bool and string types input/output library functions comments memory allocation templates classes

Stream Computing using Brook+

Lecture 7: Distributed memory

mpi4py HPC Python R. Todd Evans January 23, 2015

Contract Programming For C++0x

CSE 160 Lecture 15. Message Passing

Leveraging C++ Meta-programming Capabilities to Simplify the Message Passing Programming Model

High performance computing. Message Passing Interface

Introduction to MPI. HY555 Parallel Systems and Grids Fall 2003

Programming with MPI. Pedro Velho

Outline. 1 Function calls and parameter passing. 2 Pointers, arrays, and references. 5 Declarations, scope, and lifetimes 6 I/O

ENERGY 211 / CME 211. Functions

QUIZ. What is wrong with this code that uses default arguments?

Programming with MPI on GridRS. Dr. Márcio Castro e Dr. Pedro Velho

Message Passing Interface

CS4961 Parallel Programming. Lecture 19: Message Passing, cont. 11/5/10. Programming Assignment #3: Simple CUDA Due Thursday, November 18, 11:59 PM

Best Practice Guide for Writing MPI + OmpSs Interoperable Programs. Version 1.0, 3 rd April 2017

MPI and comparison of models Lecture 23, cs262a. Ion Stoica & Ali Ghodsi UC Berkeley April 16, 2018

Message Passing Programming. Modes, Tags and Communicators

ECE 574 Cluster Computing Lecture 13

Mellanox Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) API Guide. Version 1.0

Praktikum: Verteiltes Rechnen und Parallelprogrammierung Introduction to MPI

Parallel Programming in C with MPI and OpenMP

More advanced MPI and mixed programming topics

MATH 676. Finite element methods in scientific computing

Acknowledgments. Programming with MPI Basic send and receive. A Minimal MPI Program (C) Contents. Type to enter text

CSCI-243 Exam 1 Review February 22, 2015 Presented by the RIT Computer Science Community

CS4961 Parallel Programming. Lecture 16: Introduction to Message Passing 11/3/11. Administrative. Mary Hall November 3, 2011.

Programming with MPI Basic send and receive

MPI MPI. Linux. Linux. Message Passing Interface. Message Passing Interface. August 14, August 14, 2007 MPICH. MPI MPI Send Recv MPI

Message Passing Interface. George Bosilca

Distributed Memory Programming with Message-Passing

CS 470 Spring Mike Lam, Professor. Advanced MPI Topics

Message Passing Interface. most of the slides taken from Hanjun Kim

Standard MPI - Message Passing Interface

Practical Scientific Computing: Performanceoptimized

Transcription:

Distributed and Parallel Systems Group University of Innsbruck Simone Pellegrini, Radu Prodan and Thomas Fahringer

MPI is a relatively old standard MPI 1.0 was introduced in 1994 Designed upon an abstraction level which is the lowest common denominator across C,C++ and Fortran i.e. functions calls, pointers, data types, etc MPI standard evolved over 17 years by solely adding new primitives MPI 3.0 will contain several hundreds routines Abstraction level stays the same

MPI did not keep the pace with the evolution of languages in the last 17 years Objects Oriented Programming in Fortran 2000 C++ Templates and template meta-programming Lambdas and variadic templates in C++11 HPC community is also slowly adopting C++

MPI committee deprecated the C++ binding in MPI 2.3 They will be removed in MPI 3.0 Why? Poor overall integration in the C++ environment (e.g. with the standard template library STL) Level of abstraction far lower than typical C++ libraries C bindings are preferred even in C++ applications Weaken type safety when mapping common C++ constructs to MPI

Overview of existing MPI C++ bindings Boost.MPI Introduction of MPP (MPI C++) interface MPI endpoints as streams Integration with user and legacy data types Performance evaluation Interface overhead QUAD_MPI real case scenario

if ( rank==0 ) { MPI_Send ((const int[1]) { 2 }, 1, MPI_INT, 1, 0, MPI_COMM_WORLD ); std::array<float,2>& val = { 3.14f, 2.95f }; MPI_Send (&val.front(), val.size(), MPI_FLOAT, 1, 0, MPI_COMM_WORLD); } else if (rank == 1) { int n; MPI_Recv (&n, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE); } std::vector<float> values(n); MPI_Recv (&values.front(), n, MPI_FLOAT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);

60% of MPI routines parameters can be either Inferred by the compiler (type and size of sent/received data) Defaulted (tag = 0, comm = MPI_COMM_WORLD, status = MPI_STATUS_IGNORE) Send/Recv both requires a void* which Weaken type safety Transfer constant values (a.k.a. R-Values) inefficient Unless c99 s compounds literals are used as in line 1 Miss an integration with C++ references

Overview of existing MPI C++ bindings Boost.MPI Introduction of MPP (MPI C++) interface MPI endpoints as streams Integration with user and legacy data types Performance evaluation Estimate the interface overhead QUAD_MPI real case scenario

Rank Tag Body 1 if ( world.rank() == 0 ) { 2 world.send( 1, 1, 2 ); 3 world.send( 1, 0, { 3.14f, 2.95f } ); 4 } else if (world.rank() == 1) { 5 int n; 6 world.recv(0, 1, n); 7 std::vector<float>& values(n); 8 world.recv(0, 0, values); 9 } Communicator Support for references

Advantages Much easier to read (less syntax) Type and size of the transmitted data is inferred Includes support for C++ references Disadvantages No default parameters MPI endpoints not handled as streams using the insertion << and extraction >> C++ stream operations Communication statements cannot be combined, e.g. cout << Hello << world Use of software serialization for transmitting user data types with poor performance No integration with MPI_Datatypes

Overview of existing MPI C++ bindings Boost.MPI Introduction of MPP (MPI C++) interface MPI endpoints as streams Integration with user and legacy data types Performance evaluation Estimate the interface overhead QUAD_MPI real case scenario

Interface to MPI smoothly integrated into the C++ programming language Combines concepts of object-oriented programming and templates Key features: Header only interface Reduce the number of parameters MPI routines and infer as much as possible at compile-time Improved integration with user and legacy data types with safer type checking Use compiler optimizations to reduce overhead e.g. function inlining, constant propagation Focus on point-to-point communications

1 using namespace mpi; 2 if ( comm::world.rank() == 0 ) { 3 comm::world(1) << 2 << { 3.14f, 2.95f }; 4 } else if ( mpi::world.rank() == 1 ) { 5 int n; 6 comm::world(0) >> n; 7 std::vector<float>& values(n); 8 comm::world(0) >> values; 9 } Communicator Rank Support for references

Concept of endpoint mpi::endpoint comm::operator() (int rank); Can be used as input/output stream using the insertion << and extraction >> operators (lines 3-4) Infer type and size of the transmitted data and perform type checking Message tag is optional By default messages are sent with a tag 0 (lines 3,9) User can provide a different value, if needed (lines 4,7)

Future pattern utilized for simplifying the use of asynchronous communications 1 float real; 2 mpi::request<float>&& req = mpi::comm::world(mpi::any) > real; 3 //... do something else... 4 use( req.get() ); T& request::get() blocks for the communication to complete and directly returns a reference to received value

A good MPI interface must be capable of dealing with nonprimitive data types Boost.MPI Generic but inefficient approach based on software serialization MPI_Datatypes: MPI s mechanism to deal with user-defined types Very efficient, but cumbersome to use, unknown to most programmers Programmers need to manually specify the starting address of the object in memory and its layout The number of elements that compose the type For each element recursively, the displacement from the starting address of the container type and their MPI_Datatype

We support sending user/legacy data types in MPP via the type traits design pattern For any new data type the programmer specifies via a C++ template specialization class: How to determine its starting memory address The number of elements composing the type Type of elements

template <class T> struct mpi_type_traits<std::vector<t>> { static inline const T* get_addr( const std::vector<t>& vec ) { { return mpi_type_traits<t>::get_addr(vec.front()); } static inline const size_t get_size( const std::vector<t>& vec ) { return vec.size(); } static inline MPI_Datatype get_type( const std::vector<t>& ) { return mpi_type_traits<t>::get_type( T() ); } }; vector<int> v = { 2, 3, 5, 7, 11, 13, 17, 19 }; comm::world(1) << v typedef mpi_type_traits<vector<int>> vect_traits; MPI_Ssend( vect_traits::get_addr(v), vect_traits::get_size(v), vect_traits::get_type(v),... );

Overview of existing MPI C++ bindings Boost.MPI Introduction of our MPP (MPI C++) interface Endpoints as streams Integration with user and legacy data types Performance evaluation Estimate the interface overhead QUAD_MPI real case scenario

3 interfaces Standard MPI C bindings Boost.MPI MPP The MPI library utilized is the same, i.e. Open MPI 1.4.2 2 micro-benchmarks to measure Throughput User data type overhead

Simple ping-pong application using shared memory communication --mca btl sm flag Minimize communication Emphasize library overhead

Ping-pong application Processes allocated on different nodes Infiniband interconnection

Integral approximation using the quadrature rule Rank 0 assigns a sub-range [A,B) of the problem to other P-1 processes P processes compute in parallel the partial results for [A,B) Result is aggregated via reduction MPP reduces the input code by 30% in terms of number of characters MPP has better performance due to elimination of temporary variable assignments

Lightweight C++ interface to MPI Higher abstraction level C++ stream operators for send/receive operations Infer data type and size with improved type checking Integrations with MPI_Datatypes instead of serialization with improved performance Missing features Integration of collective communications Simplify utilization of other cumbersome features of the MPI library (e.g. dynamic processes, topologies)

Questions?