MPI versions. MPI History

Similar documents
MPI History. MPI versions MPI-2 MPICH2

Fakultät für Informatik. Professur für Rechnerarchitektur. Diploma thesis. Developing a VIA-RPI for LAM. Ralph Engler, Tobias Wenzel

Open MPI from a user perspective. George Bosilca ICL University of Tennessee

Parallel Programming with MPI on Clusters

A Component Architecture for LAM/MPI

Open MPI und ADCL. Kommunikationsbibliotheken für parallele, wissenschaftliche Anwendungen. Edgar Gabriel

Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand

MPICH on Clusters: Future Directions

Local Area Multicomputer (LAM -MPI)

Proactive Process-Level Live Migration in HPC Environments

Semantic and State: Fault Tolerant Application Design for a Fault Tolerant MPI

Linux Clusters for High- Performance Computing: An Introduction

Scalable, Fault-Tolerant Membership for MPI Tasks on HPC Systems

xsim The Extreme-Scale Simulator

Cycle Sharing Systems

MPI Mechanic. December Provided by ClusterWorld for Jeff Squyres cw.squyres.com.

SHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008

WhatÕs New in the Message-Passing Toolkit

Distributed Computing: PVM, MPI, and MOSIX. Multiple Processor Systems. Dr. Shaaban. Judd E.N. Jenne

Sami Saarinen Peter Towers. 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1

CUDA GPGPU Workshop 2012

High Performance Beowulf Cluster Environment User Manual

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Parallel & Cluster Computing. cs 6260 professor: elise de doncker by: lina hussein

MPICH2: A High-Performance, Portable Implementation of MPI

An introduction to checkpointing. for scientific applications

LAM/MPI Installation Guide Version The LAM/MPI Team Open Systems Lab

Presented By: Gregory M. Kurtzer HPC Systems Architect Lawrence Berkeley National Laboratory CONTAINERS IN HPC WITH SINGULARITY

Open MPI: A Flexible High Performance MPI

How to for compiling and running MPI Programs. Prepared by Kiriti Venkat

High Performance Computing Software Development Kit For Mac OS X In Depth Product Information

Scalability issues : HPC Applications & Performance Tools

Tools and Methodology for Ensuring HPC Programs Correctness and Performance. Beau Paisley

Tutorial on MPI: part I

Reduces latency and buffer overhead. Messaging occurs at a speed close to the processors being directly connected. Less error detection

Towards Exascale: Leveraging InfiniBand to accelerate the performance and scalability of Slurm jobstart.

MPICH User s Guide Version Mathematics and Computer Science Division Argonne National Laboratory

Parallel File Systems. John White Lawrence Berkeley National Lab

Introduction to GT3. Introduction to GT3. What is a Grid? A Story of Evolution. The Globus Project

Checkpoint/Restart System Services Interface (SSI) Modules for LAM/MPI API Version / SSI Version 1.0.0

Exascale Process Management Interface

IO virtualization. Michael Kagan Mellanox Technologies

Enhancing Checkpoint Performance with Staging IO & SSD

Memory Management Strategies for Data Serving with RDMA

CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart

ECE7995 (7) Parallel I/O

Comparing Ethernet & Soft RoCE over 1 Gigabit Ethernet

A brief introduction to MPICH. Daniele D Agostino IMATI-CNR

PM2: High Performance Communication Middleware for Heterogeneous Network Environments

Capriccio : Scalable Threads for Internet Services

Fault tolerance in Grid and Grid 5000

A Case for High Performance Computing with Virtual Machines

Designing a Cluster for a Small Research Group

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

Hungarian Supercomputing Grid 1

Directions in Workload Management

COSC 6385 Computer Architecture - Multi Processor Systems

The rcuda middleware and applications

Cluster Network Products

Comparative Performance Analysis of RDMA-Enhanced Ethernet

Cluster Computing. Cluster Architectures

MPICH-G2 performance evaluation on PC clusters

Managing MPICH-G2 Jobs with WebCom-G

An Introduction to GPFS

Real Parallel Computers

A GPFS Primer October 2005

IOS: A Middleware for Decentralized Distributed Computing

HPC Considerations for Scalable Multidiscipline CAE Applications on Conventional Linux Platforms. Author: Correspondence: ABSTRACT:

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe

InfoBrief. Platform ROCKS Enterprise Edition Dell Cluster Software Offering. Key Points

Moab Workload Manager on Cray XT3

Sharing High-Performance Devices Across Multiple Virtual Machines

MPI Optimizations via MXM and FCA for Maximum Performance on LS-DYNA

COSC 6374 Parallel Computation. Debugging MPI applications. Edgar Gabriel. Spring 2008

High Performance Computing Course Notes Course Administration

Debugging Programs Accelerated with Intel Xeon Phi Coprocessors

Introduction to Cluster Computing

TOSS - A RHEL-based Operating System for HPC Clusters

Agent Teamwork Research Assistant. Progress Report. Prepared by Solomon Lane

Cray RS Programming Environment

Microsoft Windows HPC Server 2008 R2 for the Cluster Developer

Altair OptiStruct 13.0 Performance Benchmark and Profiling. May 2015

HPC learning using Cloud infrastructure

Architecting High Performance Computing Systems for Fault Tolerance and Reliability

Experiences with the Parallel Virtual File System (PVFS) in Linux Clusters

Paving the Road to Exascale

Capriccio: Scalable Threads for Internet Services

An introduction to checkpointing. for scientifc applications

Cluster Distributions Review

MPI SUPPORT ON THE GRID. Kiril Dichev, Sven Stork, Rainer Keller. Enol Fernández

Red Hat HPC Solution Overview. Platform Computing

Grid Programming Models: Current Tools, Issues and Directions. Computer Systems Research Department The Aerospace Corporation, P.O.

Clusters. Rob Kunz and Justin Watson. Penn State Applied Research Laboratory

The SGI Message Passing Toolkit

Enhanced Memory debugging of MPI-parallel Applications in Open MPI

Cisco 4000 Series Integrated Services Routers: Architecture for Branch-Office Agility

GPU Debugging Made Easy. David Lecomber CTO, Allinea Software

The Optimal CPU and Interconnect for an HPC Cluster

Designing High Performance DSM Systems using InfiniBand Features

Grid Scheduling Architectures with Globus

Transcription:

MPI versions MPI History Standardization started (1992) MPI-1 completed (1.0) (May 1994) Clarifications (1.1) (June 1995) MPI-2 (started: 1995, finished: 1997) MPI-2 book 1999 MPICH 1.2.4 partial implemention of some MPI-2 features 1

MPI-2 Dynamic process creation and management One sided communications Extended collective operations Parallel I/O Miscellany MPICH2 all-new implementation of MPI designed to support research into high-performance implementations of MPI-1 and MPI-2 functionality. MPICH2 includes support for one-side communication, dynamic processes, intercommunicator collective operations, and expanded MPI-IO functionality. Clusters consisting of both single-processor and SMP nodes are supported. With the exception of users requiring the communication of heterogeneous data, strongly encourage everyone to consider switching to MPICH2. Researchers interested in using using MPICH as a base for their research into MPI implementations should definitely use MPICH2. current version of MPICH2 is 1.0.1, released on March 2, 2005 2

LAM Current version is LAM 7.1.1 MPI-1 Support complete implementation of the MPI 1.2 standard, MPI-2 Support LAM/MPI includes support for large portions of the MPI-2 standard. A partial list of the most commonly used features LAM supports is provided below: Process Creation and Management One-sided Communication (partial implementation) MPI I/O (using ROMIO) MPI-2 Miscellany: mpiexec Thread Support (MPI_THREAD_SINGLE - MPI_THREAD_SERIALIZED) User functions at termination Language interoperability C++ Bindings LAM features Checkpoint/Restart MPI applications running under LAM/MPI can be checkpointed to disk and restarted at a later time. LAM requires a 3rd party single-process checkpoint/restart toolkit for actually checkpointing and restarting a single MPI process - LAM takes care of the parallel coordination. Currently, the Berkeley Labs Checkpoint/Restart package (Linux only) is supported. The infrastructure allows for easy addition of new checkpoint/restart packages. Fast Job Startup LAM/MPI utilizes a small, user-level daemon for process control, output forwarding, and out-of-band communication. The user-level daemon is started at the beginning of a session using lamboot, which can use rsh/ssh, TM (OpenPBS / PBS Pro), SLURM, or BProc to remotely start the daemons. Although the time for lamboot to execute can be long on large platforms using rsh/ssh, the start-up time is amortized as applications are executed - mpirun does not use rsh/ssh, instead using the LAM daemons. Even for very large number of nodes, MPI application startup is on the order of a couple of seconds. 3

LAM features High Performance Communication LAM/MPI provides a number of options for MPI communication with very little overhead. The TCP communication system provides near-tcp stack bandwidth and latency, even at Gigabit Ethernet speeds. Two shared memory communication channels are available, each using TCP for remote-node communication. LAM/MPI 7.0 and later support Myrinet networks using the GM interface. Using Myrinet provides significantly higher bandwidth and lower latency than TCP. LAM/MPI 7.1 also provides support for high-speed, low-latency Infiniband networks using the Mellanox Verbs Interface (VAPI). Run-time Tuning and RPI Selection LAM/MPI has always supported a wide number of tuning parameters. Unfortunately, most could only be set at compile time, leading to painful application tuning. With LAM/MPI 7.0, almost every parameter in LAM can be altered at runtime -- environment variables or flags to mpirun -- make tuning much simpler. The addition of LAM's System Services Interface (SSI) allows RPI (the network transport used for MPI point-to-point communications) selection to be made at run-time rather than compile-time. Rather than recompiling LAM 4 times to decide which transport gives best performance for an application, all that is required is a single flag to mpirun. LAM features SMP-Aware Collectives The use of clusters of SMP machines is a growing trend in the clustering world. With LAM 7.0, many common collective operations are optimized to take advantage of the higher communication speed between processes on the same machine. When using the SMP-aware collectives, performance increases can be seen with little or no changes in user applications. Be sure to read the LAM User's Guide for important information on exploiting the full potential of the SMPaware collectives. Integration with PBS PBS (either OpenPBS or PBS Pro) provides scheduling services for many of the high performance clusters in service today. By using the PBS-specific boot mechanisms, LAM is able to provide process accounting and job cleanup to MPI applications. As an added bonus to MPI users, lamboot execution time is drastically reduced when compared to rsh/ssh. Integration with BProc The BProc distributed process space provides a single process space for an entire cluster. It also provides a number of mechanisms for starting applications not available on the compute nodes of a cluster. LAM's BProc support supports booting under the BProc environment, even when LAM is not installed on the compute nodes -- LAM will automatically migrate the required support out to the compute nodes. MPI applications still must be available on all compute nodes (although the -s option to mpirun eliminates this requirement). 4

LAM features Globus Enabled LAM 7.0 includes beta support for execution in the Globus Grid environment. Be sure to read the release notes in the User's Guide for important restrictions on your Globus environment. Extensible Component Architecture LAM 7.0 is the first LAM release to include the System Services Interface (SSI), providing an extensible component architecture for LAM/MPI. Currently, "dropin" modules are supported for booting the LAM run-time environment, MPI collectives, Checkpoint/Restart, and MPI transport (RPI). Selection of a component is a run-time decision, allowing for user selection of the modules that provide the best performance for a specific application. Easy Application Debugging Debugging with LAM/MPI is easy. Support for parallel debuggers such as the Distributed Data Debugging Tool and the Etnus TotalView parallel debugger allows users straight-forward debugging of even the most complicated MPI applications. LAM is also capable of starting standard single-process debuggers for quick debugging of a subset of processes (see our FAQ). A number of tools, including XMPI are available for communication debugging and analysis. Interoperable MPI LAM implements much of the Interoperable MPI (IMPI) standard, intended to allow an MPI application to execute over multiple MPI implementations. Use of IMPI allows users to obtain the best performance possible, even in a hetergenous environment. Open MPI project combining technologies and resources from several other projects (FT-MPI, LA-MPI, LAM/MPI, and PACX-MPI) in order to build the best MPI library available Fault Tolerant MPI (FT-MPI) is a full 1.2 MPI specification implementation that provides process level fault tolerance at the MPI API level. FT-MPI is built upon the fault tolerant HARNESS runtime system. LA-MPI was an implementation of the Message Passing Interface (MPI) motivated by a growing need for fault tolerance at the software level in large high-performance computing (HPC) systems. 5

Open MPI PACX-MPI (PArallel Computer extension) enables seamless running of MPI-conforming parallel application on a Computational Grid, such as a cluster of High- Performance Computers like MPPs, connected through high-speed networks or even the Internet. Communication between MPI processes internal to an MPP is done with the vendor MPI, while communication to other members of the Metacomputer is done via the connecting network. Open MPI PACX/MPI The characteristics of such systems show two different levels of quality of communication: Usage of the optimized vendor-mpi library: Internal operations are handled using the vendor-mpi environment on each system. This allows to fully exploit the capacity of the underlying communication subsystem in a portable manner. Usage of communication daemons: on each system in the metacomputer two daemons take care of communication between systems. This allows to bundle communication and to avoid to have thousands of open connections between processes. In addition it allows to handle security issues centrally. The daemon nodes are implemented as additional, local MPI processes. Therefore, no additional TCPcommunication between the application nodes and the daemons is necessary, which would needlessly increase the communication latency. 6

Open MPI features Full MPI-2 standards conformance Thread safety and concurrency Dynamic process spawning High performance on all platforms Reliable and fast job management Network and process fault tolerance Support data and network heterogeneity Single library supports all networks Run-time instrumentation Many job schedulers supported Open MPI features (ctd.) Many OS's supported (32 and 64 bit) Production quality software Portable and maintainable Tunable by installers and end-users Extensive user and installer guides Internationalized error messages Component-based design, documented APIs CPAN-like tool for component management Active, responsive mailing list Open source license based on the BSD license 7