ProActive SPMD and Fault Tolerance Protocol and Benchmarks

Similar documents
Rollback-Recovery Protocols for Send-Deterministic Applications. Amina Guermouche, Thomas Ropars, Elisabeth Brunet, Marc Snir and Franck Cappello

Distributed recovery for senddeterministic. Tatiana V. Martsinkevich, Thomas Ropars, Amina Guermouche, Franck Cappello

Agenda 1. Background: OASIS, ActiveEon 2. Multi-Cores 3. Programming, Optimizing Scheduling 4. Enterprise Parallel Computing

Chapter 8 Fault Tolerance

Parallel and Distributed Systems. Programming Models. Why Parallel or Distributed Computing? What is a parallel computer?

Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand

Uncoordinated Checkpointing Without Domino Effect for Send-Deterministic MPI Applications

REMEM: REmote MEMory as Checkpointing Storage

Distributed Algorithms Failure detection and Consensus. Ludovic Henrio CNRS - projet SCALE

Behavioural Verification of Distributed Components

Adaptive Runtime Support

Implementing Efficient and Scalable Flow Control Schemes in MPI over InfiniBand

Scalable In-memory Checkpoint with Automatic Restart on Failures

HydEE: Failure Containment without Event Logging for Large Scale Send-Deterministic MPI Applications

Proactive Process-Level Live Migration in HPC Environments

Programming with Message Passing PART I: Basics. HPC Fall 2012 Prof. Robert van Engelen

Message-Passing Programming with MPI

Reducing Network Contention with Mixed Workloads on Modern Multicore Clusters

An Empirical Performance Study of Connection Oriented Time Warp Parallel Simulation

Three Models. 1. Time Order 2. Distributed Algorithms 3. Nature of Distributed Systems1. DEPT. OF Comp Sc. and Engg., IIT Delhi

Scalable Fault Tolerance Schemes using Adaptive Runtime Support

Rollback-Recovery p Σ Σ

A Specification Language for Distributed Components

Today CSCI Recovery techniques. Recovery. Recovery CAP Theorem. Instructor: Abhishek Chandra

Peer-to-Peer and Fault-tolerance: Towards Deployment-based Technical Services

Optimize your Infrastructure & Accelerate Applications

Chapter 8 Fault Tolerance

Fault-Tolerant Computer Systems ECE 60872/CS Recovery

Failure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems

Uncoordinated Checkpointing Without Domino Effect for Send-Deterministic MPI Applications

Message-Passing Programming with MPI. Message-Passing Concepts

MPI and comparison of models Lecture 23, cs262a. Ion Stoica & Ali Ghodsi UC Berkeley April 16, 2018

MPICH-V Project: a Multiprotocol Automatic Fault Tolerant MPI

Rollback Overhead Reduction Methods for Time Warp Distributed Simulation

Implementation and Evaluation of a Scalable Application-level Checkpoint-Recovery Scheme for MPI Programs

Networked Systems and Services, Fall 2018 Chapter 4. Jussi Kangasharju Markku Kojo Lea Kutvonen

CS514: Intermediate Course in Computer Systems

ProActive Parallel Suite. Denis Caromel Arnaud Contes Univ. Nice ActiveEon

Michel Raynal. Distributed Algorithms for Message-Passing Systems

Kernel Level Speculative DSM

On Hierarchical, parallel and distributed components for Grid programming

Optimizing communications on clusters of multicores

Application. Collective Operations. P2P Operations. ft-globus. Unexpected Q. Posted Q. Log / Replay Module. Send Q. Log / Replay Module.

Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance

Fault Tolerance. Distributed Systems. September 2002

CSE 5306 Distributed Systems. Fault Tolerance

Distributed File Systems II

c 2015 by Adam R. Smith. All rights reserved.

ProActive: an integrated platform for programming and running applications on Grids and P2P systems

CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart

Lazy Agent Replication and Asynchronous Consensus for the Fault-Tolerant Mobile Agent System

Skeleton programming environments Using ProActive Calcium

Failure Tolerance. Distributed Systems Santa Clara University

Fault Tolerance in Parallel Systems. Jamie Boeheim Sarah Kay May 18, 2006

An MPI failure detector over PMPI 1

Design of High Performance Distributed Snapshot/Recovery Algorithms for Ring Networks

MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes

Unifying UPC and MPI Runtimes: Experience with MVAPICH

Introduction. Distributed Systems IT332

A Message Passing Standard for MPP and Workstations

Fault Tolerance for Highly Available Internet Services: Concept, Approaches, and Issues

HPC in Java: Experiences in Implementing the NAS Parallel Benchmarks

Event List Management In Distributed Simulation

Distributed Systems Principles and Paradigms. Chapter 01: Introduction

Distributed Systems Principles and Paradigms. Chapter 01: Introduction. Contents. Distributed System: Definition.

The Google File System

Cycle Sharing Systems

Toward An Integrated Cluster File System

WhatÕs New in the Message-Passing Toolkit

Unified Runtime for PGAS and MPI over OFED

Hypervisor-based Fault-tolerance. Where should RC be implemented? The Hypervisor as a State Machine. The Architecture. In hardware

Outline Background Jaluna-1 Presentation Jaluna-2 Presentation Overview Use Cases Architecture Features Copyright Jaluna SA. All rights reserved

CA464 Distributed Programming

Multi-Channel Clustered Web Application Servers

AN EFFICIENT ALGORITHM IN FAULT TOLERANCE FOR ELECTING COORDINATOR IN DISTRIBUTED SYSTEMS

Practical Byzantine Fault Tolerance

Checkpointing HPC Applications

Analysis of Peer-to-Peer Protocols Performance for Establishing a Decentralized Desktop Grid Middleware

OpenMP on the FDSM software distributed shared memory. Hiroya Matsuba Yutaka Ishikawa

processes based on Message Passing Interface

goals monitoring, fault tolerance, auto-recovery (thousands of low-cost machines) handle appends efficiently (no random writes & sequential reads)

Interactive Analysis of Large Distributed Systems with Scalable Topology-based Visualization

CSE 5306 Distributed Systems

Design challenges of Highperformance. MPI over InfiniBand. Presented by Karthik

Contributions to Session Aware Frameworks for Next Generation Internet Services

GFS: The Google File System. Dr. Yingwu Zhu

Fault Tolerance. Distributed Systems IT332

Distributed Systems Principles and Paradigms

SPBC: Leveraging the Characteristics of MPI HPC Applications for Scalable Checkpointing

Optimistic Preventive Replication in a Database Cluster

ProActive: an Integrated platform for programming and running applications on grids and P2P systems

Scalable and Fault Tolerant Failure Detection and Consensus

CHAPTER 4 AN INTEGRATED APPROACH OF PERFORMANCE PREDICTION ON NETWORKS OF WORKSTATIONS. Xiaodong Zhang and Yongsheng Song

Co-array Fortran Performance and Potential: an NPB Experimental Study. Department of Computer Science Rice University

Optimizing MPI Communication Within Large Multicore Nodes with Kernel Assistance

CA485 Ray Walshe Google File System

Postgres-XC PG session #3. Michael PAQUIER Paris, 2012/02/02

CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University

Distributed Systems. Aleardo Manacero Jr.

Kerrighed: A SSI Cluster OS Running OpenMP

Transcription:

1 ProActive SPMD and Fault Tolerance Protocol and Benchmarks Brian Amedro et al. INRIA - CNRS 1st workshop INRIA-Illinois June 10-12, 2009 Paris

2 Outline ASP Model Overview ProActive SPMD Fault Tolerance Benchmarks

3 Asynchronous Sequential Process Communication with message-passing Request / Reply No memory sharing Asynchronous request services Request queue Rendezvous Future with wait-by-necessity Confluence and determinacy Causal ordering Activity state characterization 1 2 1 2 3 3 P1 P2 Java implementation Denis Caromel and Ludovic Henrio and Bernard Serpette Asynchronous and Deterministic Objects, POPL!04

4 Asynchronous Sequential Process Communication with message-passing Request / Reply No memory sharing Asynchronous request services Request queue Rendezvous Future with wait-by-necessity Confluence and determinacy Causal ordering Activity state characterization 1 2 1 2 3 3 P1 P2 Java implementation Denis Caromel and Ludovic Henrio and Bernard Serpette Asynchronous and Deterministic Objects, POPL!04

5 Asynchronous Sequential Process Communication with message-passing Request / Reply No memory sharing Asynchronous request services Request queue Rendezvous Future with wait-by-necessity Confluence and determinacy Causal ordering Activity state characterization Java implementation 1 2 1 2 RDV 3 P1 P2 Denis Caromel and Ludovic Henrio and Bernard Serpette Asynchronous and Deterministic Objects, POPL!04

6 Asynchronous Sequential Process Communication with message-passing Request / Reply No memory sharing Asynchronous request services Request queue Rendezvous Future with wait-by-necessity Confluence and determinacy Causal ordering Activity state characterization Java implementation 1 2 1 1 2 2 1 Futures and WBN 3 P1 P2 Denis Caromel and Ludovic Henrio and Bernard Serpette Asynchronous and Deterministic Objects, POPL!04

7 Asynchronous Sequential Process Communication with message-passing Request / Reply No memory sharing Asynchronous request services Request queue Rendezvous Future with wait-by-necessity Confluence and determinacy Causal ordering Activity state characterization Java implementation 1 2 1 2 3 point-to-point FIFO order 3 1 3 1 2 Denis Caromel and Ludovic Henrio and Bernard Serpette Asynchronous and Deterministic Objects, POPL!04 3 2 causal ordering

8 Asynchronous Sequential Process Communication with message-passing Request / Reply No memory sharing Asynchronous request services Request queue Rendezvous Future with wait-by-necessity Confluence and determinacy Causal ordering Activity state characterization Java implementation 1 2 1 2 3 point-to-point FIFO order 3 1 3 1 2 Denis Caromel and Ludovic Henrio and Bernard Serpette Asynchronous and Deterministic Objects, POPL!04 3 2 causal ordering

9 Asynchronous Sequential Process Communication with message-passing Request / Reply No memory sharing Asynchronous request services Request queue Rendezvous Future with wait-by-necessity http://proactive.inria.fr Confluence and determinacy Causal ordering Activity state characterization Java implementation Denis Caromel and Ludovic Henrio and Bernard Serpette Asynchronous and Deterministic Objects, POPL!04

10 ProActive OO-SPMD Objectives Provide an MPI-like programming model Ease the porting an MPI application to ProActive Give Object Oriented to SPMD model

11 ProActive OO-SPMD Features Asynchronous collective operations Asynchronous barrier Asynchronous group communication Scattering, gathering Take into account topology Optimized algorithm

12 Fault Tolerance with ASP Objectives Transparency No piece of code dedicated to Fault Tolerance in applications Portability No assumption about underlying hardware Consistency

13 Fault Tolerance with ASP Propositions Rollback Recovery TTC + Communication Induced Checkpointing: Transparency No programmer intervention Constrained Checkpointability: Portability No Checkpoint during a service Un-consistency of recovery lines Christian Delbé, PhD Thesis, 2007 Tolérance aux pannes pour objets actifs asynchrones - protocole, modèle et expérimentations

14 Fault Tolerance with ASP Principles of the Protocol Q1 Q4 n Ci[Q1,Q2,Q4] Q2 Q3 Q1 Q2 Q4 i

15 Fault Tolerance with ASP Principles of the Protocol Orphan and In-transit Messages Promised Requests Request Reception History

16 Fault Tolerance with ASP Orphan Messages Ci n+1 Q1 i n Cj[Q0] Q0 n+1 Cj[Q1] Q1 R1 j Q1 is an Orphan Request

17 Fault Tolerance with ASP Promised Request n+1 Q1 i n Q0 n+1 Q1 R1 j Ci n+1 Q1 i n+1 Cj [Qi,j] Qi,j Q1 synchronization with the actual request arrival R1 j

18 Fault Tolerance with ASP In-transit Messages Ci n Cj n Q0 Ck n Q0 R1 Q2 i j k Q0 is an In-transit Request

19 Fault Tolerance with ASP Request Reception History Ci n+1 Cj n+1 Ck n Q0 Q1 Q0 Q1 R1 Ck n+1 i j k

20 Fault Tolerance with ASP Synthesis Orphan Request Replace with a promised request with wait-by-necessity In-transit Requests Reception of such a request is journalized into a request reception history In-transit Reply Can t happen: occur after the reception of an orphan request, so a checkpoint have been performed

21 Benchmarking Fault Tolerance NAS Parallel Benchmarks 5 kernels : EP, CG, FT, MG, IS About 10,000 LOC

22 Benchmarking Fault Tolerance NAS Benchmark : CG.A Checkpoint size (MB) 40.0 30.0 20.0 10.0 0 2 4 8 16 32 30 23 15 8 0 Time (sec) # nodes Checkpoint size (MB) T without checkpoints T standard T with checkpoints

23 Benchmarking Fault Tolerance NAS Benchmark : CG.C Checkpoint size (MB) 400 300 200 100 0 2 4 8 16 32 400 300 200 100 0 Time (sec) # nodes Checkpoint size (MB) T without checkpoints T standard T with checkpoints

24 Thank you for your attention