Fault tolerance in Grid and Grid 5000

Similar documents
Grid Grid A nation wide Experimental Grid. Franck Cappello INRIA With all participants

An e-infrastructure for Grid Research and the Industry: the CoreGRID experience

Networks & protocols research in Grid5000 DAS3

Introduction to Grid 5000

SILECS Super Infrastructure for Large-scale Experimental Computer Science

Cycle Sharing Systems

Modalis. A First Step to the Evaluation of SimGrid in the Context of a Real Application. Abdou Guermouche and Hélène Renard, May 5, 2010

Moving e-infrastructure into a new era the FP7 challenge

Improving locality of an object store in a Fog Computing environment

Middleware for Grid computing: experiences using the VTHD Network (2.5 Gbit/s)

MPI versions. MPI History

HPC learning using Cloud infrastructure

Towards Scalable Data Management for Map-Reduce-based Data-Intensive Applications on Cloud and Hybrid Infrastructures

HPC and Inria

The LHC Computing Grid

Introduction to Grid Computing

EGEE and Interoperation

Advanced School in High Performance and GRID Computing November Introduction to Grid computing.

Grid Scheduling Architectures with Globus

Distributed Systems Principles and Paradigms

MPI History. MPI versions MPI-2 MPICH2

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.

Distributed Systems. Overview. Distributed Systems September A distributed system is a piece of software that ensures that:

Distributed Systems Principles and Paradigms. Chapter 01: Introduction. Contents. Distributed System: Definition.

Investigating Resilient HPRC with Minimally-Invasive System Monitoring

Distributed Systems Principles and Paradigms. Chapter 01: Introduction

Accurate emulation of CPU performance

Sky Computing on FutureGrid and Grid 5000 with Nimbus. Pierre Riteau Université de Rennes 1, IRISA INRIA Rennes Bretagne Atlantique Rennes, France

High Performance Computing Course Notes HPC Fundamentals

Bluemin: A Suite for Management of PC Clusters

Garuda : The National Grid Computing Initiative Of India. Natraj A.C, CDAC Knowledge Park, Bangalore.

A User-level Secure Grid File System

CA464 Distributed Programming

Evaluation Strategies. Nick Feamster CS 7260 February 26, 2007

High Performance Computing Course Notes Course Administration

A Simulation Model for Large Scale Distributed Systems

Analysis of Peer-to-Peer Protocols Performance for Establishing a Decentralized Desktop Grid Middleware

Fault Injection in Distributed Java Applications

Andrea Sciabà CERN, Switzerland

ISTITUTO NAZIONALE DI FISICA NUCLEARE

Designing and debugging real-time distributed systems

High Performance Computing from an EU perspective

An Introduction to GPFS

STATUS OF PLANS TO USE CONTAINERS IN THE WORLDWIDE LHC COMPUTING GRID

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

InfoBrief. Platform ROCKS Enterprise Edition Dell Cluster Software Offering. Key Points

Lecture 9: MIMD Architectures

Barracuda Link Balancer

Virtualization. Michael Tsai 2018/4/16

Early Operational Experience with the Cray X1 at the Oak Ridge National Laboratory Center for Computational Sciences

Systematic Cooperation in P2P Grids

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

The DETER Testbed: Overview 25 August 2004

Tier-2 structure in Poland. R. Gokieli Institute for Nuclear Studies, Warsaw M. Witek Institute of Nuclear Physics, Cracow

A Case for High Performance Computing with Virtual Machines

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

Héméra Inria Large Scale Initiative

Grid Computing Middleware. Definitions & functions Middleware components Globus glite

Announcements. me your survey: See the Announcements page. Today. Reading. Take a break around 10:15am. Ack: Some figures are from Coulouris

Enhancing Infrastructure: Success Stories

Oracle Corporation 1

Scalable, Fault-Tolerant Membership for MPI Tasks on HPC Systems

image credit Fabien Hermenier Cloud compting 101

image credit Fabien Hermenier Cloud compting 101

Parallel & Cluster Computing. cs 6260 professor: elise de doncker by: lina hussein

Fault tolerance based on the Publishsubscribe Paradigm for the BonjourGrid Middleware

Virtualization for Desktop Grid Clients

Rollback-Recovery Protocols for Send-Deterministic Applications. Amina Guermouche, Thomas Ropars, Elisabeth Brunet, Marc Snir and Franck Cappello

Distributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju

Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21)

MCTS Guide to Microsoft Windows Server 2008 Applications Infrastructure Configuration (Exam # ) Chapter One Introducing Windows Server 2008

Functional Requirements for Grid Oriented Optical Networks

HPC Considerations for Scalable Multidiscipline CAE Applications on Conventional Linux Platforms. Author: Correspondence: ABSTRACT:

VMware vsphere 5.5 Advanced Administration

DREMS: A Toolchain and Platform for the Rapid Application Development, Integration, and Deployment of Managed Distributed Real-time Embedded Systems

Monitor System Status

TDP3471 Distributed and Parallel Computing

vsphere Networking Update 1 ESXi 5.1 vcenter Server 5.1 vsphere 5.1 EN

TOSS - A RHEL-based Operating System for HPC Clusters

Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX

Kadeploy3. Efficient and Scalable Operating System Provisioning for Clusters. Reliable Deployment Process with Kadeploy3

GPFS for Life Sciences at NERSC

vsan Stretched Cluster Configuration First Published On: Last Updated On:

HPC with GPU and its applications from Inspur. Haibo Xie, Ph.D

Outline Background Jaluna-1 Presentation Jaluna-2 Presentation Overview Use Cases Architecture Features Copyright Jaluna SA. All rights reserved

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

Introduction to Distributed Systems

Red Hat HPC Solution Overview. Platform Computing

An Integrated Experimental

Readme for Platform Open Cluster Stack (OCS)

Sharing High-Performance Devices Across Multiple Virtual Machines

e-infrastructure: objectives and strategy in FP7

Shadow: Real Applications, Simulated Networks. Dr. Rob Jansen U.S. Naval Research Laboratory Center for High Assurance Computer Systems

Advanced architecture and services Implications of the CEF Networks workshop

Japan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS

P a g e 1. Teknologisk Institut. Online kursus k SysAdmin & DevOps Collection

Distributed Systems. Bina Ramamurthy. 6/13/2005 B.Ramamurthy 1

Execo tutorial Grid 5000 school, Grenoble, January 2016

Windows Azure Services - At Different Levels

Transcription:

Fault tolerance in Grid and Grid 5000 Franck Cappello INRIA Director of Grid 5000 fci@lri.fr Fault tolerance in Grid Grid 5000

Applications requiring Fault tolerance in Grid Domains (grid applications connecting databases, supercomputers, instruments, visualization tools): Finance, Health care, escience, Cyber Infrastructure (EGEE, Virtual observatory, TeraGrid, etc.) Nature and industrial disasters prevention and management etc. Key technology: Key technology: Web Services (with some extensions: WSRF)

The EGEE project (Enabl Enabling Grid for E-SciencE) Building and Maintaining a large scale computing infrastructure Provide support for Scientists using it. Size: Users: 3000 Institutes: 70 Countries: 27 Sites: 148 CPU: > 13000 Disk > 98 PB Duration: 2 years Cost: 32M Next: EGEE2 Pilot applications: LHC experiment (Alice, Atlas, CMS, LHCb) Scale, high bandwidth data transfer Biomedical experiments: Security, Ease of use, distributed data base

Job Statistics in EGEE (Enabl Enabling Grid for E-SciencE)

EGEE issues and problems

Job Efficiency in EGEE

Software Status in TERA GRID 1/2 TeraGrid: -integrated, persistent computational resource. -Deployment completed in September 2004, -40 teraflops of computing power -nearly 2 petabytes of storage, -interconnections at 10-30 gigabits/sec. (via a dedicated national network.) http://tec IFIP WG h.teragrid.org/inc 10.4 on dependable a- prod/c gi- Computing bin//primaryhtmlmap.c and Fault Tolerance gi? mapfile=/var/www/tec h.teragrid.org/inc a/t G/html/preload.s tate&topkey=exec

Software Status in TERA GRID 2/2 http://tech.teragrid.org/inca-prod/cgi-bin//primaryhtmlmap.cgi?mapfile=/var/www/tech.teragrid.org/inca/tg/html/preload.state&topkey=stack_compute

Why FT in Grid is difficult (1/2) Grids are installed, administered and controlled by humans -local priority may lead to stop or freeze jobs -modifications and updates take times and introduce configuration inconsistencies -upgrades and modifications may introduce errors Heterogeneity (hardware and software, availability) Instability (hardware and software) + Resources belong to different administration domains!

Why FT in Grid is difficult (2/2) Vertical complexity and consistency Site1 Site2 Application Application Runtime Grid Middleware (WS) Services Operating System Networking Application Application Runtime Grid Middleware (WS) Services Operating System Networking Horizontal interoperability AND consistency When running applications on dynamic and heterogeneous Grid, we may experience many software failures

Research in Grid Fault Tolerance (some aspects) Computing models (application runtimes): Very few work (RPC-V, MPI: MPICH-V, MPICH-GF) Infrastructure: Server fault tolerance (GridServices, Webservices, WSRF) Fault detectors (few results, Xavier talk) High performance protocols (content distribution: BitTorrent) Resource discovery (DHT: Kadelmia) FT techniques: Self stabilization (crash may append during stabilization) Consensus (impossibility result on asynchronous network) Majority voting (decisions may apply to a majority of nodes absent during the vote ) Fault tolerance is one research topic of the CoreGrid NoE

Grid still raises many issues on fault tolerance, BUT also on other topics: performance, scalability, QoS, resources usage, accounting, security, etc. No environment or tool to test REAL Grid software at large scale

We need Grid experimental tools In the first of 2003, the design and development of two Grid experimental platforms was decided: Grid 5000 as a real life system log(cost) Major challenge Challenging Reasonable SimGrid MicroGrid Bricks Model NS, etc. Protocol proof Data Grid explorer WANinLab Emulab AIST SuperCluster Grid 5000 DAS 2 TERAGrid PlanetLab Naregi Testbed log(realism) math simulation emulation live systems

Grid 5000 The largest research Instrument to study Grid issues 500 500 1000 500 RENATER 500 500 500 500 500

Grid 5000 foundations: Collection of experiments to be done Networking End host communication layer (interference with local communications) High performance long distance protocols (improved TCP) High Speed Network Emulation Middleware / OS Scheduling / data distribution in Grid Fault tolerance in Grid Resource management Grid SSI OS and Grid I/O Desktop Grid/P2P systems Programming Component programming for the Grid (Java, Corba) GRID-RPC GRID-MPI Code Coupling Applications Multi-parametric applications (Climate modeling/functional Genomic) Large scale experimentation of distributed applications (Electromagnetism, multi-material fluid mechanics, parallel optimization algorithms, CFD, astrophysics Medical images, Collaborating tools in virtual 3D environment

Grid 5000 goal: Experimenting fault tolerance and many other topics on all layers of the Grid software stack Application Programming Environments Application Runtime Grid Middleware Operating System Networking A highly reconfigurable, controllable and monitorable experimental platform

Confinement / isolation 2 fibers (1 dedicated to Grid 5000) Probe (GPS) Router RENATER Router MPLS (level 2) Router RENATER 8 VLANs per site Grid5000 site G5k site Router RENATER RENATER LAb normal connection to Renater Switch Grid G5k site Routeur RENATER Switch/router lab. Grid 5000 User access point Lab. Clust LAB Firewall/nat Local Front-end Grid5000 site (logging by ssh) Controler (DNS, LDAP, NFS, /home, Reboot, DHCP, Boot server) Ganglia +HPC Reconfigurable nodes

Observation & Monitoring Probe (GPS) Router stat MPLS Grid5000 site Router RENATER Router RENATER 8 VLANs per site G5k site Router RENATER RENATER LAb normal connection to Renater Switch Grid G5k site Routeur RENATER Switch/router lab. Grid 5000 User access point Lab. Clust LAB Firewall/nat Local Front-end Grid5000 site (logging by ssh) Controler (DNS, LDAP, NFS, /home, Reboot, DHCP, Boot server) Ganglia +HPC Reconfigurable nodes

Workload/Traffic & Fault injection Probe (GPS) Router MPLS Grid5000 site Router RENATER Router RENATER 8 VLANs per site G5k site Router RENATER RENATER LAb normal connection to Renater Switch Grid G5k site Routeur RENATER Switch/router lab. Grid 5000 User a ccess point Lab. Clust LAB Firewall/nat Local Front-end Grid5000 site (logging by ssh) Controler (DNS, LDAP, NFS, /home, Reboot, DHCP, Boot server) Ganglia +HPC Reconfigurable nodes Injectors (process, communication)

Rennes Lyon Sophia Grenoble Orsay Bordeaux Toulouse

Grid 5000 Global Observer

Network traffic Grid 5000 Monitoring tools Ganglia

Grid 5000 Reservation and reconfiguration

300 Grid 5000 Reconfiguration time Time to reboot 1 cluster of Grid 5000 with Kadeploy 250 time (sec) 200 150 100 User Kernel reboot User Environment deployment Partition prep. 50 0 0 20 40 60 80 100 120 140 160 180 # nodes first check preinstall transfert last check Deploy. Kernel reboot

Grid 5000 Reconfiguration time Time to reboot 2 clusters (Paris + Nice) of Grid 5000 (Kadeploy) 300 250 200 # nodes 150 100 50 0 0 200 400 600 800 1000 1200 time (sec) deploying deployed deploying_site1 deployed_site1 deploying_site2 deployed_site2

Grid 5000 Fault Generator: Fail Objectives Probabilistic and deterministic (reproducible) fault injection. Expressiveness of scenarios. No code modification. Scalable. Concepts A A dedicated language for fault scenario specification (FAIL: FAult Injection Language). Fine control of the code execution (through a debugger) Daemon ADV2 { time_g timer = 5; node 1 : always int rand = FAIL_RANDOM (1,10); timer && rand < 2 -> halt, goto 2; node 2 : always int rand = FAIL_RANDOM (1,10); timer && rand > 7 -> restart, goto 3; node 3 : } 01001 10011 01001 10011 01001 10011 01001 10011 01001 10011 01001 10011 fail-exec.run

Grid 5000 Summary: Grid still raises many issues about fault tolerance Grid 5000 will offer a large scale infrastructure to study some of these issues (operational in September 2005) Grid 5000 will be opened to international collaborations