UNIBUS: ASPECTS OF HETEROGENEITY AND FAULT TOLERANCE IN CLOUD COMPUTING

Similar documents
Mathematics and Computer Science

REMEM: REmote MEMory as Checkpointing Storage

From Parallel Virtual Machine to Virtual Parallel Machine: The Unibus System

MPI versions. MPI History

MPI History. MPI versions MPI-2 MPICH2

Proactive Process-Level Live Migration in HPC Environments

Grid Computing Middleware. Definitions & functions Middleware components Globus glite

Structuring PLFS for Extensibility

Transparent Throughput Elas0city for IaaS Cloud Storage Using Guest- Side Block- Level Caching

First Results in a Thread-Parallel Geant4

Open Cloud Reference Architecture

Distributed Filesystem

INDIGO-DataCloud Architectural Overview

Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

Functional Requirements for Grid Oriented Optical Networks

OpenNebula on VMware: Cloud Reference Architecture

Adaptive Cluster Computing using JavaSpaces

igrid: a Relational Information Service A novel resource & service discovery approach

Technical Comparison between several representative checkpoint/rollback solutions for MPI programs

goals monitoring, fault tolerance, auto-recovery (thousands of low-cost machines) handle appends efficiently (no random writes & sequential reads)

Checkpointing with DMTCP and MVAPICH2 for Supercomputing. Kapil Arya. Mesosphere, Inc. & Northeastern University

Chapter 3 Virtualization Model for Cloud Computing Environment

A comparison of performance between KVM and Docker instances in OpenStack

Microsoft Office SharePoint Server 2007

Aggregation of Real-Time System Monitoring Data for Analyzing Large-Scale Parallel and Distributed Computing Environments

High Performance and Cloud Computing (HPCC) for Bioinformatics

JCatascopia: Monitoring Elastically Adaptive Applications in the Cloud

Virtualization for Desktop Grid Clients

A Hands on Introduction to Docker

Grid Computing Fall 2005 Lecture 5: Grid Architecture and Globus. Gabrielle Allen

Unifying UPC and MPI Runtimes: Experience with MVAPICH

Understanding StoRM: from introduction to internals

Applications of DMTCP for Checkpointing

Approaches to Cloud Computing Fault Tolerance

Unibus: A Contrarian Approach to Grid Computing

Scale-out Storage Solution and Challenges Mahadev Gaonkar igate

Installation and Cluster Deployment Guide for VMware

API, DEVOPS & MICROSERVICES

Conduire OpenStack Vers l Edge Computing Anthony Simonet Inria, École des Mines de Nantes, France

The vsphere 6.0 Advantages Over Hyper- V

The Why and How of HPC-Cloud Hybrids with OpenStack

Deep Dive Amazon Kinesis. Ian Meyers, Principal Solution Architect - Amazon Web Services

Lightweight Streaming-based Runtime for Cloud Computing. Shrideep Pallickara. Community Grids Lab, Indiana University

Grid Architectural Models

Application Fault Tolerance Using Continuous Checkpoint/Restart

GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content

xsim The Extreme-Scale Simulator

Evolution of Rack Scale Architecture Storage

Work Queue + Python. A Framework For Scalable Scientific Ensemble Applications

Monitoring and Operating a Private Cloud with System Center 2012 (70-246) Course Outline Module 1: Introduction to the Private Cloud

Cisco Application Centric Infrastructure (ACI) Simulator

Simplified CICD with Jenkins and Git on the ZeroStack Platform

Cloudian Sizing and Architecture Guidelines

The SHARED hosting plan is designed to meet the advanced hosting needs of businesses who are not yet ready to move on to a server solution.

Update on EZ-Grid. Priya Raghunath University of Houston. PI : Dr Barbara Chapman

Cooperative VM Migration for a virtualized HPC Cluster with VMM-bypass I/O devices

Storage Optimization with Oracle Database 11g

Progress Report on Transparent Checkpointing for Supercomputing

An introduction to checkpointing. for scientific applications

Distributed System. Gang Wu. Spring,2018

Overview Job Management OpenVZ Conclusions. XtreemOS. Surbhi Chitre. IRISA, Rennes, France. July 7, Surbhi Chitre XtreemOS 1 / 55

CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart

GFS: The Google File System. Dr. Yingwu Zhu

Think Small to Scale Big

Building a Data-Friendly Platform for a Data- Driven Future

Chapter 5. The MapReduce Programming Model and Implementation

Euro-Par Pisa - Italy

RapidIO.org Update. Mar RapidIO.org 1

* Inter-Cloud Research: Vision

Checkpointing using DMTCP, Condor, Matlab and FReD

IBM Leading High Performance Computing and Deep Learning Technologies

Virtualization Strategies on Oracle x86. Hwanki Lee Hardware Solution Specialist, Local Product Server Sales

Fault tolerance based on the Publishsubscribe Paradigm for the BonjourGrid Middleware

Messaging Infrastructure Design

Difference Engine: Harnessing Memory Redundancy in Virtual Machines (D. Gupta et all) Presented by: Konrad Go uchowski

Configuring and Deploying a Private Cloud DURATION: Days

Evolution of the Data Center

MATE-EC2: A Middleware for Processing Data with Amazon Web Services

Towards a Resilient Information Architecture Platform for the Smart Grid: RIAPS

Ivane Javakhishvili Tbilisi State University High Energy Physics Institute HEPI TSU

Containerizing GPU Applications with Docker for Scaling to the Cloud

Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies

FogIoT Orchestrator: an Orchestration System for IoT Applications in Fog Environment

SURVEY PAPER ON CLOUD COMPUTING

Exploring cloud storage for scien3fic research

Enhancing Checkpoint Performance with Staging IO & SSD

A Holistic View of Telco Clouds

FREE SCIENTIFIC COMPUTING

Unifying Big Data Workloads in Apache Spark

VIAF: Verification-based Integrity Assurance Framework for MapReduce. YongzhiWang, JinpengWei

Benchmarking Over The Grid Using SEE Virtual Organization.

Developing and Deploying vsphere Solutions, vservices, and ESX Agents. 17 APR 2018 vsphere Web Services SDK 6.7 vcenter Server 6.7 VMware ESXi 6.

BRINGING HOST LIFE CYCLE AND CONTENT MANAGEMENT INTO RED HAT ENTERPRISE VIRTUALIZATION. Yaniv Kaul Director, SW engineering June 2016

Distributed Optimization via ADMM

How to Deploy a VHD Virtual Test Agent Image in Azure

Using PCF Ops Manager to Deploy Hyperledger Fabric

Sky Computing on FutureGrid and Grid 5000 with Nimbus. Pierre Riteau Université de Rennes 1, IRISA INRIA Rennes Bretagne Atlantique Rennes, France

ScalaIOTrace: Scalable I/O Tracing and Analysis

A Performance Evaluation of WS-MDS in the Globus Toolkit

Transcription:

Atlanta, Georgia, April 19, 2010 in conjunction with IPDPS 2010 UNIBUS: ASPECTS OF HETEROGENEITY AND FAULT TOLERANCE IN CLOUD COMPUTING Magdalena Slawinska Jaroslaw Slawinski Vaidy Sunderam {magg, jaross, vss}@mathcs.emory.edu Emory University, Dept. of Mathematics and Computer Science Atlanta, GA, USA

Creating a problem 2 1. What do I want? Execute an MPI application 2. What do I need? Target resource: MPI cluster FT services: Checkpoint, Heartbeat 3. What do I have? Access to the Rackspace cloud User 4. Why might I want FT on cloud? To reduce costs (money, time, energy, ) Reliability 5. What is the overhead introduced by FT? 6. Can I do that? How? Rackspace cloud

Problem 3 Available resource Rackspace cloud User s requirements Execute MPI software Target resource: MPI cluster Target platform: FT-flavor User User s resources Rackspace cloud (credentials) Workstations EC2 cloud Target resource Resource transformation Manually: interaction with web page prepare the image: install required software and dependencies instantiate servers configure passwordless authentication. 1 man-hour for 16+1 nodes

Unibus: a resource orchestrator 4 Available resource Rackspace cloud User s requirements Execute MPI software Target resource: MPI cluster Target platform: FT-flavor User User s resources Rackspace cloud (credentials) Unibus EC2 cloud Target resource Workstations

Outline 5 Unibus an infrastructure framework that allows to orchestrate resources Resource access virtualization Resource provisioning Unibus FT MPI platform on demand Automatic assembly of an FT MPI-enabled platform Execution of an MPI application on the Unibuscreated FT MPI-enabled platform Discussion of the FT overhead

Unibus resource sharing model 6 Traditional Model Proposed Model Resource exposition Virtual Organization (VO) Resource provider Resource usage Determined by VO Determined by a particular resource provider Resource virtualization and aggregation Resource providers belonging to VO Software at the client side

Handling heterogeneity in Unibus 7 Resources exposed in an arbitrary manner as access points Capability Model to abstract operations available on provider s resources Mediators to implement the specifics of access points Knowledge engine to infer relevant facts Unibus Engine Capability Model Mediators Network Resources uses User access daemon library access protocols access points

Complicating a big picture Unibus access device Metaapplications 8 Services Resources exposed in an arbitrary manner as access points Capability Model to abstract operations on resources Mediators to implement the specifics of access points Knowledge engine to infer relevant fact Resource descriptors to describe resources semantically (OWL-DL) Services (standard and third parties), e.g., heartbeat, checkpoint, resource discovery, etc. Metaapplications to orchestrate execution of applications on relevant resources Unibus Engine Mediators Network Resources Capability Model Services Resource descriptors Access daemon library access protocols Access points User

9 Virtualizing access to resources Capability Model and mediators Workstations binding Implements details Ssh AP Cluster Rackspace cloud Capability Model Provides virtually homogenized access to heterogeneous resources Specifies abstract operations, grouped in interfaces Interface hierarchy not appropriate (e.g. fs:itransferable and ssh:isftp) Mediators Implement resource access point protocols

Virtualizing access to resources 10 ISsh shell exec subsystem Ssh Mediator invoke_shell exec_command get_subsystem get_subsytemsftp. compatiblewith sshd Workstation

11 Knowledge engine Knowledge Set Mediator s Developer Ssh Mediator invoke_shell exec_command get_subsystem some compatiblewith Access Point Open SshD Operating System Linux compatiblewith some hasos hasaccesspoint ISsh shell exec subsystem hasoperation Resource emily Request interface ISsh Knowledge Engine (inferring) User Resource: emily

Composite operations 12 Rs_addhosts dependson create_server Create_server is implemented by RS Mediator Rs_addhosts addhosts So RS mediator addhosts Composite operations Dynamically expand mediator s operations May result in classification of mediators and compatible resources to new interfaces ISimpleCloud addhosts deletehosts dependson Composite operation entry point Composite operation definition rs_addhosts Def Def ISimpleCloud_RS.py IRackspace create_server delete_server Rackspace Mediator create_server delete_server rs_addhosts a.k.a. addhosts Rackspace Cloud

13 Resource access unification via composite operations IEC2 run_instance Unified interface ISimpleCloud addhosts deletehosts User Eliminating need of standardization IRackspace create_server EC2 Mediator Composite operations ec2_addhosts rs_addhosts delete_server Rackspace Mediator run_instance ec2_addhosts Def Def Def Def create_server delete_server rs_addhosts EC2 cloud Different resources, yet semantically similar Rackspace Cloud

Resource provisioning Homogenizing resource heterogeneity 14 Conditioning increases resource specialization levels Soft conditioning changes resource software capabilities e.g., installing MPI enables execution of MPI apps Successive conditioning enhances resource capabilities in terms of available access points (may use soft conditioning) e.g., deploying Globus Toolkit makes the resource accessible via Grid protocols

Transforming Rackspace to FT-enabled MPI platform 15 User s credentials Soft conditioning Successive cond. Composite ops Rackspace Unibus Rackspace descriptor Metaapp User s requirements Execute software: NAS Parellel Benchmarks (NPB) Target resource: MPI cluster FT services: Heartbeat, Checkpoint User NPB logs FT MPI cluster

Rackspace Cloud to MPI cluster 16 Installing other services (FT) Deployment of MPI on new resources Creating a new group of resources (Rackspace sshenabled servers) in terms of new access points Obtaining a higher level of abstraction

17 Metaapplications Unibus access device Metaapplications Services User s requirements Execute software: NAS Parellel Benchmarks (NPB) Target resource: MPI cluster FT services: Heartbeat, Checkpoint Unibus Capability Model Engine Mediators Services User Resource descriptors Network Resources

Metaapplication 18 Metaapplication Requests IClusterMPI FT services: IHeartbeat ICheckpointRestart Specifies available resources Performs benchmarks Transfers benchmarks execution logs to the head node Requests ISftp

Rackspace testbed RAM 256MB 19 10GB 16 working nodes (WN) + 1 head node (HN) HDD Node: 4-core, 64-bit, AMD 2GHz Debian 5.0 (Lenny) dmtcp_coordinator OpenMPI v. 1.3.4 (GNU suite v. 4.3.2 (gcc, gfortran) NAS Parallel Benchmarks v.3.3, class B FT setup: dmtcp_command Heartbeat service: OpenMPI-based in case of failure, the service determines failes node(s) and raises an exception RAM 1GB 40GB HDD Private IPs dmtcp_checkpoint j h \ headnode_privateip mpirun Checkpoint/restart service: DMTCP Distributed MultiThreaded CheckPointing user-level transparent checkpointing Executes dmtcp_command every 60 secs on HN to checkpoint 81 processes (64 MPI processes, 16+1 OpenMPI supervisor processes) Moves local checkpoint files from WN to HN (in parallel) Checkpoint time 5 sec; moving checkpoints from WN -> HN less than10 sec; compressed checkpoint size c.a.1gb

20 Results: NPB, class B, Rackspace, DMTCP, OpenMPI Heartbeat 16 Worker Nodes (WN) + 1 Head Node WN: 4-core, 64-bit, AMD Opteron 2GH, 1GB RAM, 40 GB HDD Checkpoints every 60 sec, average of 8 series Checkpoints every 60 sec FT overhead 2% - 10% HB - Heartbeat

Summary 21 The Unibus infrastructure framework Virtualization of access to various resources Automatic resource provisioning Innovatively used to assemble an FT MPI execution platform on cloud resources Reduces effort to bare minimum (servers instantiation, etc) 15-20 min from 1 man-hour Observed FT overhead 2%-10% (expected at least 8%) Future work Migration and restart of MPI-based computations on two different clouds or a cloud and a local cluster Work with an MPI application