Failover procedure for Grid core services

Similar documents
Grid services. Enabling Grids for E-sciencE. Dusan Vudragovic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia

Deploying virtualisation in a production grid

Enabling Grids for E-sciencE. A centralized administration of the Grid infrastructure using Cfengine. Tomáš Kouba Varna, NEC2009.

glite Grid Services Overview

Virtualizing Oracle 11g/R2 RAC Database on Oracle VM: Methods/Tips

AMGA metadata catalogue system

Monitoring tools in EGEE

Pan-European Grid einfrastructure for LHC Experiments at CERN - SCL's Activities in EGEE

glite Middleware Usage

High Availability irods System (HAIRS)

Virtualizing Oracle 11g/R2 RAC Database on Oracle VM: Methods/Tips

g-eclipse A Framework for Accessing Grid Infrastructures Nicholas Loulloudes Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac.

Virtualization And High Availability. Howard Chow Microsoft MVP

The glite middleware. Presented by John White EGEE-II JRA1 Dep. Manager On behalf of JRA1 Enabling Grids for E-sciencE

CONCEPTS GUIDE BlueBoxx

Beob Kyun KIM, Christophe BONNAUD {kyun, NSDC / KISTI

Argus: The Simplified Policy Language

Improving Grid User's Privacy with glite Pseudonymity Service

The glite middleware. Ariel Garcia KIT

Installation, Storage, and with Windows Server 2016

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

VMware vsphere Administration Training. Course Content

Upgrading Your Skills to Windows Server 2016

YAIM Overview. Bruce Becker Meraka Institute. Co-ordination & Harmonisation of Advanced e-infrastructures for Research and Education Data Sharing

Oracle 1Z Oracle VM 2 for x86 Essentials.

Oracle VM Server Recovery Guide. Version 8.2

OLT Us e r Guide for Or acle VM

SZDG, ecom4com technology, EDGeS-EDGI in large P. Kacsuk MTA SZTAKI

FREE SCIENTIFIC COMPUTING

EGEE and Interoperation

High-Availability Using Open Source Software

Introduction. Distributed Systems IT332

COURSE 20740B: INSTALLATION, STORAGE AND COMPUTE ITH WINDOWS SERVER 2016

CIT 668: System Architecture

[MS20740]: Installation, Storage, and Compute with Windows Server 2016

Linux Clustering Technologies. Mark Spencer November 8, 2005

How To Make Databases on Linux on System z Highly Available

Geographical failover for the EGEE-WLCG Grid collaboration tools. CHEP 2007 Victoria, Canada, 2-7 September. Enabling Grids for E-sciencE

SLCS and VASH Service Interoperability of Shibboleth and glite

Building Clusters to Protect SQL Server in Cloud Configurations

Oracle Real Application Clusters One Node

EMI Deployment Planning. C. Aiftimiei D. Dongiovanni INFN

Integration of Cloud and Grid Middleware at DGRZR

StarWind Native SAN for Hyper-V:

Overview of HEP software & LCG from the openlab perspective

The impact and adoption of GLUE 2.0 in the LCG/EGEE production Grid

SUSE Linux Enterprise High Availability Extension

StarWind iscsi SAN Software: Using StarWind to provide Cluster Shared Disk resources for Hyper-V Failover Clusters

Installation, Storage, and Compute with Windows Server

Installation, Storage, and Compute with Windows Server 2016 Course 20740B - 5 Days - Instructor-led, Hands on

Provisioning Oracle RAC in a Virtualized Environment, Using Oracle Enterprise Manager

EUROPEAN MIDDLEWARE INITIATIVE

Enabling Grids for E-sciencE. EGEE security pitch. Olle Mulmo. EGEE Chief Security Architect KTH, Sweden. INFSO-RI

How To Make Databases on SUSE Linux Enterprise Server Highly Available Mike Friesenegger

Bob Jones. EGEE and glite are registered trademarks. egee EGEE-III INFSO-RI

ZENworks Mobile Workspace High Availability Environments. September 2017

Bookkeeping and submission tools prototype. L. Tomassetti on behalf of distributed computing group

iscsi Target Usage Guide December 15, 2017

EUROPEAN MIDDLEWARE INITIATIVE

20740C: Installation, Storage, and Compute with Windows Server 2016

Data Grid Infrastructure for YBJ-ARGO Cosmic-Ray Project

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Course Outline 20740B. Module 1: Installing, upgrading, and migrating servers and workloads

<Insert Picture Here> Introducing Oracle WebLogic Server on Oracle Database Appliance

Oracle VM. Manager Getting Started Guide for Release 3.4

Open Source Storage Management Aperi and SMI-S for Linux

DESY. Andreas Gellrich DESY DESY,

Abstract. 1. Introduction

MyProxy Server Installation

Oracle E-Business Suite: Migration to Oracle VM Template Based Deployment

Course Installation, Storage, and Compute with Windows Server 2016

High Availability. Prepared by Vaibhav Daud

Using EonStor DS Series iscsi-host storage systems with VMware vsphere 5.x

Conference Oracle Database Appliance Virtualized Implementation with HA and DR for Banner Database and Application Servers.

Xen and CloudStack. Ewan Mellor. Director, Engineering, Open-source Cloud Platforms Citrix Systems

GRID COMPUTING APPLIED TO OFF-LINE AGATA DATA PROCESSING. 2nd EGAN School, December 2012, GSI Darmstadt, Germany

Overview of WMS/LB API

SUMMIT Windows 2012 Cluster with Hyper V. Presented by Todd Endicott and Mary Monroe. Advancing Care through Data

Service Availability Monitor tests for ATLAS

R-GMA (Relational Grid Monitoring Architecture) for monitoring applications

Middleware-Tests with our Xen-based Testcluster

XP7 High Availability User Guide

Sizing and Best Practices for Deploying Oracle Databases on Oracle VM using Dell EqualLogic Hybrid Arrays

Oracle Enterprise Manager Ops Center. Introduction. Creating Oracle Solaris 11 Zones 12c Release 2 ( )

Setup Desktop Grids and Bridges. Tutorial. Robert Lovas, MTA SZTAKI

Towards sustainability: An interoperability outline for a Regional ARC based infrastructure in the WLCG and EGEE infrastructures

Solaris Engineered Systems

Integration of Oracle VM 3 in Enterprise Manager 12c

Monte Carlo Production on the Grid by the H1 Collaboration

Installation, Storage, and Compute with Windows Server 2016 (20740)

LCG-2 and glite Architecture and components

Vendor: Oracle. Exam Code: 1Z Exam Name: oracle VM 2 for x86 Essentials. Version: Demo

Quick Start Guide: Creating HA Device with StarWind Virtual SAN

GRNET Cloud Services

High Availability Infrastructure for Cloud Computing

Database Consolidation with Oracle Exadata

Advanced Job Submission on the Grid

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

The Grid: Processing the Data from the World s Largest Scientific Machine

Unified Management for Virtual Storage

Transcription:

Failover procedure for Grid core services Kai Neuffer COD-15, Lyon www.eu-egee.org EGEE and glite are registered trademarks

Overview List of Grid core services Top level BDII Central LFC VOMS server WMS-LB/RB FTS? Metadata servers (AMGA, 3d, etc.) MyProxy Site Grid Services CE SiteBDII Local LFC MON-Boc UIs/VOBOX Local Metadata servers 2

Failover levels Failover levels Central service failover without shared data dependence BDII, WMS,... Central service failover with shared data dependence LGC,VOMS,... On site service failover (could be combined with load balancing) All Grid services and also site services (pbs, DNS, etc.) 3

Failover scheme: Independent central services Service recovery Site 1 Site 2 BDII 1 BDII 2 4

Dependent services 1 Failover scheme with service redirection: DNS alias, etc. Virtual LFC Site 1 Site 2 LFC 1 LFC 2 DB Backend 1 DB synchronization DB Backend 2 5

Dependent services 2 Failover scheme with service recovery: Service recovery Site 1 Site 2 LFC 1 LFC 2 DB Backend 1 DB synchronization DB Backend 2 6

Site service Failover 1 traditional cluster Cluster IP failover Virtual IP Virtual IP Real IP 1 Real IP 2 Heartbeat Node 1 Node 2 Shared Storage SAN, iscsi, DRDB with Cluster Filesystem (GFS2, OCFS) Not necessary for BDII,LFS Service runs on one node and is started on the other in case of a hardware failure 7

Site service failover with VM VM Cluster DomU 1 active DomU 2 not active HA DomU 1 not active DomU 2 active... (more nodes and VMs) Node 1 Dom0 Node 2 Dom0 Shared files: image DomU 1 and DomU 2, xen conf DomU 1 and DomU 2 GFS2 CLVS SAN,ISCSI or DRDB partitions Service runs on a VM on one node and life migrates to another in case of hardware failure 8

Load balancing Load balancing with failover Cluster IP failover Virtual IP Virtual IP Real IP 1 Real IP 2 Heartbeat LVS 1 LVS 2 Load Balancing node 1 node 2 Service nodes clustered or not Service is load balanced by the redundant LVS server 9

Conclusions 1 Service recovery should be implemented for al Grid services where it is possible Failover reached by installing a secondary service server No possible for all Grid services For some important VO services decentralized hosting could be of interest (LFC, VOMS,...) Not single site depended Technically complicated Higher costs (Oracle licenses, etc.) Site service clustering enables failover at the site Service runs like on a single machine but with failover Higher costs depended on the storage solution Each Grid service has to be teated differently Some Grid service are not clusterizable 10

Conclusions 2 Service independent failover with virtual machines Theoretically all services could be made failover No hardware dependency on the Grid middleware OS Easy maintenance of the services (life migration) Loss of performance over all disc access Higher hardware requirements to get the same performance Higher costs depended on the shared storage environment Service load balancing and failover Enables load balancing with failover depending on the service two other clustered machines needed more complex network structure 11