ELFms industrialisation plans

Similar documents
HEP Grid Activities in China

ALHAD G. APTE, BARC 2nd GARUDA PARTNERS MEET ON 15th & 16th SEPT. 2006

Cisco Configuration Engine 2.0

Remote power and console management in large datacenters

Oracle for administrative, technical and Tier-0 mass storage services

Virtualisation for Oracle databases and application servers

The EU DataGrid Fabric Management

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

10 Gbit/s Challenge inside the Openlab framework

g-eclipse A Framework for Accessing Grid Infrastructures Nicholas Loulloudes Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac.

Metal Recovery from Low Grade Ores and Wastes Plus

Provisioning of Grid Middleware for EGI in the framework of EGI InSPIRE

GRIDS INTRODUCTION TO GRID INFRASTRUCTURES. Fabrizio Gagliardi

Advanced School in High Performance and GRID Computing November Introduction to Grid computing.

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science

Internet2 Meeting September 2005

Launching StarlingX. The Journey to Drive Compute to the Edge Pilot Project Supported by the OpenStack

Distribution system how to remotely configure Zabbix infrastructure

Windows Azure Services - At Different Levels

where the Web was born Experience of Adding New Architectures to the LCG Production Environment

Summary of the LHC Computing Review

The European DataGRID Production Testbed

The LHC Computing Grid

ISTITUTO NAZIONALE DI FISICA NUCLEARE

NeuroLOG WP1 Sharing Data & Metadata

Easy Access to Grid Infrastructures

The INFN Tier1. 1. INFN-CNAF, Italy

LCG Installation LCFGng

The CORAL Project. Dirk Düllmann for the CORAL team Open Grid Forum, Database Workshop Barcelona, 4 June 2008

A scalable storage element and its usage in HEP

D8.1 Project website

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

Enterprise Manager: Scalable Oracle Management

Techno Expert Solutions

Andrea Sciabà CERN, Switzerland

Distributed Systems. Thoai Nam Faculty of Computer Science and Engineering HCMC University of Technology

Monitoring the ALICE Grid with MonALISA

Google Cloud Platform for Systems Operations Professionals (CPO200) Course Agenda

Grids and Security. Ian Neilson Grid Deployment Group CERN. TF-CSIRT London 27 Jan

Deploying virtualisation in a production grid

Chapter 20: Database System Architectures

Advanced Grid Technologies, Services & Systems: Research Priorities and Objectives of WP

Application of Virtualization Technologies & CernVM. Benedikt Hegner CERN

WP JRA1: Architectures for an integrated and interoperable AAI

IEPSAS-Kosice: experiences in running LCG site

WebSphere 4.0 General Introduction

OEM Provisioning An Introduction

Geographical failover for the EGEE-WLCG Grid collaboration tools. CHEP 2007 Victoria, Canada, 2-7 September. Enabling Grids for E-sciencE

<Insert Picture Here> Enterprise Data Management using Grid Technology

Future Developments in the EU DataGrid

Chapter Outline. Chapter 2 Distributed Information Systems Architecture. Layers of an information system. Design strategies.

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

Scott Lowden SAP America Technical Solution Architect

Virtualizing Oracle 11g/R2 RAC Database on Oracle VM: Methods/Tips

Systemwalker Service Quality Coordinator. Technical Guide. Windows/Solaris/Linux

Chapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems!

Systemwalker Service Quality Coordinator. Technical Guide. Windows/Solaris/Linux

einfrastructures Concertation Event

Management of batch at CERN

Update on EZ-Grid. Priya Raghunath University of Houston. PI : Dr Barbara Chapman

About Database Adapters

CrossGrid testbed status

Openbravo Technology Platform

Fusion Registry 9 SDMX Data and Metadata Management System

An Intelligent Service Oriented Infrastructure supporting Real-time Applications

Enabling the Autonomic Data Center with a Smart Bare-Metal Server Platform

Advanced Continuous Delivery Strategies for Containerized Applications Using DC/OS

Oracle SOA Suite 11g: Build Composite Applications

20532D: Developing Microsoft Azure Solutions

ATEMPO UNITED BACKUP EDITION DATA PROTECTION

UCT Application Development Lifecycle. UCT Business Applications

Brocade Virtual Traffic Manager and Parallels Remote Application Server

Acronis Backup & Recovery 11.5

Compute Infrastructure Management: The Future. Fred van den Bosch CTO, EVP Advanced Technology VERITAS Software Corporation

Towards Network Awareness in LHC Computing

vsphere Installation and Setup Update 2 Modified on 10 JULY 2018 VMware vsphere 6.5 VMware ESXi 6.5 vcenter Server 6.5

Chapter Outline. Chapter 2 Distributed Information Systems Architecture. Distributed transactions (quick refresh) Layers of an information system

Red Hat JBoss Data Virtualization 6.3 Glossary Guide

Developing Microsoft Azure Solutions

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Lecture 23 Database System Architectures

Grid Computing a new tool for science

BSA Sizing Guide v. 1.0

SQL Server on Linux and Containers

High Availability Guide for Distributed Systems

e-infrastructure: objectives and strategy in FP7

Developing Microsoft Azure Solutions (70-532) Syllabus

IBM Tivoli Storage Productivity Center for Disk Midrange Edition V4.1 and IBM Tivoli Monitoring for Virtual Servers V6.2

status Emmanuel Cecchet

Exadata Database Machine: 12c Administration Workshop Ed 2

Grid Computing Middleware. Definitions & functions Middleware components Globus glite

An Introduction to Cloud Computing with OpenNebula

Considerations for a grid-based Physics Analysis Facility. Dietrich Liko

ACS-WG Installable Unit Deployment Descriptor

Scaling for the Enterprise

Exadata Database Machine: 12c Administration Workshop Ed 2 Duration: 5 Days

Exadata Database Machine Administration Workshop

MONitoring Agents using a Large Integrated Services Architecture. Iosif Legrand California Institute of Technology

Introduction to GT3. Introduction to GT3. What is a Grid? A Story of Evolution. The Globus Project

Travelling securely on the Grid to the origin of the Universe

Transcription:

ELFms industrialisation plans CERN openlab workshop 13 June 2005 German Cancio CERN IT/FIO http://cern.ch/elfms ELFms industrialisation plans, 13/6/05 Outline Background What is ELFms Collaboration with industry Challenges ELFms industrialisation plans - n 2 1

Background A collaboration agreement is being worked out between CERN and an European SME Leading provider of cluster systems Aim: further development of CERN s Extremely Large Fabric management system (ELFms) in an industrial context Example of how a software toolsuite initiated by EGEE s predecessor (EDG) is developed, deployed and now being industrialised Inspiration for potential EGEE spin-offs? ELFms industrialisation plans - n 3 Fabric Management with ELFms ELFms stands for Extremely Large Fabric management system Subsystems: : configuration, installation and management of nodes : system / service monitoring : hardware / state management Node Configuration Management Node Management ELFms manages and controls most of the nodes in the CERN CC ~2400 nodes out of ~ 3100 Multiple functionality and cluster size (batch nodes, disk servers, tape servers, DB, web, ) Heterogeneous hardware (CPU, memory, HD size,..) Supported OS: RH Linux / Scientific Linux (on i386,ia64,x86_64) and Solaris 9 ELFms industrialisation plans - n 4 2

http://quattor.org ELFms industrialisation plans - n 5 Quattor Quattor takes care of the configuration, installation and management of fabric cluster and nodes A Configuration Database holds the desired state of all fabric elements Node setup (CPU, HD, memory, software RPMs/PKGs, network, system services, location, audit info ) Cluster (name and type, batch system, load balancing info ) and site-wide attributes Defined in templates arranged in hierarchies common properties set only once Autonomous management agents running on the node for Base installation Service (re-)configuration Software installation and management Quattor addresses heterogeneity Functionality Platforms/OS Hardware Quattor addresses scalability Management of O(10K) nodes with proxy infrastructure ELFms industrialisation plans - n 6 3

Architecture CLI GUI scripts Configuration server SQL SQL backend SOAP CDB XML backend HTTP XML configuration profiles SW server(s) SW Repository HTTP RPMs Node Configuration Manager NCM CompA CompB CompC ServiceA ServiceB ServiceC RPMs / PKGs SW Package Manager SPMA base OS HTTP / PXE Install server Install Manager System installer Managed Nodes ELFms industrialisation plans - n 7 http://cern.ch/lemon ELFms industrialisation plans - n 8 4

Lemon LHC Era Monitoring Performance and exception monitoring of nodes and clusters Distributed system, scalable to O(10K) nodes Provides active monitoring of software and hardware Facilitates early error detection and problem prevention Executes corrective actions Global and local correlation and fault tolerance engines Provides persistent storage of monitoring data Oracle or flat-file based back-end holding all event history ELFms industrialisation plans - n 9 Architecture Correlation Engines SOAP Repository backend Monitoring Repository SQL SOAP RRDTool / PHP apache TCP/UDP HTTP Nodes Monitoring Agent Lemon CLI Web browser Sensor Sensor Alarm/ recovery User User Workstations ELFms industrialisation plans - n 10 5

ELFms industrialisation plans - n 11 http://cern.ch/leaf ELFms industrialisation plans - n 12 6

LEAF - LHC Era Automated Fabric LEAF is a collection of workflows for high level node hardware and state management, on top of Quattor and LEMON: HMS (Hardware Management System): Track systems through all physical steps in lifecycle eg. installation, moves, vendor calls, retirement Automatically requests installs, retires etc. to technicians GUI to locate equipment physically SMS (State Management System): Automated handling (and tracking of) high-level configuration steps E.g. progressively reconfigure / reboot cluster nodes for new kernel and/or physical move LEAF implementation is CERN specific, but concepts and design should be generic ELFms industrialisation plans - n 13 Development / Deployment ELFms (Quattor/Lemon) were started in the scope of EU DataGrid. Development is now coordinated by CERN/IT in collaboration with other HEP institutes Quattor/Lemon are used in production in/outside CERN LCG T1/T2 sites, ranging from 50-800 nodes/site Complete configuration of system and LCG Grid middleware via Quattor Integration with Grid services e.g. monitoring (GridICE, MonALISA) ELFms industrialisation plans - n 14 7

Collaboration with Industry Collaboration between CERN and European SME partner Mutual interest: SME in ELFms: Profit from innovations in Quattor/Lemon architecture and design, in particular scalability and heterogeneity Extend it for remote (WAN), secure, management of sites (University institutes, company offices, etc) Make a product out of it: easy-to-use interfaces, setup wizards, advanced GUI s; integrate with own HPC technology and tools ELFms in SME: Leverage knowledge in GUI s, easy-to-use interfaces, setup wizards Remote management of sites (LCG T2 sites, online experiments) Technology Transfer: demonstrate returns to industry of research results ELFms industrialisation plans - n 15 Collaboration with Industry (II) Framework: PPARC s Industrial Programme Support Scheme (PIPSS) Grant scheme for collaboration between UK industry and researchers in HEP and related fields Duration: 2 years Manpower: 2 FTE / year 1 FTE funded by PPARC, another FTE coming from SME Work Packages: WP1. Technology transfer of existing ELFms technology WP2: Development of secure remote management functionality WP3: Capabilities for commercial product ELFms industrialisation plans - n 16 8

Collaboration with Industry (III) Exploitable results: 1. Freely-available (open source) version of Quattor/Lemon with advanced functionality 2. Commercially supported version of Quattor/Lemon, integrated with added-value proprietary extensions by SME 3. Use of Quattor/Lemon within a major european engineering company (SME customer), setting up an internal Grid encompassing clusters at multiple geographical sites Status Research Proposal (contains actual project milestones and deliverables) finished Collaboration agreement being worked out ELFms industrialisation plans - n 17 Challenges Evolution of core ELFms software How to ensure a compatible evolution of core software and commercial extensions Definition of standards and stable API s Software licensing schema, IP rights Original EDG software: BSD-like license Current ELFms continues with EDG license Licensing schema for (part of) collaboration results may be different ELFms collaborating institutes Define relationships and re-define responsibilities Cultural differences HEP vs. industry ELFms industrialisation plans - n 18 9