WLCG Lightweight Sites

Similar documents
LHCb experience running jobs in virtual machines

INDIGO PAAS TUTORIAL. ! Marica Antonacci RIA INFN-Bari

What s new in HTCondor? What s coming? HTCondor Week 2018 Madison, WI -- May 22, 2018

Clouds in High Energy Physics

Clouds at other sites T2-type computing

Singularity in CMS. Over a million containers served

Virtualization of the ATLAS Tier-2/3 environment on the HPC cluster NEMO

Advanced Continuous Delivery Strategies for Containerized Applications Using DC/OS

EGEE and Interoperation

IBM Cloud for VMware Solutions vrealize Automation 7.2 Chef Integration

Andrej Filipčič

Introducing the HTCondor-CE

Singularity tests at CC-IN2P3 for Atlas

VC3. Virtual Clusters for Community Computation. DOE NGNS PI Meeting September 27-28, 2017

CERN: LSF and HTCondor Batch Services

HTCondor Week 2015: Implementing an HTCondor service at CERN

DevOps Technologies. for Deployment

Kubernetes made easy with Docker EE. Patrick van der Bleek Sr. Solutions Engineer NEMEA

SQL Server inside a docker container. Christophe LAPORTE SQL Server MVP/MCM SQL Saturday 735 Helsinki 2018

Using Puppet to contextualize computing resources for ATLAS analysis on Google Compute Engine

On-demand provisioning of HEP compute resources on cloud sites and shared HPC centers

TRAINING AND CERTIFICATION UPDATE

VMs at a Tier-1 site. EGEE 09, Sander Klous, Nikhef

STATUS OF PLANS TO USE CONTAINERS IN THE WORLDWIDE LHC COMPUTING GRID

HOW TO STAND OUT IN DEVOPS

ALICE Grid Activities in US

Cloud & container monitoring , Lars Michelsen Check_MK Conference #4

Geant4 on Azure using Docker containers

Important DevOps Technologies (3+2+3days) for Deployment

Automated Deployment of Private Cloud (EasyCloud)

Interfacing HTCondor-CE with OpenStack: technical questions

Status of KISTI Tier2 Center for ALICE

glideinwms architecture by Igor Sfiligoi, Jeff Dost (UCSD)

Accelerate at DevOps Speed With Openshift v3. Alessandro Vozza & Samuel Terburg Red Hat

RED HAT GLUSTER TECHSESSION CONTAINER NATIVE STORAGE OPENSHIFT + RHGS. MARCEL HERGAARDEN SR. SOLUTION ARCHITECT, RED HAT BENELUX April 2017

Containerized Cloud Scheduling Environment

LENS Server Maintenance Guide JZ 2017/07/28

Deploying virtualisation in a production grid

glite Grid Services Overview

ElastiCluster Automated provisioning of computational clusters in the cloud

VMware s (Open Source) Way of Container. Dr. Udo Seidel

glideinwms: Quick Facts

[MS20533]: Implementing Microsoft Azure Infrastructure Solutions

UP! TO DOCKER PAAS. Ming

Simple custom Linux distributions with LinuxKit. Justin Cormack

WHITE PAPER. RedHat OpenShift Container Platform. Benefits: Abstract. 1.1 Introduction

EASILY DEPLOY AND SCALE KUBERNETES WITH RANCHER

arxiv: v1 [cs.dc] 7 Apr 2014

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Red Hat Roadmap for Containers and DevOps

The Long Road from Capistrano to Kubernetes

70-532: Developing Microsoft Azure Solutions

Implementing the Twelve-Factor App Methodology for Developing Cloud- Native Applications

g-eclipse A Framework for Accessing Grid Infrastructures Nicholas Loulloudes Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac.

RightScale 2018 State of the Cloud Report DATA TO NAVIGATE YOUR MULTI-CLOUD STRATEGY

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

Automatic Dependency Management for Scientific Applications on Clusters. Ben Tovar*, Nicholas Hazekamp, Nathaniel Kremer-Herman, Douglas Thain

OpenStack Magnum Pike and the CERN cloud. Spyros

Jenkins: A complete solution. From Continuous Integration to Continuous Delivery For HSBC

A comparison of performance between KVM and Docker instances in OpenStack

AGENDA. 13:30-14:25 Gestion des patches, du provisionning et de la configuration de RHEL avec Satellite 6.1, par Michael Lessard, Red Hat

Getting Started With Containers

Developing and Testing Java Microservices on Docker. Todd Fasullo Dir. Engineering

What s new in HTCondor? What s coming? European HTCondor Workshop June 8, 2017

[Docker] Containerization

RED HAT CLOUD STRATEGY (OPEN HYBRID CLOUD) Ahmed El-Rayess Solutions Architect

Workload management at KEK/CRC -- status and plan

A curated Domain centric shared Docker registry linked to the Galaxy toolshed

Comparison of Service Description and Composition for Complex 3-tier Cloud-based Services

Containers, Serverless and Functions in a nutshell. Eugene Fedorenko

IBM Bluemix compute capabilities IBM Corporation

Taming your heterogeneous cloud with Red Hat OpenShift Container Platform.

Flying HTCondor at 100gbps Over the Golden State

Moab Workload Manager on Cray XT3

Using DC/OS for Continuous Delivery

DevOps Course Content

Implementing Microsoft Azure Infrastructure Solutions (20533)

Build Cloud like Rackspace with OpenStack Ansible

/ Cloud Computing. Recitation 5 September 27 th, 2016

Integration of Cloud and Grid Middleware at DGRZR

YAIM: The glite configuration tool

BOSCO Architecture. Derek Weitzel University of Nebraska Lincoln

Docker and Oracle Everything You Wanted To Know

Scientific Cluster Deployment and Recovery Using puppet to simplify cluster management

A WEB-BASED SOLUTION TO VISUALIZE OPERATIONAL MONITORING LINUX CLUSTER FOR THE PROTODUNE DATA QUALITY MONITORING CLUSTER

ATLAS Tier-3 UniGe

Installation runbook for Hedvig + Cinder Driver

KUBERNETES IN A GROWN ENVIRONMENT AND INTEGRATION INTO CONTINUOUS DELIVERY

Scientific Computing on Emerging Infrastructures. using HTCondor

LGTM Enterprise System Requirements. Release , August 2018

What s Up Docker. Presented by Robert Sordillo Avada Software

AutoPyFactory: A Scalable Flexible Pilot Factory Implementation

Building/Running Distributed Systems with Apache Mesos

Setup Desktop Grids and Bridges. Tutorial. Robert Lovas, MTA SZTAKI

ISTITUTO NAZIONALE DI FISICA NUCLEARE

Application of Virtualization Technologies & CernVM. Benedikt Hegner CERN

Ingress Kubernetes Tutorial

SUSE Cloud Admin Appliance Walk Through. You may download the SUSE Cloud Admin Appliance the following ways.

CONTINUOUS DELIVERY WITH DC/OS AND JENKINS

/ Cloud Computing. Recitation 5 February 14th, 2017

Transcription:

WLCG Lightweight Sites Mayank Sharma (IT-DI-LCG) 3/7/18 Document reference 2

WLCG Sites Grid is a diverse environment (Various flavors of CE/Batch/WN/ +various preferred tools by admins for configuration/maintenance) 3/7/18 Document reference 3

WLCG Sites Site Admin : require significant insight into middleware and services for installing/configuring/maintaining grid services and infrastructure Easier for bigger and experienced sites (T1 and many T2). Not very intuitive for smaller/ newer sites. 3/7/2018 Document reference 4 4

Potential of smaller sites 400k 300k 200k SixTrack 100k 8 th BOINC Pentathlon 2017 5

Survey results! Link : http://cern.ch/go/rhv9 51 Sites respond to the questionnaire that shows potential benefits of shared repositories Strong support for Docker, Puppet, OpenStack images 3/7/2018 Document reference 6

WLCG Lightweight Sites We would like to have sites that can run with minimal oversight and operational efforts from people at the site. They run almost by themselves. Provide resources with preferred technology with less effort (configuration management, maintenance etc.) Keep things basically the same for us, but easier for admins 3/7/2018 Document reference 7

Lightweight Site Principles 1. Abstraction: to abstract the nuances of several popular CE/WN/Batch technologies as much as possible from site-admin s view. 2. Modular Design: Allow admins to use existing and popular tools for setting up their sites. 3. Simple Deployment: Packaged into containers/ VM s for easy distribution and deployment 3/7/2018 Document reference 8

Lightweight Sites Principles 4. Centrally Configurable: Instead of individually configuring components on nodes, configure everything at site level rather 5. Extendable: A community driven effort to develop implementations for various CE/Batch in parallel 3/7/2018 Document reference 9

Lightweight Site Specification Describe the components of lightweight sites. Main function of the component Configuration parameters Communication protocols to interact with each other Repository structure for modular implementations Deployment/Release processes Maintenance guidelines 3/7/2018 Document reference 10

Lightweight Sites Components 1. Site Level Configuration File 2. Repositories for containers of different CE/Batch/WN 3. Configuration Validation System 4. Central Configuration Manager 5. Networking strategy 6. File System (CVMFS)/ Caches 3/7/2018 Document reference 11

Site Level Configuration File A site-level YAML file to describe: 1. Site Infrastructure: 1. Hostnames, IP Addresses, OS/Kernel, SSH access, Disk/ Memory/ CPU/ Network information 2. Grid Components: 1. Site Components: CE/Batch/WN/Middleware etc. 2. What to use(arc, Condor, Slurm) and what versions 3. Node on which they should be configured. 4. Component specific configuration( fetched from component repositories) 3/7/2018 Document reference 12

Site Level Configuration File 3. Generic Site Info: Users, Groups, VO s, Host Certificates 4. Misc Site Info: security emails, support emails etc. 5. Background Technologies: preferred tools for container orchestration( Kubernetes, Docker Swarm)/ configuration management(puppet, Ansible) to be used for configuring the site. 3/7/2018 Document reference 13

Site Level Configuration File component-info: ce: repo: https://github.com/... type: {cream, arc, condor, other} hostnames: { ce-01.domain, ce-02.domain } wn: repo: https://github.com/... type: {pbs, condor, slurm} hostnames: { wn-01.domain, wn-02.domain } batch: repo: https://github.com/... type: {pbs, condor} 3/7/2018 Document reference 14

Component Repositories Publicly hosted repositories on GitHub that provide code for the images of CE/WN/Batch/Squid etc. meta information for configuration of images using different configuration management tools 1 repository for every component (for instance, CreamCE, CondorCE, Torque, Slurm reside in separate repositories) 3/7/2018 Document reference 15

Component Repositories Repository Structure #General Component Description component : "Cream-CE" schema: "cream-info" version : 2.7 type : cream" cream-info: wn_list: hostname_array() users_conf: user_os() groups_conf: group_os() Enforced by Configuration Validation Engine 3/7/2018 Document reference 16

Configuration Validation configuration validation engine to ensure information supplied in site configuration file: meets the configuration requirements of desired site component is realizable on the available infrastructure using available background technologies. http://cern.ch/go/cvs8 Possibility to inject custom validation rules 3/7/2018 Document reference 17

Central Configuration Manager The main module for centrally configuring everything Uses Validation Engine to check siteconfiguration file Checks status of available Site Infrastructure that needs to be orchestrated Installs and configures Grid components from the repositories 3/7/2018 Document reference 18

Central Configuration Manager Implements a Networking strategy (overlay/ dedicated) Ensures availability of the File System (CVMFS) and Caches to the containers Runs Tests like submitting jobs to check for success or failure of site configuration 3/7/2018 Document reference 19

Specification: Put it together 1 2 3 4 5 3/7/2018 Document reference 20

1 st Implementation 1 2 4 3 5 3/7/2018 Document reference 21

Implementation Status CE: Cream Batch: Torque WN: Torque client Background Technologies: Docker (containers) Docker Swarm( container orchestration) Puppet (configuration management) Infrastructure: CERN OpenStack/ Public cloud infrastructure providers (not yet final) 3/7/2018 Document reference 22

Implementation Status Central Configuration Manager: Puppet Configuration Validation Engine: Python command line utility Overall Status: Complete: Containers for CreamCE, TorqueWN (test job. ) YAIM based configuration of containers En-route (within 2 weeks): Public repositories Puppet module for central config management More documentation 3/7/2018 Document reference 23

2 nd Implementation: GSoC 2018 2 Google Summer of Code 2018 Projects Background Technologies: Docker (containers) Kubernetes (container orchestration) Ansible (configuration management) Timeline (May 2018 September 2018) 3/7/2018 Document reference 24

Supporting new components Modular design can support ARC, SLURM, Condor etc. New repository for the components Dockerfile: instructions for setting up OS, relevant packages, middleware. Entrypoint/ init script: used by container to configure itself on startup based on information available through the central configuration module. 3/7/2018 Document reference 25

Community Technical Discussion List (E-Groups): Name: WLCG-Lightweight-Sites-Dev Link: http://cern.ch/go/l9wz Google Group (Open Source Community) Name: WLCG Lightweight Sites Link: http://cern.ch/go/hz7s 3/7/2018 Document reference 26

Other Lightweight ideas Not classic grid sites. Regional HTCondor Pools. Small sites boot up containers that connect to the regional pool for workloads 1 or 2 proof of concepts exist On our roadmap after release of version 1.0.0 3/7/2018 Document reference 27

Questions 3/7/2018 Document reference 28