STATUS OF PLANS TO USE CONTAINERS IN THE WORLDWIDE LHC COMPUTING GRID

Similar documents
Singularity in CMS. Over a million containers served

Singularity tests at CC-IN2P3 for Atlas

Centre de Calcul de l Institut National de Physique Nucléaire et de Physique des Particules. Singularity overview. Vanessa HAMAR

Singularity tests. ADC TIM at CERN 21 Sep E. Vamvakopoulos

High Performance Containers. Convergence of Hyperscale, Big Data and Big Compute

Scientific Computing on Emerging Infrastructures. using HTCondor

Andrej Filipčič

Opportunities for container environments on Cray XC30 with GPU devices

Volunteer Computing at CERN

State of Containers. Convergence of Big Data, AI and HPC

RADU POPESCU IMPROVING THE WRITE SCALABILITY OF THE CERNVM FILE SYSTEM WITH ERLANG/OTP

Clouds at other sites T2-type computing

Lightweight scheduling of elastic analysis containers in a competitive cloud environment: a Docked Analysis Facility for ALICE

ViryaOS RFC: Secure Containers for Embedded and IoT. A proposal for a new Xen Project sub-project

Allowing Users to Run Services at the OLCF with Kubernetes

Virtualizing a Batch. University Grid Center

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

Clouds in High Energy Physics

The ATLAS PanDA Pilot in Operation

SKA SDP-COMP Middleware: The intersect with commodity computing. Piers Harding // February, 2017

OS Virtualization. Linux Containers (LXC)

Application of Virtualization Technologies & CernVM. Benedikt Hegner CERN

Global Software Distribution with CernVM-FS

DIRAC pilot framework and the DIRAC Workload Management System

Interoperating AliEn and ARC for a distributed Tier1 in the Nordic countries.

CouchDB-based system for data management in a Grid environment Implementation and Experience

Singularity: Containers for High-Performance Computing. Grigory Shamov Nov 21, 2017

Evolution of Database Replication Technologies for WLCG

Presented By: Gregory M. Kurtzer HPC Systems Architect Lawrence Berkeley National Laboratory CONTAINERS IN HPC WITH SINGULARITY

LHCb experience running jobs in virtual machines

Container Adoption for NFV Challenges & Opportunities. Sriram Natarajan, T-Labs Silicon Valley Innovation Center

Introduction to Containers

13th International Workshop on Advanced Computing and Analysis Techniques in Physics Research ACAT 2010 Jaipur, India February

Red Hat Atomic Details Dockah, Dockah, Dockah! Containerization as a shift of paradigm for the GNU/Linux OS

The LHC Computing Grid

CS-580K/480K Advanced Topics in Cloud Computing. Container III

Employing HPC DEEP-EST for HEP Data Analysis. Viktor Khristenko (CERN, DEEP-EST), Maria Girone (CERN)

Evolution of Database Replication Technologies for WLCG

Batch Services at CERN: Status and Future Evolution

How Container Runtimes matter in Kubernetes?

The ATLAS Trigger Simulation with Legacy Software

RDMA Container Support. Liran Liss Mellanox Technologies

CERN: LSF and HTCondor Batch Services

Deployment Patterns using Docker and Chef

The four forces of Cloud Native

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

Docker und IBM Digital Experience in Docker Container

OS Security III: Sandbox and SFI

AutoPyFactory: A Scalable Flexible Pilot Factory Implementation

Towards Reproducible Research Data Analyses in LHC Particle Physics

Shifter at CSCS Docker Containers for HPC

How to Put Your AF Server into a Container

Simple custom Linux distributions with LinuxKit. Justin Cormack

First Experience with LCG. Board of Sponsors 3 rd April 2009

Container mechanics in Linux and rkt FOSDEM 2016

Status of KISTI Tier2 Center for ALICE

Scheduling Computational and Storage Resources on the NRP

A Container On a Virtual Machine On an HPC? Presentation to HPC Advisory Council. Perth, July 31-Aug 01, 2017

[Docker] Containerization

Flip the Switch to Container-based Clouds

PoS(EGICF12-EMITC2)106

WLCG Lightweight Sites

Kubernetes Integration with Virtuozzo Storage

The failure of Operating Systems,

Building A Better Test Platform:

Containers Infrastructure for Advanced Management. Federico Simoncelli Associate Manager, Red Hat October 2016

Linux Containers Roadmap Red Hat Enterprise Linux 7 RC. Bhavna Sarathy Senior Technology Product Manager, Red Hat

Docker for HPC? Yes, Singularity! Josef Hrabal

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

Deploy containers on your cluster - A proof of concept

S INSIDE NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORK CONTAINERS

Making Immutable Infrastructure simpler with LinuxKit. Justin Cormack

Status and Roadmap of CernVM

CernVM-FS beyond LHC computing

CLOUDS OF JINR, UNIVERSITY OF SOFIA AND INRNE JOIN TOGETHER

Geant4 on Azure using Docker containers

Preparing for High-Luminosity LHC. Bob Jones CERN Bob.Jones <at> cern.ch

ISTITUTO NAZIONALE DI FISICA NUCLEARE

Grid Computing Activities at KIT

One Pool To Rule Them All The CMS HTCondor/glideinWMS Global Pool. D. Mason for CMS Software & Computing

New strategies of the LHC experiments to meet the computing requirements of the HL-LHC era

Running Jobs in the Vacuum

Using CernVM-FS to deploy Euclid processing S/W on Science Data Centres

Overview. About CERN 2 / 11

The new Docker networking put into action to spin up a SLURM cluster

Container Pods with Docker Compose in Apache Mesos

Database Services at CERN with Oracle 10g RAC and ASM on Commodity HW

Table of Contents 1.1. Introduction. Overview of vsphere Integrated Containers 1.2

Containers for NFV Peter Willis March 2017

Oh.. You got this? Attack the modern web

HTCondor Week 2015: Implementing an HTCondor service at CERN

Running MarkLogic in Containers (Both Docker and Kubernetes)

Cloud du CCIN2P3 pour l' ATLAS VO

Summary of the LHC Computing Review

Deploying virtualisation in a production grid

Use of containerisation as an alternative to full virtualisation in grid environments.

What s new in HTCondor? What s coming? HTCondor Week 2018 Madison, WI -- May 22, 2018

Virtualization. A very short summary by Owen Synge

Five years of OpenStack at CERN

Transcription:

The WLCG Motivation and benefits Container engines Experiments status and plans Security considerations Summary and outlook STATUS OF PLANS TO USE CONTAINERS IN THE WORLDWIDE LHC COMPUTING GRID SWISS EXPERIENCE Gianfranco Sciacca AEC - Laboratory for High Energy Physics, University of Bern, Switzerland hpc-ch Forum on Containers for HPC - 26 October 2017, Paul Scherrer Institut

THE WLCG 2 http://wlcg-public.web.cern.ch The Worldwide LHC Computing Grid (WLCG) project is a global collaboration of more than 170 computing centres in 42 countries, linking up national and international grid infrastructures. The mission of the WLCG project is to provide global computing resources to store, distribute and analyse the ~50 Petabytes of data expected in 2017, generated by the Large Hadron Collider (LHC) at CERN on the Franco-Swiss border. WLCG is co-ordinated by CERN. It is managed and operated by a worldwide collaboration between the experiments (ALICE, ATLAS, CMS and LHCb) and the participating computer centres.

MOTIVATION AND BENEFITS 1/3 3 General Aim to: reduce operational effort on the sites, and to meet the needs of the ongoing analysis reproducibility work going on in the experiments. One of the current systems limitation: all jobs use the same root filesystem as the host i.e. workloads tied to the host OS an SL6 host can only run SL6 workloads software/os dictated by the LHC experiments slow evolution because of stability needed for data taking

MOTIVATION AND BENEFITS 2/3 4 Containers are similar to VMs but more flexible and no performance loss Native performance as compared to true virtualisation (also on HPC systems) Effectively using a custom set of OS libs and software, apart from sharing the kernel with host OS (similar to chroot) Provide independence of the execution environment from the OS Isolate experiments from site choices/upgrades Isolate sites from experiment constraints Make it easy to create test environments Several different environments can be used at the same time on the same site Common approach for execution, software distribution for all sites (including HPC)

MOTIVATION AND BENEFITS 3/3 5 Isolation of payload payload jobs cannot see other processes on the host or even processes from the pilot payload jobs cannot see any files from the pilot PID namespace Network configuration UID/GID Filesystem, device access

CONTAINER ENGINES 1/2 6 Main products : Docker, Singularity Dominated by Docker, better suited for full encapsulation Analogous of VMs, full software and environment stack Arbitrary workflow or service execution Docker instances can be long lived - service deployment model Or application oriented - execution of complex workflows Provisioning model similar to VM Singularity: new engine from the HPC world, very lightweight, removing the unnecessary parts from Docker in our context OS encapsulation but use as much as possible from host OS, lower initialisation latency Designed for batch job execution, focusing on simplicity and minimal configuration E.g.: singularity <OS Image> <command> Can also run in user space (no SUID) with limited functionality (eg no bind mounting)

CONTAINER ENGINES 2/2 7 Docker and singularity can use identical images, the usage of either is purely context dependent Singularity is better suited for simple job execution at sites, while docker requires more complex deployment and more privileges on a site Other engines not as popular in our community: Linux containers (lxc), Rocket (rkt), systemd Ability to build container clusters with orchestrators Mesos (good for long-running service), Kubernetes (availability to build small clusters),...

EXPERIMENTS PLANS AND STATUS: ALICE 1/1 8 ALICE still familiarising with the technology First step is deploying the experiment specific services at sites (VO-box) in a container Currently in pre-production Find that singularity is a simple and good isolation method Currently in the development plan Expect to be integrated in the Job Agent (pilot) code by end of 2017 Container setup: centralised and simple container for jobs would be the best approach

EXPERIMENTS PLANS AND STATUS: ATLAS 1/2 9 ATLAS plan is to use both Singularity: Well suited to be used everywhere making the site SW specifics irrelevant to ATLAS Jobs can be executed on every site regardless of the site OS and do not require any customisation at the site. Site upgrades decoupled from ATLAS SW requirements ATLAS can use several OS versions at the same time matching the experiment software release requirements E.g.: Run-1 analysis on SLC5, SLC6 images, Run-2 on SLC6, CC7,... Docker: Currently the best way to encapsulate more complex tasks, such as software development/testing or analysis preservation

EXPERIMENTS PLANS AND STATUS: ATLAS 2/2 10 Current deployment focus: Singularity should be widely deployed on most of the computing resources both pledged an opportunistic Job step containerised execution: Stagein, RunJob, stageout pilot steps are each executed in a separate container instance Proof of concept tested, will need more pilot code refactoring Docker is for now not considered yet, although some proactive sites are already supporting it To be addressed in 2018 Several sites (~10) are using singularity already for ATLAS production, although in a way which is not controlled by ATLAS Eg: forcing automatic execution of all ATLAS jobs in SLC6 containers Some HPCs using shifter with Docker (WNs run as Docker containers)

EXPERIMENTS PLANS AND STATUS: CMS 1/1 11 CMS plan is to use singularity: Provides the isolation needed by CMS, does not do resource management (the batch system does) No daemons, no UID switching (glexec) Easy to install: default configuration is OK, no need to edit config files User gains no privilege being inside the container E.g. all setuid binaries disabled in the container Will allow to decouple the OS installed (and used by the pilot) from the one used to execute the payload The pilot is in charge of instantiating the appropriate container: can use a different container for each payload it schedules CMS decided to use Docker images rather than native ones allows to easily import the images into their own distribution system

EXPERIMENTS PLANS AND STATUS: LHCB 1/2 12 LHCb plan to use singularity, but not as a hard requirement yet: LHCb work a lot with specially defined VMs and use their own VM provisioning engine Working on integrating the singularity functionality into their own WM pilot-based framework Isolation: no more UID switching to run each payload (glexec) in the pilot Useful, but not a hard requirement for provisioning SL6 on CentOS7 worker nodes. Docker and VM regarded as possible alternatives Now developing a generic LHCb container definition based on their VM experience Uses Docker and the generic CERN root image Overlays as needed LHCb specific setup scripts, sourcing minimal dependancies from the CernVM-FS and/or from their Web servers Must be compatible with the current VM provisioning engine Do not expect to need special LHCb images

EXPERIMENTS PLANS AND STATUS: LHCB 2/2 Longer term: containers as a user job format? 13 There is a lot of interest from LHCb users in packaging their jobs in, say, Docker images Allows reuse of other people s code and management of what the user has changed Makes analysis more reproducible and easier to recreate in the future Asses effort vs benefit

SECURITY CONSIDERATIONS Strengths and Issues 14 Benefits Containers decouple provisioning and experiments OS/library independent from experiments No experiment libraries leaking to provisioning Containers provide a better isolation than UID switch (glexec) WN processes and files invisible/not accessible cgroups to manage resources used Potential issues Young technology: new classes of bugs in the kernel, missing support and the ecosystem changing fast Most kernel bugs can still be exploited with containers: still need the ability to do emergency updates Singularity is still SUID: could disappear in the future but a sysctl configuration might be needed Disabling suid will disable OverlayFS Singularity is an attractive technology to replace the UID switch, but would rely on kernel security updates No central service required: simpler configuration means less failures but at the price of no traceability to the end-user. It needs the experiments to do the appropriate logging (some do it already) Potential impact on the way central banning is done: move from site-based central banning to VObased central banning

SUMMARY AND OUTLOOK 15 Use of containers in WLCG The 4 LHC experiments all plan to leverage the container technology to some extent. Pretty much work in progress for all of them Singularity with Docker based images is the prevailing trend at the moment WLCG looking at co-ordinating the efforts to the possible extent. Green light given to wide deployment of singularity Experiments should collect the experience, site specific requirements or configuration specifics in the next few months Main immediate benefits: Decouple experiment needs from provisioning Isolation Longer term Reproducibility of analysis Containers as user job format Points to evaluate: Can the experiments use common images? Are unprivileged containers enough (can run completely in user space)? Are there custom configuration requirements on sites? Some sites already using containers on their own initiative Some security issues to keep under the radar