CernVM-FS beyond LHC computing

Similar documents
CernVM-FS. Catalin Condurache STFC RAL UK

Security in the CernVM File System and the Frontier Distributed Database Caching System

Software installation and condition data distribution via CernVM File System in ATLAS

Evaluation of the Huawei UDS cloud storage system for CERN specific data

Using Puppet to contextualize computing resources for ATLAS analysis on Google Compute Engine

Recent Developments in the CernVM-File System Server Backend

Application of Virtualization Technologies & CernVM. Benedikt Hegner CERN

Using CernVM-FS to deploy Euclid processing S/W on Science Data Centres

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

ATLAS Nightly Build System Upgrade

The evolving role of Tier2s in ATLAS with the new Computing and Data Distribution model

Using S3 cloud storage with ROOT and CvmFS

Recent Developments in the CernVM-FS Server Backend

An Analysis of Storage Interface Usages at a Large, MultiExperiment Tier 1

Overview of ATLAS PanDA Workload Management

RADU POPESCU IMPROVING THE WRITE SCALABILITY OF THE CERNVM FILE SYSTEM WITH ERLANG/OTP

Global Software Distribution with CernVM-FS

The ATLAS EventIndex: an event catalogue for experiments collecting large amounts of data

13th International Workshop on Advanced Computing and Analysis Techniques in Physics Research ACAT 2010 Jaipur, India February

LHCb Distributed Conditions Database

The ATLAS PanDA Pilot in Operation

AGIS: The ATLAS Grid Information System

Distributing storage of LHC data - in the nordic countries

CernVM a virtual software appliance for LHC applications

ATLAS software configuration and build tool optimisation

LHCb experience running jobs in virtual machines

CouchDB-based system for data management in a Grid environment Implementation and Experience

The Grid: Processing the Data from the World s Largest Scientific Machine

The Legnaro-Padova distributed Tier-2: challenges and results

Benchmarking the ATLAS software through the Kit Validation engine

A short introduction to the Worldwide LHC Computing Grid. Maarten Litmaath (CERN)

Testing an Open Source installation and server provisioning tool for the INFN CNAF Tier1 Storage system

File Access Optimization with the Lustre Filesystem at Florida CMS T2

Evolution of Cloud Computing in ATLAS

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

Singularity in CMS. Over a million containers served

Evolution of Database Replication Technologies for WLCG

Clouds in High Energy Physics

N. Marusov, I. Semenov

IEPSAS-Kosice: experiences in running LCG site

Visita delegazione ditte italiane

arxiv: v1 [cs.dc] 7 Apr 2014

Data services for LHC computing

Reliability Engineering Analysis of ATLAS Data Reprocessing Campaigns

Grid Data Management

Giovanni Lamanna LAPP - Laboratoire d'annecy-le-vieux de Physique des Particules, Université de Savoie, CNRS/IN2P3, Annecy-le-Vieux, France

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

WLCG Transfers Dashboard: a Unified Monitoring Tool for Heterogeneous Data Transfers.

The ATLAS Tier-3 in Geneva and the Trigger Development Facility

Long Term Data Preservation for CDF at INFN-CNAF

The PanDA System in the ATLAS Experiment

Maintaining Traceability in an Evolving Distributed Computing Environment

Constant monitoring of multi-site network connectivity at the Tokyo Tier2 center

Interoperating AliEn and ARC for a distributed Tier1 in the Nordic countries.

Geant4 Computing Performance Benchmarking and Monitoring

Geant4 on Azure using Docker containers

First Experience with LCG. Board of Sponsors 3 rd April 2009

Virtual machine provisioning, code management, and data movement design for the Fermilab HEPCloud Facility

Monitoring System for the GRID Monte Carlo Mass Production in the H1 Experiment at DESY

Streamlining CASTOR to manage the LHC data torrent

System upgrade and future perspective for the operation of Tokyo Tier2 center. T. Nakamura, T. Mashimo, N. Matsui, H. Sakamoto and I.

Online data storage service strategy for the CERN computer Centre G. Cancio, D. Duellmann, M. Lamanna, A. Pace CERN, Geneva, Switzerland

A Simulation Model for Large Scale Distributed Systems

The LHC Computing Grid. Slides mostly by: Dr Ian Bird LCG Project Leader 18 March 2008

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

High Performance Computing Course Notes Grid Computing I

Grid and Cloud Activities in KISTI

Scientific Cluster Deployment and Recovery Using puppet to simplify cluster management

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science

System Specification

Data preservation for the HERA experiments at DESY using dcache technology

Travelling securely on the Grid to the origin of the Universe

Exploring cloud storage for scien3fic research

DIRAC pilot framework and the DIRAC Workload Management System

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

HammerCloud: A Stress Testing System for Distributed Analysis

The LHC Computing Grid

Bringing ATLAS production to HPC resources - A use case with the Hydra supercomputer of the Max Planck Society

Clouds at other sites T2-type computing

Scalable global grid catalogue for Run3 and beyond

Managing a tier-2 computer centre with a private cloud infrastructure

The Diverse use of Clouds by CMS

ISTITUTO NAZIONALE DI FISICA NUCLEARE

STATUS OF PLANS TO USE CONTAINERS IN THE WORLDWIDE LHC COMPUTING GRID

A scalable storage element and its usage in HEP

UW-ATLAS Experiences with Condor

The High-Level Dataset-based Data Transfer System in BESDIRAC

BSA Best Practices Webinars Distributed Installations Sean Berry Customer Engineering

Grid Computing at the IIHE

Tests of PROOF-on-Demand with ATLAS Prodsys2 and first experience with HTTP federation

STFC Cloud Developments. Ian Collier & Alex Dibbo STFC Rutherford Appleton Laboratory UK Research & Innovation HEPiX Spring, Madison, WI 17 May 2018

SPINOSO Vincenzo. Optimization of the job submission and data access in a LHC Tier2

Grid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms

Monitoring ARC services with GangliARC

Lightweight scheduling of elastic analysis containers in a competitive cloud environment: a Docked Analysis Facility for ALICE

Servicing HEP experiments with a complete set of ready integreated and configured common software components

COURSE OUTLINE MOC : PLANNING AND ADMINISTERING SHAREPOINT 2016

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

IllustraCve Example of Distributed Analysis in ATLAS Spanish Tier2 and Tier3

Transcription:

CernVM-FS beyond LHC computing C Condurache, I Collier STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot, OX11 0QX, UK E-mail: catalin.condurache@stfc.ac.uk Abstract. In the last three years the CernVM File System has transformed the distribution of experiment software to WLCG sites. CernVM-FS removes the need for local installations jobs and performant network fileservers at sites and it often improves performance at the same time. Now established and proven to work at scale, CernVM-FS is beginning to perform a similar role for non-lhc computing. The deployment of CernVM-FS service at RAL Tier-1 is presented, as well as the proposed development of a network of Stratum-0 and replicas somewhat modelled upon the infrastructure developed to support the WLCG computing. A case study of one non-lhc Virtual Organization is also included, describing their use of the CernVM-FS Stratum-0 service, along with a web interface intended to be used as a tool to upload software at Stratum-0 sites. 1. What is CernVM-FS? The CernVM File System (CernVM-FS) [1] is a read-only, globally distributed file system optimized for software distribution. Originally developed for the CERN Virtual Machine project [2], it was designed to solve the problem of distributing frequently changing experiment software to virtual machines (VM) that might not have access to traditional software servers without having to include it within the VM images. It is built using standard technologies, being implemented as a POSIX readonly file system in the user space (a FUSE module) [3]. Internally, a CernVM-FS repository is defined by its file catalog, which is a SQLite database [4]. CernVM-FS uses the HTTP protocol, which allows data replication to multiple web servers and data caching by standard web proxies such as Squid [5]. The replication source server is the CernVM-FS Stratum-0 and contains a read-writable copy of the repository. In a typical setup, repositories are replicated to a network of web servers, which form the CVMFS service. While a CernVM-FS Stratum-0 repository server is able to serve clients directly, a large number of clients is better served by a set of replica servers. Multiple servers improve the reliability, reduce the load and protect the Stratum-0 master copy of the repository from direct accesses. In the traditional way, the experiment software is distributed using installation jobs with shared software areas on common distributed file systems, such as Network File System (NFS) [6] or Andrew File System (AFS) [7]. This entire process is error-prone and manpower intensive, as a typical software release is several Gigabytes in size, consists of tens of thousands of files and has to be stored at each site supporting a specific Virtual Organization (VO). Now CernVM-FS removes the need for local installation jobs and conventional Network File System (NFS) servers at every site and its use simplifies the environment across the Computing Grid. For any Virtual Organization, the software needs one single installation and then is available at any site with CernVM-FS client installed. Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. Published under licence by IOP Publishing Ltd 1

Once the signed file catalog has been downloaded on a worker node at a site and mounted using FUSE, the metadata operations require no further network access. Together with the file based deduplication (since the CernVM-FS repository is a form of content-addressable storage), this makes CernVM-FS very efficient in terms of disk usage and network traffic. The data validity and integrity of the file system are ensured by the combination between the digitally signed file catalog and the cryptographic content hash used as a method to locate a file within the catalog. 2. CernVM-FS History at RAL Tier-1 The Tier-1 centre at the Rutherford Appleton Laboratory (RAL) [8] provides large-scale computing facilities for UK Particle Physics experiments and it is used through the GridPP [9] and European Grid Infrastructure (EGI) [10] projects for UK and wider European access. During the summer 2010 the RAL Tier-1 was the first Worldwide LHC Computing Grid (WLCG) [11] site involved in testing CernVM-FS at scale and worked towards getting it accepted and deployed within WLCG. Then in February 2011 the first global CernVM-FS replica for LHC Virtual Organizations became operational outside CERN at the RAL Tier-1. In December 2012, a new initiative to offer CernVM-FS Stratum-0 services to non-lhc VOs was started at RAL, and first repositories have been setup for the mice [12] and na62 [13] VOs supported by the GridPP UK project. Between January and August 2013 the RAL CernVM-FS Stratum-0 service has been extended to other international small VOs. As at October 2013 the initiative is entering the EGI service phase with more VOs supported or requesting access to this software distribution service and a separate CernVM-FS service is being offered at RAL for non-lhc VOs only. In collaboration with the Open Science Grid (OSG) [14], which is running a similar CernVM-FS infrastructure, it is proposed that experiments with worldwide presence will be given access to a consolidated software distribution mechanism across both organizations. In parallel, the RAL Tier-1 group is involved in the verification of CernVM-FS client software at scale before it is released into production. 3. enmr.eu CernVM-FS Stratum-0 at RAL Tier-1 The CernVM-FS Stratum-0 deployed at RAL Tier-1 is offering its services to non-lhc experiments belonging not only to the High Energy Physics (HEP) community, but also to other communities like Life or Earth Sciences. WeNMR [15] is a Virtual Research Community (VRC) supported by EGI which aims to build a worldwide e-infrastructure for Nuclear Magnetic Resonance (NMR) spectroscopy and Structural Biology. Its VO enmr.eu is currently the largest in the area of Life Sciences with over 570 registered users and a steady growth rate. enmr.eu has an active presence across the Grid at more than 25 sites, which makes the distribution of new software releases time consuming. The deployment and configuration of the CernVM-FS Stratum-0 repository for enmr.eu at RAL Tier-1 has reduced the effort required to maintain multiple NFS software areas at all other supporting sites. Installation jobs run by the enmr.eu/role=manager at the RAL Tier-1 batch farm upload and configure new software releases within a standard NFS area, then a synchronization mechanism updates the /cvmfs/wenmr.gridpp.ac.uk directory within the Stratum-0 repository. Work is ongoing to consolidate the enmr.eu Stratum-0 at RAL and to move toward a CernVM-FS only environment. 4. CernVM-FS Task Force The CernVM-FS Task Force [16] is a working group setup by EGI to establish a CernVM-FS infrastructure that allows EGI VOs to use it as a standard method of distribution of their software across the EGI computing resources. It has an active role in promoting the use of CernVM-FS technology by VOs and in creating a network of sites providing the core services (Stratum-0, Stratum- 2

1, Squid). It also encourages the institutions running local Stratum-0 and servers to become part of the EGI CernVM-FS infrastructure by offering their services at a regional or larger level. The Task Force also promotes cooperation with other organizations, such as OSG and the WLCG, on monitoring tools for CernVM-FS and by cross-replicating repositories for VOs supported by multiple collaborations. 5. A Relaxed Topology for the EGI CernVM-FS Infrastructure Deployment The CernVM-FS Task Force s main task is the development by the beginning of 2014 of a network of Stratum-0 repositories and replicas modelled upon the infrastructure established to support the WLCG computing. The CernVM-FS software distribution model for the LHC VOs is shown in Figure 1. In this model the software is installed by the LHC VOs at the Stratum-0 instance hosted at CERN and replicated to replica servers hosted by WLCG Tier-1 sites. The CernVM-FS clients connect to one of the services via a distributed hierarchy of proxy servers. Figure 1: CernVM-FS WLCG deployment The proposed CernVM-FS infrastructure model for the EGI community is presented in Figure 2. This model consists of more than one Stratum-0 instances which are disjoint and represent the source repositories where software is installed by the EGI VOs. Stratum-0 and servers can be geographically co-located or not, depending on the resources made available by a specific site. Another main characteristic of this model is the fact that a can replicate a whole Stratum-0 or can partially replicate, depending again on the site resources and on what VOs are being supported. This is the relaxed model designed for the EGI experiments, where the software repositories are spread across several Stratum-0 locations and where instances can participate with less than full commitment in the replication process. Of course, similar to the WLCG model, a distributed and consolidated network of s makes CernVM-FS software distribution more resilient, so each EGI should aim to replicate as much as possible. It is worth mentioning one advantage of the CernVM-FS EGI model which is that it can make use of the existent hierarchy of proxy servers used for the LHC software distribution. 3

Figure 2: CernVM-FS EGI deployment the relaxed topology: solid line - full replication; dashed line - partial replication 6. CernVM-FS Stratum-0 Web Frontend The CernVM-FS Stratum-0 repositories at RAL were initially maintained by the means of installation jobs in the case of VOs that had a CPU allocation at RAL Tier-1. As more VOs without any CPU Figure 3: CernVM-FS Stratum-0 web frontend screenshot 4

allocation started to use the CernVM-FS service, manual handling of software tarballs was a temporary solution and alternatives were investigated. One option was to give VO Software Grid Managers (SGMs) privileges to login into Stratum-0 servers and upload the software (method used elsewhere at other Stratum-0 sites), but this was not agreed at RAL as a good practice policy because of potential security issues with non-local user accounts. To remove the need for privileged roles and jobs, a web interface which allows VO Software Managers to upload and publish their software to the Stratum-0 has been developed during summer 2013 (see Figure 3). It authenticates users (mainly VO SGMs) based on their X509 certificates that are managed by a web server and it is due to be made available to all VOs by the end of 2013. The application allows uploading and unpacking tarballs within the /cvmfs/<repo_name> space, also users can remove obsolete software or visualize the entire directory structure. At the backend a synchronization process with the real CernVM-FS Stratum-0 repository is regularly triggered whenever new changes are detected. 7. Summary As to date (October 2013) CernVM-FS has proved itself very scalable in distributing the LHC experiment software at nearly 150 WLCG sites with more than 400,000 logical CPUs. Following this successful model, the EGI CernVM-FS Task Force is coordinating efforts to develop a similar infrastructure that will serve the EGI supported VOs and not only. The RAL Tier-1 centre is leading this activity by hosting Stratum-0 and services, but it is envisaged that a consolidated EGI CernVM-FS infrastructure will include additional Stratum-0 and sites and also will make use of the similar existing services within OSG. All HEP and non-hep communities will overall benefit from this project as it standardizes the computing environment across the Grid and frees valuable resources at supporting sites to be reused for other purposes. Acknowledgments Many thanks to Jakob Blomer for his advices during CernVM-FS Stratum-0 initial implementation at the RAL Tier-1. Also many thanks go to Michal Knapik for his invaluable work in developing the CernVM-FS Stratum-0 web frontend. References [1] Blomer J, Aguado-Sanchez C, Buncic P and Harutyunyan A 2011 Distributing LHC application software and conditions databases using the CernVM file system Journal of Physiscs: Conference Series 331 042003 [2] CERN Virtual Machine Project - http://cernvm.cern.ch [3] FUSE - http://fuse.sourceforge.net [4] SQLite - http://www.sqlite.org [5] Squid - http://www.squid-cache.org [6] NFS - http://nfs.sourceforge.net [7] AFS - http://www.openafs.org [8] RAL Tier-1 - http://www.stfc.ac.uk [9] GridPP UK - http://www.gridpp.ac.uk [10] EGI - http://www.egi.eu [11] WLCG - http://wlcg.web.cern.ch [12] MICE - http://mice.iit.edu [13] NA62 - http://na62.web.cern.ch [14] OSG - http://www.opensciencegrid.org [15] WeNMR - http://www.wenmr.eu [16] EGI CernVM-FS Task Force - https://wiki.egi.eu/wiki/cvmfs-tf:cvmfs_task_force 5