RADU POPESCU IMPROVING THE WRITE SCALABILITY OF THE CERNVM FILE SYSTEM WITH ERLANG/OTP

Similar documents
Global Software Distribution with CernVM-FS

STATUS OF PLANS TO USE CONTAINERS IN THE WORLDWIDE LHC COMPUTING GRID

Recent Developments in the CernVM-FS Server Backend

CernVM-FS beyond LHC computing

Recent Developments in the CernVM-File System Server Backend

Using CernVM-FS to deploy Euclid processing S/W on Science Data Centres

Application of Virtualization Technologies & CernVM. Benedikt Hegner CERN

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

CouchDB-based system for data management in a Grid environment Implementation and Experience

CERN openlab II. CERN openlab and. Sverre Jarp CERN openlab CTO 16 September 2008

Storage Resource Sharing with CASTOR.

Singularity in CMS. Over a million containers served

Volunteer Computing at CERN

Virtualizing a Batch. University Grid Center

Geant4 on Azure using Docker containers

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

PROOF-Condor integration for ATLAS

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

Distributed File Systems II

CSCS CERN videoconference CFD applications

Evaluation of the Huawei UDS cloud storage system for CERN specific data

Distributed Systems 16. Distributed File Systems II

Using Puppet to contextualize computing resources for ATLAS analysis on Google Compute Engine

and the GridKa mass storage system Jos van Wezel / GridKa

Using S3 cloud storage with ROOT and CvmFS

Overview. About CERN 2 / 11

ISTITUTO NAZIONALE DI FISICA NUCLEARE

Batch Services at CERN: Status and Future Evolution

The CMS Computing Model

Security in the CernVM File System and the Frontier Distributed Database Caching System

The LHC Computing Grid

Storage and I/O requirements of the LHC experiments

N. Marusov, I. Semenov

The CORAL Project. Dirk Düllmann for the CORAL team Open Grid Forum, Database Workshop Barcelona, 4 June 2008

13th International Workshop on Advanced Computing and Analysis Techniques in Physics Research ACAT 2010 Jaipur, India February

Evolution of Cloud Computing in ATLAS

Docker 101 Workshop. Eric Smalling - Solution Architect, Docker

Travelling securely on the Grid to the origin of the Universe

Towards Reproducible Research Data Analyses in LHC Particle Physics

Monitoring system for geographically distributed datacenters based on Openstack. Gioacchino Vino

News From the OTP TEAM. Kenneth Lundin, Erlang/OTP, Ericsson Erlang User Conference, Stockholm 2017

Opportunities for container environments on Cray XC30 with GPU devices

CernVM a virtual software appliance for LHC applications

Grid Data Management

CernVM-FS. Catalin Condurache STFC RAL UK

WORK PROJECT REPORT: TAPE STORAGE AND CRC PROTECTION

BUILDING A SCALABLE MOBILE GAME BACKEND IN ELIXIR. Petri Kero CTO / Ministry of Games

The evolving role of Tier2s in ATLAS with the new Computing and Data Distribution model

The Lion of storage systems

Data services for LHC computing

Modules and Front-End Electronics Developments for the ATLAS ITk Strips Upgrade

ATLAS Experiment and GCE

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9b: Distributed File Systems INTRODUCTION. Transparency: Flexibility: Slide 1. Slide 3.

Big Data Analytics and the LHC

CSE 124: Networked Services Lecture-16

Software installation and condition data distribution via CernVM File System in ATLAS

DISTRIBUTED FILE SYSTEMS CARSTEN WEINHOLD

Interoperating AliEn and ARC for a distributed Tier1 in the Nordic countries.

The EU DataGrid Testbed

CERN s Business Computing

The Grid: Processing the Data from the World s Largest Scientific Machine

LHCb experience running jobs in virtual machines

WLCG Transfers Dashboard: a Unified Monitoring Tool for Heterogeneous Data Transfers.

DISTRIBUTED FILE SYSTEMS CARSTEN WEINHOLD

The LHC Computing Grid

Andrea Sciabà CERN, Switzerland

Preparing for High-Luminosity LHC. Bob Jones CERN Bob.Jones <at> cern.ch

Data Transfers Between LHC Grid Sites Dorian Kcira

CC-IN2P3: A High Performance Data Center for Research

State of Containers. Convergence of Big Data, AI and HPC

Large Scale Software Building with CMake in ATLAS

Construct a High Efficiency VM Disaster Recovery Solution. Best choice for protecting virtual environments

Changing Requirements for Distributed File Systems in Cloud Storage

Computing at the Large Hadron Collider. Frank Würthwein. Professor of Physics University of California San Diego November 15th, 2013

CLOUD-SCALE FILE SYSTEMS

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E.

New strategies of the LHC experiments to meet the computing requirements of the HL-LHC era

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao

Grid Computing: dealing with GB/s dataflows

Efficient HTTP based I/O on very large datasets for high performance computing with the Libdavix library

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Data Management for the World s Largest Machine

Visita delegazione ditte italiane

Getting Started with Hadoop

The CMS data quality monitoring software: experience and future prospects

Summary of the LHC Computing Review

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Stephen J. Gowdy (CERN) 12 th September 2012 XLDB Conference FINDING THE HIGGS IN THE HAYSTACK(S)

CS November 2017

Weak Consistency and Disconnected Operation in git. Raymond Cheng

Distributing Software in a Massively Parallel Environment

CSE 124: Networked Services Fall 2009 Lecture-19

CA485 Ray Walshe Google File System

The creation of a Tier-1 Data Center for the ALICE experiment in the UNAM. Lukas Nellen ICN-UNAM

RUSSIAN DATA INTENSIVE GRID (RDIG): CURRENT STATUS AND PERSPECTIVES TOWARD NATIONAL GRID INITIATIVE

arxiv: v1 [cs.dc] 7 Apr 2014

Transcription:

RADU POPESCU IMPROVING THE WRITE SCALABILITY OF THE CERNVM FILE SYSTEM WITH ERLANG/OTP

THE EUROPEAN ORGANISATION FOR PARTICLE PHYSICS RESEARCH (CERN) 2 THE LARGE HADRON COLLIDER

THE LARGE HADRON COLLIDER 3 TUNNEL VISION 27 km circumference 100 m underground 180MW power consumption 7 TeV per beam

THE LARGE HADRON COLLIDER ALICE, ATLAS, CMS AND LHCB DETECTORS 4

THE LARGE HADRON COLLIDER 5 CMS DETECTOR INNER BARREL

THE LARGE HADRON COLLIDER 6 SUPER COLLIDER Super Collider, Mustaine et al. (2013, Universal Records)

THE LARGE HADRON COLLIDER 7 EXPERIMENT DATA CHALLENGE 100 Million channels, bunch crossing every 25 ns 1 PB/s internal data rate 5 PB data / year recorded (plus derived data sets) 100 PB / year by 2025 (x20) 5 million lines of code / experiment

THE WORLDWIDE LHC COMPUTING GRID GLOBALLY DISTRIBUTED Worldwide LHC Compute Grid live map 42 countries, 170 computing centres, 2 million jobs run each day 8

LHC EXPERIMENT SOFTWARE STACKS 9 KEY FIGURES Hundreds of developers ~10^8 binaries ~1TB / day of nightly builds ~100 000 machines world-wide Daily production releases, remain available

THE CERNVM FILE SYSTEM 10

THE CERNVM FILE SYSTEM 11 A FILE SYSTEM APPROACH TO DISTRIBUTING SOFTWARE BASIC SYSTEM UTILITIES OS KERNEL CERNVM FS FUSE GLOBAL HTTP CACHE HIERARCHY FILE SYSTEM MEMORY BUFFER (~100MB) CERNVM-FS PERSISTENT CACHE (~20GB) REPOSITORY (HTTP OR S3) ~1-10TB ~100 000 clients FUSE based, independent mount points, e. g. /cvmfs/atlas.cern.ch Clients have a read-only view; single writer into repository HTTP transport, access and caching on demand

THE CERNVM FILE SYSTEM 12 MAIN COMPONENTS Client: FUSE module (with cache plugins) Server tools (command line tools) Standard HTTP server HTTP caches

THE CERNVM FILE SYSTEM 13 DESIGN Data store: Immutable content-addressed blobs (*) Compression, deduplication Metadata: Catalogs: state of the entire repository at a given moment in time is encoded in a Merkle tree Digitally signed manifest Versioning, snapshots etc. PULL based!

CVMFS PUBLICATION WORKFLOW 14

THE CERNVM FILE SYSTEM 15 PUBLISHING Single writer (stateless command line utilities) A read/write view is constructed with a union mount (OverlayFS, Aufs) Files are compressed and hashed, and written to repository storage New metadata catalogs are created and published Repository manifest is updated (atomic operation)

PUBLISHING TO CVMFS REPOSITORIES 16 EXISTING WORKFLOW Centralised release manager machine Direct interaction with the release manager: $ ssh my-cvmfs-server.cern.ch $ cvmfs_server transaction $ vim /cvmfs/my-cvmfs-server.cern.ch/some_file.org (Make changes to files in the R/W mount) $ cvmfs_server publish

PUBLISHING TO CVMFS REPOSITORIES 17 EXISTING WORKFLOW PROS: Straightforward to use Good for scripting Hides somewhat the distributed nature of the system CONS: No support for concurrent writing Can be unsafe (shell access to machine with repository storage) Performance issues for large change-sets

PUBLISHING TO CVMFS REPOSITORIES 18 PROPERTIES AND CONSTRAINTS 1. The system (repository + cache + clients) is eventually consistent 2. Concurrency can be further exploited due to: Immutability of CAS Pushing objects is idempotent Directory tree structure 3. The critical section involves updating the metadata catalog and swapping the manifest

PUBLISHING TO CVMFS REPOSITORIES 19 EXISTING ARCHITECTURE HTTP SSH NFS, S3 USER MACHINE RELEASE MANAGER AND GATEWAY CVMFS FUSE CVMFS SERVER AUTHORITATIVE STORAGE STRATA 1

PUBLISHING TO CVMFS REPOSITORIES 20 AN IMPROVED ARCHITECTURE USER MACHINE RELEASE MANAGER AND GATEWAY CVMFS FUSE CVMFS SERVER CVMFS SERVICE API Gateway CVMFS Gateway Services CVMFS Gateway Services CVMFS Services STORAGE GATEWAY USER MACHINE RELEASE MANAGER AND GATEWAY CVMFS FUSE CVMFS SERVER REPLICAS AUTH. STORAGE STRATA 1

PUBLISHING TO CVMFS REPOSITORIES 21 AN IMPROVED WORKFLOW $ ssh my-cvmfs-1.cern.ch $ cvmfs_server transaction /lcg/58 (Make changes to files in the R/W mount) $ vim /cvmfs/my-cvmfs.cern.ch/lcg/58/ some_file.org $ cvmfs_server publish $ ssh my-cvmfs-2.cern.ch $ cvmfs_server transaction /lcg/60 (Make changes to files in the R/W mount) $ vim /cvmfs/my-cvmfs.cern.ch/lcg/60/ some_file.org $ cvmfs_server publish

CVMFS SERVICE ARCHITECTURE 22 CVMFS STORAGE GATEWAY Serves as a distributed lock manager Checks rights of clients to modify repositories Assigns exclusive leases to clients on repository subpaths Receives files (object packs) from clients, writes them to authoritative storage

CVMFS SERVICES IMPLEMENTATION 23 ERLANG/OTP: DISTRIBUTED GLUE Language (Erlang) and framework (OTP) designed for concurrent and distributed applications: Actor model: lightweight processes with memory isolation Immutability of values Supervision trees Erlang/OTP/BEAM are battle-tested, 30+ years of use at Ericsson Excellent C/C++ interoperability

CVMFS SERVICES IMPLEMENTATION 24 GATEWAY APPLICATION ARCHITECTURE HTTP FRONT-END (COWBOY) BACK-END (MULTIPLEXER) AUTH LEASE RECEIVER (WORKER POOL) PERSIST (MNESIA) WORKER (C++) WORKER (C++) WORKER (C++)

CVMFS SERVICES IMPLEMENTATION 25 DEVELOPER EXPERIENCE WITH ERLANG/OTP Great: OTP Tracing, inspection etc. Immutability, Functional language Very simple to write concurrent programs Use Dialyzer, CommonTest, QuickCheck etc. Easy integration with C++

CVMFS SERVICES IMPLEMENTATION 26 DEVELOPER EXPERIENCE WITH ERLANG/OTP Less great: Dynamic typing is strange, coming from C++ Deciphering Erlang errors is an acquired taste (use Lager for logging) Large APIs in OTP, some parts feel less clearly documented

TEXT 27 DEVELOPER EXPERIENCE WITH ERLANG/OTP Overall impression is very positive! Would definitely use it for other new components Looking forward to more operational experience

OTHER CERNVM-FS PROJECTS AND ACTIVITIES 28 DOCKER GRAPHDRIVER PLUGIN Docker Graphdriver plugin for CernVM-FS (Nikola Hardi): https://github.com/cvmfs/docker-graphdriver Store the contents of Docker image layers inside CernVM FS repositories Instead of having to download the entire layers, mount a CernVM FS repository and download individual files on-demand

OTHER CERNVM-FS PROJECTS AND ACTIVITIES 29 CERN VM 10TH ANNIVERSARY! Next year, CernVM is turning 10 Jan 30th -> Feb 1st 2018: CernVM workshop @CERN Open to anyone Talks by users and developers of CernVM and related projects

30 THE CERNVM TEAM (LEFT TO RIGHT) Radu Popescu Jakob Blomer Gerardo Ganis Petr Jirout (former) Nikola Hardi (former)

TEXT 31 THANK YOU CernVM-FS: https://github.com/cvmfs http://cvmfs.readthedocs.io/en/stable/ radu.popescu@cern.ch, https://github.com/radupopescu, @iradupopescu

ERLANG/OTP CONCURRENCY PATTERNS 32 CRITICAL SECTIONS Erlang (only) provides processes and message passing for concurrency No locks, semaphores, condition variables etc. What if a exclusive access to a resource is needed? OTP gen_server works as a critical section

ERLANG/OTP CONCURRENCY PATTERNS 33 MULTIPLEXING REQUESTS/REPLIES ON GEN_SERVER OTP gen_server with concurrency? In gen_server:handle_call, spawn a process per request, and return {noreply, } The spawned process later returns a value with gen_server:reply. Does not maintain order of requests Concurrency adaptor between Cowboy and C++ worker pool