ATLAS & Google "Data Ocean" R&D Project

Similar documents
Experiences with the new ATLAS Distributed Data Management System

C3PO - A Dynamic Data Placement Agent for ATLAS Distributed Data Management

The evolving role of Tier2s in ATLAS with the new Computing and Data Distribution model

ANSE: Advanced Network Services for [LHC] Experiments

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science

Popularity Prediction Tool for ATLAS Distributed Data Management

ATLAS DQ2 to Rucio renaming infrastructure

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

AGIS: The ATLAS Grid Information System

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

ATLAS Distributed Computing Experience and Performance During the LHC Run-2

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

Towards Network Awareness in LHC Computing

Federated data storage system prototype for LHC experiments and data intensive science

Tests of PROOF-on-Demand with ATLAS Prodsys2 and first experience with HTTP federation

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

The ATLAS PanDA Pilot in Operation

Overview of ATLAS PanDA Workload Management

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

Data Transfers Between LHC Grid Sites Dorian Kcira

PanDA: Exascale Federation of Resources for the ATLAS Experiment

ATLAS Data Management Accounting with Hadoop Pig and HBase

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

HEP replica management

Distributed Data Management on the Grid. Mario Lassnig

The DMLite Rucio Plugin: ATLAS data in a filesystem

UW-ATLAS Experiences with Condor

Programmable Information Highway (with no Traffic Jams)

New data access with HTTP/WebDAV in the ATLAS experiment

Rucio quota management

Reliability Engineering Analysis of ATLAS Data Reprocessing Campaigns

Analytics Platform for ATLAS Computing Services

Experience with ATLAS MySQL PanDA database service

A Popularity-Based Prediction and Data Redistribution Tool for ATLAS Distributed Data Management

Kubernetes Integration with Virtuozzo Storage

Scheduling Computational and Storage Resources on the NRP

PoS(EGICF12-EMITC2)106

High Performance Computing Course Notes Grid Computing I

High-Energy Physics Data-Storage Challenges

Federated Data Storage System Prototype based on dcache

Andrea Sciabà CERN, Switzerland

The Fermilab HEPCloud Facility: Adding 60,000 Cores for Science! Burt Holzman, for the Fermilab HEPCloud Team HTCondor Week 2016 May 19, 2016

ATLAS Computing: the Run-2 experience

High Performance Computing on MapReduce Programming Framework

Influence of Distributing a Tier-2 Data Storage on Physics Analysis

<Insert Picture Here> Enterprise Data Management using Grid Technology

ATLAS distributed computing: experience and evolution

Constant monitoring of multi-site network connectivity at the Tokyo Tier2 center

UK Tier-2 site evolution for ATLAS. Alastair Dewhurst

ATLAS Experiment and GCE

INDEXING OF ATLAS DATA MANAGEMENT AND ANALYSIS SYSTEM

The LHC Computing Grid

PROOF-Condor integration for ATLAS

Data services for LHC computing

ATLAS Analysis Workshop Summary

150 million sensors deliver data. 40 million times per second

Connectivity Services, Autobahn and New Services

High Throughput WAN Data Transfer with Hadoop-based Storage

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

LHCb Computing Status. Andrei Tsaregorodtsev CPPM

BigData and Computing Challenges in High Energy and Nuclear Physics

Evolution of Cloud Computing in ATLAS

Lessons Learned in the NorduGrid Federation

AMAZON S3 FOR SCIENCE GRIDS: A VIABLE SOLUTION?

WLCG Network Throughput WG

Bootstrapping a (New?) LHC Data Transfer Ecosystem

Online data storage service strategy for the CERN computer Centre G. Cancio, D. Duellmann, M. Lamanna, A. Pace CERN, Geneva, Switzerland

IllustraCve Example of Distributed Analysis in ATLAS Spanish Tier2 and Tier3

The ATLAS EventIndex: Full chain deployment and first operation

Monitoring of large-scale federated data storage: XRootD and beyond.

Data Management for the World s Largest Machine

Early experience with the Run 2 ATLAS analysis model

Computing / The DESY Grid Center

Streamlining CASTOR to manage the LHC data torrent

Volley: Automated Data Placement for Geo-Distributed Cloud Services

Zero to Microservices in 5 minutes using Docker Containers. Mathew Lodge Weaveworks

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

Big Data Analytics and the LHC

Storage and I/O requirements of the LHC experiments

Unified System for Processing Real and Simulated Data in the ATLAS Experiment

Monitoring for IT Services and WLCG. Alberto AIMAR CERN-IT for the MONIT Team

HammerCloud: A Stress Testing System for Distributed Analysis

Software and computing evolution: the HL-LHC challenge. Simone Campana, CERN

Using Puppet to contextualize computing resources for ATLAS analysis on Google Compute Engine

System upgrade and future perspective for the operation of Tokyo Tier2 center. T. Nakamura, T. Mashimo, N. Matsui, H. Sakamoto and I.

Insight: that s for NSA Decision making: that s for Google, Facebook. so they find the best way to push out adds and products

Prompt data reconstruction at the ATLAS experiment

arxiv: v1 [cs.dc] 12 May 2017

IEPSAS-Kosice: experiences in running LCG site

Status of KISTI Tier2 Center for ALICE

Securely Access Services Over AWS PrivateLink. January 2019

Europeana Core Service Platform

Vendor: Microsoft. Exam Code: Exam Name: Developing Microsoft Azure Solutions. Version: Demo

Improved ATLAS HammerCloud Monitoring for Local Site Administration

MapReduce. U of Toronto, 2014

Summary of the LHC Computing Review

Distributed Data Management with Storage Resource Broker in the UK

Philippe Laurens, Michigan State University, for USATLAS. Atlas Great Lakes Tier 2 collocated at MSU and the University of Michigan

Outline. Infrastructure and operations architecture. Operations. Services Monitoring and management tools

Data Storage. Paul Millar dcache

Transcription:

ATLAS & Google "Data Ocean" R&D Project Authors: Mario Lassnig (CERN), Karan Bhatia (Google), Andy Murphy (Google), Alexei Klimentov (BNL), Kaushik De (UTA), Martin Barisits (CERN), Fernando Barreiro (UTA), Thomas Beermann (CERN), Ruslan Mashinistov (UTA), Torre Wenaus (BNL), Sergey Panitkin (BNL) Project overview 2 Use cases 2 User analysis 2 Data placement, replication, and popularity 2 Data streaming 2 Work packages 3 WP1 - Data management 3 WP2 - Workflow management 3 WP3 - Google Cloud Storage Global Redirection 3 WP4 - Cost Model 3 Addendum 3 List of key personnel and PI s 3 Timeline 4 Resources from ATLAS and Google 4 ATL-SOFT-PUB-2017-002 29 December 2017 Objectives and key results 5 Namespace handling 5 Connecting ATLAS grid storage with Google storage for third-party-copy 5 Monitoring third-party-copy 6 Reading data from Google storage to Grid worker nodes File copy-to-scratch 7 Monitoring copy-to-scratch transfers 7 Reading data from Google storage to Grid worker nodes Streaming random-io 7 Monitoring random-io 7 Deletion of data on Google Storage 7 Reading data inside Google data centres Jobs running on Google compute 7 Network provisioning 8 Transparent global redirection between inter-regional zones on Google Cloud Storage 8 Development of an economic cost model 8 Appendix 1 Brainstorming document 8 Appendix 2 Group photo 10 Bibliography 10

Project overview ATLAS [1] is facing several challenges with respect to their computing requirements for LHC [2] Run-3 (2020-2023) and HL-LHC runs (2025-2034). The challenges are not specific for ATLAS or/and LHC, but common for HENP computing community. Most importantly, storage continues to be the driving cost factor and at the current growth rate cannot absorb the increased physics output of the experiment. Novel computing models with a more dynamic use of storage and computing resources need to be considered. This project aims to start an R&D project for evaluating and adopting novel IT technologies for HENP computing. ATLAS and Google plan to launch an R&D project to integrate Google cloud resources (Storage and Compute) to the ATLAS distributed computing environment. After a series of teleconferences, a face-to-face brainstorming meeting in Denver, CO at the Supercomputing 2017 conference resulted in this proposal for a first prototype of the "Data Ocean" project. The idea is threefold: (a) to allow ATLAS to explore the use of different computing models to prepare for High-Luminosity LHC, (b) to allow ATLAS user analysis to benefit from the Google infrastructure, and (c) to give Google real science use cases to improve their cloud platform. Use cases User analysis When analysts use the distributed analysis services to run on the grid, the outputs are deposited on the grid. Making 100% of those outputs available to the analyst quickly is a difficult problem and remains one of the weak points of distributed analysis. Through this R&D, analysis outputs generated in worker nodes around the world could be directed to Google Cloud Storage, where they become uniformly and reliably available to the analyst anywhere in the world. Analysis data products are small and the GCS-resident outputs could be regarded as a cache with a limited lifetime, and thus limited storage footprint, while the value of reliable accessibility of this hot data to analysts would be enormous. Data placement, replication, and popularity The final stages of data analysis by users require access to multi-petabyte of data storage. To ensure high level of access, ATLAS replicates multiple copies of this data to worldwide computing resources. The Google Cloud Storage service could be an alternative to these highly used data formats. We plan to store the final derivation of the full ATLAS MC or/and reprocessing data campaigns. This data will then be available to users worldwide through Google Compute and ATLAS Compute resources. Data streaming ATLAS Computing is investigating the use of sub-file data products in the analysis chain. A prototype of this "Event Streaming Service" is currently in development and could benefit from fine-grained cloud storage. This use case will evaluate the necessary compute to generate the sub-file data

products ("events") from their original files at the scale required by HL-LHC, and the performance gains of highly parallel small size data delivery to the analysis software. Work packages The proof-of-concept phase of the "Data Ocean" project will consist of four major parts (Work Packages - WP). We envision that these packages will have well defined common milestones and overlaps. Both ATLAS and Google will commit software engineering effort to this project, initially at the level of 3 FTE s total. The expected official project start is early 2018. Additional partners from US National Laboratories and Universities, and CERN/WLCG are likely to join this project. WP1 - Data management This work package connects Google Cloud Storage with the ATLAS Data Management system "Rucio" [3], which will allow writing a full multi-petabyte physics sample to Google Cloud Storage. By taking advantage of Google's and ESnet fast networks, the sample is then distributed by Google between their continental regions/zones and made available to ATLAS Compute across the globe. ATLAS and Google will work together to understand data popularity and cache the most popular physics data vs geographical access pattern. WP2 - Workflow management ATLAS user analysis jobs, brokered by the "PanDA" workflow management system [4], should be able to run using either file-copy or direct-io with Google Cloud Storage. A strategy of using container formats for user analysis jobs will be developed. In addition, this work package will involve running jobs on Google Compute Platform, accessing either data from ATLAS storage or Google Cloud Storage. WP3 - Google Cloud Storage Global Redirection The third work package will involve an improvement to Google Cloud Storage itself. Right now, the ATLAS jobs needs to retain knowledge which Google Cloud region is to be used. Google will implement a global redirection between their regions to expose Google Cloud Storage as a single global entity. WP4 - Cost Model The fourth work package will deal with the economic model necessary for sustainable commercial clouds resource usage. For example, using adaptive pricing for cloud resource costs (storage, compute, network). Addendum List of key personnel and PI s Google

ATLAS Karan Bhatia Andy Murphy BNL Alexei Klimentov Torre Wenaus Sergey Panitkin CERN Mario Lassnig Martin Barisits Thomas Beermann Tobias Wegner UTA Kaushik De Fernando Barreiro Ruslan Mashinistov Project Management The project will be managed jointly by Google and ATLAS PIs Progress will be reported and followed on weekly basis Two Technical Interchange Meetings will be organized during duration of the project : once by Google, once by ATLAS Timeline The expected official project start is early 2018. X+1 month: detailed objectives and key results description X+2 month: test ATLAS/Google data transfer X+3 month: test ATLAS/Google analysis jobs access X+4 month: Full ATLAS derived data replica stored by Google X+6 month: End-user analysis test X+8 month: commissioning and pre-production for ATLAS selected users Resources from ATLAS and Google Both ATLAS and Google will commit software engineering effort to this project, initially at the level of 3 FTEs total. It will be highly desirable to have a Google SW engineer at CERN to work together with Rucio and PanDA teams during PoC implementation and commissioning phase. Additional partners from US National Laboratories and Universities, and CERN/WLCG are likely to join this project. Google computing resources (storage, bandwidth and CPUs) estimation for PoC phase will be done in one month after project will be launched and WP approved by both parties.

Objectives and key results These OKRs are only loosely coupled and should be doable in parallel after the two initial steps ("Namespace handling" and "Connecting grid storage") are finished. Names and ETAs are tentative and subject to official project start. Namespace handling Google Storage would become a new endpoint for ATLAS Be able to address all derived MC and processing campaign data Set up Google storage authorisation, authentication Add Google storage hosts to the ATLAS topology system (AGIS) Synchronise topology with ATLAS data management system (Rucio) There should be two available buckets, one in the US (Available: 100G Chicago, 10G San Jose, 10G Ashburn; Coming: 100G NY, 100G Seattle) and one in the EU (no ESnet peering with Google). Rucio Data Identifiers (DIDs) are a globally unique tuple <Scope:Name> E.g., mc16_13tev:12345.hits.pool.root Have associated collection of metadata, e.g., project, datatype, #events Can be either file, dataset (collection of files), container (collection of datasets) Unique among all three categories, cannot be reused We put replication rules on DIDs (declarative data management, e.g., 3 copies of this DID, one must be on tape and all should be on different continents) Resolve DIDs to files to actual replicas (root://hostname/storage/file.123) RSE (Rucio Storage Element) Unique logical unit of data storage Has different attributes, e.g., is_tape, geoip, We have topological split between the endpoint name (e.g., CERN-PROD_SCRATCHDISK) and the associated hosts behind the name (which could be many, each with a different protocol). So, e.g., we could have GCS_EUROPE, GCS_USEAST, GCS_USWEST, (and once the GCS global redirector exists, just a single GCS). each one would then have an associated storage endpoint: gcs://bucket/.. (or more likely) s3://bucket/... Connecting ATLAS grid storage with Google storage for third-party-copy We can transfer to/from Google storage using our orchestrated mechanisms in Rucio

Verify Rucio transfertool implementation for S3 compatibility with Google RSE Implement changes to Rucio transfertool if necessary Set rules for DIDs on grid storage to create replicas on Google RSE Set rules for DIDs on Google RSE to create replicas on grid storage Proposed input volume is between 1-6 Petabytes. For two possible scenarios: 1PB of NTUP for end-user analysis only 4-5PB for a complete copy of derivation data for one campaign The current full analysis produces roughly ~500'000 files with ~40MB each per day, equalling a growth rate of ~20TB/day. For the proof-of-concept it should be sufficient if a small percentage of the jobs (<1%) can be rerouted to write their output to GCS (5000 files, <200GB per day growth rate). Google has network peering with ESnet, which has connections to several ATLAS Tier-1 and Tier-2 centres in the US and Europe. The connections to US ATLAS sites are very good, whereas the EU peerings are less reliable. BNL might serve as a bridge for EU transfers if necessary. Network monitoring should be considered, especially for the ESnet peering, e.g., using perfsonar. Rucio has a multi-queue transfer system (conveyor + transfertool) Conveyor decides which transfer requests to take off the queue and process Transfertool submits transfer requests to third-party-copy component FTS supports WebDAV to S3 push-third-party copy from DPM and dcache Receives acknowledgements and polls status of transfers Updates DIDs, replicas, rules, does the retries, etc. Monitoring third-party-copy We are able to understand the performance differences between our existing transfer infrastructure and Google Storage Ensure instrumentation events are properly forwarded to monitoring system Create dedicated dashboards We ship all our transfer events into HDFS and ElasticSearch Dashboards, compute durations, historical views, accounting, etc.. Also source for our analytics system, e.g. to estimate transfer-time-to-complete using machine-learning Most important metrics #files/second transferred and deleted mbps per file and per link space usage over time

Reading data from Google storage to Grid worker nodes File copy-to-scratch Jobs can download full input files for processing using rucio-clients Access protocols might differ from third-party-copy If new protocols are needed they can be implemented Monitoring copy-to-scratch transfers We can follow the job transfers with our existing monitoring Every job sends a trace for every files they access. A trace is a dictionary containing information like location of the file, timestamps (start of the copy, end of the copy)... These traces are used : To build the popularity of our data To monitor the volume processed by the jobs Reading data from Google storage to Grid worker nodes Streaming random-io It might be necessary to add the Google Cloud Network to LHCone Monitoring random-io Deletion of data on Google Storage Allow the deletion of data on Google Cloud Storage using Rucio Reading data inside Google data centres Jobs running on Google compute

Network provisioning Ensure that full-capacity network is used for data ingress from grid storage to Google Cloud Storage Ensure that jobs running in Google Cloud Compute do not overwhelm our research networks Transparent global redirection between inter-regional zones on Google Cloud Storage Retrieve a file from Google Cloud Storage using a unique identifier regardless which region/zone was used for initial data ingress Development of an economic cost model Control the cost of ATLAS data on Google Cloud Storage Control the cost of ATLAS jobs on Google Cloud Compute Appendix 1 Brainstorming document

Full resolution: https://drive.google.com/open?id=1os6zn1c1n1xocyy-a_mmw71mdtqzsnks https://drive.google.com/open?id=1ayylhvgiiv5h2_aeyqb-yoisdxeqw75m

Appendix 2 Group photo Left to right: Karan Bhatia, Alexei Klimentov, Horst Severini, Kaushik De, Thomas Beermann, Mario Lassnig, Sergey Panitkin, Ruslan Mashinistov, Martin Barisits, Fernando Barreiro, Matteo Turilli Full resolution: https://drive.google.com/open?id=1uqutcaga0rboljavoty6ncohj0nyds13 Bibliography [1] ATLAS Collaboration, G Aad, et al. The ATLAS experiment at the CERN large hadron collider. J.Instrum, 3:S08003, 2008. [2] LHC The Large Hadron Collider. http://lhc.web.cern.ch/lhc/. [3] Rucio [4] T.Maeno P.Nilsson K.De, A.Klimentov and T.Wenaus. PanDA Production and Analysis backend. Journal of Physics, vol. 219, 210, 2009.