New data access with HTTP/WebDAV in the ATLAS experiment

Similar documents
Efficient HTTP based I/O on very large datasets for high performance computing with the Libdavix library

ATLAS operations in the GridKa T1/T2 Cloud

Tests of PROOF-on-Demand with ATLAS Prodsys2 and first experience with HTTP federation

The ATLAS Distributed Analysis System

Overview of ATLAS PanDA Workload Management

Evolution of Cloud Computing in ATLAS

Influence of Distributing a Tier-2 Data Storage on Physics Analysis

ATLAS Analysis Workshop Summary

UW-ATLAS Experiences with Condor

XCache plans and studies at LMU Munich

The DMLite Rucio Plugin: ATLAS data in a filesystem

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science

Improved ATLAS HammerCloud Monitoring for Local Site Administration

The ATLAS Production System

Data Storage. Paul Millar dcache

Using a dynamic data federation for running Belle-II simulation applications in a distributed cloud environment

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

ATLAS DQ2 to Rucio renaming infrastructure

A new petabyte-scale data derivation framework for ATLAS

C3PO - A Dynamic Data Placement Agent for ATLAS Distributed Data Management

HammerCloud: A Stress Testing System for Distributed Analysis

The PanDA System in the ATLAS Experiment

ATLAS Distributed Computing Experience and Performance During the LHC Run-2

Data Analysis in ATLAS. Graeme Stewart with thanks to Attila Krasznahorkay and Johannes Elmsheuser

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

UK Tier-2 site evolution for ATLAS. Alastair Dewhurst

The ATLAS PanDA Pilot in Operation

Considerations for a grid-based Physics Analysis Facility. Dietrich Liko

Bootstrapping a (New?) LHC Data Transfer Ecosystem

Experiences with the new ATLAS Distributed Data Management System

The Run 2 ATLAS Analysis Event Data Model

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

Use to exploit extra CPU from busy Tier2 site

Singularity tests at CC-IN2P3 for Atlas

Popularity Prediction Tool for ATLAS Distributed Data Management

The ATLAS Tier-3 in Geneva and the Trigger Development Facility

New Features in PanDA. Tadashi Maeno (BNL)

ATLAS & Google "Data Ocean" R&D Project

TAG Based Skimming In ATLAS

Monitoring for IT Services and WLCG. Alberto AIMAR CERN-IT for the MONIT Team

Using Puppet to contextualize computing resources for ATLAS analysis on Google Compute Engine

Expressing Parallelism with ROOT

PanDA: Exascale Federation of Resources for the ATLAS Experiment

August Li Qiang, Huang Qiulan, Sun Gongxing IHEP-CC. Supported by the National Natural Science Fund

Exploiting Virtualization and Cloud Computing in ATLAS

Data services for LHC computing

ATLAS Computing: the Run-2 experience

Analytics Platform for ATLAS Computing Services

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

ATLAS Experiment and GCE

dcache: challenges and opportunities when growing into new communities Paul Millar on behalf of the dcache team

Big Data Analytics Tools. Applied to ATLAS Event Data

Data Transfers Between LHC Grid Sites Dorian Kcira

Ganga - a job management and optimisation tool. A. Maier CERN

Introduction to SciTokens

Dynamic Federations. Seamless aggregation of standard-protocol-based storage endpoints

Cloud du CCIN2P3 pour l' ATLAS VO

Forecasting Model for Network Throughput of Remote Data Access in Computing Grids

PoS(ACAT2010)039. First sights on a non-grid end-user analysis model on Grid Infrastructure. Roberto Santinelli. Fabrizio Furano.

Early experience with the Run 2 ATLAS analysis model

Operating the Distributed NDGF Tier-1

PROOF-Condor integration for ATLAS

Computing activities in Napoli. Dr. Silvio Pardi (INFN-Napoli) Belle II Italian collaboration meeting 21 November 2017 Pisa - Italy

HIGH ENERGY PHYSICS ON THE OSG. Brian Bockelman CCL Workshop, 2016

ANSE: Advanced Network Services for [LHC] Experiments

Prompt data reconstruction at the ATLAS experiment

Austrian Federated WLCG Tier-2

Experience with ATLAS MySQL PanDA database service

Bringing ATLAS production to HPC resources - A use case with the Hydra supercomputer of the Max Planck Society

Existing Tools in HEP and Particle Astrophysics

Evaluation of Apache Hadoop for parallel data analysis with ROOT

WLCG Transfers Dashboard: a Unified Monitoring Tool for Heterogeneous Data Transfers.

Computing at Belle II

ATLAS distributed computing: experience and evolution

LCG Conditions Database Project

Andrej Filipčič

A scalable storage element and its usage in HEP

On-demand provisioning of HEP compute resources on cloud sites and shared HPC centers

Monitoring HTCondor with the BigPanDA monitoring package

ATLAS Nightly Build System Upgrade

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

Scientific data management

Update of the Computing Models of the WLCG and the LHC Experiments

The High-Level Dataset-based Data Transfer System in BESDIRAC

File Access Optimization with the Lustre Filesystem at Florida CMS T2

Data Access and Data Management

CMS Belgian T2. G. Bruno UCL, Louvain, Belgium on behalf of the CMS Belgian T2 community. GridKa T1/2 meeting, Karlsruhe Germany February

The evolving role of Tier2s in ATLAS with the new Computing and Data Distribution model

Geant4 Computing Performance Benchmarking and Monitoring

Philippe Charpentier PH Department CERN, Geneva

Data transfer over the wide area network with a large round trip time

The ATLAS MDT Remote Calibration Centers

Federated Data Storage System Prototype based on dcache

Big Data Tools as Applied to ATLAS Event Data

DIAL: Distributed Interactive Analysis of Large Datasets

BOSS and LHC computing using CernVM and BOINC

Analisi Tier2 e Tier3 Esperienze ai Tier-2 Giacinto Donvito INFN-BARI

The ATLAS Software Installation System v2 Alessandro De Salvo Mayuko Kataoka, Arturo Sanchez Pineda,Yuri Smirnov CHEP 2015

PARALLEL PROCESSING OF LARGE DATA SETS IN PARTICLE PHYSICS

DQ2 - Data distribution with DQ2 in Atlas

Transcription:

New data access with HTTP/WebDAV in the ATLAS experiment Johannes Elmsheuser on behalf of the ATLAS collaboration Ludwig-Maximilians-Universität München 13 April 2015 21st International Conference on Computing in High Energy and Nuclear Physics (CHEP2015) Co-authors: Rodney Walker, Cedric Serfon, Sylvain Blunier, Vincenzo Lavorini, Paul Nilsson

Outline 1 Introduction 2 Setup status and Webdav PanDA Functional Tests with HammerCloud 3 Access Protocol Comparisons J. Elmsheuser (LMU München) Webdav/HTTP data access in ATLAS 13/04/2015 2 / 19

Introduction Fast and reliable access in reasonably long jobs essential for any kind of (HEP) data analysis job ATLAS Grid setup: Workload management system (PanDA) with pilot jobs Input data access is configured on a site by site basis depending on the storage element type: dcap, copy-to-scratch, file, xrootd, FAX (federated xrootd) DPM usually copy-to-scratch or lately xrootd dcache uses usually dcap Explore webdav/http input access: Industry standard Same client and access protocol everywhere Use aria2c for stage-in Use Davix plugin for ROOT I/O J. Elmsheuser (LMU München) Webdav/HTTP data access in ATLAS 13/04/2015 3 / 19

PanDA data access 2 access modes in analysis jobs: PanDA analysis usually consists of two job types: one build job for source code compilation and many subsequent analysis jobs processing the input data: First local stage-in of analysis source code dataset from previous build jobs aria2c Then ROOT/Athena (ATLAS offline software) event loop with local (or remote) input ROOT file access Davix Setup: aria2c stage-in has been successfully used at a couple of DE sites for Monte Carlo production for some time Davix: remote I/O library for Webdav/http/S3 including ROOT I/O plugin ATLAS data management system (Rucio) provides a redirector with very easy syntax Redirection rules per site in ATLAS grid information system (AGIS) J. Elmsheuser (LMU München) Webdav/HTTP data access in ATLAS 13/04/2015 4 / 19

Workflow schema Easy to construct TURL with additional optional parameters To configure site choice (site, geoip, spacetoken): https://rucio-lb-prod.cern.ch/redirect/mc14_8tev/aod.01507240._010001.pool.root.2?site=lrz-lmu Rucio redirector Redirector returns actual local webdav TURL: https://lcg-lrz-dc66.grid.lrz.de:443/pnfs/lrzmuenchen.de/data/atlas/dq2/atlaslocalgroupdisk/rucio/mc14_8tev/99/dd/aod.01507240._010001.po ol.root.2 Job on grid site worker node Application: ROOT Athena with Davix or standalone aria2c Local or Remote SE PanDA server Total Rucio: 705 endpoints, 159.8 PB, 637 mio files Webdav accessible: 442 endpoints, 82.6 PB, 491 mio files J. Elmsheuser (LMU München) Webdav/HTTP data access in ATLAS 13/04/2015 5 / 19

Outline 1 Introduction 2 Setup status and Webdav PanDA Functional Tests with HammerCloud 3 Access Protocol Comparisons J. Elmsheuser (LMU München) Webdav/HTTP data access in ATLAS 13/04/2015 6 / 19

Setup Rucio and PanDA I Step I: Information system: Mapping of sitename vs. webdav door, e.g. LRZ-LMU: https://lcg-lrz-dc66.grid.lrz.de Verify information system webdav configurations ATLAS DDM started in the past in SE renaming campaign Full table: http://atlas-agis.cern.ch/agis/service/table_view/?type= SE/HTTP&state=ACTIVE Step IIa: PanDA pilot update - aria2c: Add aria2c wrapper method to PanDA pilot to create and use metalink xml file for source code download in the job: <metalink> <file name="user.gangarbt.0302032633.699303.5577.lib.tgz"> <identity>user.gangarbt:user.gangarbt.0302032633.699303.5577.lib.tgz</identity> <hash type="adler32">22b8cbbd</hash> <size>11505</size> <url location="lrz-lmu_scratchdisk" priority="1"> https://lcg-lrz-dc66.grid.lrz.de:443/pnfs/lrz-muenchen.de/data/atlas/dq2/atlasscratchdisk/rucio/user/g </url> </file> </metalink> Since release 1.18.10 support adler32 checksums J. Elmsheuser (LMU München) Webdav/HTTP data access in ATLAS 13/04/2015 7 / 19

Setup Rucio and PanDA II Step IIb: PanDA pilot update - ROOT with Davix: Iterated with Davix developers over several versions, current version 0.4.0 working stable and fine Deployed and enabled Davix library with ATLAS ROOT installation in CVMFS Usage: TFile::Open("https://myse/scope/myfile.root") Rucio allows together with Davix to redirect to any webdav configured ATLAS SE Options:?site=LRZ-LMU?rse=LRZ-LMU LOCALGROUPDISK?select=geoip to select best replica Currently in use in PanDA pilot:?site=lrz-lmu Add a configurable wrapper to create TURL input file list for ROOT/Athena job:... https://rucio-lb-prod.cern.ch/redirect/data12_8tev/ntup_common.01306922._000007.root.1?site=lrz-lmu https://rucio-lb-prod.cern.ch/redirect/data12_8tev/ntup_common.01306922._000013.root.1?site=lrz-lmu... Future: Option for metalink usage - this is used in the PanDA aria2c sitemover - allows multiple streams from different sites & automatic failover See also talks by Vincent Garonne on Rucio (#205) and Tadashi Maeno on PanDA (#144) J. Elmsheuser (LMU München) Webdav/HTTP data access in ATLAS 13/04/2015 8 / 19

Webdav functional tests with PanDA Step III: Continuous HammerCloud tests: Probing aria2c/davix access to local SE via webdav at more than 70 sites using a HammerCloud functional test with ATLAS PanDA jobs Short PyROOT 5.34.22 job that reads a couple of variables from ROOT ntuple Iterative improvement process - about 1/3 of site systematically fail due to old SE software versions or configurations update needed at site level See also poster by Michael Böhler about HammerCloud (#159) J. Elmsheuser (LMU München) Webdav/HTTP data access in ATLAS 13/04/2015 9 / 19

Outline 1 Introduction 2 Setup status and Webdav PanDA Functional Tests with HammerCloud 3 Access Protocol Comparisons J. Elmsheuser (LMU München) Webdav/HTTP data access in ATLAS 13/04/2015 10 / 19

Access protocol comparisons Comparison of different input file protocols: Comparison of nfs, dcap, xrootd/fax, webdav/davix Use real-life analysis based on new ATLAS xaod data format, EventLoop frameworks, reads/processes muons, jets, stores some output Local tests at LRZ-LMU with access to dcache SE and in Grid using PanDA jobs at different sites Using different analysis modes: Rather CPU intensive analysis and rather fast I/O analysis with ROOT 5.34.24-x86 64-slc6-gcc48-opt Access with xaod class or branch access mode TTreeCache is enabled with 10 events/10 MB learning in code Using in addition: ROOT TTREECACHE PREFILL=1 ROOT TTREECACHE SIZE=1 to patch missing pre-fetch buffer in Davix See also talks by Thomas Maier (#171), Scott Snyder (#182), James Catmore (#164) J. Elmsheuser (LMU München) Webdav/HTTP data access in ATLAS 13/04/2015 11 / 19

Local access to LRZ-LMU Measured event rate for local access at LRZ-LMU Repeated several times, uncertainties 10% class access mode Slow Medium Fast Fast w/o init local/nfs [Hz] 80 210 195 200 dcap 75 196 147 155 FAX 69 193 182 194 Davix 72 192 190 205 branch access mode local/nfs [Hz] 89 230 550 1050 dcap 94 228 495 850 FAX 86 211 497 830 Davix 84 205 483 815 Davix suffers from missing buffering in start-up - PREFILL=1 cures the problem No differences in access modes within uncertainties and scale of tests J. Elmsheuser (LMU München) Webdav/HTTP data access in ATLAS 13/04/2015 12 / 19

Local access to LRZ-LMU II - eth0 rate MB/s 10 12 10 8 6 4 6 Received bytes / sec branch access class access dcap xrootd webdav dcap xrootd webdav Medium fast TTreeCache 10 events 10 MB 2 input files at LRZ-LMU 2 0-600 -500-400 -300-200 -100 time [s] 0 10-12 MB/s throughput, no large difference between branch/class access, reading spike for every file open, not much protocol difference J. Elmsheuser (LMU München) Webdav/HTTP data access in ATLAS 13/04/2015 13 / 19

Local access to LRZ-LMU III - eth0 rate MB/s Received bytes / sec 6 10 35 Fast analysis branch access class access 30 TTreeCache 10 events dcap xrood webdav dcap xrootd webdav 25 10 MB 5 input files at LRZ-LMU 20 15 10 5 0-900 -800-700 -600-500 -400-300 -200-100 0 time [s] 10-16 and 15-30 MB/s throughput, large difference between branch/class access, not much protocol difference J. Elmsheuser (LMU München) Webdav/HTTP data access in ATLAS 13/04/2015 14 / 19

HammerCloud - Fast analysis, local replica I Default access: 24±15 % 280±190 Hz Webdav access: 23±9 % 228±130 Hz LRZ-LMU (dcap), FZK (dcap), Prague/FZU (DPM/copy-to-scratch) (BNL (failed in the webdav test), IN2P3-CC, UIO) J. Elmsheuser (LMU München) Webdav/HTTP data access in ATLAS 13/04/2015 15 / 19

HammerCloud - Fast analysis, local replica II CPU: 24 ± 15% Default: LRZ-LMU FZK Prague 550±205 Hz 331±74 Hz (313±80 Hz)* *copy-to-scratch 503±102 Hz 223±88 Hz 265±76 Hz Webdav: XRootD: 516±215 Hz 332±78 Hz 259±57 Hz J. Elmsheuser (LMU München) Webdav/HTTP data access in ATLAS 13/04/2015 16 / 19

HammerCloud - Fast analysis, Replica at LRZ Webdav access to remote replica at LRZ-LMU CPU: 20 ± 7% BNL from LRZ-LMU FZK from LRZ-LMU 243±43 Hz 274±54 Hz Prague from LRZ-LMU LRZ-LMU from LRZ-LMU 185±61 Hz 542±141 Hz J. Elmsheuser (LMU München) Webdav/HTTP data access in ATLAS 13/04/2015 17 / 19

HammerCloud - Slow analysis, local replica CPU: 63 ± 14% Default: LRZ-LMU FZK Prague 55±17 Hz 55±8 Hz (65±13 Hz)* *copy-to-scratch 51±13 Hz 40±19 Hz 49±14 Hz Webdav: XRootD: 50±17 Hz 53±11 Hz 48±11 Hz J. Elmsheuser (LMU München) Webdav/HTTP data access in ATLAS 13/04/2015 18 / 19

Summary and Conclusions Presented the setup and how to use webdav access for PanDA job in the ATLAS experiment aria2c and Davix together with Rucio redirector key components Several and long process of iterations to stabilise external code components Now webdav is working fine at up-to-date sites No differences in tested access protocols visible within uncertainties and scale of tests Webdav is very good candidate for unified access mode at HEP sites J. Elmsheuser (LMU München) Webdav/HTTP data access in ATLAS 13/04/2015 19 / 19