Influence of Distributing a Tier-2 Data Storage on Physics Analysis

Similar documents
Tier2 Centre in Prague

New data access with HTTP/WebDAV in the ATLAS experiment

The INFN Tier1. 1. INFN-CNAF, Italy

Lessons Learned in the NorduGrid Federation

Distributing storage of LHC data - in the nordic countries

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

Data Storage. Paul Millar dcache

The ATLAS Distributed Analysis System

A scalable storage element and its usage in HEP

Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008

LCG data management at IN2P3 CC FTS SRM dcache HPSS

ATLAS Tier-3 UniGe

Analisi Tier2 e Tier3 Esperienze ai Tier-2 Giacinto Donvito INFN-BARI

Parallel Storage Systems for Large-Scale Machines

UW-ATLAS Experiences with Condor

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

Benoit DELAUNAY Benoit DELAUNAY 1

Support for multiple virtual organizations in the Romanian LCG Federation

Storage Resource Sharing with CASTOR.

Operating the Distributed NDGF Tier-1

PROOF-Condor integration for ATLAS

Users and utilization of CERIT-SC infrastructure

Considerations for a grid-based Physics Analysis Facility. Dietrich Liko

Data transfer over the wide area network with a large round trip time

Efficient HTTP based I/O on very large datasets for high performance computing with the Libdavix library

Prague Tier-2 storage experience

Data Transfers Between LHC Grid Sites Dorian Kcira

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

Outline. Infrastructure and operations architecture. Operations. Services Monitoring and management tools

Status of KISTI Tier2 Center for ALICE

The Run 2 ATLAS Analysis Event Data Model

Grid Computing Activities at KIT

Storage and I/O requirements of the LHC experiments

Overview of ATLAS PanDA Workload Management

and the GridKa mass storage system Jos van Wezel / GridKa

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

The Wuppertal Tier-2 Center and recent software developments on Job Monitoring for ATLAS

The ATLAS EventIndex: Full chain deployment and first operation

The EU DataGrid Testbed

UK Tier-2 site evolution for ATLAS. Alastair Dewhurst

The evolving role of Tier2s in ATLAS with the new Computing and Data Distribution model

Summary of the LHC Computing Review

The Global Grid and the Local Analysis

Towards Network Awareness in LHC Computing

Clouds at other sites T2-type computing

High-density Grid storage system optimization at ASGC. Shu-Ting Liao ASGC Operation team ISGC 2011

Large scale commissioning and operational experience with tier-2 to tier-2 data transfer links in CMS

ATLAS operations in the GridKa T1/T2 Cloud

XCache plans and studies at LMU Munich

Edinburgh (ECDF) Update

Benchmarking third-party-transfer protocols with the FTS

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

Data Movement & Tiering with DMF 7

Long Term Data Preservation for CDF at INFN-CNAF

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

Department of Physics & Astronomy

Unified storage systems for distributed Tier-2 centres

The CMS Computing Model

Compact Muon Solenoid: Cyberinfrastructure Solutions. Ken Bloom UNL Cyberinfrastructure Workshop -- August 15, 2005

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. Presented by Manfred Alef Contributions of Jos van Wezel, Andreas Heiss

D0 Grid: CCIN2P3 at Lyon

Austrian Federated WLCG Tier-2

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

Philippe Charpentier PH Department CERN, Geneva

ATLAS COMPUTING AT OU

ANSE: Advanced Network Services for [LHC] Experiments

Mainframe Backup Modernization Disk Library for mainframe

Cluster Setup and Distributed File System

arxiv: v1 [physics.ins-det] 1 Oct 2009

High-Energy Physics Data-Storage Challenges

System upgrade and future perspective for the operation of Tokyo Tier2 center. T. Nakamura, T. Mashimo, N. Matsui, H. Sakamoto and I.

Past. Inputs: Various ATLAS ADC weekly s Next: TIM wkshop in Glasgow, 6-10/06

Service Availability Monitor tests for ATLAS

Andrea Sciabà CERN, Switzerland

Milestone Solution Partner IT Infrastructure Components Certification Report

Data storage services at KEK/CRC -- status and plan

I/O Challenges: Todays I/O Challenges for Big Data Analysis. Henry Newman CEO/CTO Instrumental, Inc. April 30, 2013

Analytics Platform for ATLAS Computing Services

Service Level Agreement Metrics

SAM at CCIN2P3 configuration issues

Analysis of internal network requirements for the distributed Nordic Tier-1

ARC integration for CMS

CMS Belgian T2. G. Bruno UCL, Louvain, Belgium on behalf of the CMS Belgian T2 community. GridKa T1/2 meeting, Karlsruhe Germany February

CMS Grid Computing at TAMU Performance, Monitoring and Current Status of the Brazos Cluster

1. Introduction. Outline

Clustering and Reclustering HEP Data in Object Databases

I/O at the Center for Information Services and High Performance Computing

The PanDA System in the ATLAS Experiment

Technical papers Web caches

Magellan Project. Jeff Broughton NERSC Systems Department Head October 7, 2009

Proceedings of the Meeting & workshop on Development of a National IT Strategy Focusing on Indigenous Content Development

Tests of PROOF-on-Demand with ATLAS Prodsys2 and first experience with HTTP federation

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 420, York College. November 21, 2006

Spanish Tier-2. Francisco Matorras (IFCA) Nicanor Colino (CIEMAT) F. Matorras N.Colino, Spain CMS T2,.6 March 2008"

Data oriented job submission scheme for the PHENIX user analysis in CCJ

ATLAS Oracle database applications and plans for use of the Oracle 11g enhancements

PoS(EGICF12-EMITC2)106

Data Analysis in ATLAS. Graeme Stewart with thanks to Attila Krasznahorkay and Johannes Elmsheuser

Tier-2 structure in Poland. R. Gokieli Institute for Nuclear Studies, Warsaw M. Witek Institute of Nuclear Physics, Cracow

Optical Networking Activities in NetherLight

Transcription:

ACAT Conference 2013 Influence of Distributing a Tier-2 Data Storage on Physics Analysis Jiří Horký 1,2 (horky@fzu.cz) Miloš Lokajíček 1, Jakub Peisar 2 1 Institute of Physics ASCR, 2 CESNET 17th of May, 2013

Background and Motivation FZU is a Tier-2 site, mainly used for ATLAS, ALICE and D0 experiments based in Prague, Czech Republic 4000 CPU cores, 2.5PB of usable disk space DPM storage element for ATLAS and xrootd for ALICE decrease of financial support from grants increasing demand for capacity from CERN experiments foreseen novel resource providers must be looked for 2

Background and Motivation New e-infrastructure projects in the Czech Republic Czech NREN, but not only a plain network provider NGI CZ computing infrastructure, but with limited resources for HEP experiments new service: data storage facility three distributed HSM based storage sites designed for research and science community opportunity for collaboration 3

Site Locations FZU Tier-2 in Prague CESNET storage site in Pilsen 4

Implementation Design Choices Under which site to publish the storage resources? ATLAS nor ALICE experiments supported on CESNET s computing infrastructure (prague_cesnet_lcg2) another SE under FZU s Tier-2 site (praguelcg2) part of the site operated by someone else - concerns about influence on reliability, monitoring etc. Which SE implementation? HSM system (DMF) with 500TB of disk and 3PB of tape space in Pilsen only ~35TB of disk space could be devoted to grid services SE that could handle tapes needed dcache chosen, gsidcap and gsiftp protocols 5

Implementation Details FZU<->Pilsen - 10Gbit link with ~3.5ms latency public Internet connection shared with other institutes within the area dedicated link from FZU to the CESNET s backbone to be installed soon Concerns about chaotic use of the HSM system from users (migrations, recalls from/to tape) disk-only spacetoken (ATLASLOCALGROUPDISK) provided for user analysis data tape-only spacetoken (ATLASTAPE) as an archive of users datasets similar setup for Auger 6

Implementation ATLAS New DDM (ATLAS data management system) endpoint created (PRAGUELCG2_PPSLOCALGROUPDISK) a user selects which endpoint to send the data in ATLAS DaTRI/DQ2 system The same Panda (ATLAS job submission system) queue as for the rest of pragulcg2 site transparent job submission for data on the local and the remote SE 7

Operational Challenges 3 rd party service provider to FZU problems with dcache affect FZU s reliability SAM Nagios messages regarding dcache go to FZU s team instead of CESNET, same for GGUS Nagios reconfigured, GGUS still need to be reposted (or receive all the unrelated tickets) CESNET s members were added the possibility to add scheduled downtimes in GOCDB (but for the whole site) Some trust necessary 8

Impact on User Analysis Initial user analysis tests very slow 14% job efficiency in comparison with 71% against local storage with a single job manual iperf showed 30Mbps throughput only Cisco FWSM module identified to be the issue even with CPU load close to 0 HW filtering limit! only in effect on a public Internet link the one to dcache 2.5Gbit hard limit in one direction, much less on a single connection moved away from FWSM to ACLs speed issue fixed 9

Impact on User Analysis Still concerned about job efficiency due to network latency ~3.5 ms instead of 0.2 ms locally line bandwidth obvious limitation as well Several factors to be tested TTreeCache on/off dcap read ahead (DCACHE_RAHEAD) on/off number of roundtrips & network bandwidth used 10

Impact on User Analysis An example ATLAS analysis job selected and examined IO access pattern number of read requests, size of the requests, sequential/random manner -> access pattern diagram, size histogram of the requests 11

Impact on User Analysis TTreeCache off in detail 485MiB out of 2GB read 15471 IO calls median size 2.6KiB, avg size 30KiB 12

Impact on User Analysis TTreeCache on in detail 566MiB out of 2GB read 16% more 2311 IO calls median size 90KiB, avg size 251KiB 13

Bytes/s Bytes/s Impact on User Analysis Network utilization using dcap, no RA cache tuning TTreeCache OFF - constant ~10MB/s flow - avg. throughput 76.5 Mbit + takes longer 25M TTreeCache ON - ~50MB/s for 3seconds period - avg. throughput 61.4Mbit 100M 14

Bytes/s Bytes/s Impact on User Analysis Network utilization using dcap, no RA cache tuning TTreeCache OFF - constant ~10MB/s flow - avg. throughput 76.5 Mbit + takes longer 25M Reason: dcap RA cache TTreeCache ON - ~50MB/s for 3seconds period - avg. throughput 61.4Mbit 100M 15

Impact on User Analysis dcap ReadAhead cache in ROOT: enabled by default, 128KiB, cache size the same actually, can not be really disabled! but can be set to really small value - DCACHE_RA_BUFFER env. variable quite strict behavior for every IO not within 128KiB from last read, at least 128KiB transferred bigger requests split to 256KiB performance killer for small random reads TTreeCache OFF case 16

Impact on User Analysis dcap ReadAhead cache in ROOT TTreeCache OFF Local access (no RA) dcap with default 128KiB RA measured on server side 485MiB in 16471 calls 770MiB in 5376 calls 3x less IO reqs, no IO vectorization 58% more data read 17

Impact on User Analysis dcap ReadAhead cache in ROOT - TTreeCache ON Local access (no RA) dcap with default 128KiB RA measured on server side 566MiB in 2311 calls 490MiB in 7772 calls non cache reads are vectorized 18

Side by Side Comparison The same analysis job run under different conditions reading 100 ~2GB files remote/local Method TTreeCache dcap RA events/s ( %) Bytes transferred % CPU Efficiency local rfio ON N/A 100% 117% 98,9% local rfio OFF N/A 74% 100% 72,7% remote dcap ON off 76% 117% 75,0% remote dcap ON 128KiB 75% 101% 73,5% remote dcap OFF off 46% 100% 46,9% remote dcap OFF 128KiB 54% 159% 59,7% 19

Side by Side Comparison remote/local Method TTreeCache dcap RA events/s ( %) Bytes transferred % CPU Efficiency local rfio ON N/A 100% 117% 98,9% local rfio OFF N/A 74% 100% 72,7% remote dcap ON off 76% 117% 75,0% remote dcap ON 128KiB 75% 101% 73,5% remote dcap OFF off 46% 100% 46,9% remote dcap OFF 128KiB 54% 159% 59,7% TTreeCache helps a lot both for local and for remote transfers efficient coupling with dcap RA mechanism almost no extra bandwidth overhead 20

Side by Side Comparison remote/local Method TTreeCache dcap RA events/s ( %) Bytes transferred % CPU Efficiency local rfio ON N/A 100% 117% 98,9% local rfio OFF N/A 74% 100% 72,7% remote dcap ON off 76% 117% 75,0% remote dcap ON 128KiB 75% 101% 73,5% remote dcap OFF off 46% 100% 46,9% remote dcap OFF 128KiB 54% 159% 59,7% TTreeCache helps a lot both for local and for remote transfers efficient coupling with dcap RA mechanism almost no extra bandwidth overhead dcap RA can cause considerable bandwidth overhead without TTreeCache 21

Side by Side Comparison remote/local Method TTreeCache dcap RA events/s ( %) Bytes transferred % CPU Efficiency local rfio ON N/A 100% 117% 98,9% local rfio OFF N/A 74% 100% 72,7% remote dcap ON off 76% 117% 75,0% remote dcap ON 128KiB 75% 101% 73,5% remote dcap OFF off 46% 100% 46,9% remote dcap OFF 128KiB 54% 159% 59,7% TTreeCache helps a lot both for local and for remote transfers efficient coupling with dcap RA mechanism almost no extra bandwidth overhead dcap RA can cause considerable bandwidth overhead without TTreeCache TTreeCached remote jobs faster than local ones without the cache 22

Conclusion Operating CESNET s dcache SE under FZU s Tier-2 site works well Several issues identified and fixed (FW, SW issues) Proper job settings needed (TTreeCache) to ensure reasonable performance and link utilization 23

Thank you for your attention. Questions? Acknowledgements: This work was supported by the CESNET Storage group members as well as by the computing center team at FZU. The work was supported by grant INGO LG13031. Jiří Horký (horky@fzu.cz)