Data Transfers Between LHC Grid Sites Dorian Kcira

Similar documents
Towards Network Awareness in LHC Computing

The CMS Computing Model

Compact Muon Solenoid: Cyberinfrastructure Solutions. Ken Bloom UNL Cyberinfrastructure Workshop -- August 15, 2005

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

High Throughput WAN Data Transfer with Hadoop-based Storage

Computing at the Large Hadron Collider. Frank Würthwein. Professor of Physics University of California San Diego November 15th, 2013

CMS Data Transfer Challenges and Experiences with 40G End Hosts

Storage on the Lunatic Fringe. Thomas M. Ruwart University of Minnesota Digital Technology Center Intelligent Storage Consortium

ANSE: Advanced Network Services for [LHC] Experiments

arxiv: v1 [cs.dc] 20 Jul 2015

The Compact Muon Solenoid Experiment. Conference Report. Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

CouchDB-based system for data management in a Grid environment Implementation and Experience

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

Storage and I/O requirements of the LHC experiments

Stephen J. Gowdy (CERN) 12 th September 2012 XLDB Conference FINDING THE HIGGS IN THE HAYSTACK(S)

The LHC Computing Grid

System upgrade and future perspective for the operation of Tokyo Tier2 center. T. Nakamura, T. Mashimo, N. Matsui, H. Sakamoto and I.

WLCG Transfers Dashboard: a Unified Monitoring Tool for Heterogeneous Data Transfers.

Andrea Sciabà CERN, Switzerland

Virtualizing a Batch. University Grid Center

SPINOSO Vincenzo. Optimization of the job submission and data access in a LHC Tier2

Grid Data Management

The INFN Tier1. 1. INFN-CNAF, Italy

Large scale commissioning and operational experience with tier-2 to tier-2 data transfer links in CMS

CERN openlab II. CERN openlab and. Sverre Jarp CERN openlab CTO 16 September 2008

The Echo Project An Update. Alastair Dewhurst, Alison Packer, George Vasilakakos, Bruno Canning, Ian Johnson, Tom Byrne

Big Data Analytics and the LHC

CMS Computing Model with Focus on German Tier1 Activities

File Access Optimization with the Lustre Filesystem at Florida CMS T2

ATLAS Experiment and GCE

Data Storage. Paul Millar dcache

Cluster Setup and Distributed File System

Data transfer over the wide area network with a large round trip time

First Experience with LCG. Board of Sponsors 3 rd April 2009

Reliability Engineering Analysis of ATLAS Data Reprocessing Campaigns

High-Energy Physics Data-Storage Challenges

LCG data management at IN2P3 CC FTS SRM dcache HPSS

ATLAS operations in the GridKa T1/T2 Cloud

Detector Control LHC

The creation of a Tier-1 Data Center for the ALICE experiment in the UNAM. Lukas Nellen ICN-UNAM

Understanding the T2 traffic in CMS during Run-1

UW-ATLAS Experiences with Condor

Benoit DELAUNAY Benoit DELAUNAY 1

LHC and LSST Use Cases

CMS Grid Computing at TAMU Performance, Monitoring and Current Status of the Brazos Cluster

GEMSS: a novel Mass Storage System for Large Hadron Collider da

Xrootd Monitoring for the CMS Experiment

arxiv: v1 [physics.ins-det] 1 Oct 2009

LHCb Computing Resource usage in 2017

Challenges of the LHC Computing Grid by the CMS experiment

Center for Advanced Computing Research

Monitoring of large-scale federated data storage: XRootD and beyond.

The evolving role of Tier2s in ATLAS with the new Computing and Data Distribution model

I Service Challenge e l'implementazione dell'architettura a Tier in WLCG per il calcolo nell'era LHC

High-density Grid storage system optimization at ASGC. Shu-Ting Liao ASGC Operation team ISGC 2011

Benchmarking third-party-transfer protocols with the FTS

UK Tier-2 site evolution for ATLAS. Alastair Dewhurst

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

SNAG: SDN-managed Network Architecture for GridFTP Transfers

LHCb Computing Resources: 2018 requests and preview of 2019 requests

Computing. DOE Program Review SLAC. Rainer Bartoldus. Breakout Session 3 June BaBar Deputy Computing Coordinator

CASTOR: A Distributed Storage Resource Facility for High Performance Data Processing at CERN

Long Term Data Preservation for CDF at INFN-CNAF

Software and computing evolution: the HL-LHC challenge. Simone Campana, CERN

Data services for LHC computing

Grid Computing at the IIHE

Tests of PROOF-on-Demand with ATLAS Prodsys2 and first experience with HTTP federation

Constant monitoring of multi-site network connectivity at the Tokyo Tier2 center

CMS Belgian T2. G. Bruno UCL, Louvain, Belgium on behalf of the CMS Belgian T2 community. GridKa T1/2 meeting, Karlsruhe Germany February

DESY at the LHC. Klaus Mőnig. On behalf of the ATLAS, CMS and the Grid/Tier2 communities

The CMS data quality monitoring software: experience and future prospects

RUSSIAN DATA INTENSIVE GRID (RDIG): CURRENT STATUS AND PERSPECTIVES TOWARD NATIONAL GRID INITIATIVE

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

Improving Packet Processing Performance of a Memory- Bounded Application

Analytics Platform for ATLAS Computing Services

CERN and Scientific Computing

Monitoring system for geographically distributed datacenters based on Openstack. Gioacchino Vino

The CMS experiment workflows on StoRM based storage at Tier-1 and Tier-2 centers

Programmable Information Highway (with no Traffic Jams)

Storage Resource Sharing with CASTOR.

arxiv: v1 [cs.dc] 12 May 2017

CC-IN2P3: A High Performance Data Center for Research

Grid Computing: dealing with GB/s dataflows

Distributed e-infrastructures for data intensive science

Status of KISTI Tier2 Center for ALICE

PROOF-Condor integration for ATLAS

Connectivity Services, Autobahn and New Services

Summary of the LHC Computing Review

The Software Defined Online Storage System at the GridKa WLCG Tier-1 Center

LHCb Computing Status. Andrei Tsaregorodtsev CPPM

New strategies of the LHC experiments to meet the computing requirements of the HL-LHC era

Spanish Tier-2. Francisco Matorras (IFCA) Nicanor Colino (CIEMAT) F. Matorras N.Colino, Spain CMS T2,.6 March 2008"

Experience of the WLCG data management system from the first two years of the LHC data taking

Attaining the Promise and Avoiding the Pitfalls of TCP in the Datacenter. Glenn Judd Morgan Stanley

ISTITUTO NAZIONALE DI FISICA NUCLEARE

Philippe Laurens, Michigan State University, for USATLAS. Atlas Great Lakes Tier 2 collocated at MSU and the University of Michigan

Transcription:

Data Transfers Between LHC Grid Sites Dorian Kcira dkcira@caltech.edu Caltech High Energy Physics Group hep.caltech.edu/cms

CERN Site: LHC and the Experiments Large Hadron Collider 27 km circumference Lake Geneva CMS LHCb ALICE ATLAS

Compact Muon Solenoid Magnetic length 12.5 m Free bore diameter 6 m Central B Field 4 Tesla Temperature 4.2 o K Nominal current 20 ka Radial Pressure 64 Atm. Stored energy 2.7 GJ CMS: A Nimitz Class 117,000 Ton Carrier at 20 mph

Compact Muon Solenoid CMS: several tens of petabytes of data > 70 000 cores spread globally in the LHC Grid all LHC experiments: >250K cores Computing resources: 24/7/365 basis High efficiency (estimated 85%) Some fluctuations due to data taking variations of the experiment and analysis variations due to interesting results and/or conferences Two main types of main workflows: 1. Monte Carlo simulation of physics events: high CPU, small I/O 2. Reconstruction meaningful physical quantities + Analysis: high CPU + high I/O

The LHC Data Grid Hierarchy: Developed at Caltech (1999) A Global Dynamic System. Petabytes to Exabytes per Year Experiment Tier 1 IN2P3 Center A New Generation of Networks ~PByte/sec Online System 10 40 to 100 Gbps RAL Center Tier 0 +1 ~300-1500 MBytes/sec INFN Center CERN Center PBs of Disk; Tape Robots 11 Tier1 and 300 Tier2 and Tier3 Centers FNAL Center Tier 3 Physics data cache 10 to N X 10 Gbps Institute Workstations Institute Tier 2 Institute Institute 1 to 10 Gbps Tier2 Center Tier2 Center Tier2 Center Tier2 Center Tier2 Center > 100 Petabytes Per Year 100 Gbps+ Data Networks Tier 4 Synergy with US LHCNet TA Network State of the Art Data Transfer Tools

Location Independent Access: Blurring the Boundaries Among Sites + Analysis vs Computing Once the archival functions are separated from the Tier-1 sites, the Tier2-Tier1 functional difference blurs. Similar for the analysis/computing-ops Connections and functions of sites are defined by their capability, including the network Maria Girone CMS Computing Tier-0 CAF Tier-1 Tier-1 Tier-1 Tier-1 Tier-2 Tier-1 Tier-1 Tier-1 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Cloud Model Tier-2 Tier-2 Tier-2 AAA Scale tests ongoing: 20% of data across WAN: 200k jobs, 60k files, (100TB)/day

Any Data, Any Time, Any Where AAA aims at removing data locality and increase number of sites where jobs can run US CMS effort based at Nebraska/UCSD/Wisconsin The solution is a federated data storage collection of disparate storage resources transparently accessible across a wide area network via a common namespace Underlying technology is Xrootd Uniform interface in front of heterogeneous storage systems Applications query a redirector that points them to the sites Fallback mode very useful, access missing dat at other sites Access is authenticated Possible because of reliable access of data over WAN and because of the globally consistent namespace of CMS

Any Data, Any Time, Any Where Xrootd Local Region Redirection Xrootd Fallback Access

Any Data, Any Time, Any Where Aggregated Xrootd traffic over the last 3 months Average: 1.757 GB/s => 148 TB / day over WAN

Any Data, Any Time, Any Where Xrootd at Caltech in September Peaks at 6.8 Gbps Non-negligible traffic Largest part from FNAL but contributions also from other T2 sites.

Very first LHC Tier 2 site On Caltech Campus at 3 different locations 3.1 PB Hadoop storage 5200 job slots 300 physical servers Upgraded 100G link to Internet 2 Lead key areas of the LHC computing and software aimed at enabling grid-based data analysis, as well as global-scale networking and collaborative systems The Caltech Tier 2

The Caltech Tier 2

CMS Data Transfer Volume 42 PetaBytes Transferred Over 12 Months = 10.6 Gbps Avg. (>20 Gbps Peak) Feb. 2012 to Jan 2013 2012 Versus 2011: +45%

Data to and from the Caltech Tier 2 Outgoing from Caltech 370TB Transfers during 52 weeks Relatively calm period due to LHC shutdown Incoming to Caltech 240 TB Outgoing data of the same order as incoming. This is due to o MC generation happening at Tier 2s o Data transfers not happening anymore centrally: full mesh model More transfers through, federated data system, AAA: any data anytime anywhere. o Fallback or overflow, especially at regional level

CMS Data Transfer Tools PhEDEx: CMS Data Transfer Management System FTS: Grid Data Transfer Service SRM: Storage Resource Manager. Grid storage services providing interfaces to storage resources GridFTP: high-performance, secure, reliable data transfer protocol optimized for high-bandwidth widearea networks FDT: java application for efficient data transfers, capable of reading and writing at disk speed over wide area networks

Steering CMS Data Transfers PhEDEx FTS FTS Optimizer CMS Central Services PhEDEx client Site, e.g. T2_US_Caltech SRM GridFTP GridFTP GridFTP GridFTP GridFTP GridFTP Site Storage Data transfers to other sites

20 Gbps and Beyond Links of USCMS sites were upgraded to 100 Gbps USCMS asked all sites to commission the links and demonstrate the ability to perform transfers of at least 20 Gbps in production hence the name 20 Gpbs Challenge there are 8 US CMS Tier 2 sites and the Tier 1 site at Fermilab Caltech was the first site to pass the challenge and took a leading role in this effort. Results presented here are based on the work of Samir Cury of Caltech

20 Gbps and Beyond Transfer Commissioning As expected, 20 Gbps transfers did not work out of the box. Most problems related to configuration at different levels PhEDEx At Caltech we had to change the number of transfers to the different sites in the client configuration These configuration changes were propagated to other sites FTS FTS 3 migration on June 2014. Transition transparent until we started with the high rate transfers FTS optimizer changes the number of active transfers (GridFTP processes) depending on transfer rates (or failures) Work went into tuning the FTS Optimizer

time FTS Optimizer

FTS Optimizer

20 Gbps and Beyond Bypassing the FTS Optimizer FTS developers agreed that we can trade-off better rates for imperfect transfer efficiencies (failure rates) Performed tests bypassing the optimizer

FTS Optimizer There are three different auto-tuning modes for the FTS Optimizer: Conservative Try to maintain 99%-100% transfer efficiency No TCP streams optimization, finite range per file size Moderate Allow transfer efficiency to go down to 97% Probe no less than 15 min for 1-16 TCP streams After converging, stick to the best settings for ½ day Aggressive Allow transfer efficiency to go down to 95% Probe for 15 mins time window with 1-16 TCP streams After converging, maintain setting for ½ day Set TCP buffer size to 8MB. N.B. this has not been tested in production.

20 Gbps and Beyond Transfer Commissioning (PhEDEx) Transfers from Caltech to other sites

20 Gbps and Beyond Design Considerations 8 x 10 Gpbs GridFTP servers => High cost 2 x 40 Gbps GridFTP servers => at least ½ the cost Challenges 40 Gbps do not perform as well Evaluating their performance for different setups Max rate so far: 18Gbps / GridFTP server Still a lot of tuning options to be tested

20 Gbps and Beyond Next Steps USCMS Sites that passed the challenge already: Caltech, Purdue USCMS Sites very close to passing: Florida, Nebraska Coordinated efforts with the other USCMS sites Development needed in order for Xrootd to be used transparently through FTS at any site Work on optimizing GridFTP setup

Summary & Conclusions The LHC computing and data models continue to evolve towards more dynamic, richly structured, on-demand data movement. Large data transfers (requiring high throughput) are complemented by remote data access (latency sensitive). Uplinks of US CMS LHC Grid sites upgraded to 100 Gbps. Caltech coordinating effort to commission faster transfers over the new links. Work ongoing into optimization of transfer tools and development of new tools. Caltech leads projects that are engaging the LHC experiments to implement increasing network-awareness and interaction in their data management and workflow software stacks. See also presentation in this track, Wednesday at 9:15

Questions?