ANSE: Advanced Network Services for [LHC] Experiments

Similar documents
LHC and LSST Use Cases

Towards Network Awareness in LHC Computing

DYNES: DYnamic NEtwork System

CMS Data Transfer Challenges and Experiences with 40G End Hosts

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

Data Transfers Between LHC Grid Sites Dorian Kcira

The evolving role of Tier2s in ATLAS with the new Computing and Data Distribution model

THOUGHTS ON SDN IN DATA INTENSIVE SCIENCE APPLICATIONS

November 1 st 2010, Internet2 Fall Member Mee5ng Jason Zurawski Research Liaison

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

5 August 2010 Eric Boyd, Internet2 Deputy CTO

Network Analytics. Hendrik Borras, Marian Babik IT-CM-MM

LHCb Computing Status. Andrei Tsaregorodtsev CPPM

Overview of ATLAS PanDA Workload Management

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science

PoS(ISGC 2014)021. Integrating Network-Awareness and Network-Management into PhEDEx. Tony Wildish. ANSE Collaboration

High Throughput WAN Data Transfer with Hadoop-based Storage

Understanding the T2 traffic in CMS during Run-1

The CMS Computing Model

LHC Open Network Environment. Artur Barczyk California Institute of Technology Baton Rouge, January 25 th, 2012

Andrea Sciabà CERN, Switzerland

WLCG Network Throughput WG

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

Monitoring for IT Services and WLCG. Alberto AIMAR CERN-IT for the MONIT Team

Summary of the LHC Computing Review

SLATE. Services Layer at the Edge. First Meeting of the National Research Platform Montana State University August 7-8, 2017

ATLAS Distributed Computing Experience and Performance During the LHC Run-2

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

New strategies of the LHC experiments to meet the computing requirements of the HL-LHC era

ATLAS & Google "Data Ocean" R&D Project

WLCG Transfers Dashboard: a Unified Monitoring Tool for Heterogeneous Data Transfers.

An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems. June 2007

Large scale commissioning and operational experience with tier-2 to tier-2 data transfer links in CMS

Analytics Platform for ATLAS Computing Services

Philippe Laurens, Michigan State University, for USATLAS. Atlas Great Lakes Tier 2 collocated at MSU and the University of Michigan

Data services for LHC computing

The Compact Muon Solenoid Experiment. Conference Report. Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland

Compact Muon Solenoid: Cyberinfrastructure Solutions. Ken Bloom UNL Cyberinfrastructure Workshop -- August 15, 2005

Computing. DOE Program Review SLAC. Rainer Bartoldus. Breakout Session 3 June BaBar Deputy Computing Coordinator

HEP replica management

Support for multiple virtual organizations in the Romanian LCG Federation

Distributed Data Management on the Grid. Mario Lassnig

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

C3PO - A Dynamic Data Placement Agent for ATLAS Distributed Data Management

The INFN Tier1. 1. INFN-CNAF, Italy

Monitoring and Accelera/ng GridFTP

The ATLAS Distributed Analysis System

DICE Diagnostic Service

The European DataGRID Production Testbed

The LHC Computing Grid

150 million sensors deliver data. 40 million times per second

Grid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms

SENSE: SDN for End-to-end Networked Science at the Exascale

T0-T1-T2 networking. Vancouver, 31 August 2009 LHCOPN T0-T1-T2 Working Group

A Simulation Model for Large Scale Distributed Systems

Evolution of OSCARS. Chin Guok, Network Engineer ESnet Network Engineering Group. Winter 2012 Internet2 Joint Techs. Baton Rouge, LA.

Grid Computing Activities at KIT

CMS users data management service integration and first experiences with its NoSQL data storage

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

ATLAS operations in the GridKa T1/T2 Cloud

ACCI Recommendations on Long Term Cyberinfrastructure Issues: Building Future Development

AliEn Resource Brokers

System upgrade and future perspective for the operation of Tokyo Tier2 center. T. Nakamura, T. Mashimo, N. Matsui, H. Sakamoto and I.

Streamlining CASTOR to manage the LHC data torrent

Virtualizing a Batch. University Grid Center

HEP Grid Activities in China

Distributing storage of LHC data - in the nordic countries

LHCb Computing Resources: 2019 requests and reassessment of 2018 requests

ELFms industrialisation plans

PROOF-Condor integration for ATLAS

Grid monitoring with MonAlisa

Federated Data Storage System Prototype based on dcache

ATLAS Computing: the Run-2 experience

PoS(EGICF12-EMITC2)106

Popularity Prediction Tool for ATLAS Distributed Data Management

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.

NCP Computing Infrastructure & T2-PK-NCP Site Update. Saqib Haleem National Centre for Physics (NCP), Pakistan

Constant monitoring of multi-site network connectivity at the Tokyo Tier2 center

Internet2 DCN and Dynamic Circuit GOLEs. Eric Boyd Deputy Technology Officer Internet2 GLIF Catania March 5, 2009

Benefits of a SD-WAN Development Ecosystem

Programmable Information Highway (with no Traffic Jams)

Evolution of Database Replication Technologies for WLCG

Analisi Tier2 e Tier3 Esperienze ai Tier-2 Giacinto Donvito INFN-BARI

Status of KISTI Tier2 Center for ALICE

Experiences with the new ATLAS Distributed Data Management System

The ATLAS Production System

Cluster Setup and Distributed File System

LHCb Computing Resources: 2018 requests and preview of 2019 requests

UK Tier-2 site evolution for ATLAS. Alastair Dewhurst

The ATLAS EventIndex: Full chain deployment and first operation

The Wuppertal Tier-2 Center and recent software developments on Job Monitoring for ATLAS

Internet2 Meeting September 2005

US LHCNet: Transatlantic Networking for the LHC and the U.S. HEP Community

The PanDA System in the ATLAS Experiment

Operating the Distributed NDGF Tier-1

Monitoring the ALICE Grid with MonALISA

LHCb Computing Strategy

perfsonar Deployment on ESnet

Presentation of the LHCONE Architecture document

Transcription:

ANSE: Advanced Network Services for [LHC] Experiments Artur Barczyk California Institute of Technology Joint Techs 2013 Honolulu, January 16, 2013

Introduction ANSE is a project funded by NSF s CC-NIE program Two years funding, started in Jan 2013, ~3 FTEs Collaboration of 4 institutes: Caltech (CMS) University of Michigan (ATLAS) Vanderbilt University (CMS) University of Texas at Arlington (ATLAS) Goal: Enable strategic workflow planning including network capacity as well as CPU and storage as a co-scheduled resource Path: Integrate advanced network-aware tools with the mainstream production workflows of ATLAS and CMS network provisioning and monitoring

Some Background: LHC Computing Model Evolution The original MONARC model was strictly hierarchical Changes introduced gradually since 2010 Main evolutions: Meshed data flows: Any site can use any other site as source of data Dynamic data caching: Analysis sites will pull datasets from other sites on demand, including from Tier2s in other regions In combination with strategic pre-placement of data sets Remote data access: jobs executing locally, using data cached at a remote site in quasi-real time Possibly in combination with local caching Variations by experiment Increased reliance on network performance! 3

Computing Site Roles (so far) Prompt calibration and alignment Reconstruction Store complete set of RAW data Data Reprocessing Archive RAW and Reconstructed data Tier 0 (CERN) Tier 1 Tier 1 Monte Carlo Production Physics Analysis Store Analysis Objects Tier 2 Tier 2 Physics Analysis, Interactive Studies Tier 3 4

ANSE Objectives and Approach Deterministic, optimized workflow is the goal Use network resource allocation along with storage and CPU resource allocation in planning data and job placement Improve overall throughput and task times to completion Integrate advanced network-aware tools in the mainstream production workflows of ATLAS and CMS use tools and deployed installations where they exist i.e. build on previous manpower investment in R&E networks extend functionality of the tools to match experiments needs identify and develop tools and interfaces where they are missing Build on several years of invested manpower, tools and ideas (some since the MONARC era)

ANSE - Methodology Use agile, managed bandwidth for tasks with levels of priority along with CPU and disk storage allocation. Allows one to define goals for time-to-completion, with reasonable chance of success Allows one to define metrics of success, such as the rate of work completion with reasonable resource use Allows one to define and achieve consistent workflow Dynamic circuits a natural match (as in DYNES for Tier2s and Tier3s) Process-Oriented Approach Measure resource usage and job/task progress in real-time If resource use or rate of progress is not as requested/planned, diagnose, analyze and decide if and when task replanning is needed Classes of work: defined by resources required, estimated time to complete, priority, etc.

Tool Categories Monitoring Allows reactive use react to events or situations in the network throughput measurements; possible actions: raise alarm and continue abort/restart transfers choose different source topology monitoring; possible actions: influence source selection raise alarm (e.g. extreme cases like site isolation) Network Control Allows pro-active use reserve Bandwidth -> prioritize transfers, remote access flows, etc. Co-scheduling of CPU, storage and network resources create custom topologies -> optimize infrastructure to operational conditions e.g. during LHC running period vs reconstruction/re-distribution

ATLAS and CMS computing PanDA Workflow Management System (ATLAS) highly automated flexible unified system for organised production and user analysis jobs uses asynchronous Distributed Data Management system DQ2, Rucio PanDA basic unit of work: a job physics tasks split into jobs by ProdSys layer above PanDA Automated brokerage based on CPU and Storage resources Tasks brokered among ATLAS clouds Jobs brokered among sites here s where Network information can/will be useful!

ATLAS Production Workflow Kaushik De, UTA

ATLAS and CMS computing PhEDEx is the CMS data-placement management tool a reliable and scalable dataset (fileblock-level) replication system focus on robustness Responsible for scheduling the transfer of CMS data across the grid using FTS, SRM, FTM, or any other transport package PhEDEx typically queues data in blocks for a given src-dst pair tens of TB up to several PB Natural candidate for using dynamic circuits could be extended to make use of a dynamic circuit API like NSI

Possible approaches within PhEDEx from T. Wildish, PhEDEx team lead There are essentially four possible approaches within PhEDEx for booking dynamic circuits: do nothing, and let the fabric take care of it similar to LambdaStation trivial, but lacks prioritization scheme not clear the result will be optimal book a circuit for each transfer-job i.e. per FDT or gridftp call effectively executed below the PhEDEx level management and performance optimization not obvious book a circuit at each download agent, use it for multiple transfer jobs maintain stable circuit for all the transfers on a given src-dst pair only local optimization, no global view book circuits at the (dataset) router level maintain a global view, global optimization is possible advance reservation

(Some) Questions to Answer Also raised at the LHCONE Point-to-point workshop last December Call-blocking response what happens when a request is denied? how to propagate reasons, alternatives, etc.? level of details? What granularity of capacity allocations makes sense? From the application side, e.g. why not allocating full pipes, all time? What s good/acceptable/desired from the Networks side? Are request priorities desired (or necessary?) Should applications (PanDA, PhEDEx) be multi-domain aware? or, a black-box approach?

Ideas for WMS-Network Interface Possible use cases as evidenced at the LHCONE Point-to-point Workshop (all of them relevant to ANSE): Use network information for replica selection Aka data routing Use network information for task/job brokerage given the locations of the input data given the desired output destination Use provisioning to improve workflow through known network capacity, better ETA If a transfer between A and B does not work, try A C B? Need to know topology as well as status Point-to-multipoint replication?

Where to Attach? To be further investigated in ANSE later stage ANSE initial main thrust axis Can do now with DYNES/FDT and PhEDEx (CMS) first step in ANSE D. Bonacorsi, CMS, at LHCONE P2P Service Workshop, Dec. 2012

Relation to DYNES In brief, DYNES is an NSF funded project to deploy a cyberinstrument linking up to 50 US campuses through Internet2 dynamic circuit backbone and regional networks based on ION service, using OSCARS technology DYNES instrument can be viewed as a production-grade starter-kit comes with a disk server, inter-domain controller (server) and FDT installation FDT code includes OSCARS IDC API -> reserves bandwidth, and moves data through the created circuit Bandwidth on Demand, i.e. get it now or never routed GPN as fallback The DYNES system is naturally capable of advance reservation We need the right agent code inside CMS/ATLAS to call the API whenever transfers involve two DYNES sites

Components for a Working System (I) The DYNES Instrument See presentation by Shawn McKee last Tuesday

Fast Data Transfer (FDT) DYNES instrument includes a storage element, FDT as transfer application FDT is an open source Java application for efficient data transfers Easy to use: similar syntax with SCP, iperf/netperf Based on an asynchronous, multithreaded system Uses the New I/O (NIO) interface and is able to: stream continuously a list of files use independent threads to read and write on each physical device transfer data in parallel on multiple TCP streams, when necessary use appropriate size of buffers for disk IO and networking resume a file transfer session FDT uses IDC API to request dynamic circuit connections 17

DYNES/FDT/PhEDEx FDT integrates OSCARS IDC API to reserve network capacity for data transfers FDT has been integrated with PhEDEx at the level of download agent Basic functionality OK more work needed to understand performance issues with HDFS Interested sites are welcome to test With FDT deployed as part of DYNES, this makes one possible entry point for ANSE

Components for a working system (II) To be useful for the LHC community, it is mandatory to build on current and emerging standards deployed on global scale Jerry Sobieski, NORDUnet

Components for a working system (III) Monitoring: PerfSONAR and MonALISA All LHCOPN and many LHCONE sites have PerfSONAR deployed Goal is to have all LHCONE instrumented for PerfSONAR measurement Regularly scheduled tests between configured pairs of end-points: Latency (one way) Bandwidth Currently used to construct a dashboard Could provide input to algorithms developed in ANSE for PhEDEx and PanDA ALICE and CMS experiments are using MonALISA monitoring framework accurate bandwidth availability complete topology view

Summary The ANSE project will integrate advanced network services with the LHC Experiments SW stacks Through interfaces to Monitoring services (PerfSONAR-based, MonALISA) Bandwidth reservation systems (NSI,d IDCP) Working with PanDA system in ATLAS PhEDEx in CMS The goal is to make deterministic workflows possible

THANK YOU! QUESTIONS? Artur.Barczyk@cern.ch