PRP. Frank Würthwein (SDSC/UCSD) Jason Nielsen (UCSC) Owen Long (UCR) Chris West (UCSB) Anyes Taffard (UCI) Maria Spiropolu (Caltech)

Similar documents
Flying HTCondor at 100gbps Over the Golden State

ATLAS Experiment and GCE

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

UW-ATLAS Experiences with Condor

Data Transfers Between LHC Grid Sites Dorian Kcira

The ATLAS Distributed Analysis System

The creation of a Tier-1 Data Center for the ALICE experiment in the UNAM. Lukas Nellen ICN-UNAM

High Throughput WAN Data Transfer with Hadoop-based Storage

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

Compact Muon Solenoid: Cyberinfrastructure Solutions. Ken Bloom UNL Cyberinfrastructure Workshop -- August 15, 2005

Data Analysis in ATLAS. Graeme Stewart with thanks to Attila Krasznahorkay and Johannes Elmsheuser

Computing at the Large Hadron Collider. Frank Würthwein. Professor of Physics University of California San Diego November 15th, 2013

Software and computing evolution: the HL-LHC challenge. Simone Campana, CERN

PROOF-Condor integration for ATLAS

Virtualizing a Batch. University Grid Center

Towards Network Awareness in LHC Computing

Batch Services at CERN: Status and Future Evolution

LHC and LSST Use Cases

Stephen J. Gowdy (CERN) 12 th September 2012 XLDB Conference FINDING THE HIGGS IN THE HAYSTACK(S)

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science

The CMS Computing Model

Early experience with the Run 2 ATLAS analysis model

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

DESY at the LHC. Klaus Mőnig. On behalf of the ATLAS, CMS and the Grid/Tier2 communities

1. Introduction. Outline

One Pool To Rule Them All The CMS HTCondor/glideinWMS Global Pool. D. Mason for CMS Software & Computing

ATLAS Analysis Workshop Summary

The ATLAS Tier-3 in Geneva and the Trigger Development Facility

ANSE: Advanced Network Services for [LHC] Experiments

System upgrade and future perspective for the operation of Tokyo Tier2 center. T. Nakamura, T. Mashimo, N. Matsui, H. Sakamoto and I.

Next Generation Integrated Architecture SDN Ecosystem for LHC and Exascale Science. Harvey Newman, Caltech

Big Data Analytics and the LHC

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

Programmable Information Highway (with no Traffic Jams)

A Virtual Comet. HTCondor Week 2017 May Edgar Fajardo On behalf of OSG Software and Technology

Big Computing and the Mitchell Institute for Fundamental Physics and Astronomy. David Toback

Towards a Strategy for Data Sciences at UW

High-Energy Physics Data-Storage Challenges

ATLAS operations in the GridKa T1/T2 Cloud

ATLAS NOTE. December 4, ATLAS offline reconstruction timing improvements for run-2. The ATLAS Collaboration. Abstract

Storage on the Lunatic Fringe. Thomas M. Ruwart University of Minnesota Digital Technology Center Intelligent Storage Consortium

THOUGHTS ON SDN IN DATA INTENSIVE SCIENCE APPLICATIONS

Improving Network Infrastructure to Enable Large Scale Scientific Data Flows and Collaboration (Award # ) Klara Jelinkova Joseph Ghobrial

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Data oriented job submission scheme for the PHENIX user analysis in CCJ

Opportunities A Realistic Study of Costs Associated

HIGH ENERGY PHYSICS ON THE OSG. Brian Bockelman CCL Workshop, 2016

File Access Optimization with the Lustre Filesystem at Florida CMS T2

ATLAS 実験コンピューティングの現状と将来 - エクサバイトへの挑戦 坂本宏 東大 ICEPP

CMS Grid Computing at TAMU Performance, Monitoring and Current Status of the Brazos Cluster

CMS Computing Model with Focus on German Tier1 Activities

Scheduling Computational and Storage Resources on the NRP

Summary of the LHC Computing Review

Spanish Tier-2. Francisco Matorras (IFCA) Nicanor Colino (CIEMAT) F. Matorras N.Colino, Spain CMS T2,.6 March 2008"

On-demand provisioning of HEP compute resources on cloud sites and shared HPC centers

Austrian Federated WLCG Tier-2

Geant4 Computing Performance Benchmarking and Monitoring

PARALLEL PROCESSING OF LARGE DATA SETS IN PARTICLE PHYSICS

CERN openlab II. CERN openlab and. Sverre Jarp CERN openlab CTO 16 September 2008

ACCI Recommendations on Long Term Cyberinfrastructure Issues: Building Future Development

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

Overview of ATLAS PanDA Workload Management

Philippe Laurens, Michigan State University, for USATLAS. Atlas Great Lakes Tier 2 collocated at MSU and the University of Michigan

Improving Generators Interface to Support LHEF V3 Format

Enabling a SuperFacility with Software Defined Networking

Monitoring of Computing Resource Use of Active Software Releases at ATLAS

The Global Grid and the Local Analysis

Overview. About CERN 2 / 11

Magellan Project. Jeff Broughton NERSC Systems Department Head October 7, 2009

SPINOSO Vincenzo. Optimization of the job submission and data access in a LHC Tier2

Implementation of the Pacific Research Platform over Pacific Wave

NCP Computing Infrastructure & T2-PK-NCP Site Update. Saqib Haleem National Centre for Physics (NCP), Pakistan

Data handling and processing at the LHC experiments

Tackling tomorrow s computing challenges today at CERN. Maria Girone CERN openlab CTO

Prompt data reconstruction at the ATLAS experiment

The INFN Tier1. 1. INFN-CNAF, Italy

Constant monitoring of multi-site network connectivity at the Tokyo Tier2 center

IEPSAS-Kosice: experiences in running LCG site

Andrea Sciabà CERN, Switzerland

Reliability Engineering Analysis of ATLAS Data Reprocessing Campaigns

Overview and Introduction to Scientific Visualization. Texas Advanced Computing Center The University of Texas at Austin

Analisi Tier2 e Tier3 Esperienze ai Tier-2 Giacinto Donvito INFN-BARI

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

SOFTWARE-DEFINED NETWORKING WHAT IT IS, AND WHY IT MATTERS

CMS Tier-2 Program for user Analysis Computing on the Open Science Grid Frank Würthwein UCSD Goals & Status

Hall D and IT. at Internal Review of IT in the 12 GeV Era. Mark M. Ito. May 20, Hall D. Hall D and IT. M. Ito. Introduction.

Computing. DOE Program Review SLAC. Rainer Bartoldus. Breakout Session 3 June BaBar Deputy Computing Coordinator

The Run 2 ATLAS Analysis Event Data Model

Insight: that s for NSA Decision making: that s for Google, Facebook. so they find the best way to push out adds and products

Improving Packet Processing Performance of a Memory- Bounded Application

Installation of CMSSW in the Grid DESY Computing Seminar May 17th, 2010 Wolf Behrenhoff, Christoph Wissing

IRNC:RXP SDN / SDX Update

Analytics Platform for ATLAS Computing Services

Parallel Storage Systems for Large-Scale Machines

Computing at Belle II

Best Practices for Validating the Performance of Data Center Infrastructure. Henry He Ixia

International Cooperation in High Energy Physics. Barry Barish Caltech 30-Oct-06

BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE

Data services for LHC computing

Spark and HPC for High Energy Physics Data Analyses

Transcription:

LHC @ PRP Frank Würthwein (SDSC/UCSD) Jason Nielsen (UCSC) Owen Long (UCR) Chris West (UCSB) Anyes Taffard (UCI) Maria Spiropolu (Caltech) 10/14/15 PRP Workshop 1

ATLAS & CMS CMS ATLAS Collaborations span ~3000 scientists across ~200 institutions in ~40 countries. Experiments comprised of 100M electronic channels recording proton-proton collisions every 25ns. 10/14/15 PRP Workshop 2

The Path to Discovery Private Data Publication Detector Simulation Reconstruction Public Data Sets Public Data Sets Private Data Private Data Publication Publication centrally organized production of 10 s of PB of data per collaboration. All members have ~ equal access. Each group produces their own private data. More than one group may contribute to a paper. A group may use their private data to contribute to more than one publication. Private Data each group produces ~4-40TB per publication Publication ~1000 publications from Run 1 Data (2010-12) PRP makes public data accessible from home, and focuses on the last mile problem from private data to publication. 10/14/15 PRP Workshop 3

UW Seattle NERSC Compute Resource SLAC Data & Compute Resource UCSC UCD LHC Scientists across nine West Coast Universities to benefit from CSU Fresno Petascale Data & Compute Resources across PRP UCSB Caltech Data & Compute Resource UCI UCSD & SDSC Data & Compute Resources UCR 10/14/15 PRP Workshop 4

LHC @ West Coast LHC community may use five major data & compute resources in CA: SLAC, NERSC, Caltech, UCSD, SDSC aggregate Petabytes of disk space & Petaflops of compute power. LHC Scientists across West Coast want to transparently compute on data at home institutions & these five major centers to accelerate science from idea to discovery Uniform execution environment Xrootd Data Federations for ATLAS & CMS serving local disks outbound to remotely running jobs caching remote data inbound for locally running jobs HTCondor overflow of jobs from local cluster to major centers satisfy peak needs to accelerate path from idea to publication Collaboration of PRP, SDSC, and Open Science Grid PRP Builds on SDSC LHC@UC Project 10/14/15 5

The DTN we ship(ed) HTCondor system with 40 batch slots fully integrated into Campus Cluster, and 5 major centers. Login node for Researchers 12 x 4TB data disks Apps & Libs Cache Data Cache Origin Server All services remotely administered 6

Xrootd Data Federation ATLAS FAX Global XRootd Federation ATLAS@UC Redirector hardware@sdsc SDSC XRootd XRootd Data Server Xrootd local Redirector XRootd Data Cache Services on the new DTN hardware@uci XRootd XRootd Data Server XRootd Data Server XRootd Data Server Data Server Pre-existing infrastructure at UCI Data Cache Example UCI 10/14/15 PRP Workshop 7

OSG Compute Federation Other compute Resources OSG gwms hardware@sdsc ssh OSG SLURM @ Comet HTCondor batch system SLURM batch system Services on the new DTN hardware@uci Pre-existing infrastructure at UCI Example UCI 10/14/15 PRP Workshop 8

Jason Nielsen - UCSC LHC/CERN ATLAS 3-level trigger 20 MHz 60 khz 6 khz 500 Hz Trigger & DAQ 1001000011 raw data 100101011 10 PB/year ATLAS Data Flow (R. Reece / K. Cranmer) 3. The LHC and LHC ATLAS Worldwide Computing Grid Local resources 100k CPUs over 100 PB Monte Carlo production 78 Athena Framework Detector Simulation Generator generated MC simulated MC reconstructed ATLAS Computing QFT matrix element primary kinematics detector tracks, hits3.24: TODO clusters, jets Figure [296]. Ryan Reece (UCSC) 10/14/15Generator ROD Emulation HepMC ROD Input Particle Filter PRP Workshop plots/ tables Technical Design Report 20 Results! June 2005 GB-TB ntuple data/mc MCTruth (Gen) 4 Simulation MCTruth 9

Jason Nielsen - UCSC Ryan Reece (UCSC) Data Reduction ATLAS software Athena DerivationFramework QuickAna, SUSYTools, CxAOD,... (Py)ROOT wild-west condor athena reconstruction derivation/skim Tier-3 cluster or Grid? Tier-3 cluster desktop/laptop World-wide LHC Computing Grid CP tools event loop merge/scale visualize plots xaod DxAOD CxAOD hists.root hists.merged.root (R.Reece)' ~PB ~TB ~TB ~GB GB 10/14/15 PRP Workshop 10 10 19

Jason Nielsen - UCSC Preparing'for'Larger'LHC'Datasets' LHC'Run'2'(2015<2018):'5x'current'dataset,'at' roughly'double'the'energy'(8<>14'tev)' Unique'physics'opportuniLes'with'new'data' Measure'Higgs'boson'properLes' Search'for'rare'new'parLcle'producLon' (supersymmetry,'exolca)' Challenge'of'scaling'compuLng'access'to'allow' repeated'filtering'and'analysis'of'dataset' LHC'Run'4'(2025<'):'100x'current'dataset!' 10/14/15 PRP Workshop 11

UCR CMS Physics Searches for Supersymmetry (SUSY) Addresses big questions: dark matter, grand unification, stabilization of the Higgs mass. CM Energy 8 TeV -> 13 TeV means significant enhancement in sensitivity. Possible outcomes from analyzing run2 data: We find SUSY! No sign of SUSY. This won't kill it but it will make it less relevant. Searches for heavy Majorana neutrinos. Higgs physics: H->γγ, µµ, 4τ. Top quark physics: precision mass and cross section, rare processes (4t production). Owen Long, UCR 10/14/15 PRP Workshop 12

The UCR CMS T3 Cluster 512 computing cores total, half new, half old. Old Cores 256 (16 16-core boards) 2.4 GHz AMD Opteron 6136 32 GB RAM / node ~400 W / node New Cores 256 (8 32-core boards) 2.8 GHz AMD Opteron 6320 128 GB RAM / node ~1000 W / node 2 GridFTP servers connected to Science DMZ at 10 Gb/s. HDFS and NSF interconnects: 10 Gb/s, management 1 Gb/s. 240 TB raw HDFS disk ~30 TB other disk To be added: xrootd cache appliance. Owen Long, UCR 10/14/15 PRP Workshop 13

UCR CMS Analysis and PRP Analysis workflow in past (Run1). Submit 1000s of jobs running on reconstructed real and simulated data. Jobs run all over the world at various CMS computing s. Results trickle in to UCR T3. Bottleneck issues with file transfers. Often needed to do a few resubmissions to get last few %. Long tedious painful process Because previous step is so painful, output is large (everything you can think of wanting later) to avoid having to do it again and again and again. Further data reduction at UCR T3 eventually down to 10s of GB (laptop size). Current situation and Impact of PRP More compact analysis format ("miniaod") centrally produced. No need for giant analysis-specific nutples. Eliminates one significant intermediate step. If miniaod for important datasets stored on PRP network, expect vast improvements in speed and reliability. Access miniaod through PRP network xrootd servers. Very large pool of local computing resources in PRP network. Barrier for analysis iterations significantly lowered. Faster pace for innovation. Owen Long, UCR 10/14/15 PRP Workshop 14

Overview of UCSB CMS Tier-3 computing center CMS groups that use the computing resources at UCSB focus on SUSY searches, particularly for gluinos ~200 cores, ~200 TB disk 1 Gbps NICs on nodes in data center, some bonded to provide 2 Gbps 100 Gbps campus WAN connection via CENIC (recently upgraded from 10 Gbps) Usage: Run I primarily for processing bare ROOT ntuples generated at other s Run II creation of a smaller CMS data tier MINIAOD (~15-50 kb/event) makes it possible to run the same analysis on the CMS data itself Also used by LUX/LZ colleagues who generate LUX MC and occasionally transfer ~1 TB data samples from SLAC/UCD/Brown/SURF The small size of our makes our needs somewhat different from that of other institutions 10/14/15 PRP Workshop 15 Chris West October 15, 2015 Pacific Research Platform Workshop

Transfers to UCSB Two main types of transfers: Transfers of MINIAOD to process at UCSB The output of jobs run on MINIAOD at other s CMS data Processed data Frequency When CMS taking data Irregular Rate ~30-60 MB/s up to 1 Gbps Tool Data received from PhEDEx (srm-cp/gridftp) Mainly US s CRAB3/FTS (gfal-cp/gridftp) Wherever data is located/ processed 10/14/15 PRP Workshop 16 Chris West October 15, 2015 Pacific Research Platform Workshop

Wish list Minimal maintenance Manpower is an important limitation, particularly for a small Performance optimizations sometimes not worth the effort We are not currently connected to the LHC-ONE network due to additional work needed to guarantee that only LHC data travels across this network would be nice to have the PRP simplify the connection to LHC- ONE Improved performance in transfers from distant nodes Ability to use resources (disk, and particularly CPU) at UCSD semi-transparently All CMS groups at UCSB also use resources at UCSD Example use case: compute-intensive jobs (such as systematics computations) on data stored at UCSB Will have a node dedicated exclusively to CMS connections to UCSD (thanks to Frank W., et. al.) but we expect LUX/LZ needs to grow and PRP will be important for that connectivity 10/14/15 PRP Workshop 17 Chris West October 15, 2015 Pacific Research Platform Workshop

How UCI Works UCI is active in searches for Supersymmetry (SUSY) at ATLAS Modus Operandi - Develop lightweight analysis framework and custom data format ( SusyNtuple ) Process the ATLAS-wide data format ( xaod ) to produce SusyNtuple Download output data to local T3 Develop analysis and search for new physics - Typical submission: ~O(1000) jobs - Failures of submission not uncommon (fault on side of the grid s) - Download can take ~days (unresponsive grid s/unreliable grid-ware used to process the downloads) Constant babysitting of submission & download status Several institutes involved in the ATLAS-SUSY group use the UCI analysis framework and rely on a smooth operation and production turn-over full dataset in xaod: O(10 TB) xaod grid Grid Submit full dataset in SusyNtuple: O(50 Gb) z z z z SusyNtuple z T3 @ UCI 10/14/15 PRP Workshop 18 Anyes Taffard & Daniel Antrim (UC Irvine) grid

Experience so Far Brick installed at UCI T3 - Bottlenecks in setting up a complete workarea are promptly fixed and addressed thanks to support team at UCSD (thanks Edgar and Jeff!) Tested Condor + XRootD (FAX) jobs using the cached datasets - Painful download step essentially removed from user s POV - Output datasets run on the grid are registered to FAX automatically We can simultaneously begin analysis and caching of datasets in a time span that is shorter than the time needed to download the same datasets locally - Ability to distribute compute power over many s removes bottleneck issues of our T3 s queue system Can easily run CPU/IO intensive Monte Carlo simulation processes simultaneous to processing large analysis n-tuples 10/14/15 PRP Workshop 19 3 xaod grid Grid Submit z z z z SusyNtuple z cache ability to use cached datasets in addition to distributing cluster/batch resources looks to already be a game changer for our typical operations

On Going Tests Tests to ~remove user interaction with the grid underway - Cache and process ATLAS-wide datasets typically processed on the grid using Condor and cache the output datasets for easy access later on All steps of our data-processing will be more directly under our control Potential to avoid the layers of obfuscation and grid-management that can disrupt smooth data flow - Less downtime between when new data from ATLAS becomes available and when we can access it - More time for thinking about and can be done in background xaod cache Condor Sub local register to grid z z z z SusyNtuple z cache local register output files to a to be doing physics! accessed via FAX later on 10/14/15 PRP Workshop 20 4 local = in the user s work area on the brick

Maria Spiropolu (Caltech) 126 GEV HIGGS AND OTHER PUZZLES E SM is the Higgs the SM one? are there more? where is SUSY? without SUSY we don t understand how the Higgs boson can exist without violating basic mechanisms of quantum physics; is the Higgs connected with neutrinos? dark matter? dark energy? MORE DATA FROM MANY SOURCES (PARTICLE, ASTRO, COSMO) WILL GUIDE US SUSY Compo H 50 126 200 Oct 14 2015, PRP Big Data Freeway, smaria@caltech.edu 10/14/15 PRP Workshop 21 1

Maria Spiropolu (Caltech) DATA HYPERLOOPS The largest data- & network-intensive programs (LHC and HL LHC, LSST, DESI, LCLS II, Joint Genome Institute etc) face unprecedented challenges in!global data distribution,!processing,!access,!analysis,!coordinated use of CPU,!storage and!network resources. High-performance networking is a key enabling technology for this research: global science collaborations depend on fast and reliable data transfers and access on regional, national and international scales Total traffic handled in Petabytes per Month Projected Traffic Reaches 1 Exabyte Per Month. by ~2020 10 EB/Mo. by ~2024 Rate of increase follows or exceeds Historical trend of 10X per 4 Years HEP traffic will compete with BES, BER and ASCR Exascale CSN Ecosystems, great opportunity for HEP (eg CMS CPU needs will grow by 65-200X by HL-LHC) Oct 14 2015, PRP Big Data Freeway, smaria@caltech.edu 10/14/15 PRP Workshop 22 2

Maria Spiropolu (Caltech) INTELLIGENT CFN SYSTEMS Allocate guaranteed bandwidth to high priority flows (Dynamic Circuit Networking: ESnet/FNAL, Internet2 ) Point-to-Point circuits across the LHCONE multi-domain fabric Deeply programmable, agile software-defined network (SDN) infrastructures are emerging as multi-service multi-domain network operating systems interconnecting science teams across regional, national and global distances, Worldwide distributed systems developed by the data intensive science programs, harnessing global workflow, scheduling and data management systems they have developed, which are enabled by distributed operations and security infrastructures riding on high capacity (but still-passive) networks New Computing Models: network aware data operations, strategic data distribution/placement/managent via dynamic network provisioning (more on H. Newman s presentation on Fri) Oct 14 2015, PRP Big Data Freeway, smaria@caltech.edu 3 10/14/15 PRP Workshop 23

Size of data & frequency of transfers caching of experiment data is local CPU power limited, and ad hoc. 5Gbps probably plenty enough initially (see caching benchmark). serving out is remote CPU power limited, and ad hoc. 10Gbps probably enough to feed 1-2k remote CPUs (see read-only benchmark). More needed later, most likely. Data is exchanged within PRP and with LHCOne LHCOne is the most important connectivity external to PRP. Tools used: Xrootd, HTCondor, gridftp Speed achieved: 10Gbps read-only, 5Gbps caching (see benchmarking) What is screwed up? Answers to Q s don t know yet. Have exercised the infrastructure on LAN, but not yet sufficiently on WAN. concerned about CPU elasticity: can we grow fast enough to have a serious impact? Are there enough CPU resources on PRP to scale out? concerned about infrastructure operational cost. Lacking experience! What are the failure modes? How do we monitor against failure? How much human intervention is required to debug and fix stuff when it breaks? How do we deal with effort limited operations? How stable is the infrastructure against abuse ( = unexpected loads see also next point!!!) concerned about detailed IO performance requirements. Lacking experience! 10/14/15 PRP Workshop 24

Benchmarking 10/14/15 PRP Workshop 25

Read-only 1000 to 2000 simultaneous clients Aggregate peaks at 90-100% of the available 10Gbps. Xrootd Data Server use case: Many clients read small amounts of data at a time because IO per client is limited by CPU available. Recall, all apps are single threaded! This test was run with synthetic workflows simulating realistic read patterns in LAN environment before the server was shipped. 10/14/15 PRP Workshop 26

Caching behavior Synthetic load to simulate typical cache use case: 200 jobs read 2.4MB every ~ 10 seconds. cache loads up in parallel to reads all files requested are cached no more writes while reads continue Write performance at ~ 5Gbps in parallel with reads. Reads almost unaffected by caching & writes. (ignore spikes at 20:30 additional unrelated tests active at that time.) 10/14/15 PRP Workshop 27