PRP. Frank Würthwein (SDSC/UCSD) Jason Nielsen (UCSC) Owen Long (UCR) Chris West (UCSB) Anyes Taffard (UCI) Maria Spiropolu (Caltech)

LHC @ PRP Frank Würthwein (SDSC/UCSD) Jason Nielsen (UCSC) Owen Long (UCR) Chris West (UCSB) Anyes Taffard (UCI) Maria Spiropolu (Caltech) 10/14/15 PRP Workshop 1

ATLAS & CMS CMS ATLAS Collaborations span ~3000 scientists across ~200 institutions in ~40 countries. Experiments comprised of 100M electronic channels recording proton-proton collisions every 25ns. 10/14/15 PRP Workshop 2

The Path to Discovery Private Data Publication Detector Simulation Reconstruction Public Data Sets Public Data Sets Private Data Private Data Publication Publication centrally organized production of 10 s of PB of data per collaboration. All members have ~ equal access. Each group produces their own private data. More than one group may contribute to a paper. A group may use their private data to contribute to more than one publication. Private Data each group produces ~4-40TB per publication Publication ~1000 publications from Run 1 Data (2010-12) PRP makes public data accessible from home, and focuses on the last mile problem from private data to publication. 10/14/15 PRP Workshop 3

UW Seattle NERSC Compute Resource SLAC Data & Compute Resource UCSC UCD LHC Scientists across nine West Coast Universities to benefit from CSU Fresno Petascale Data & Compute Resources across PRP UCSB Caltech Data & Compute Resource UCI UCSD & SDSC Data & Compute Resources UCR 10/14/15 PRP Workshop 4

LHC @ West Coast LHC community may use five major data & compute resources in CA: SLAC, NERSC, Caltech, UCSD, SDSC aggregate Petabytes of disk space & Petaflops of compute power. LHC Scientists across West Coast want to transparently compute on data at home institutions & these five major centers to accelerate science from idea to discovery Uniform execution environment Xrootd Data Federations for ATLAS & CMS serving local disks outbound to remotely running jobs caching remote data inbound for locally running jobs HTCondor overflow of jobs from local cluster to major centers satisfy peak needs to accelerate path from idea to publication Collaboration of PRP, SDSC, and Open Science Grid PRP Builds on SDSC LHC@UC Project 10/14/15 5

The DTN we ship(ed) HTCondor system with 40 batch slots fully integrated into Campus Cluster, and 5 major centers. Login node for Researchers 12 x 4TB data disks Apps & Libs Cache Data Cache Origin Server All services remotely administered 6

Xrootd Data Federation ATLAS FAX Global XRootd Federation ATLAS@UC Redirector hardware@sdsc SDSC XRootd XRootd Data Server Xrootd local Redirector XRootd Data Cache Services on the new DTN hardware@uci XRootd XRootd Data Server XRootd Data Server XRootd Data Server Data Server Pre-existing infrastructure at UCI Data Cache Example UCI 10/14/15 PRP Workshop 7

OSG Compute Federation Other compute Resources OSG gwms hardware@sdsc ssh OSG SLURM @ Comet HTCondor batch system SLURM batch system Services on the new DTN hardware@uci Pre-existing infrastructure at UCI Example UCI 10/14/15 PRP Workshop 8

Jason Nielsen - UCSC LHC/CERN ATLAS 3-level trigger 20 MHz 60 khz 6 khz 500 Hz Trigger & DAQ 1001000011 raw data 100101011 10 PB/year ATLAS Data Flow (R. Reece / K. Cranmer) 3. The LHC and LHC ATLAS Worldwide Computing Grid Local resources 100k CPUs over 100 PB Monte Carlo production 78 Athena Framework Detector Simulation Generator generated MC simulated MC reconstructed ATLAS Computing QFT matrix element primary kinematics detector tracks, hits3.24: TODO clusters, jets Figure [296]. Ryan Reece (UCSC) 10/14/15Generator ROD Emulation HepMC ROD Input Particle Filter PRP Workshop plots/ tables Technical Design Report 20 Results! June 2005 GB-TB ntuple data/mc MCTruth (Gen) 4 Simulation MCTruth 9

Jason Nielsen - UCSC Ryan Reece (UCSC) Data Reduction ATLAS software Athena DerivationFramework QuickAna, SUSYTools, CxAOD,... (Py)ROOT wild-west condor athena reconstruction derivation/skim Tier-3 cluster or Grid? Tier-3 cluster desktop/laptop World-wide LHC Computing Grid CP tools event loop merge/scale visualize plots xaod DxAOD CxAOD hists.root hists.merged.root (R.Reece)' ~PB ~TB ~TB ~GB GB 10/14/15 PRP Workshop 10 10 19

Jason Nielsen - UCSC Preparing'for'Larger'LHC'Datasets' LHC'Run'2'(2015<2018):'5x'current'dataset,'at' roughly'double'the'energy'(8<>14'tev)' Unique'physics'opportuniLes'with'new'data' Measure'Higgs'boson'properLes' Search'for'rare'new'parLcle'producLon' (supersymmetry,'exolca)' Challenge'of'scaling'compuLng'access'to'allow' repeated'filtering'and'analysis'of'dataset' LHC'Run'4'(2025<'):'100x'current'dataset!' 10/14/15 PRP Workshop 11

UCR CMS Physics Searches for Supersymmetry (SUSY) Addresses big questions: dark matter, grand unification, stabilization of the Higgs mass. CM Energy 8 TeV -> 13 TeV means significant enhancement in sensitivity. Possible outcomes from analyzing run2 data: We find SUSY! No sign of SUSY. This won't kill it but it will make it less relevant. Searches for heavy Majorana neutrinos. Higgs physics: H->γγ, µµ, 4τ. Top quark physics: precision mass and cross section, rare processes (4t production). Owen Long, UCR 10/14/15 PRP Workshop 12

The UCR CMS T3 Cluster 512 computing cores total, half new, half old. Old Cores 256 (16 16-core boards) 2.4 GHz AMD Opteron 6136 32 GB RAM / node ~400 W / node New Cores 256 (8 32-core boards) 2.8 GHz AMD Opteron 6320 128 GB RAM / node ~1000 W / node 2 GridFTP servers connected to Science DMZ at 10 Gb/s. HDFS and NSF interconnects: 10 Gb/s, management 1 Gb/s. 240 TB raw HDFS disk ~30 TB other disk To be added: xrootd cache appliance. Owen Long, UCR 10/14/15 PRP Workshop 13

UCR CMS Analysis and PRP Analysis workflow in past (Run1). Submit 1000s of jobs running on reconstructed real and simulated data. Jobs run all over the world at various CMS computing s. Results trickle in to UCR T3. Bottleneck issues with file transfers. Often needed to do a few resubmissions to get last few %. Long tedious painful process Because previous step is so painful, output is large (everything you can think of wanting later) to avoid having to do it again and again and again. Further data reduction at UCR T3 eventually down to 10s of GB (laptop size). Current situation and Impact of PRP More compact analysis format ("miniaod") centrally produced. No need for giant analysis-specific nutples. Eliminates one significant intermediate step. If miniaod for important datasets stored on PRP network, expect vast improvements in speed and reliability. Access miniaod through PRP network xrootd servers. Very large pool of local computing resources in PRP network. Barrier for analysis iterations significantly lowered. Faster pace for innovation. Owen Long, UCR 10/14/15 PRP Workshop 14

Overview of UCSB CMS Tier-3 computing center CMS groups that use the computing resources at UCSB focus on SUSY searches, particularly for gluinos ~200 cores, ~200 TB disk 1 Gbps NICs on nodes in data center, some bonded to provide 2 Gbps 100 Gbps campus WAN connection via CENIC (recently upgraded from 10 Gbps) Usage: Run I primarily for processing bare ROOT ntuples generated at other s Run II creation of a smaller CMS data tier MINIAOD (~15-50 kb/event) makes it possible to run the same analysis on the CMS data itself Also used by LUX/LZ colleagues who generate LUX MC and occasionally transfer ~1 TB data samples from SLAC/UCD/Brown/SURF The small size of our makes our needs somewhat different from that of other institutions 10/14/15 PRP Workshop 15 Chris West October 15, 2015 Pacific Research Platform Workshop

Transfers to UCSB Two main types of transfers: Transfers of MINIAOD to process at UCSB The output of jobs run on MINIAOD at other s CMS data Processed data Frequency When CMS taking data Irregular Rate ~30-60 MB/s up to 1 Gbps Tool Data received from PhEDEx (srm-cp/gridftp) Mainly US s CRAB3/FTS (gfal-cp/gridftp) Wherever data is located/ processed 10/14/15 PRP Workshop 16 Chris West October 15, 2015 Pacific Research Platform Workshop

Wish list Minimal maintenance Manpower is an important limitation, particularly for a small Performance optimizations sometimes not worth the effort We are not currently connected to the LHC-ONE network due to additional work needed to guarantee that only LHC data travels across this network would be nice to have the PRP simplify the connection to LHC- ONE Improved performance in transfers from distant nodes Ability to use resources (disk, and particularly CPU) at UCSD semi-transparently All CMS groups at UCSB also use resources at UCSD Example use case: compute-intensive jobs (such as systematics computations) on data stored at UCSB Will have a node dedicated exclusively to CMS connections to UCSD (thanks to Frank W., et. al.) but we expect LUX/LZ needs to grow and PRP will be important for that connectivity 10/14/15 PRP Workshop 17 Chris West October 15, 2015 Pacific Research Platform Workshop

How UCI Works UCI is active in searches for Supersymmetry (SUSY) at ATLAS Modus Operandi - Develop lightweight analysis framework and custom data format ( SusyNtuple ) Process the ATLAS-wide data format ( xaod ) to produce SusyNtuple Download output data to local T3 Develop analysis and search for new physics - Typical submission: ~O(1000) jobs - Failures of submission not uncommon (fault on side of the grid s) - Download can take ~days (unresponsive grid s/unreliable grid-ware used to process the downloads) Constant babysitting of submission & download status Several institutes involved in the ATLAS-SUSY group use the UCI analysis framework and rely on a smooth operation and production turn-over full dataset in xaod: O(10 TB) xaod grid Grid Submit full dataset in SusyNtuple: O(50 Gb) z z z z SusyNtuple z T3 @ UCI 10/14/15 PRP Workshop 18 Anyes Taffard & Daniel Antrim (UC Irvine) grid

Experience so Far Brick installed at UCI T3 - Bottlenecks in setting up a complete workarea are promptly fixed and addressed thanks to support team at UCSD (thanks Edgar and Jeff!) Tested Condor + XRootD (FAX) jobs using the cached datasets - Painful download step essentially removed from user s POV - Output datasets run on the grid are registered to FAX automatically We can simultaneously begin analysis and caching of datasets in a time span that is shorter than the time needed to download the same datasets locally - Ability to distribute compute power over many s removes bottleneck issues of our T3 s queue system Can easily run CPU/IO intensive Monte Carlo simulation processes simultaneous to processing large analysis n-tuples 10/14/15 PRP Workshop 19 3 xaod grid Grid Submit z z z z SusyNtuple z cache ability to use cached datasets in addition to distributing cluster/batch resources looks to already be a game changer for our typical operations

On Going Tests Tests to ~remove user interaction with the grid underway - Cache and process ATLAS-wide datasets typically processed on the grid using Condor and cache the output datasets for easy access later on All steps of our data-processing will be more directly under our control Potential to avoid the layers of obfuscation and grid-management that can disrupt smooth data flow - Less downtime between when new data from ATLAS becomes available and when we can access it - More time for thinking about and can be done in background xaod cache Condor Sub local register to grid z z z z SusyNtuple z cache local register output files to a to be doing physics! accessed via FAX later on 10/14/15 PRP Workshop 20 4 local = in the user s work area on the brick

Maria Spiropolu (Caltech) 126 GEV HIGGS AND OTHER PUZZLES E SM is the Higgs the SM one? are there more? where is SUSY? without SUSY we don t understand how the Higgs boson can exist without violating basic mechanisms of quantum physics; is the Higgs connected with neutrinos? dark matter? dark energy? MORE DATA FROM MANY SOURCES (PARTICLE, ASTRO, COSMO) WILL GUIDE US SUSY Compo H 50 126 200 Oct 14 2015, PRP Big Data Freeway, smaria@caltech.edu 10/14/15 PRP Workshop 21 1

Maria Spiropolu (Caltech) DATA HYPERLOOPS The largest data- & network-intensive programs (LHC and HL LHC, LSST, DESI, LCLS II, Joint Genome Institute etc) face unprecedented challenges in!global data distribution,!processing,!access,!analysis,!coordinated use of CPU,!storage and!network resources. High-performance networking is a key enabling technology for this research: global science collaborations depend on fast and reliable data transfers and access on regional, national and international scales Total traffic handled in Petabytes per Month Projected Traffic Reaches 1 Exabyte Per Month. by ~2020 10 EB/Mo. by ~2024 Rate of increase follows or exceeds Historical trend of 10X per 4 Years HEP traffic will compete with BES, BER and ASCR Exascale CSN Ecosystems, great opportunity for HEP (eg CMS CPU needs will grow by 65-200X by HL-LHC) Oct 14 2015, PRP Big Data Freeway, smaria@caltech.edu 10/14/15 PRP Workshop 22 2

Maria Spiropolu (Caltech) INTELLIGENT CFN SYSTEMS Allocate guaranteed bandwidth to high priority flows (Dynamic Circuit Networking: ESnet/FNAL, Internet2 ) Point-to-Point circuits across the LHCONE multi-domain fabric Deeply programmable, agile software-defined network (SDN) infrastructures are emerging as multi-service multi-domain network operating systems interconnecting science teams across regional, national and global distances, Worldwide distributed systems developed by the data intensive science programs, harnessing global workflow, scheduling and data management systems they have developed, which are enabled by distributed operations and security infrastructures riding on high capacity (but still-passive) networks New Computing Models: network aware data operations, strategic data distribution/placement/managent via dynamic network provisioning (more on H. Newman s presentation on Fri) Oct 14 2015, PRP Big Data Freeway, smaria@caltech.edu 3 10/14/15 PRP Workshop 23

Size of data & frequency of transfers caching of experiment data is local CPU power limited, and ad hoc. 5Gbps probably plenty enough initially (see caching benchmark). serving out is remote CPU power limited, and ad hoc. 10Gbps probably enough to feed 1-2k remote CPUs (see read-only benchmark). More needed later, most likely. Data is exchanged within PRP and with LHCOne LHCOne is the most important connectivity external to PRP. Tools used: Xrootd, HTCondor, gridftp Speed achieved: 10Gbps read-only, 5Gbps caching (see benchmarking) What is screwed up? Answers to Q s don t know yet. Have exercised the infrastructure on LAN, but not yet sufficiently on WAN. concerned about CPU elasticity: can we grow fast enough to have a serious impact? Are there enough CPU resources on PRP to scale out? concerned about infrastructure operational cost. Lacking experience! What are the failure modes? How do we monitor against failure? How much human intervention is required to debug and fix stuff when it breaks? How do we deal with effort limited operations? How stable is the infrastructure against abuse ( = unexpected loads see also next point!!!) concerned about detailed IO performance requirements. Lacking experience! 10/14/15 PRP Workshop 24

Benchmarking 10/14/15 PRP Workshop 25

Read-only 1000 to 2000 simultaneous clients Aggregate peaks at 90-100% of the available 10Gbps. Xrootd Data Server use case: Many clients read small amounts of data at a time because IO per client is limited by CPU available. Recall, all apps are single threaded! This test was run with synthetic workflows simulating realistic read patterns in LAN environment before the server was shipped. 10/14/15 PRP Workshop 26

Caching behavior Synthetic load to simulate typical cache use case: 200 jobs read 2.4MB every ~ 10 seconds. cache loads up in parallel to reads all files requested are cached no more writes while reads continue Write performance at ~ 5Gbps in parallel with reads. Reads almost unaffected by caching & writes. (ignore spikes at 20:30 additional unrelated tests active at that time.) 10/14/15 PRP Workshop 27