Computing at the Large Hadron Collider. Frank Würthwein. Professor of Physics University of California San Diego November 15th, 2013

Similar documents
Big Data Analytics and the LHC

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

Data Transfers Between LHC Grid Sites Dorian Kcira

The CMS Computing Model

ATLAS Experiment and GCE

CERN openlab II. CERN openlab and. Sverre Jarp CERN openlab CTO 16 September 2008

The creation of a Tier-1 Data Center for the ALICE experiment in the UNAM. Lukas Nellen ICN-UNAM

Stephen J. Gowdy (CERN) 12 th September 2012 XLDB Conference FINDING THE HIGGS IN THE HAYSTACK(S)

CERN s Business Computing

Virtualizing a Batch. University Grid Center

CSCS CERN videoconference CFD applications

Overview. About CERN 2 / 11

Storage and I/O requirements of the LHC experiments

IEPSAS-Kosice: experiences in running LCG site

Grid Computing at the IIHE

1. Introduction. Outline

Grid Computing: dealing with GB/s dataflows

CouchDB-based system for data management in a Grid environment Implementation and Experience

N. Marusov, I. Semenov

Grid Computing a new tool for science

PROOF-Condor integration for ATLAS

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

Storage Resource Sharing with CASTOR.

Federated data storage system prototype for LHC experiments and data intensive science

Tracking and flavour tagging selection in the ATLAS High Level Trigger

Big Computing and the Mitchell Institute for Fundamental Physics and Astronomy. David Toback

Philippe Laurens, Michigan State University, for USATLAS. Atlas Great Lakes Tier 2 collocated at MSU and the University of Michigan

CMS Computing Model with Focus on German Tier1 Activities

Evaluation of the computing resources required for a Nordic research exploitation of the LHC

Experience of the WLCG data management system from the first two years of the LHC data taking

The LHC Computing Grid

Software and computing evolution: the HL-LHC challenge. Simone Campana, CERN

Andrea Sciabà CERN, Switzerland

Batch Services at CERN: Status and Future Evolution

HIGH ENERGY PHYSICS ON THE OSG. Brian Bockelman CCL Workshop, 2016

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

Summary of the LHC Computing Review

b-jet identification at High Level Trigger in CMS

Tackling tomorrow s computing challenges today at CERN. Maria Girone CERN openlab CTO

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors

ATLAS NOTE. December 4, ATLAS offline reconstruction timing improvements for run-2. The ATLAS Collaboration. Abstract

arxiv: v1 [physics.ins-det] 1 Oct 2009

A New Segment Building Algorithm for the Cathode Strip Chambers in the CMS Experiment

First LHCb measurement with data from the LHC Run 2

Travelling securely on the Grid to the origin of the Universe

High-Energy Physics Data-Storage Challenges

Grid Computing Activities at KIT

Data handling and processing at the LHC experiments

UW-ATLAS Experiences with Condor

Grid Computing: dealing with GB/s dataflows

New strategies of the LHC experiments to meet the computing requirements of the HL-LHC era

Reliability Engineering Analysis of ATLAS Data Reprocessing Campaigns

Distributed e-infrastructures for data intensive science

CERN and Scientific Computing

LHCb Computing Resources: 2018 requests and preview of 2019 requests

and the GridKa mass storage system Jos van Wezel / GridKa

Direct photon measurements in ALICE. Alexis Mas for the ALICE collaboration

Online data storage service strategy for the CERN computer Centre G. Cancio, D. Duellmann, M. Lamanna, A. Pace CERN, Geneva, Switzerland

The LHC Computing Grid. Slides mostly by: Dr Ian Bird LCG Project Leader 18 March 2008

RADU POPESCU IMPROVING THE WRITE SCALABILITY OF THE CERNVM FILE SYSTEM WITH ERLANG/OTP

The Global Grid and the Local Analysis

Preparing for High-Luminosity LHC. Bob Jones CERN Bob.Jones <at> cern.ch

Study of the Higgs boson coupling to the top quark and of the b jet identification with the ATLAS experiment at the Large Hadron Collider.

Compact Muon Solenoid: Cyberinfrastructure Solutions. Ken Bloom UNL Cyberinfrastructure Workshop -- August 15, 2005

e-science for High-Energy Physics in Korea

Precision Timing in High Pile-Up and Time-Based Vertex Reconstruction

Data Handling for LHC: Plans and Reality

The LHC computing model and its evolution. Dr Bob Jones CERN

Data Management for the World s Largest Machine

arxiv: v1 [cs.dc] 20 Jul 2015

CMS Conference Report

The Grid: Processing the Data from the World s Largest Scientific Machine

The LHC Computing Grid

Prompt data reconstruction at the ATLAS experiment

CC-IN2P3: A High Performance Data Center for Research

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

ATLAS 実験コンピューティングの現状と将来 - エクサバイトへの挑戦 坂本宏 東大 ICEPP

Update of the Computing Models of the WLCG and the LHC Experiments

ATLAS distributed computing: experience and evolution

Scientific Computing on Emerging Infrastructures. using HTCondor

From Internet Data Centers to Data Centers in the Cloud

One Pool To Rule Them All The CMS HTCondor/glideinWMS Global Pool. D. Mason for CMS Software & Computing

Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands

Using the In-Memory Columnar Store to Perform Real-Time Analysis of CERN Data. Maaike Limper Emil Pilecki Manuel Martín Márquez

Data Reconstruction in Modern Particle Physics

Early experience with the Run 2 ATLAS analysis model

Visita delegazione ditte italiane

WLCG Transfers Dashboard: a Unified Monitoring Tool for Heterogeneous Data Transfers.

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science

Status of KISTI Tier2 Center for ALICE

Monitoring of Computing Resource Use of Active Software Releases at ATLAS

Università degli Studi di Ferrara

ISTITUTO NAZIONALE DI FISICA NUCLEARE

The National Analysis DESY

Electron and Photon Reconstruction and Identification with the ATLAS Detector

PoS(High-pT physics09)036

Towards Network Awareness in LHC Computing

CMS Tier-2 Program for user Analysis Computing on the Open Science Grid Frank Würthwein UCSD Goals & Status

Transcription:

Computing at the Large Hadron Collider Frank Würthwein Professor of Physics of California San Diego November 15th, 2013

Outline The Science Software & Computing Challenges Present Solutions Future Solutions 2

The Science

The Universe is a strange place! ~68% of energy is dark energy We got no clue what this is. ~27% of matter is dark matter We have some ideas but no proof of what this is! All of what we know makes up Only about 5% of the universe. 4

Mont Blanc Lake Geneva LHCb To study Dark ATLAS CMS Matter we need to create it in the ALICE laboratory 5

Mont Blanc Lake Geneva Four experimental collaborations: ATLAS, CMS, LHCb, ALICE LHCb To study Dark ATLAS CMS Matter we need to create it in the ALICE laboratory 6

The Large Hadron Collider (LHC) 27 km in circumference Colliding protons on protons at energies of 7,8,13,14TeV 2808 bunches colliding every 25ns with ~115 billion protons per bunch Beam size ~30cm in Z and ~30micron transverse Proton = composite object => Gluons & quarks collide => p Z of Restframe unknown

Big bang in the laboratory We gain insight by colliding particles at the highest energies possible to measure: Production rates Masses & lifetimes Decay rates From this we derive the spectroscopy as well as the dynamics of elementary particles. Progress is made by going to higher energies and brighter beams. 9

Explore Nature over ~13 Orders of magnitude Perfect agreement between Theory & Experiment pb GeV/c σ dy 2 T d dp 13 10 11 10 9 10 7 10 5 10 pp s = 8 TeV CMS Preliminary -1 open: L = 5.8 pb int filled: L int = 10.71 fb (low PU runs) -1 (high PU runs) NNPDF 2.1 NLO NP 3 10-1 10-3 10 10 10-5 21 5 0.0 < y < 0.5 ( 10 ) 4 0.5 < y < 1.0 ( 10 ) 3 1.0 < y < 1.5 ( 10 ) 2 1.5 < y < 2.0 ( 10 ) 1 2.0 < y < 2.5 ( 10 ) 0 2.5 < y < 3.0 ( 10 ) Dark Matter expected -1 3.2 < y < 4.7 ( 10 ) somewhere below this line. 30 40 100 200 1000 2000 Jet p [GeV/c] T 10

And for the Sci-Fi Buffs Imagine our 3D world to be confined to a 3D surface in a 4D universe. Imagine this surface to be curved such that the 4 th D distance is short for locations light years away in 3D. Imagine space travel by tunneling through the 4 th D. The LHC is searching for evidence of a 4 th dimension of space. 11

Recap so far During 2010-12 data taking period, the beams crossed in the ATLAS and CMS detectors at a rate of 20MHz Each crossing contained ~10 collisions We were looking for rare events that are expected to occur in roughly 1/10000000000000 collisions, or less. We are now starting our 2 nd running period (2015-18) with crossing beams at 40MHz and ~30 collisions per crossing. 12

Software & Computing Challenges

The CMS Experiment

The CMS Experiment 80 Million electronic channels x 4 bytes x 40MHz ----------------------- ~ 10 Petabytes/sec of information x 1/1000 zero-suppression x 1/100,000 online event filtering ------------------------ ~ 100-1000 Megabytes/sec raw data to tape 1 to 10 Petabytes of raw data per year written to tape, not counting simulations. 2000 Scientists (1200 Ph.D. in physics) ~ 180 Institutions ~ 40 countries person: 12,500 tons, 21m long, 16m diameter 15

Example of an interesting Event Higgs to γγ candidate 16

Zoomed in R-Z view of a busy event Yellow dots indicate individual collisions, all during the same beam crossing. 17

Active Scientists in CMS 5-40% of the scientific members are actively doing large scale data analysis in any given week. ~1/4 of the collaboration, scientists and engineers, contributed to the common source code of ~3.6M C++ SLOC. 18

Evolution of LHC Science Program L int ~75-100 fb -1 150Hz 1000Hz 10000Hz Event Rate written to tape 19

The Challenge How do we organize the processing of 10 s to 1000 s of Petabytes of data by a globally distributed community of scientists, and do so with manageable change costs for ~ 35 years (2005-2040)? Guiding Principles for Solutions Chose technical solutions that allow computing resources as distributed as human resources. Support distributed ownership and control, within a global single sign-on security context. Design for heterogeneity and adaptability. 20

Present Solutions

Principles of Collaboration Each collaboration shares responsibility for: construction & maintenance of the detector creation of all primary data samples review of all results prior to making them public in any way shape or form Each collaboration fiercely competes within for any physics results derived from the primary data samples. 22

Competition & Computing In addition to shared resources, groups within also have private resources and produce private data that needs to be stored & analyzed in groups ranging from individual scientists to groups of 10 s of collaborators. ~ 1000 published papers so far were typically based on 4-400TB of private data per paper. 23

Software Performance Production of the shared data is heavily CPU limited dominated by pattern recognition likely to benefit from Many-core architectures in the future. Production of private data still often CPU limited public data optimized for size and versatility, thus requiring unpacking & non-negligible computations Analysis of private data tends to be IO limited mostly fast filters to make plots and tables Wide range of application requirements! 24

Federation of National Infrastructures. In the U.S.A.: Open Science Grid 25

Among the top 500 supercomputers there are only two that are bigger when measured by power consumption. 26

Tier-3 Centers Locally controlled resources not pledged to any of the 4 collaborations. Large clusters at major research Universities that are time shared. Small clusters inside departments and individual research groups. Requires global sign-on system to be open for dynamically adding resources. Easy to support APIs Easy to work around unsupported APIs 27

Me -- My friends -- The grid/cloud Domain science specific Common to all sciences and industry O(10 4 ) Users Me Thin client O(10 1-2 ) VOs O(10 2-3 ) Sites My friends The anonymous Grid or Cloud Thick VO Middleware & Support Thin Grid API 28

Example My Friends Services Dynamic Resource provisioning Workload management schedule resource, establish runtime environment, execute workload, handle results, clean up Data distribution and access Input, output, and relevant metadata File catalogue 29

Optimize Data Structure for!"#$%&"'#()*#+)",#-.//&-0."*#.1#.23&-'*#45'(#+)",#-.+6."&"'*7# =4,C@$C-55:C(-.H $_$=4,C@ ] K$=4,C@^K$`$=4,C@ [D $a $,$>:?$]bb/$':4$:j:.d$ c:d$c-55:c(-.h $ $_$c:d ] K$c:D^K$`$c:D [B $a$ $ $,$>:?$]b/$':4$:j:.d$ 15:CD4-.$C-55:C(-.H$_$15:CD4-. ] K$`$a$ $ $ $,$C-A'5:K$*>$,D$,55$ FZ,/@:DG$ Partial Reads =4,C@$' = $>-4$:J:.D/$]$D-$1 ]$ =4,C@$' +K $:J#$]$D-$1 ]$ =4,C@$dK$:J#$]$D-$1 ]$ `$ c:d$1 E,;4-.*C K$:J#$]$D-$1 ]$ 15:CD4-.$1 1W K$:J#$]$D-$1 ]$ `$ =4,C@$' = $>-4$:J:.D/$1 ]$ D-$1^$ =4,C@$' +K $:J#$1 ]$ D-$1^$ =4,C@$dK$:J#$1 ]$ D-$1^$ `$ c:d$1 E,;4-.*C K$:J#$1 ]$ D-$1^$ 15:CD4-.$1 1W K$:J#$1 ]$ D-$1^$ `$ 1,CE$F9,/@:DG$C-)'4://:;$/:',4,D:56$ef$-'()*+:;$D-$:gC*:.D56$4:,;H$ ',4(,5$:J:.DK$:#I#K$-.56$=4,C@/h$ ',4(,5$-9B:CDK$:#I#K$-.56$15:CD4-.$1 1W $D-$;:C*;:$*>$DE:$:J:.D$*/$*.D:4:/(.I$,D$,55$$ P-4$7WVK$C5A/D:4$/*+:$-.$;*/@$*/$i$j$^b$WZ$-4$]b$j$kb$:J:.D/$ =-D,5$35:$/*+:$>4-)$]bb$WZ$D-$]b$\Z$ `$ `$ C5A/D:4 ]$ C5A/D:4^$ 30 75A/D:4/$->$1J:.D/$

Fraction of a file that is read # of files read 700 600 500 400 300 200 3 10 AOD -- all sources median at 8.5% frac_read Entries 6932521 Mean 0.3814 RMS 0.3401 Statistics on 23M files read via the WAN Jan. 2012 Feb. 2014 12 Petabytes read in total. Overflow bin 100 0 0 0.2 0.4 0.6 0.8 1 Fraction of file read For half the files less than 8.5% of a file is read. 31

Future Solutions

From present to future Initially, we operated a largely static system. Data was placed quasi-static before it can be analyzed. Analysis centers have contractual agreements with the collaboration. All reconstruction is done at centers with custodial archives. Increasingly, we have too much data to afford this. Dynamic data placement Data is placed at T2s based on job backlog in global queues. WAN access: Any Data, Anytime, Anywhere Jobs start on same continent as data instead of the same cluster as data. data is cached on local cluster while its read via the WAN Dynamic creation of data processing centers Elastic scale out to support peak needs Agile use/sharing of more diverse set of resources 33

Any Data, Anytime, Anywhere User Application Q: Open /store/foo A: Success! Q: Open /store/foo A: Check Site A Global Xrootd Redirector Xrootd Cmsd Site A Cmsd Xrootd Site B Cmsd Xrootd Site C Cmsd Xrootd Lustre Storage Hadoop Storage dcache Storage Global redirection system to unify all CMS data into one globally accessible namespace. Is made possible by paying careful attention to IO layer to avoid inefficiencies due to IO related latencies. 34

Vision going forward Steady State! Processing! @ FNAL! Tape Archive! @ FNAL! Simulated! Data! Cloud and/or OSG! Resources! Peak! Processing! @ SDSC! or @ AWS or @. Tier-2 Centers! @ OSG! 35

Summary & Conclusions Guided by the principles: Support distributed ownership and control in a global single sign-on security context. Design for heterogeneity and adaptability submit locally & compute globally The LHC experiments very successfully implemented a globally distributed cyberinfrastructure to deal with their BigData. Looking towards the future, this originally largely static system is becoming more agile and more elastic. 36