Stephen J. Gowdy (CERN) 12 th September 2012 XLDB Conference FINDING THE HIGGS IN THE HAYSTACK(S)

Similar documents
The CMS Computing Model

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

ATLAS Experiment and GCE

Software and computing evolution: the HL-LHC challenge. Simone Campana, CERN

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Virtualizing a Batch. University Grid Center

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

Data Transfers Between LHC Grid Sites Dorian Kcira

CMS Computing Model with Focus on German Tier1 Activities

Big Data Analytics and the LHC

CouchDB-based system for data management in a Grid environment Implementation and Experience

CERN and Scientific Computing

Compact Muon Solenoid: Cyberinfrastructure Solutions. Ken Bloom UNL Cyberinfrastructure Workshop -- August 15, 2005

1. Introduction. Outline

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

PoS(EPS-HEP2017)523. The CMS trigger in Run 2. Mia Tosi CERN

PoS(High-pT physics09)036

The creation of a Tier-1 Data Center for the ALICE experiment in the UNAM. Lukas Nellen ICN-UNAM

arxiv: v1 [physics.ins-det] 1 Oct 2009

Computing. DOE Program Review SLAC. Rainer Bartoldus. Breakout Session 3 June BaBar Deputy Computing Coordinator

Prompt data reconstruction at the ATLAS experiment

Storage and I/O requirements of the LHC experiments

b-jet identification at High Level Trigger in CMS

Overview. About CERN 2 / 11

Tracking and flavour tagging selection in the ATLAS High Level Trigger

ATLAS, CMS and LHCb Trigger systems for flavour physics

Using the In-Memory Columnar Store to Perform Real-Time Analysis of CERN Data. Maaike Limper Emil Pilecki Manuel Martín Márquez

Data Reconstruction in Modern Particle Physics

Batch Services at CERN: Status and Future Evolution

Grid Computing at the IIHE

Philippe Laurens, Michigan State University, for USATLAS. Atlas Great Lakes Tier 2 collocated at MSU and the University of Michigan

Spark and HPC for High Energy Physics Data Analyses

Experience of the WLCG data management system from the first two years of the LHC data taking

First LHCb measurement with data from the LHC Run 2

The LHC Computing Grid

Muon Reconstruction and Identification in CMS

Tackling tomorrow s computing challenges today at CERN. Maria Girone CERN openlab CTO

The ATLAS EventIndex: Full chain deployment and first operation

Computing at the Large Hadron Collider. Frank Würthwein. Professor of Physics University of California San Diego November 15th, 2013

Experience with Data-flow, DQM and Analysis of TIF Data

How to discover the Higgs Boson in an Oracle database. Maaike Limper

CC-IN2P3: A High Performance Data Center for Research

Precision Timing in High Pile-Up and Time-Based Vertex Reconstruction

Challenges of the LHC Computing Grid by the CMS experiment

CMS FPGA Based Tracklet Approach for L1 Track Finding

Reprocessing DØ data with SAMGrid

LHCb Computing Resources: 2018 requests and preview of 2019 requests

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

Monitoring of Computing Resource Use of Active Software Releases at ATLAS

New strategies of the LHC experiments to meet the computing requirements of the HL-LHC era

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

The LHC computing model and its evolution. Dr Bob Jones CERN

arxiv: v1 [cs.dc] 20 Jul 2015

Visita delegazione ditte italiane

CSCS CERN videoconference CFD applications

Data handling and processing at the LHC experiments

Big Computing and the Mitchell Institute for Fundamental Physics and Astronomy. David Toback

Performance of Tracking, b-tagging and Jet/MET reconstruction at the CMS High Level Trigger

Grid Computing: dealing with GB/s dataflows

AGIS: The ATLAS Grid Information System

ALICE ANALYSIS PRESERVATION. Mihaela Gheata DASPOS/DPHEP7 workshop

Data Analysis in ATLAS. Graeme Stewart with thanks to Attila Krasznahorkay and Johannes Elmsheuser

Summary of the LHC Computing Review

Track reconstruction with the CMS tracking detector

Physics CMS Muon High Level Trigger: Level 3 reconstruction algorithm development and optimization

ATLAS NOTE. December 4, ATLAS offline reconstruction timing improvements for run-2. The ATLAS Collaboration. Abstract

The GAP project: GPU applications for High Level Trigger and Medical Imaging

Andrea Sciabà CERN, Switzerland

LHCb Computing Resources: 2019 requests and reassessment of 2018 requests

Installation of CMSSW in the Grid DESY Computing Seminar May 17th, 2010 Wolf Behrenhoff, Christoph Wissing

Spanish Tier-2. Francisco Matorras (IFCA) Nicanor Colino (CIEMAT) F. Matorras N.Colino, Spain CMS T2,.6 March 2008"

CMS Alignement and Calibration workflows: lesson learned and future plans

IEPSAS-Kosice: experiences in running LCG site

The LHC Computing Grid. Slides mostly by: Dr Ian Bird LCG Project Leader 18 March 2008

ATLAS PILE-UP AND OVERLAY SIMULATION

Monte Carlo Production Management at CMS

Belle & Belle II. Takanori Hara (KEK) 9 June, 2015 DPHEP Collaboration CERN

Monitoring system for geographically distributed datacenters based on Openstack. Gioacchino Vino

PoS(IHEP-LHC-2011)002

The ATLAS Trigger System: Past, Present and Future

Storage Resource Sharing with CASTOR.

The CMS data quality monitoring software: experience and future prospects

PoS(TIPP2014)204. Tracking at High Level Trigger in CMS. Mia TOSI Universitá degli Studi di Padova e INFN (IT)

DESY at the LHC. Klaus Mőnig. On behalf of the ATLAS, CMS and the Grid/Tier2 communities

Reliability Engineering Analysis of ATLAS Data Reprocessing Campaigns

T0-T1-T2 networking. Vancouver, 31 August 2009 LHCOPN T0-T1-T2 Working Group

A L I C E Computing Model

Tracking and Vertexing performance in CMS

UW-ATLAS Experiences with Condor

Grid Data Management

CERN openlab II. CERN openlab and. Sverre Jarp CERN openlab CTO 16 September 2008

Level-1 Regional Calorimeter Trigger System for CMS

An SQL-based approach to physics analysis

LHCb Computing Resource usage in 2017

Simulation and Physics Studies for SiD. Norman Graf (for the Simulation & Reconstruction Team)

Travelling securely on the Grid to the origin of the Universe

Deep Learning Photon Identification in a SuperGranular Calorimeter

Continuation Report/Proposal: Vanderbilt CMS Tier 2 Computing Project Reporting Period: November 1, 2010 to March 31, 2012

The CMS L1 Global Trigger Offline Software

The CLICdp Optimization Process

Transcription:

Stephen J. Gowdy (CERN) 12 th September 2012 XLDB Conference FINDING THE HIGGS IN THE HAYSTACK(S)

Overview Large Hadron Collider (LHC) Compact Muon Solenoid (CMS) experiment The Challenge Worldwide LHC Computing Grid (wlcg) Data Organisation Analysis Techniques Databases Future Trends 2

Large Hadron Collider a hadron is a composite particle made of quarks 3

Big machine characteristics 17 mile circular tunnel, 100m underground, straddling the French-Swiss border Protons currently travel at 99.9999964% of the speed of light Each proton enters CH over 11,000 times in a second Will not reach design beam energy till 2014 Interactions potentially every 25ns (40MHz) Each interaction has multiple collisions Call pileup, currently around 30 collisions per event 4

Accelerator Complex Older machines feed newer machines LHC Protons start in LINAC2 then go to the PS via the BOOSTER From the PS they are injected to the SPS Injected to LHC at 450GeV Accelerated to 4TeV in LHC Need to have fills ~1/day 5

CMS LHC SPS CERN Main Site 6

Compact Muon Spectrometer a muon is a (comparatively) long lived big brother to the electron 7

8

Particle Identification 101 9

10

Trigger Architecture Matching Trigger Towers ECAL, HCAL: E T (d d < 2.1 0.8 < < 2.4 < 1.2 Electron Isolation, Jet detection Track segments endcap and barrel Sorting E T miss E T tot for Endcap and Barrel: p T,,, quality 4 candidates Final decision, partitioning Interface to TTC, TTS (Trigger throttling system) 11

Data Rates RAW (ie unprocessed) data is about ~1MB/ev Potential detector acquisition rate 1MB * 40MHz = 40TB/s Actual data is much larger but all detectors not able to readout at 40MHz Hardware trigger decision allows 100kHz rate Looks at individual detectors to make a fast choice Data rate up to 100GB/s High Level Trigger done on filter farm Output rate is nominally 300Hz ~= 300MB/s 12

The Challenge why it isn t easy 13

A Higgs event 14

A Haystack 40 reconstructed vertices High PileUp run 25 th October 2011 15

Haystacks So that was one event 2012 average is 30 collisions per event By the end of 2012 will have almost 7 billion events recorded After the reduction of 40MHz to O(300Hz) Doesn t include simulated data Looking for a half million Higgs particles Assuming predicted cross sections are correct Many are much much harder to find than 4 muons 16

Worldwide LHC Computing Grid (wlcg) like an electric grid that supplies computing power 17

Tiered System Tier-0 at CERN Data gets sorted and its first pass reconstruction Tier-1 centres CMS has seven, large regional facilities Provide custodial tape storage Large scale re-reconstruction Tier-2 centres Frequently universities or groups of universities Simulation End user analysis 18

Schematic Filter Farm CMS Detector CERN Tier-0 Fermilab IN2P3 ASGC KIT CNAF Tier-1 Florida UCSD Tier-2 UCLA MyLaptop Tier-3 19

Traffic on a CERN Holiday LHCOPN (Optical Private Network) CMS is green 20

Resources CPU (khs06) Tier-2 324 56% 582kHS06~=150kSi2k Tier-0 121 21% Tier-1 137 23% Disk (TB) Tier-2 27000 51% Tape (TB) 51800TB Tier-1 21000 40% Tier-0 4800 9% 90000TB Tier-0 23000 33% Tier-1 47000 67% 21

Data Organisation lining up the bytes in a consumable order 22

Data Tiers Streamer files written to disk by filter farm Read and reorganised into Primary Datasets (PD) Based on trigger selections (physics motivation) Output is the custodial RAW data Reconstruction run on RAW PDs Output RECO and AOD (Analysis Object Data) Simulation also produces similar data tiers plus truth information 23

Data Ordering ROOT used as persistency framework Depending on expected reading pattern adjust ordering of data in files RAW & RECO expected to read whole event Ordering in file is by event 1 2 3 n AOD could have subset of data read Pass frequently over a single variable making plots 1 2 3 n 1 2 3 n Attribute 1 Attribute 4 24

Skims Train model like event selection Various analysis include their event selection Selection done using reco output More detailed and accurate than trigger info Can cut a lot harder First skims done at Tier-1 on the Tier-0 output Called PromptSkims as it is started ASAP Currently write out 81 datasets from Tier-0 output 25

Datasets Files are collected in datasets Datasets should be processed together This actually uses a database (Oracle) Each dataset has provenance attached to it Can be superseded by a reprocessing End user tool queries database and creates jobs to process it Typically across all the Tier-2s hosting the dataset 26

Analysis Techniques narrowing the haystacks 27

Discriminating Variables Each analysis will find the variables that enhance their signal to noise ratio High energy muon is an easy one i.e. something going really fast doesn t bend so much in the magnetic field May end up loosing a lot of signal to reduce the background by a larger factor Optimise S/ B or S/ (S+B) 60 50 40 30 20 10 0 Pseudo Data Momentum of muon (GeV) Background Signal 28

Multivariate Analysis Many different types Simple rectangular cuts (multiple 1-d cuts) Maximum Likelihood approaches Combine the probability of all input variables Fisher Discriminants Input variables are projected to another space to avoid correlations Neural Networks Most of these methods rely on training Some packages can apply many methods 29

TMVA (Toolkit for MVA in ROOT) 30

New Boson Plot H -> ZZ -> llll Use five angles and two masses as discriminators 31

Databases not xldbs though 32

Conditions Database Largest database use (not in size, ~300GB) Provides calibration, geometry and alignment information Used by all running jobs Can be more than 100k jobs world wide Network of squid caches used Database queues transformed into http requests Home grown technology to achieve this (Frontier) Works as data is written once, read many 33

Squids Aggregate: 500k requests/min 500MB/s Offline Servers: 4k requests/min 0.5MB/s 34

Other Databases PhEDEx : Manages file transfers Single Oracle instance at CERN DBS : Dataset Bookkeeping System Contains meta-data about datasets and files Main instance in Oracle at CERN User instances available elsewhere with MySQL Job tracking databases Use both Oracle and MySQL Recent system archiving information in CouchDB 35

Reading Rate 6TB/day 250TB/day 36

Future Trends need to wear shades 37

Federated Storage Aiming towards an architecture where all storage is visible globally Redirect EU Global Redirector US Region EU Region?? Region Query /store/foo User App Open /store/foo Open /store/foo Redirect Global Redirect Site C US Redirector Query /store/foo EU Redirector Query /store/foo Site A Site B Site C Site D /store/foo 38

Clouds: for a rainy day Helix Nebula European initiative to provide unified system Shows importance for standards Proof of concept demonstrated on Amazon Costs still prohibitively expensive Estimate order of magnitude Running our own data centres more cost effective May be interesting for adding short term capacity 39

Clouds: internal cloud CERN moving to agile infrastructure Commissioning new data centre in Hungary Filter farm as cloud during LHC shutdown Using OpenStack across 15k cores Allows flexibility for redeployment Farm also needed for detector work 40

Summary Database technology used in various roles Whole size around 10TB: not huge Our Big Data: 20PB RAW data CMS uses worldwide computing infrastructure to deliver physics results We ve found a needle, now need to figure out what kind it is: http://lanl.arxiv.org/abs/1207.7235 41

XLDB Europe 2013 @ CERN CERN will be happy to host a European Satellite XLDB Planned date: 25+26 June 2013 During LHC long shutdown, which will allow to include also discussions on LHC data management issues We invite everyone to help reaching out to places in Europe with challenging xldbrelated issues please contact dirk.duellmann@cern.ch and becla@slac.stanford.edu 42