handling of LHE files in the CMS production and usage of MCDB

Similar documents
Experience with Data-flow, DQM and Analysis of TIF Data

LCG MCDB Knowledge Base of Monte Carlo Simulated Events

Latest development for ME-PS matching with MadGraph/MadEvent

Improving Generators Interface to Support LHEF V3 Format

CMS Simulation Software

Real-time dataflow and workflow with the CMS tracker data

The CMS data quality monitoring software: experience and future prospects

Workload Management. Stefano Lacaprara. CMS Physics Week, FNAL, 12/16 April Department of Physics INFN and University of Padova

arxiv:hep-ph/ v1 27 Mar 2007

Recasting LHC analyses with MADANALYSIS 5

Collider analysis recasting with Rivet & Contur. Andy Buckley, University of Glasgow Jon Butterworth, University College London MC4BSM, 20 April 2018

Tracking POG Update. Tracking POG Meeting March 17, 2009

Tutorial for CMS Users: Data Analysis on the Grid with CRAB

Data Analysis in ATLAS. Graeme Stewart with thanks to Attila Krasznahorkay and Johannes Elmsheuser

ALICE ANALYSIS PRESERVATION. Mihaela Gheata DASPOS/DPHEP7 workshop

CMS Analysis Workflow

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

CEDAR: HepData, JetWeb and Rivet

Rivet. July , CERN

CRAB tutorial 08/04/2009

Offline Tutorial I. Małgorzata Janik Łukasz Graczykowski. Warsaw University of Technology

PoS(ACAT08)101. An Overview of the b-tagging Algorithms in the CMS Offline Software. Christophe Saout

LHCb Computing Status. Andrei Tsaregorodtsev CPPM

CMS Computing Model with Focus on German Tier1 Activities

DESY at the LHC. Klaus Mőnig. On behalf of the ATLAS, CMS and the Grid/Tier2 communities

Reducing CPU Consumption of Geant4 Simulation in ATLAS

LAr Event Reconstruction with the PANDORA Software Development Kit

Installation of CMSSW in the Grid DESY Computing Seminar May 17th, 2010 Wolf Behrenhoff, Christoph Wissing

The CMS Computing Model

FAMOS: A Dynamically Configurable System for Fast Simulation and Reconstruction for CMS

glite Grid Services Overview

Jet algoritms Pavel Weber , KIP Jet Algorithms 1/23

Analysis preservation and recasting with Rivet & Contur. Andy Buckley, University of Glasgow FNAL Reinterpretation Workshop, 16 Oct 2017

8.882 LHC Physics. Track Reconstruction and Fitting. [Lecture 8, March 2, 2009] Experimental Methods and Measurements

Early experience with the Run 2 ATLAS analysis model

Challenges of the LHC Computing Grid by the CMS experiment

The NOvA software testing framework

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

Physics and Detector Simulations. Norman Graf (SLAC) 2nd ECFA/DESY Workshop September 24, 2000

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

Computing at Belle II

Spanish Tier-2. Francisco Matorras (IFCA) Nicanor Colino (CIEMAT) F. Matorras N.Colino, Spain CMS T2,.6 March 2008"

Data and Analysis preservation in LHCb

The evolving role of Tier2s in ATLAS with the new Computing and Data Distribution model

LCG MCDB a Knowledgebase of Monte-Carlo Simulated Events.

EventStore A Data Management System

Open data and scientific reproducibility

CMS Belgian T2. G. Bruno UCL, Louvain, Belgium on behalf of the CMS Belgian T2 community. GridKa T1/2 meeting, Karlsruhe Germany February

CouchDB-based system for data management in a Grid environment Implementation and Experience

Monte Carlo Production on the Grid by the H1 Collaboration

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

DVCS software and analysis tutorial

Computing. DOE Program Review SLAC. Rainer Bartoldus. Breakout Session 3 June BaBar Deputy Computing Coordinator

Computing / The DESY Grid Center

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

VISPA: Visual Physics Analysis Environment

Reprocessing DØ data with SAMGrid

ATLAS Simulation Computing Performance and Pile-Up Simulation in ATLAS

CompHEP 4.5 Status Report

Data Transfers Between LHC Grid Sites Dorian Kcira

ComPWA: A common amplitude analysis framework for PANDA

The CMS experiment workflows on StoRM based storage at Tier-1 and Tier-2 centers

The High-Level Dataset-based Data Transfer System in BESDIRAC

Suez: Job Control and User Interface for CLEO III

ATLAS PILE-UP AND OVERLAY SIMULATION

Data handling and processing at the LHC experiments

LHCb Distributed Conditions Database

8.882 LHC Physics. Analysis Tips. [Lecture 9, March 4, 2009] Experimental Methods and Measurements

GRID COMPUTING APPLIED TO OFF-LINE AGATA DATA PROCESSING. 2nd EGAN School, December 2012, GSI Darmstadt, Germany

Stephen J. Gowdy (CERN) 12 th September 2012 XLDB Conference FINDING THE HIGGS IN THE HAYSTACK(S)

Status and Plans for PYTHIA 8

Monitoring the Usage of the ZEUS Analysis Grid

Distributed Data Management on the Grid. Mario Lassnig

Software and computing evolution: the HL-LHC challenge. Simone Campana, CERN

A Geometrical Modeller for HEP

CMS - HLT Configuration Management System

A data handling system for modern and future Fermilab experiments

Hall D and IT. at Internal Review of IT in the 12 GeV Era. Mark M. Ito. May 20, Hall D. Hall D and IT. M. Ito. Introduction.

ATLAS DQ2 to Rucio renaming infrastructure

The INFN Tier1. 1. INFN-CNAF, Italy

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

PoS(High-pT physics09)036

Performance Pack. Benchmarking with PlanetPress Connect and PReS Connect

Data Quality Monitoring at CMS with Machine Learning

Virtualizing a Batch. University Grid Center

Event cataloguing and other database applications in ATLAS

Agents and Daemons, automating Data Quality Monitoring operations

ARC integration for CMS

A L I C E Computing Model

ATLAS Analysis Workshop Summary

Andrea Sciabà CERN, Switzerland

Streamlining CASTOR to manage the LHC data torrent

Monte Carlo Production Management at CMS

Web-interface for Monte-Carlo event generators

PROOF-Condor integration for ATLAS

The full detector simulation for the ATLAS experiment: status and outlook

A time machine for the OCDB

DIRAC pilot framework and the DIRAC Workload Management System

Simulation and Visualization in Hall-D (a seminar in two acts) Thomas Britton

SeU Certified Selenium Engineer (CSE) Syllabus

Transcription:

handling of LHE files in the CMS production and usage of MCDB Christophe Saout CERN, University of Karlsruhe on behalf of the CMS physics event generators group Christophe M. Saout, CERN, Uni Karlsruhe LCG Generator Services 09.04.08 1

Outline Traditional MC production in CMS Split MC / PS generator procedure The approach so far A common accord: the Les Houches Event files LHE files in CMS? Where theory & experiment meets: the MCDB Issues with MCDB and CMS production chain Possible solutions Conclusion & Outlook Christophe M. Saout, CERN, Uni Karlsruhe LCG Generator Services 09.04.08 2

config file Traditional MC Production in CMS The big cmsrun executable (CMSSW) source Physics Event Generator (GeneratorInterface) Pythia6Interface Herwig6Interface C++... HepMC::GenEvent One big CMS executable for everything Steered by config files One generic data file format for everything: EDM (based on ROOT) Built out of modules (shared libraries) Source can either be: an Event Generator another EDM file Detector Simulation (Geant4) Reconstruction EDM ROOT file (Event Data Model) Christophe M. Saout, CERN, Uni Karlsruhe LCG Generator Services 09.04.08 3

CMS Production Workflow (I) Official Monte-Carlo production decentralised over the GRID ( ProdAgent ) Samples divided in datasets Unique name: /wz2j-alpgen/cmssw_1_6_7-csa07-1205907776/reco split into file blocks file blocks split into individual EDM files (<GUID>.root) one file per cmsrun (O(300) events each) Data stored locally (dcache, CASTOR) Local file URLs translation from worldwide logical filename using site-local trivial file catalog (/store/xxx rfio://.../xxx) Logical file names registered on central DBS server File block information: sites which hold the datasets Dataset transfers using PhEDEx (currently based on SRM) Christophe M. Saout, CERN, Uni Karlsruhe LCG Generator Services 09.04.08 4

CMS Production Workflow (II) Site 1 file block Node Node Node ABCD...root EF01...root 2345...root DBS server (CMS dataset registration) Node Node Node... dataset dataset CASTOR dcache file block... Site 2... Site 3... file block... file block... Note: DBS is a CMS-internal tool for dataset bookkeeping! Christophe M. Saout, CERN, Uni Karlsruhe LCG Generator Services 09.04.08 5

Matrix Element generators Typical physics event generation consists of roughly three steps Matrix Element calculation: the hard process Parton Shower: evolution of partons into jets Hadronisation: Create final-state particles (also all sorts of radiation and underlying event) General-purpose generators like Pythia and Herwig provide all three steps together, but their Matrix Elements are only leading order (LO) Almost the only generators with PS / hadronisation models A lot of alternate generators exist that provide only ME Improved ME (more accurate description of hard emissions) Other physics processes (SM, SUSY, exotics,...) need Pythia/Herwig afterwards to generate full events! Christophe M. Saout, CERN, Uni Karlsruhe LCG Generator Services 09.04.08 6

The CMS approach so far Production job (on grid site) e.g. AlpGen executable text file parton-level events CmsRun AlpgenInterface ME generator executed directly on-site Production workflow can be kept Not integrated in cmsrun Additional binaries/scripts needed Parton-level files thrown away Some generators (e.g. MG/ME) need to be compiled for the process (per-sample binaries!) Need preparatory, time-consuming warm-up calculations before starting event generation Pythia 6 simulation reconstruction manual preparation needed anyway! (example for one possible generator combination) Christophe M. Saout, CERN, Uni Karlsruhe LCG Generator Services 09.04.08 7

Les Houches Event files A lot of event generators only provide ME calculations Output needs to be fed to Pythia / Herwig subsequently common Les Houches Fortran common blocks defined Common blocks only suitable if code is directly glued into executable A common file format was defined: LHE files Allows complete separation of ME production and subsequent generation chain (PS, hadronisation,...) Easier interchangability of generators (e.g. Pythia Herwig) Lowers the hurdle for adoption of new ME generators Another advantage: Parton-level events are very small (handy to keep around) LHE files can be provided by theorists (done so for Spring07 independently from experiment MadGraph production) Christophe M. Saout, CERN, Uni Karlsruhe LCG Generator Services 09.04.08 8

Generators status in CMS Currently available modules Pythia6Interface Herwig6Interface MC@NLOInterface MadGraphInterface ALPGENInterface ExHumeInterface PomwigInterface CosmicMuonGenerator Pythia8Interface Tauola/Photos EvtGenInterface HyjdjetInterface PyquenInterface BeamHaloGenerator ParticleGuns MCFileReader CompHepInterface TopRexInterface (SherpaInterface) (Herwig++Interface) a colourful mixture Pythia used in some, Herwig in others Some modules simply read (local) files Several modules read generator-specific LHE files (from local disk) Considering LHE files a standard, a simple overall working plain LHE interface is not officially available LHE-based interfaces could share LHE interfacing (reader, MCDB) Pythia6/8, Herwig(++) interfacing could be factorized (where applicable) Christophe M. Saout, CERN, Uni Karlsruhe LCG Generator Services 09.04.08 9

Modular LHEInterface ParameterSet (config) CMSSW HepMC::GenEvent LHESource plugins/ Storage Factory (CASTOR, dcache, files, TFC...) ( MCDB ) file URLs LHEReader (HepML) LHEEvent (HEPEUP) LHECommon (HEPRUP) LHEInterface ME Hadronisation plugins Pythia 6 Pythia 8 Herwig 6 (PS, ISR/FSR, hadr., UE) src/ interface/ jet/parton matching (optional) MLM KT Cone weight veto (fastjet) (Working prototype available, needs to be validated and polished) Christophe M. Saout, CERN, Uni Karlsruhe LCG Generator Services 09.04.08 10

Where does MCDB fit in? The basic idea: First step of ME production is completely decoupled Resulting LHE files (small!) are uploaded to MCDB Documentation of ME generation step (no throwing away) Independent of experiment (and experiment software) Can also be provided / validated by theory colleagues directly) (instead of fiddling with integration of code into CMS chain) Issues: Does not fit well into existing CMS production chain Needs new setup for producing ME separately and uploading Christophe M. Saout, CERN, Uni Karlsruhe LCG Generator Services 09.04.08 11

MCDB open issues (I) Issues concerning reading the LHE events (in production) MCDB is located at CERN - CMS production anywhere on the Grid LHE data transfer issue CMS production is done in chunks of O(300) events assuming LHE file contains 30000 events, this would mean 100 jobs accessing the same file and counting events to find the correct starting point potential I/O bandwith waste Possible solutions: CMSSW StorageFactory supports arbitrary I/O protocols! rfio:// only works locally at CERN, gsiftp:// will be turned off? (and srmcp doesn't work behind firewalls) Register LHE files into DBS and use our PhEDEx site replication? Files in DBS are expected to be EDM conform (i.e. ROOT format) Text files aren't seekable by event number (I/O overhead) Christophe M. Saout, CERN, Uni Karlsruhe LCG Generator Services 09.04.08 12

MCDB open issues (II) Preferred solution: Before going into production convert LHE files into EDM files Solution preferred by EDM experts C++ representation of LHE contents trivial (both header, per-event and possible additional HepML information) Converter in both directions trivial Full information and production history (book-keeping) directly in EDM file! (no loss of information) Per-event exact reproducability of event generation step I/O overhead negligible, ROOT is seekable Registration with DBS makes it transparent to the system (and independence from yet another grid transport system) Some open framework issues (likely to be solved) Christophe M. Saout, CERN, Uni Karlsruhe LCG Generator Services 09.04.08 13

MCDB open issues (III) Producing and uploading LHE event data to MCDB How is the authentication done? grid certificate probably sufficient, all CMS production jobs run with the VOMS CMS production role LHE event file production is likely to be done in a distributed way on the Grid some sort of automated distributed upload mechanism Like 100 jobs uploading LHE files belonging to the same dataset (sample) in parallel Create MCDB article on the fly on first upload attempt, merge all other files into same article? ( to be discussed!) Possible via unique identifier of sample (like in DBS?) possibility to have an automatic ID string article ID mapping (or something similar) would be perfect Christophe M. Saout, CERN, Uni Karlsruhe LCG Generator Services 09.04.08 14

Proposed Architecure Theory other experiment MCDB CASTOR CERN (T0) LHE to EDM converter DBS CASTOR... 10k 10k 10k (PhEDEx)... some Grid site LHE file production Node some Grid site dcache,... Node Node Split into small jobs [ Production Manager] Christophe M. Saout, CERN, Uni Karlsruhe LCG Generator Services 09.04.08 15

Planned Productions The next Monte Carlo production for physics in CMS should bring us to the interpretation of the first data (hopefully). CMS is currently planning to focus on: Spring08 (April 08) a fast simulation production of the order of ~ 500M events. 3-6 months of data taking at 20% efficiency and 300 Hz storage rate full SM coverage for understanding PD overlaps, trigger tables, training the analyses icsa08 (May 08) a full simulation production of the order of 100M events, where the main component is QCD+MB. DPG oriented. (simpler than Spring08) mimic the first weeks of data taking with startup simulation conditions test of the computing flow and basic object reconstruction https://twiki.cern.ch/twiki/bin/view/cms/detectorperformancemcproduction2008 fcsa08 (July 08 if no beam). mimic the first 10-100pb -1 of data taking generator plans to be announced, readiness driven by Spring08 + signal MC packages (from P. Bartalini (NTU), R. Chierici (IPNL-Lyon), CMS software meeting 08.04.08) Christophe M. Saout, CERN, Uni Karlsruhe LCG Generator Services 09.04.08 16

Ongoing Production The 500 million Fast-Sim events that will be produced before the icsa08 exercise should roughly consist of: Min bias Pythia 100 Mevt QCD jets Madgraph 200 Mevt tt + jets Madgraph 10 Mevt t+ jets Madgraph 2 Mevt Photon+jets Madgraph 25 Mevt Z/W + jets Madgraph 50 Mevt Enriched e Madgraph or Pythia+Filter 25 Mevt Enriched µ Madgraph or Pythia+Filter 25 Mevt Enriched γ Madgraph or Pythia+Filter 10 Mevt Bbbar Madgraph or Pythia+Filter 50 Mevt Onia Pythia 5 Mevt time scale: April 08 + O(1M) fake µ, fake γ + O(50M) QCD jets with Pythia (for x-checks). Further smaller Pythia samples? (from P. Bartalini (NTU), R. Chierici (IPNL-Lyon), CMS software meeting 08.04.08) Christophe M. Saout, CERN, Uni Karlsruhe LCG Generator Services 09.04.08 17

Conclusion & Outlook A variety of generators employing LHE already in use by CMS Plain LHE/MCDB reading already possible (working!) for private purposes Generic LHE interface in preparation (probably a good place to start factorization / code sharing) A few technical obstacles for large-scale official CMS production (mostly on CMS side for the moment) Integration into CMS production workflow I/O issues prefer a robust solution without adding dependencies Distributed LHE file generation and upload Complicated, but hopefully feasible solution proposed Reusing LHE files and proper book-keeping is a must for future CMS productions, MCDB really is the most proper way! aiming for integration into CMS workflow before end of 2008! On the MCDB side everything seems to be there Feedback (and hands-on tests) especially on the upload issue welcome Evaluation and integration tests are ongoing Christophe M. Saout, CERN, Uni Karlsruhe LCG Generator Services 09.04.08 18