Experience with Data-flow, DQM and Analysis of TIF Data

Similar documents
Real-time dataflow and workflow with the CMS tracker data

CMS event display and data quality monitoring at LHC start-up

The CMS data quality monitoring software: experience and future prospects

The CMS Computing Model

CRAB tutorial 08/04/2009

CMS data quality monitoring: Systems and experiences

CMS Computing Model with Focus on German Tier1 Activities

Data Analysis in ATLAS. Graeme Stewart with thanks to Attila Krasznahorkay and Johannes Elmsheuser

handling of LHE files in the CMS production and usage of MCDB

Stephen J. Gowdy (CERN) 12 th September 2012 XLDB Conference FINDING THE HIGGS IN THE HAYSTACK(S)

Kondo GNANVO Florida Institute of Technology, Melbourne FL

Spanish Tier-2. Francisco Matorras (IFCA) Nicanor Colino (CIEMAT) F. Matorras N.Colino, Spain CMS T2,.6 March 2008"

Installation of CMSSW in the Grid DESY Computing Seminar May 17th, 2010 Wolf Behrenhoff, Christoph Wissing

Agents and Daemons, automating Data Quality Monitoring operations

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

CMS Grid Computing at TAMU Performance, Monitoring and Current Status of the Brazos Cluster

Workload Management. Stefano Lacaprara. CMS Physics Week, FNAL, 12/16 April Department of Physics INFN and University of Padova

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

CMS Analysis Workflow

LHCb Computing Status. Andrei Tsaregorodtsev CPPM

The CMS L1 Global Trigger Offline Software

The INFN Tier1. 1. INFN-CNAF, Italy

CouchDB-based system for data management in a Grid environment Implementation and Experience

Data handling and processing at the LHC experiments

CMS Tier-2 Program for user Analysis Computing on the Open Science Grid Frank Würthwein UCSD Goals & Status

Belle & Belle II. Takanori Hara (KEK) 9 June, 2015 DPHEP Collaboration CERN

Tracking POG Update. Tracking POG Meeting March 17, 2009

ARC integration for CMS

Challenges of the LHC Computing Grid by the CMS experiment

PROOF-Condor integration for ATLAS

Analisi Tier2 e Tier3 Esperienze ai Tier-2 Giacinto Donvito INFN-BARI

The evolving role of Tier2s in ATLAS with the new Computing and Data Distribution model

Test Beam Task List - ECAL

Front-End Electronics Configuration System for CMS. Philippe Gras CERN - University of Karlsruhe

Data Quality Monitoring for High Energy Physics (DQM4HEP) Version

The ATLAS Conditions Database Model for the Muon Spectrometer

Computing Model Tier-2 Plans for Germany Relations to GridKa/Tier-1

arxiv: v1 [physics.ins-det] 1 Oct 2009

CMS Simulation Software

Hall D and IT. at Internal Review of IT in the 12 GeV Era. Mark M. Ito. May 20, Hall D. Hall D and IT. M. Ito. Introduction.

Monte Carlo Production on the Grid by the H1 Collaboration

Monitoring of Computing Resource Use of Active Software Releases at ATLAS

Persistent storage of non-event data in the CMS databases

Compact Muon Solenoid: Cyberinfrastructure Solutions. Ken Bloom UNL Cyberinfrastructure Workshop -- August 15, 2005

The CORAL Project. Dirk Düllmann for the CORAL team Open Grid Forum, Database Workshop Barcelona, 4 June 2008

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

HEP Grid Activities in China

LCG Conditions Database Project

Data services for LHC computing

LHCb Computing Strategy

MONitoring Agents using a Large Integrated Services Architecture. Iosif Legrand California Institute of Technology

Data Quality Monitoring Display for ATLAS experiment

The ATLAS Production System

Performance quality monitoring system (PQM) for the Daya Bay experiment

Detector Control System for Endcap Resistive Plate Chambers

CMS users data management service integration and first experiences with its NoSQL data storage

The Run 2 ATLAS Analysis Event Data Model

Event cataloguing and other database applications in ATLAS

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

Summary of the LHC Computing Review

Status Report of PRS/m

CLAS12 Offline Software Tools. G.Gavalian (Jlab) CLAS Collaboration Meeting (June 15, 2016)

LHCb Distributed Conditions Database

L1 and Subsequent Triggers

HLT Infrastructure Commissioning

PoS(ACAT)020. Status and evolution of CRAB. Fabio Farina University and INFN Milano-Bicocca S. Lacaprara INFN Legnaro

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

Software and computing evolution: the HL-LHC challenge. Simone Campana, CERN

Performance quality monitoring system for the Daya Bay reactor neutrino experiment

Persistent storage of non-event data in the CMS databases

ALICE ANALYSIS PRESERVATION. Mihaela Gheata DASPOS/DPHEP7 workshop

The Compact Muon Solenoid Experiment. Conference Report. Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland

Large scale commissioning and operational experience with tier-2 to tier-2 data transfer links in CMS

Common Software for Controlling and Monitoring the Upgraded CMS Level-1 Trigger

KLOE software on Linux

Computing. DOE Program Review SLAC. Rainer Bartoldus. Breakout Session 3 June BaBar Deputy Computing Coordinator

The GAP project: GPU applications for High Level Trigger and Medical Imaging

ANSE: Advanced Network Services for [LHC] Experiments

The ATLAS Trigger Simulation with Legacy Software

Streamlining CASTOR to manage the LHC data torrent

Data Quality Monitoring at CMS with Machine Learning

Computing at Belle II

LHCb Computing Resources: 2018 requests and preview of 2019 requests

Prompt data reconstruction at the ATLAS experiment

1. Introduction. Outline

Event Displays and LArg

The National Analysis DESY

Andrea Sciabà CERN, Switzerland

Data Transfers Between LHC Grid Sites Dorian Kcira

IBM Content Manager Compliance Solution with IBM System Storage N Series SnapLock devices

CMS conditions database web application service

Subscriptions and Recurring Payments 2.X

RDMS CMS Computing Activities before the LHC start

I Service Challenge e l'implementazione dell'architettura a Tier in WLCG per il calcolo nell'era LHC

EUSurvey Installation Guide

The NOvA software testing framework

Status of KISTI Tier2 Center for ALICE

Access to ATLAS Geometry and Conditions Databases

Deploying virtualisation in a production grid

ATLAS Offline Data Quality Monitoring

Transcription:

Experience with Data-flow, DQM and Analysis of TIF Data G. Bagliesi, R.J. Bainbridge, T. Boccali, A. Bocci, V. Ciulli, N. De Filippis, M. De Mattia, S. Dutta, D. Giordano, L. Mirabito, C. Noeding, F. Palla, S. Sarkar ECAL-DPG Meeting, CERN

Tracker Integration & Slice Test (~ 25%) verify long-term stable operation of the system & data-acquisition finalise PS, DAQ, DCS, Safety systems deploy Online software, Data Quality Monitor collect a significant cosmic data sample useful to develop and deploy Track reconstruction, Cosmic tracking and alignment algorithms establish data movement and analysis in the Grid computing environment Tracker Analysis Center DAQ DQM/IGUANA na ly si s The CMS tracker has been fully integrated at the TIF Commissioning of the 25% of the tracker with Cosmic muons on-going since Feb '07 to A 2

TIF Data Processing Overview 1. Conversion of Raw data to EDM compatible format 2. Copying Raw and EDM to Castor DAQ + Filter Farm StorageManager Local Disk Storage (temporary before transfer to CASTOR) DQM Copying to CASTOR / Registration in DBS/DLS Local reconstruction Shipping to Tier 1 / 2 5. Reconstruction with ProdAgent 3. Registration of EDM data in DBS-1/DLS Reconstruction 4. Injection in PhEDEx for the transfer DBS-2 registration underway (both Raw and EDM ) Visualization (IGUANA) Publication Injection Subscription Skimming User analsysis - Data registered in Bari/CERN/FNAL - Publication of Reco data 6. Data analysis via CRAB - Global Monitoring Track TIF data-flow in real-time Migration of TIF data processing @ Tier-0 in progress 3

Data Quality Monitor DQM is an indispensable tool to continuously monitor the performance of a large number of detectors find problems and find it early saves a lot of head-ache downstream send instant feedback to hardware and reconstruction experts Smooth data-taking will be ensured iff all the above are under control Provides summary information for the shifters and all the imaginable details for the experts 4

Different Modes of Running DQM Online, during data taking events from Storage Manager (EventStreamHttpReader) 1 event out of 10 (configurable) Source reading events from file stored on disk (local/castor) Offline DQM Source Collector Quasi-Online DQM Source Client standalone client together with source modules in a single process to achieve full statistics and bookkeeping In all cases full reconstruction of runs together with DQM source For all different modes the output is a Root file with histograms arranged in folders - Collation - Accumulation - Summary Statistical Tool OK Warning Failed Web based visualization 5

DQM Shifter View Global Tracks with Reference Pre-defined Layout Start a slide show Check Lite or Detailed text Summary of QTest results 6

DQM Expert View Select a part of the detector problem!! Navigate down the folder tree Pin point the culprit(s) 7

Tracker Map 2D representation of the tracker painted with generated alarms [M. Mennea, G. Zito] track down culprit modules and click and see detail 8

Local Resources: CPU and Storage Two dedicated PCs at Tracker Analysis Center (TAC) cmstkstorage-giga storage processing 2 data volumes each ~ 1 TB used alternately during data-taking Temporary storage; a robust clean-up mechanism in place data volumes are exported to each TAC machine via NFS copies Raw and EDM files to Castor performs DBS/DLS registration cmstac11 converts Raw data to EDM format loads pedestal and noise values from online to offline DB (o2o), crucial for offline processing of data hosts global monitoring of TIF data 9

Conversion, Copying to Castor Fully automated with cron jobs and daemons all types of runs are converted (physics, pedestal etc.) Raw files archived in Castor under /castor/cern.ch/cms/testbeam/tac EDM files archived in Castor under /castor/cern.ch/cms/store/tac once the EDM files for a run copied to Castor, a catalog is prepared for DBS/DLS registration in the next step of the chain Experience flat file based book-keeping; only one conversion process runs at a time Castor has its own well-known problems code developed in production environment; initially we had difficult moments NFS slows down processing when a large number of clients access the data volumes 10

Registration in DBS-1/DLS A number of daemon processes look continuously for new runs, i.e DBS catalog files created by the previous steps Technicalities a Grid certificate with production role is required to connect to DLS voms-proxy-init -voms cms:/cms/role=production registration scripts based on DBS/DLS API one DBS and DLS instance MCLocal_4/Writer for DBS prod-lfc-cms-central.cern.ch/grid/cms/dls/mclocal_4 for DLS for EDM files provide file size, number of events in a file, checksum Experience no hiccups at all, fast and robust registration repeated for a few runs due to problems in the previous steps 11

Raw Tracker data in DBS/DLS http://cmsdbs.cern.ch/discovery/expert Tracker data MTCC data 12

Injection in PhEDEx for Transfer Data published to DBS/DLS are injected to the official CMS data movement tool, PhEDEx Data injection is the procedure to write into the PhEDEx database and can be run at a remote Tier site Presently, the injection performed from Bari an official PhEDEx agent and a component of ProdAgent modified to close blocks at the end of the transfer for automatic publication to DLS to work Several daemons continuously look for new tracker data being published to DBS/DLS Once datasets are injected in PhEDEx, any Tier-n site can subscribe and PhEDEx will eventually deliver them Tracker data available at: CERN, FNAL, Bari, Pisa 13

Tracker Data in PhEDEx http://cmsdoc.cern.ch/cms/aprom/phedex 14

PhEDEx Experience If Castor fails to deliver files, PhEDEx may wait indefinitely PhEDEx not supposed to identify and work around mass storage problems for efficiency PhEDEx assigns a group of files at a time for transfer File size mismatch between Castor and PhEDEx TMDB Some EDM files overwritten after injection to PhEDEx multiple Raw-to-EDM conversion processes created problem with file-based book-keeping device reverted to single process a couple of months ago; PhEDEx TMDB updated less than 0.1% files affected CERN to FNAL Raw data transfer affected by other transfers with higher priority (MC Production) eventually importance of tracker data was recognised and transfer streamlined 15

Standard Reconstruction Run reconstruction of Raw data in a standard and official way, i.e using a CMSSW release or pre-release but without user patches ProdAgent can be used in the same way it is used for MC production can be run in any remote Tier-1/2 Running with ProdAgent ensures that the reconstructed data are automatically registered to DBS/DLS, ready to be shipped via PhEDEx to other Tiers and analysed with standard distributed computing tools Offline DB (pedestal, noise, cabling information) accessed at remote site via Frontier/Squid cache Currently, reconstruction of new runs triggered automatically by a ProdAgent instance in Bari Jobs run in CERN, FNAL, Bari and Pisa where Raw data are available Reconstructed data registered to DBS/DLS from sites where they are produced 16

Reconstruction in Development Environment Performed at FNAL using ProdAgent with releases or prereleases patched with latest development/bug fixes Provides immediate feedback to the tracking developers incorporates corrected geometry, latest algorithm changes with physics run on track reconstruction algorithms and alignment Not fully compatible with the official naming convention for the reconstructed data Reconstructed data transferred to other sites in the usual way Details at https://twiki.cern.ch/twiki/bin/view/cms/tifdataanalysis 17

Reco Tracker Data in DBS/DLS http://cmsdbs.cern.ch/discovery/expert 18

Data Analysis with CRAB Both Raw and Reco data published in DBS/DLS can be analysed with CRAB at different Tiers Edit crab.cfg and insert the dataset path of the Run to be analysed. CRAB automatically gets the file list Follow the usual steps setup the CMS environment compile your code provide the usual CMSSW cfg to be used by cmsrun Offline DB accessed via frontier at Tier 1/2 Automate analysis steps via CRAB 19

Automated Analysis with CRAB The automated process repeats the following steps for all the interesting physics runs Run Discovery: combine information from Run Summary Page, DBS (and eventually a custom list of runs) Change of CMSSW analysis release evolution of code; new reconstruction feature in newer release Analysis Flag assignment when something changes: an analysis condition, parameter, Creation/Submission of all CMSSW jobs with CRAB Monitoring / Output Retrieval of jobs on Grid the CRAB user is supposed to retrieve the job output from WN, no automatic retrieval on job completion eventual output merging if CRAB divided the job in multiple sub-jobs Run the analysis Root macros on the output Represent results in a way useful to identify trends, problems, etc. publish on the web, allow * easy navigation through several Flags, runs, results * open access for everyone interested to contribute to the analysis. 20

Automated Analysis continued http://cmstac11.cern.ch:8080/.../stable/bari/130_v5/allsummaries/asummary_pedth.html 21

TIF Data Processing Statistics Technical problem http://cmstac11.cern.ch:8080/analysis/crabanalysis/rate/rateshort_integrated.gif approaching 5M physics events several factors affect the reconstruction phase (finding Grid resources, availability of the proper CMSSW etc.) 22

Global Monitoring Follow TIF data movement in real time from local storage to Tier sites We have our own monitor at http://cmstac11.cern.ch:8080/tacmon - essential to track problems early in the long chain Detailed information for a - useful as an easy reference for the TIF data run in each phase - new ideas still flowing in Filters 23

Summary DQM, a crucial component to find problems early optimised for both end-users and experts Fully automated data movement, reconstruction and analysis of TIF data somewhat ad-hoc yet robust design; still evolving with time no problems encountered in the last couple of months Data movement/processing limited by conversion efficiency Castor copy efficiency and Castor delivering files to PhEDEx lack of a better DB based book-keeping that would allow multiple conversion processes to run in parallel to speed up the whole chain Documentation CMS IN-2007/014 https://twiki.cern.ch/twiki/bin/view/cms/tifdataanalysis It was a challenging exercise for the community and very satisfying that we made it in the best possible way 24