Experience with Data-flow, DQM and Analysis of TIF Data G. Bagliesi, R.J. Bainbridge, T. Boccali, A. Bocci, V. Ciulli, N. De Filippis, M. De Mattia, S. Dutta, D. Giordano, L. Mirabito, C. Noeding, F. Palla, S. Sarkar ECAL-DPG Meeting, CERN
Tracker Integration & Slice Test (~ 25%) verify long-term stable operation of the system & data-acquisition finalise PS, DAQ, DCS, Safety systems deploy Online software, Data Quality Monitor collect a significant cosmic data sample useful to develop and deploy Track reconstruction, Cosmic tracking and alignment algorithms establish data movement and analysis in the Grid computing environment Tracker Analysis Center DAQ DQM/IGUANA na ly si s The CMS tracker has been fully integrated at the TIF Commissioning of the 25% of the tracker with Cosmic muons on-going since Feb '07 to A 2
TIF Data Processing Overview 1. Conversion of Raw data to EDM compatible format 2. Copying Raw and EDM to Castor DAQ + Filter Farm StorageManager Local Disk Storage (temporary before transfer to CASTOR) DQM Copying to CASTOR / Registration in DBS/DLS Local reconstruction Shipping to Tier 1 / 2 5. Reconstruction with ProdAgent 3. Registration of EDM data in DBS-1/DLS Reconstruction 4. Injection in PhEDEx for the transfer DBS-2 registration underway (both Raw and EDM ) Visualization (IGUANA) Publication Injection Subscription Skimming User analsysis - Data registered in Bari/CERN/FNAL - Publication of Reco data 6. Data analysis via CRAB - Global Monitoring Track TIF data-flow in real-time Migration of TIF data processing @ Tier-0 in progress 3
Data Quality Monitor DQM is an indispensable tool to continuously monitor the performance of a large number of detectors find problems and find it early saves a lot of head-ache downstream send instant feedback to hardware and reconstruction experts Smooth data-taking will be ensured iff all the above are under control Provides summary information for the shifters and all the imaginable details for the experts 4
Different Modes of Running DQM Online, during data taking events from Storage Manager (EventStreamHttpReader) 1 event out of 10 (configurable) Source reading events from file stored on disk (local/castor) Offline DQM Source Collector Quasi-Online DQM Source Client standalone client together with source modules in a single process to achieve full statistics and bookkeeping In all cases full reconstruction of runs together with DQM source For all different modes the output is a Root file with histograms arranged in folders - Collation - Accumulation - Summary Statistical Tool OK Warning Failed Web based visualization 5
DQM Shifter View Global Tracks with Reference Pre-defined Layout Start a slide show Check Lite or Detailed text Summary of QTest results 6
DQM Expert View Select a part of the detector problem!! Navigate down the folder tree Pin point the culprit(s) 7
Tracker Map 2D representation of the tracker painted with generated alarms [M. Mennea, G. Zito] track down culprit modules and click and see detail 8
Local Resources: CPU and Storage Two dedicated PCs at Tracker Analysis Center (TAC) cmstkstorage-giga storage processing 2 data volumes each ~ 1 TB used alternately during data-taking Temporary storage; a robust clean-up mechanism in place data volumes are exported to each TAC machine via NFS copies Raw and EDM files to Castor performs DBS/DLS registration cmstac11 converts Raw data to EDM format loads pedestal and noise values from online to offline DB (o2o), crucial for offline processing of data hosts global monitoring of TIF data 9
Conversion, Copying to Castor Fully automated with cron jobs and daemons all types of runs are converted (physics, pedestal etc.) Raw files archived in Castor under /castor/cern.ch/cms/testbeam/tac EDM files archived in Castor under /castor/cern.ch/cms/store/tac once the EDM files for a run copied to Castor, a catalog is prepared for DBS/DLS registration in the next step of the chain Experience flat file based book-keeping; only one conversion process runs at a time Castor has its own well-known problems code developed in production environment; initially we had difficult moments NFS slows down processing when a large number of clients access the data volumes 10
Registration in DBS-1/DLS A number of daemon processes look continuously for new runs, i.e DBS catalog files created by the previous steps Technicalities a Grid certificate with production role is required to connect to DLS voms-proxy-init -voms cms:/cms/role=production registration scripts based on DBS/DLS API one DBS and DLS instance MCLocal_4/Writer for DBS prod-lfc-cms-central.cern.ch/grid/cms/dls/mclocal_4 for DLS for EDM files provide file size, number of events in a file, checksum Experience no hiccups at all, fast and robust registration repeated for a few runs due to problems in the previous steps 11
Raw Tracker data in DBS/DLS http://cmsdbs.cern.ch/discovery/expert Tracker data MTCC data 12
Injection in PhEDEx for Transfer Data published to DBS/DLS are injected to the official CMS data movement tool, PhEDEx Data injection is the procedure to write into the PhEDEx database and can be run at a remote Tier site Presently, the injection performed from Bari an official PhEDEx agent and a component of ProdAgent modified to close blocks at the end of the transfer for automatic publication to DLS to work Several daemons continuously look for new tracker data being published to DBS/DLS Once datasets are injected in PhEDEx, any Tier-n site can subscribe and PhEDEx will eventually deliver them Tracker data available at: CERN, FNAL, Bari, Pisa 13
Tracker Data in PhEDEx http://cmsdoc.cern.ch/cms/aprom/phedex 14
PhEDEx Experience If Castor fails to deliver files, PhEDEx may wait indefinitely PhEDEx not supposed to identify and work around mass storage problems for efficiency PhEDEx assigns a group of files at a time for transfer File size mismatch between Castor and PhEDEx TMDB Some EDM files overwritten after injection to PhEDEx multiple Raw-to-EDM conversion processes created problem with file-based book-keeping device reverted to single process a couple of months ago; PhEDEx TMDB updated less than 0.1% files affected CERN to FNAL Raw data transfer affected by other transfers with higher priority (MC Production) eventually importance of tracker data was recognised and transfer streamlined 15
Standard Reconstruction Run reconstruction of Raw data in a standard and official way, i.e using a CMSSW release or pre-release but without user patches ProdAgent can be used in the same way it is used for MC production can be run in any remote Tier-1/2 Running with ProdAgent ensures that the reconstructed data are automatically registered to DBS/DLS, ready to be shipped via PhEDEx to other Tiers and analysed with standard distributed computing tools Offline DB (pedestal, noise, cabling information) accessed at remote site via Frontier/Squid cache Currently, reconstruction of new runs triggered automatically by a ProdAgent instance in Bari Jobs run in CERN, FNAL, Bari and Pisa where Raw data are available Reconstructed data registered to DBS/DLS from sites where they are produced 16
Reconstruction in Development Environment Performed at FNAL using ProdAgent with releases or prereleases patched with latest development/bug fixes Provides immediate feedback to the tracking developers incorporates corrected geometry, latest algorithm changes with physics run on track reconstruction algorithms and alignment Not fully compatible with the official naming convention for the reconstructed data Reconstructed data transferred to other sites in the usual way Details at https://twiki.cern.ch/twiki/bin/view/cms/tifdataanalysis 17
Reco Tracker Data in DBS/DLS http://cmsdbs.cern.ch/discovery/expert 18
Data Analysis with CRAB Both Raw and Reco data published in DBS/DLS can be analysed with CRAB at different Tiers Edit crab.cfg and insert the dataset path of the Run to be analysed. CRAB automatically gets the file list Follow the usual steps setup the CMS environment compile your code provide the usual CMSSW cfg to be used by cmsrun Offline DB accessed via frontier at Tier 1/2 Automate analysis steps via CRAB 19
Automated Analysis with CRAB The automated process repeats the following steps for all the interesting physics runs Run Discovery: combine information from Run Summary Page, DBS (and eventually a custom list of runs) Change of CMSSW analysis release evolution of code; new reconstruction feature in newer release Analysis Flag assignment when something changes: an analysis condition, parameter, Creation/Submission of all CMSSW jobs with CRAB Monitoring / Output Retrieval of jobs on Grid the CRAB user is supposed to retrieve the job output from WN, no automatic retrieval on job completion eventual output merging if CRAB divided the job in multiple sub-jobs Run the analysis Root macros on the output Represent results in a way useful to identify trends, problems, etc. publish on the web, allow * easy navigation through several Flags, runs, results * open access for everyone interested to contribute to the analysis. 20
Automated Analysis continued http://cmstac11.cern.ch:8080/.../stable/bari/130_v5/allsummaries/asummary_pedth.html 21
TIF Data Processing Statistics Technical problem http://cmstac11.cern.ch:8080/analysis/crabanalysis/rate/rateshort_integrated.gif approaching 5M physics events several factors affect the reconstruction phase (finding Grid resources, availability of the proper CMSSW etc.) 22
Global Monitoring Follow TIF data movement in real time from local storage to Tier sites We have our own monitor at http://cmstac11.cern.ch:8080/tacmon - essential to track problems early in the long chain Detailed information for a - useful as an easy reference for the TIF data run in each phase - new ideas still flowing in Filters 23
Summary DQM, a crucial component to find problems early optimised for both end-users and experts Fully automated data movement, reconstruction and analysis of TIF data somewhat ad-hoc yet robust design; still evolving with time no problems encountered in the last couple of months Data movement/processing limited by conversion efficiency Castor copy efficiency and Castor delivering files to PhEDEx lack of a better DB based book-keeping that would allow multiple conversion processes to run in parallel to speed up the whole chain Documentation CMS IN-2007/014 https://twiki.cern.ch/twiki/bin/view/cms/tifdataanalysis It was a challenging exercise for the community and very satisfying that we made it in the best possible way 24