The ATLAS Trigger Simulation with Legacy Software

Similar documents
Early experience with the Run 2 ATLAS analysis model

Monitoring of Computing Resource Use of Active Software Releases at ATLAS

Large Scale Software Building with CMake in ATLAS

The evolving role of Tier2s in ATLAS with the new Computing and Data Distribution model

Fast pattern recognition with the ATLAS L1Track trigger for the HL-LHC

STATUS OF PLANS TO USE CONTAINERS IN THE WORLDWIDE LHC COMPUTING GRID

ATLAS Tracking Detector Upgrade studies using the Fast Simulation Engine

Virtualizing a Batch. University Grid Center

Managing Asynchronous Data in ATLAS's Concurrent Framework. Lawrence Berkeley National Laboratory, 1 Cyclotron Rd, Berkeley CA 94720, USA 3

The CMS Computing Model

CMS Simulation Software

The CMS data quality monitoring software: experience and future prospects

ATLAS NOTE. December 4, ATLAS offline reconstruction timing improvements for run-2. The ATLAS Collaboration. Abstract

Evaluation of the computing resources required for a Nordic research exploitation of the LHC

ALICE ANALYSIS PRESERVATION. Mihaela Gheata DASPOS/DPHEP7 workshop

AGIS: The ATLAS Grid Information System

The GAP project: GPU applications for High Level Trigger and Medical Imaging

The Database Driven ATLAS Trigger Configuration System

Offline Tutorial I. Małgorzata Janik Łukasz Graczykowski. Warsaw University of Technology

Software and computing evolution: the HL-LHC challenge. Simone Campana, CERN

ATLAS Distributed Computing Experience and Performance During the LHC Run-2

FAMOS: A Dynamically Configurable System for Fast Simulation and Reconstruction for CMS

Introduction to Geant4

ATLAS Offline Data Quality Monitoring

First LHCb measurement with data from the LHC Run 2

Prompt data reconstruction at the ATLAS experiment

Atlantis: Visualization Tool in Particle Physics

ATLAS Analysis Workshop Summary

Data preservation for the HERA experiments at DESY using dcache technology

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

The BABAR Database: Challenges, Trends and Projections

Volunteer Computing at CERN

Belle & Belle II. Takanori Hara (KEK) 9 June, 2015 DPHEP Collaboration CERN

Unified System for Processing Real and Simulated Data in the ATLAS Experiment

ATLAS software configuration and build tool optimisation

Optimizing Parallel Access to the BaBar Database System Using CORBA Servers

Andrea Sciabà CERN, Switzerland

Building the Trigger Partition Testbed

Data Analysis in ATLAS. Graeme Stewart with thanks to Attila Krasznahorkay and Johannes Elmsheuser

Tests of PROOF-on-Demand with ATLAS Prodsys2 and first experience with HTTP federation

Streamlining CASTOR to manage the LHC data torrent

Tracking and flavour tagging selection in the ATLAS High Level Trigger

The Compact Muon Solenoid Experiment. Conference Report. Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland

Monte Carlo Production Management at CMS

ATLAS PILE-UP AND OVERLAY SIMULATION

Geant4 Computing Performance Benchmarking and Monitoring

LHCb Computing Status. Andrei Tsaregorodtsev CPPM

A new petabyte-scale data derivation framework for ATLAS

Performance quality monitoring system for the Daya Bay reactor neutrino experiment

Overview of ATLAS PanDA Workload Management

ATLAS Nightly Build System Upgrade

Big Data Tools as Applied to ATLAS Event Data

The Error Reporting in the ATLAS TDAQ System

Improved ATLAS HammerCloud Monitoring for Local Site Administration

Chapter 1: Distributed Information Systems

THE REAL ROOT CAUSES OF BREACHES. Security and IT Pros at Odds Over AppSec

InstallFree Application Compatibility Solution for Windows 7 Migrations

TAG Based Skimming In ATLAS

The ATLAS PanDA Pilot in Operation

Reliability Engineering Analysis of ATLAS Data Reprocessing Campaigns

ATLAS Simulation Computing Performance and Pile-Up Simulation in ATLAS

1. Introduction. Outline

Particle Track Reconstruction with Deep Learning

Verification and Diagnostics Framework in ATLAS Trigger/DAQ

Source KIDS White Paper

The ATLAS EventIndex: an event catalogue for experiments collecting large amounts of data

LCIO: A Persistency Framework and Event Data Model for HEP. Steve Aplin, Jan Engels, Frank Gaede, Norman A. Graf, Tony Johnson, Jeremy McCormick

Emerging Database Technologies and Their Applicability to High Energy Physics: A First Look at SciDB

Experience with Data-flow, DQM and Analysis of TIF Data

THE ATLAS INNER DETECTOR OPERATION, DATA QUALITY AND TRACKING PERFORMANCE.

ATLAS TDAQ System Administration: Master of Puppets

Security in the CernVM File System and the Frontier Distributed Database Caching System

LCG Conditions Database Project

IEPSAS-Kosice: experiences in running LCG site

PARALLEL PROCESSING OF LARGE DATA SETS IN PARTICLE PHYSICS

Reprocessing DØ data with SAMGrid

Monte Carlo programs

ATLAS Experiment and GCE

13th International Workshop on Advanced Computing and Analysis Techniques in Physics Research ACAT 2010 Jaipur, India February

CMS High Level Trigger Timing Measurements

ACCI Recommendations on Long Term Cyberinfrastructure Issues: Building Future Development

Use of ROOT in the DØ Online Event Monitoring System

Deferred High Level Trigger in LHCb: A Boost to CPU Resource Utilization

Evaluation of the Huawei UDS cloud storage system for CERN specific data

Gustavo Alonso, ETH Zürich. Web services: Concepts, Architectures and Applications - Chapter 1 2

Summary of the LHC Computing Review

Online data storage service strategy for the CERN computer Centre G. Cancio, D. Duellmann, M. Lamanna, A. Pace CERN, Geneva, Switzerland

AIDA-2020 Advanced European Infrastructures for Detectors at Accelerators. Presentation. A generic data acquisition software framework, EUDAQ2

Grid Computing a new tool for science

The ATLAS Data Flow System for LHC Run 2

ATLAS strategy for primary vertex reconstruction during Run-2 of the LHC

A new approach for ATLAS Athena job configuration

C3PO - A Dynamic Data Placement Agent for ATLAS Distributed Data Management

ATLAS DQ2 to Rucio renaming infrastructure

Update of the Computing Models of the WLCG and the LHC Experiments

Data handling and processing at the LHC experiments

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

ALICE Simulation Architecture

A Tool for Conditions Tag Management in ATLAS

DIAL: Distributed Interactive Analysis of Large Datasets

Transcription:

The ATLAS Trigger Simulation with Legacy Software Carin Bernius SLAC National Accelerator Laboratory, Menlo Park, California E-mail: Catrin.Bernius@cern.ch Gorm Galster The Niels Bohr Institute, University of Copenhagen, Copenhagen, Denmark E-mail: Gorm.Galster@cern.ch Andy Salnikov SLAC National Accelerator Laboratory, Menlo Park, California E-mail: Salnikov@slac.stanford.edu Joerg Stelzer CERN, Geneva, Switzerland E-mail: Joerg.Stelzer@cern.ch ATL-DAQ-PROC-2017-038 17 October 2017 Werner Wiedenmann University of Wisconsin, Madison, Wisconsin E-mail: Werner.Wiedenmann@cern.ch Abstract. Physics analyses at the LHC require accurate simulations of the detector response and the event selection processes, generally done with the most recent software releases. The trigger response simulation is crucial for determination of overall selection efficiencies and signal sensitivities and should be done with the same software release with which data were recorded. This requires potentially running with software dating many years back, the so-called legacy software, in which algorithms and configuration may differ from their current implementation. Therefore having a strategy for running legacy software in a modern environment becomes essential when data simulated for past years start to present a sizeable fraction of the total. The requirements and possibilities for such a simulation scheme within the ATLAS software framework were examined and a proof-of-concept simulation chain has been successfully implemented. One of the greatest challenges was the choice of a data format which promises long term compatibility with old and new software releases. Over the time periods envisaged, data format incompatibilities are also likely to emerge in databases and other external support services. Software availability may become an issue, when e.g. the support for the underlying operating system might stop. The encountered problems and developed solutions will be presented, and proposals for future development will be discussed. Some ideas reach beyond the retrospective trigger simulation scheme in ATLAS as they also touch more generally aspects of data preservation.

1. Introduction To analyse the data taken with the ATLAS detector [1] at the Large Hadron Collider (LHC), many analyses require a corresponding sample of simulated Monte Carlo (MC) data. This means that the production of new MC data needs to be maintained for all data taking periods. While MC data are generally produced with the newest software release, the simulated trigger response needs to be in agreement with when the data was taken by ideally using the same trigger algorithms and selections. This is necessary to reproduce selection efficiencies and trigger response as close as possible to those in the accumulated real data sample. Therefore the accurate re-simulation (e.g. because of improved detector response description, simulation of new physics processes etc.) of in particular the trigger response for data taking periods from several years ago poses a challenge. A solution to this is to use legacy software, the trigger software and conditions data that match the simulated data-taking period dating potentially many years back. 2. The ATLAS Monte Carlo Simulation Chain A simplified view of the ATLAS MC simulation chain [2] is shown in Figure 1. Generated physics processes are put through a detailed simulation of the detector response (Hits). They are then available after the digitization processes as simulated raw data which are the input to the trigger simulation, adding the trigger response record to the event data. The MC simulation is completed by further event reconstruction. Data exchange between digitization, simulation and further reconstruction in form of Raw Data Object (RDO) [3], a ROOT/POOL [4, 5] based data format. 3. Considerations and Options for Simulation of the Trigger Response There are several reasons that will require to re-simulate the trigger response, e.g. an improved description and/or understanding of the detector response, improved software for offline reconstructions in terms of algorithms and methods, increase of the MC sample size for future studies and the introduction of new event generators. Hits Digi- RDO Trigger tization Simulation RDOTrig Reconstruction ATLAS Simulation Chain Process Data RDO = Raw Data Object RDOTrig = RDO with trigger response record Figure 1. Simplified view of the ATLAS MC simulation chain. The physics processes that are generated are first put through a detailed simulation of the detector response (Hits), then digitized. The data is then passed on to the trigger simulation and further reconstruction. For the entire chain the same common simulation release is used. The data are exchanged in form of Raw Data Object (RDO).

While detector response simulation and event reconstruction should be done with the newest software, the version used for the simulation of the trigger response needs to match that used for data taking. Mismatch between the software versions imposes a number of challenges with the most direct challenge being the data format compatibility with respect to the trigger response simulation. Therefore forward compatibility of format and content produced by new detector simulation or the possibility of the detector response conversion to an older format readable by the old trigger simulation needs to be guaranteed. Similarly backward compatibility or a conversion step need to be considered for the reconstruction that has to be able to read the old trigger response record. Besides the data format challenges, there are the changes to hardware architectures, operating systems, core components and compiler changes which make it impossible to run the old trigger software as is today. The option of porting old trigger selection code to new simulation releases poses various problems as it would be necessary to keep old selection lines, algorithms, their configuration and conditions data operational along with the most recent trigger selections. This is not only problematic as it requires significant maintenance and manpower efforts but because of conflicting requirements in the trigger selections, the preservation of knowledge and the maintenance of the the infrastructure services. The reuse of unmodified legacy trigger selection code from old releases can provide the most accurate trigger simulation of past data taking periods. The option to rerun legacy code allows for a strategy where the representative releases for a data taking period are chosen when all the required information and expertise is still available. The same selection algorithms and trigger configurations as during data taking can be applied and no or very little maintenance effort for conservation of the legacy trigger selection lines is required. Due to the advantages of this option, it has been further investigated and a simulation chain has been developed. 4. Trigger Simulation with Legacy Trigger Selection Software To realise running the trigger legacy trigger selection code, there are various considerations that have to be addressed. To be able to run with legacy releases, the long term conservation of trigger software releases, configuration data and conditions data becomes a necessity. In ATLAS, the software distribution which also includes all required external software components, the compilers with their run time environments and software configuration and development tools is done on the cvmfs file system [6]. Furthermore the ATLAS simulation chain requires a split into sub-steps which can use different software release versions. In the present simulation chain data are exchanged between the simulation modules as RDO data containing MC truth information and meta data with data processing parameters using a single software release. This guarantees the data compatibility between all steps, which is not given when running the trigger simulation with a different (legacy) software. In this case the data compatibility becomes an issue with respect to forward compatibility (output data from newer detector simulation need to be readable with older trigger releases) and backward compatibility (output data from old trigger release needs to be readable with newer reconstruction release). Data which are directly read out from the detector hardware are available as byte stream (BS) data. This format is based on containers of 32 bit integers with a simple payload structure and described by the ATLAS raw data format [7]. It is tightly coupled to the detector readout hardware with only very few changes over time. Since it is a requirement that all ATLAS software releases can read data in BS format from all data taking periods, backward compatibility is guaranteed. Due to the very simple structure of the data format providing forward compatibility is much easier. The scripts to convert RDO data to BS data are already available in simulation releases and some ATLAS detectors already provide forward

compatibility. The BS data format is therefore well suited to reach compatibility between different software release versions. However it does not provide structures to hold MC truth information or simulation meta data like the RDO data. The trigger simulation step only needs the detector raw data as input without any additional MC information and provides as output the trigger decision record in BS format. For the subsequent event reconstruction step the output can be merged with the other event reconstruction input data. The additionally introduced sub-steps of RDO to byte stream data conversion before the trigger simulation and data merging after the trigger simulation which integrate the trigger simulation with a legacy release in the simulation chain are shown in Figure 2. In the data merging step the trigger decision record is converted from BS format to RDO format and added to the simulated data. 5. Use of Virtualisation - An Outlook In the medium term the use of older releases on new operating systems can be achieved with a compatibility layer. However in the long term, using legacy code as is will be impossible on modern computing platforms. New computing hardware technologies, operating system changes and updates to the compilers and core run time libraries require changes to the legacy trigger releases. Virtualization enables the abstraction of the hardware and software run time layer from the underlying computing platform but comes at the expense of computational and resource overhead. Furthermore, external infrastructure services, e.g. for data input and output will most likely undergo important changes on a time scale of 10 years, requiring adaptations for the legacy trigger release. Patch releases to the legacy code should collect all these changes and allow for the integration with the environment external to the virtual machine. Encapsulation of the legacy trigger selection code in a virtual machine image extends the time span for of using legacy trigger selection code. 6. Conclusion The challenges of simulating the trigger response precisely in MC simulation campaigns for datataking periods dating back many years have been discussed and a strategy how to overcome Hits Digi- Trigger RDO tization RDOTrig Simulation Reconstruction RDO to BS Merge ATLAS Simulation Chain Byte- Stream Trigger Simulation with Legacy Release Trigger Byte- Stream Legacy Simulation Chain Process Data RDO = Raw Data Object RDOTrig = RDO with trigger response record Figure 2. Simplified view of the ATLAS MC simulation and legacy simulation chain using a different, legacy software release for the trigger simulation. For the legacy simulation chain the data are converted from RDO to byte stream format and passed through the trigger simulation using a legacy software release. The resulting byte stream with trigger response record is then merged with information from the initial RDO and converted back to RDO format.

these difficulties has been laid out. While detector simulation and reconstruction use the newest available software releases, the simulation of the trigger response is done with older software which was used for this particular data-taking period that is simulated. Issues with data format compatibility between the different simulation steps have been studied and a modified simulation chain has been identified and implemented. Further work is still needed to completely integrate this modified simulation chain in the production workflow for large scale simulation campaigns in ATLAS. References [1] ATLAS Collaboration, JINST 3, S08003 (2008). [2] ATLAS Collaboration, Eur. Phys. J. C 70, 823 (2010). [3] P. van Gemmeren, D. Malon, for the ATLAS Collaboration, Supporting High-Performance I/O at the Petascale: The Event Data Store for ATLAS at the LHC, Proccedings of IEEE Cluster 2010, Grece, September 2010. [4] ROOT - A C++ framework for petabyte data storage, statistical analysis and visualization, Computer Physics Communications, Anniversary Issue, Volume 180, Issue 12, Dec. 2009, P. 2499-2512. [5] POOL - Persistency Framework, [http://pool.cern.ch]. [6] CernVM Software Appliance, [http://cernvm.cern.ch/portal]. [7] C.P.Bee, D.Francis, L.Mapelli, R.McLaren, G.Mornacchi, J.Petersen, F.J.Wickens, ATLAS-DAQ-98-129 (2004), [https://cds.cern.ch/record/683741].