ATLAS NOTE. December 4, ATLAS offline reconstruction timing improvements for run-2. The ATLAS Collaboration. Abstract

Similar documents
Tracking and flavour tagging selection in the ATLAS High Level Trigger

ATLAS PILE-UP AND OVERLAY SIMULATION

b-jet identification at High Level Trigger in CMS

Monitoring of Computing Resource Use of Active Software Releases at ATLAS

PoS(EPS-HEP2017)523. The CMS trigger in Run 2. Mia Tosi CERN

The GAP project: GPU applications for High Level Trigger and Medical Imaging

CMS Conference Report

PoS(High-pT physics09)036

LHCb Computing Resources: 2018 requests and preview of 2019 requests

Prompt data reconstruction at the ATLAS experiment

CMS High Level Trigger Timing Measurements

THE ATLAS INNER DETECTOR OPERATION, DATA QUALITY AND TRACKING PERFORMANCE.

Evaluation of the computing resources required for a Nordic research exploitation of the LHC

ATLAS Tracking Detector Upgrade studies using the Fast Simulation Engine

Performance of Tracking, b-tagging and Jet/MET reconstruction at the CMS High Level Trigger

ATLAS Simulation Computing Performance and Pile-Up Simulation in ATLAS

ATLAS ITk Layout Design and Optimisation

Fast pattern recognition with the ATLAS L1Track trigger for the HL-LHC

The performance of the ATLAS Inner Detector Trigger Algorithms in pp collisions at the LHC

PoS(IHEP-LHC-2011)002

First LHCb measurement with data from the LHC Run 2

A LVL2 Zero Suppression Algorithm for TRT Data

PoS(TIPP2014)204. Tracking at High Level Trigger in CMS. Mia TOSI Universitá degli Studi di Padova e INFN (IT)

Monte Carlo programs

Integrated CMOS sensor technologies for the CLIC tracker

The ATLAS Trigger System: Past, Present and Future

arxiv:hep-ph/ v1 11 Mar 2002

Modules and Front-End Electronics Developments for the ATLAS ITk Strips Upgrade

CMS reconstruction improvements for the tracking in large pile-up events

Performance of the ATLAS Inner Detector at the LHC

CMS Alignement and Calibration workflows: lesson learned and future plans

Software and computing evolution: the HL-LHC challenge. Simone Campana, CERN

System upgrade and future perspective for the operation of Tokyo Tier2 center. T. Nakamura, T. Mashimo, N. Matsui, H. Sakamoto and I.

Reliability Engineering Analysis of ATLAS Data Reprocessing Campaigns

CMS FPGA Based Tracklet Approach for L1 Track Finding

ACTS: from ATLAS software towards a common track reconstruction software

The Compact Muon Solenoid Experiment. Conference Report. Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

Track pattern-recognition on GPGPUs in the LHCb experiment

Adding timing to the VELO

The Database Driven ATLAS Trigger Configuration System

Tracking and Vertexing performance in CMS

arxiv: v1 [physics.ins-det] 11 Jul 2015

The LHCb Upgrade. LHCC open session 17 February Large Hadron Collider Physics (LHCP) Conference New York, 2-7 June 2014

MIP Reconstruction Techniques and Minimum Spanning Tree Clustering

The LHCb upgrade. Outline: Present LHCb detector and trigger LHCb upgrade main drivers Overview of the sub-detector modifications Conclusions

ATLAS, CMS and LHCb Trigger systems for flavour physics

Study of the Higgs boson coupling to the top quark and of the b jet identification with the ATLAS experiment at the Large Hadron Collider.

Alignment of the ATLAS Inner Detector tracking system

Virtualizing a Batch. University Grid Center

First results from the LHCb Vertex Locator

FAMOS: A Dynamically Configurable System for Fast Simulation and Reconstruction for CMS

Electron and Photon Reconstruction and Identification with the ATLAS Detector

Design of the new ATLAS Inner Tracker (ITk) for the High Luminosity LHC

IEPSAS-Kosice: experiences in running LCG site

Determination of the aperture of the LHCb VELO RF foil

Muon Reconstruction and Identification in CMS

The creation of a Tier-1 Data Center for the ALICE experiment in the UNAM. Lukas Nellen ICN-UNAM

Disentangling P ANDA s time-based data stream

A New Segment Building Algorithm for the Cathode Strip Chambers in the CMS Experiment

Performance Study of GPUs in Real-Time Trigger Applications for HEP Experiments

ATLAS PAPER. 27th June General guidelines for ATLAS papers. ATLAS Publication Committee. Abstract

PoS(ACAT08)101. An Overview of the b-tagging Algorithms in the CMS Offline Software. Christophe Saout

ATLAS strategy for primary vertex reconstruction during Run-2 of the LHC

Performance studies of the Roman Pot timing detectors in the forward region of the IP5 at LHC

Atlantis: Visualization Tool in Particle Physics

Data Reconstruction in Modern Particle Physics

Track reconstruction with the CMS tracking detector

PoS(EPS-HEP2017)492. Performance and recent developments of the real-time track reconstruction and alignment of the LHCb detector.

Inside-out tracking at CDF

Data oriented job submission scheme for the PHENIX user analysis in CCJ

DESY at the LHC. Klaus Mőnig. On behalf of the ATLAS, CMS and the Grid/Tier2 communities

Full Offline Reconstruction in Real Time with the LHCb Detector

Robustness Studies of the CMS Tracker for the LHC Upgrade Phase I

Data handling and processing at the LHC experiments

LHC Detector Upgrades

Simulating the RF Shield for the VELO Upgrade

Use of hardware accelerators for ATLAS computing

Machine Learning for (fast) simulation

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

PoS(ACAT)049. Alignment of the ATLAS Inner Detector. Roland Haertel Max-Planck-Institut für Physik, Munich, Germany

THE ATLAS DATA ACQUISITION SYSTEM IN LHC RUN 2

Update of the Computing Models of the WLCG and the LHC Experiments

Alignment of the ATLAS inner detector tracking system

Alignment of the CMS Silicon Tracker

The CMS Computing Model

Monte Carlo Production Management at CMS

ATLAS Dr. C. Lacasta, Dr. C. Marinas

Tracking and Vertex reconstruction at LHCb for Run II

Precision Timing in High Pile-Up and Time-Based Vertex Reconstruction

1. INTRODUCTION 2. MUON RECONSTRUCTION IN ATLAS. A. Formica DAPNIA/SEDI, CEA/Saclay, Gif-sur-Yvette CEDEX, France

Trigger and Data Acquisition: an ATLAS case study

The LHCb VERTEX LOCATOR performance and VERTEX LOCATOR upgrade

The ATLAS Conditions Database Model for the Muon Spectrometer

Software for implementing trigger algorithms on the upgraded CMS Global Trigger System

Preparing ATLAS reconstruction software for LHC's Run 2

The evolving role of Tier2s in ATLAS with the new Computing and Data Distribution model

CMS Simulation Software

arxiv: v4 [physics.comp-ph] 15 Jan 2014

Kalman Filter Tracking on Parallel Architectures

Transcription:

ATLAS NOTE December 4, 2014 ATLAS offline reconstruction timing improvements for run-2 The ATLAS Collaboration Abstract ATL-SOFT-PUB-2014-004 04/12/2014 From 2013 to 2014 the LHC underwent an upgrade to boost the available centre-ofmass energy for collisions from 8 TeV to 13 TeV. During this interval of time, known as Long Shutdown 1 (LS1), the ATLAS software group began a campaign to substantially reduce the CPU time needed to process data. This reduction could not come at the expense of physics performance. The campaign was undertaken to prepare for the increase of the trigger bandwidth from 500 Hz to 1 khz and for the increase in the number of interactions per LHC proton bunch crossing, which is commonly referred to as pile-up. This article summarises the main improvements and presents measurements of the data processing time and of a key performance indicator, the tracking efficiency, as a function of major software releases. c Copyright 2014 CERN for the benefit of the ATLAS Collaboration. Reproduction of this article or parts of it is allowed as specified in the CC-BY-3.0 license.

1 Introduction The performance of the LHC in run-1 exceeded the design specifications for pile-up mainly because of proton bunch crossings occuring every 50 ns. The average number of interactions per bunch crossing, µ, which corresponds to the mean of the poisson distribution on the number of interactions per crossing calculated for each luminosity bunch, is a direct measure of pile-up. It is calculated from the instantaneous per bunch luminosity as µ = L bunch σ inel /(n bunch f r ) where L bunch is the per bunch instantaneous luminosity, σ inel is the inelastic cross section which is assumed to be 71.5 mb for 7 TeV collisions and 73.0 mb for 8 TeV collisions, n bunch is the number of colliding bunches and f r is the LHC revolution frequency. More details on this can be found in Ref. [1]. During run-1 the pile-up benchmark of µ 20 was exceeded in a majority of fills, with µ 35 fills being common in the latter part of run-1. In run-2 pile-up will increase, with the first 1 fb 1 expected to be taken with a 50 ns bunch crossing interval, one can expect µ 40 fills in the near term. Furthermore one should be prepared for fills with µ 60. Coupled with increased collision energy and pile-up in run-2, is an increase in trigger bandwidth of 1 khz, which is required to maintain important single lepton triggers near to the run-1 level transverse energy and momentum thresholds. Given the run-2 requirements the following goals were set: reduce data processing time by a factor 3 without compromising physics performance; increase maintainability of the code; validate that the physics results are invariant to code changes. The data processing time of interest here is the time taken to process Raw Data Object (RDO) files into Event Summary Data (ESD) files, in what is known as the reconstruction step. Therefore the data processing time is referred to as the reconstruction time. Reconstruction time measurements were conducted on both simulated and real data samples. The Monte Carlo simulated samples consist of top-quark pair production events (t t) generated at a collision energy of 14 TeV with conditions of a bunch crossing (BC) spacing time of 25 ns and an average number of interactions per BC spanning µ = 0, 20 and 40. Run-1 data sets were sourced from the JetTauEtmiss slice and span measured µ = 16.3, 20.1 25.0 30.0, and 35.4. These samples provide events with the highest high track multiplicities and therefore provide an upper bound on the reconstruction time. Measurements of the reconstruction time unless otherwise stated are performed on a machine with HEPSPEC scaling factor of 11.95, where specifically the CPU used was a Intel Xeon L5520 @ 2.26 GHz 2 processor - 16 core. The following sections describe updates and improvements made to the reconstruction software. Bear in mind that estimates of improvements in the reconstruction time are approximate. It is very difficult to factorise out the improvement due to one change as many changes were implemented concurrently. The entire ATLAS software suite consists of around two thousand software packages and is currently maintained by around 400 developers. The software is especially complex because it necessarily matches the complex nature of the ATLAS detector. Moreover that further complexity is needed for very sophisticated analysis of proton on proton collision data. Reconstruction times were measured for three versions of the software release, namely: 17.2.7.9 version used to reconstruct data at the end of run-1. 19.0.3.3 version with updates in software technology and optimised Inner Detector track seeding strategy for 8 TeV. 19.1.1.1 version with updates for track seeding at 13 TeV and region of interest (ROI) seeded back tracking. 1

2 Upgrades and improvements in software technology Following are a list of upgrades and improvements in software technology. A new method to read the value of the magnetic field strength within ATLAS was implemented because this functionality was identified as a CPU bottleneck. Here the code was newly written in C++, where previously it was written in FORTRAN. The field value was cached for a fast lookup. Unit conversions between Tesla and Gauss and vice-versa were minimised, which also had the effect of reducing the call depth. Moreover the functions were made auto-vectorisable. These changes resulted in a 20% gain in speed in detector simulation tasks. Replace the use of the CLHEP library, which is used for linear algebra vector and matrix operations, with the Eigen C++ template library. The use of expression templates removes intermediate steps performed in calculations. This migration affected thousands of lines of code in up to a thousand packages and took approximately eight months to complete. However the CLHEP library is still necessary as it s used to declare Lorentz vectors and in the description of the detector geometry. Millions of evaluations of trigonometric functions occur in the reconstruction. In run-1 these were handled by the GNU libm math library. A switch was made to using the Intel math library. This library is a part of the Intel C++ compiler, which contains highly optimized and very accurate mathematical functions. The average time spent in evaluating trigonometric functions when using the libm library was 2.1 seconds out of a total event processing time of 14.1 seconds. The use libimf reduces the evaluation time by, on average, 10%. Updated from a 32-bit to a 64-bit architecture, which provided a 25% overall reduction in data processing time. Updated Google memory allocator package, tcmalloc, from version 0.99 to 2.1, in order to fix issue in unaligned memory blocks, which caused problems with Eigen. Moreover the updated version provides effective use of single instruction multiple data (SIMD) CPU functionality. Update the compiler from version 4.3 to 4.7, which allows for study of auto-vectorisation. Simplification of the event data model resulting in the reduction of dynamic memory allocation. 3 Optimisation of track seeding and track finding in high pile-up environments The ATLAS Inner Detector (ID) charged particle tracking algorithms are the biggest consumers of the CPU budget in reconstruction. In a typical reconstruction job of run-1 data the Inner Detector algorithms consumed up to 60% of the total reconstruction time alone. This expense is to be expected because with more pile-up comes more space points with which one can form tracks in the ID, and so ID algorithms are susceptible to an exponential growth in reconstruction time. At µ = 40 ID algorithms are expected to take as much as 75% of the reconstruction time 1. Therefore tuning and renewed optimisation of ID tracking has been a top priority during LS1. ATLAS has commissioned dedidated optimal track seeding strategies for run-2 that depend on the level of pile-up and moreover were re-tuned to fully exploit the capabilities of newly installed Insertable B-Layer (IBL). These have resulted in a factor of 2 speedup in reconstruction time in conditions of 1 In general, the tracking optimisation depends on the expected level of pile-up both for timing and physics performance. 2

µ = 40 and a bunch crossing of 25 ns. A first optimisation using 8 TeV data-sets was implemented in release 19.0, where release 17.2, referred to in figures later, was used in reconstruction at the end of run- 1. Further it was found that, for the purpose of photon conversion reconstruction, dedictated tracking in the Transistion Radiation Tracker sub-component of the ID (known within ATLAS as back tracking and TRT only tracking) need only be run in the region of interest defined by the presence of a energy deposit in the calorimeter. This change resulted in a factor of three reduction in the reconstruction time expended in TRT only tracking. The only client of TRT only tracking, conversion finding, was not affected by the change. This change was commissioned in software release 19.1 together with a further evolution in the track seeding for 13 TeV. 4 Measurements Fig. 1 displays the measured reconstruction time for all and Inner Detector only algorithms as a function of the software release for top-quark pair production events. It shows that a factor 3 reduction in processing time has been achieved in LS1. The majority of the improvement has been due to improvements in ID algorithms. Fig. 2 displays the ID track reconstruction efficiency as a function of the software release. It has slightly improved from release to release, indicating that the performance has not been compromised by the changes that reduced the overall reconstruction time. Fig. 3 displays the measured reconstruction time as a function of the average number of interactions per bunching crossing in data events from the so-called JetETMiss stream. These are events triggered either by the presence of jets, missing transverse energy or tau-leptons. The data was collected in the latter part of run-1. It shows that a factor 4 reduction in processing time has been achieved in LS1 when comparing the reconstruction time between the three software releases. Fig. 4 displays the measured reconstruction time as a function of the average number of interactions per bunching crossing in data events from the same JetETMiss stream in the latter part of 2012. The reconstruction times shown are taken from the actual Tier-0 prompt reconstruction log files and plotted separately for each CPU type deployed in the Tier-0. Fig. 4 demonstrates that real reconstruction times are consistent with dedicated measurements used on the benchmark machine but can sometimes fluctuate to as high as 100 seconds in data when µ = 35.4 on some machines. References [1] G. Aad et al. [ATLAS Collaboration], Eur. Phys. J. C 71 (2011) 1630 [arxiv:1101.2185 [hep-ex]]. 3

Reconstruction time per event [s] 90 80 70 60 50 40 30 20 10 0 ATLAS Simulation RDO to ESD s = 14 TeV <µ> = 40 25 ns bunch spacing Run 1 Geometry pp tt HS06 = 11.95 Full reconstruction Inner Detector only 17.2.7.9, 32bit 19.0.3.3, 64bit 19.1.1.1, 64bit Software release Figure 1: Time per event as measured in seconds to reconstruct Monte Carlo top-quark pair production events (t t) as a function of the ATLAS software release version. These events are generated at LHC collision energy of 14 TeV with conditions of a bunch crossing (BC) spacing time of 25 ns and an average number of interactions per BC of 40 ( µ ). Two sets of data are displayed: the full reconstruction time (red); and the reconstruction time used for the Inner Detector sub-system reconstruction only (blue), which is the dominant sub-component to the full reconstruction time. The simulation is performed for the run-1 ATLAS detector geometry. Measurements were performed on a machine with a HS06 scaling factor of 11.95. The data processing time of interest here is the time taken to process Raw Data Object (RDO) files into Event Summary Data (ESD) files, in what is known as the reconstruction step. 4

Efficiency [%] 100 99 98 97 96 s = 14 TeV <µ> = 40 25 ns bunch spacing Run 1 Geometry pp tt events ATLAS Simulation 95 94 93 92 17.2.7.9, 32bit 19.0.3.3 19.1.1.1 Software release Figure 2: ATLAS Inner Detector track reconstruction efficiency for true charged particles from t t that originate within an radius of 20 mm from the z-axis of the ATLAS detector, which is defined along the beam-line. The true charged particle must have a true transverse momentum of greater than 0.8 GeV/c and create a least 7 hits in the silicon tracker. These events are generated at LHC collision energy of 14 TeV with conditions of a bunch crossing (BC) spacing time of 25 ns and an average number of interactions per BC of 40 ( µ ). 5

Full reconstruction time per event [s] 50 45 40 35 30 25 ATLAS (Data 2012) Software release 17.2.7.9 19.0.3.3 19.1.1.1 20 15 10 5 0 15 20 25 30 35 Average number of interactions per bunch crossing µ Figure 3: Time per event as measured in seconds to reconstruct data events triggered by the presence of jets, missing transverse energy or tau-leptons, as a function of the number of primary vertices and the software release. The data was collected at the end of 2012 at the conclusion of LHC run-1. 6

Reconstruction time per event [s] 100 90 Different CPU types at Tier0 Intel L5520 2.27GHz/8192 KB (15242 jobs) 80 Intel L5640 2.27GHz/12288 KB (11631 jobs) 70 Intel L5420 2.50GHz/6144 KB (15846 jobs) 60 Intel E5410 2.33GHz/6144 KB (8600 jobs) Intel E5-2630L 0 2.00GHz/15360 KB (6941 jobs) 50 40 ATLAS 30 20 10 5 10 15 20 25 30 35 Average number of interactions per bunch crossing µ Figure 4: Time per event as measured in seconds to reconstruct data events triggered by the presence of jets, missing transverse energy or tau-leptons, as a function of the average number of primary vertices. The time is given for a number of thousands of jobs as measured on various CPU chips. The data was collected at the end of 2012 at the conclusion of LHC run-1. The software release 17.2 was deployed at Tier0 to reconstruct these events. The colours of the points distinguish between the CPU machine used for the reconstruction job as detailed in the legend. 7