Computing strategy for PID calibration samples for LHCb Run 2

Similar documents
First LHCb measurement with data from the LHC Run 2

ATLAS, CMS and LHCb Trigger systems for flavour physics

Tesla : an application for real-time data analysis in High Energy Physics

PoS(EPS-HEP2017)492. Performance and recent developments of the real-time track reconstruction and alignment of the LHCb detector.

The GAP project: GPU applications for High Level Trigger and Medical Imaging

Calorimeter Object Status. A&S week, Feb M. Chefdeville, LAPP, Annecy

PoS(EPS-HEP2017)523. The CMS trigger in Run 2. Mia Tosi CERN

ATLAS NOTE. December 4, ATLAS offline reconstruction timing improvements for run-2. The ATLAS Collaboration. Abstract

LHCb Computing Resources: 2018 requests and preview of 2019 requests

Full Offline Reconstruction in Real Time with the LHCb Detector

Tracking and Vertex reconstruction at LHCb for Run II

First results from the LHCb Vertex Locator

Adding timing to the VELO

Tracking and flavour tagging selection in the ATLAS High Level Trigger

CMS Alignement and Calibration workflows: lesson learned and future plans

CMS Conference Report

Performance of the ATLAS Inner Detector at the LHC

PoS(TIPP2014)204. Tracking at High Level Trigger in CMS. Mia TOSI Universitá degli Studi di Padova e INFN (IT)

Primary Vertex Reconstruction at LHCb

The LHCb upgrade. Outline: Present LHCb detector and trigger LHCb upgrade main drivers Overview of the sub-detector modifications Conclusions

b-jet identification at High Level Trigger in CMS

Performance of Tracking, b-tagging and Jet/MET reconstruction at the CMS High Level Trigger

The LHCb Upgrade. LHCC open session 17 February Large Hadron Collider Physics (LHCP) Conference New York, 2-7 June 2014

HLT Hadronic L0 Confirmation Matching VeLo tracks to L0 HCAL objects

MIP Reconstruction Techniques and Minimum Spanning Tree Clustering

Physics CMS Muon High Level Trigger: Level 3 reconstruction algorithm development and optimization

PoS(High-pT physics09)036

The ALICE electromagnetic calorimeter high level triggers

ATLAS PILE-UP AND OVERLAY SIMULATION

Track pattern-recognition on GPGPUs in the LHCb experiment

Muon Reconstruction and Identification in CMS

arxiv:hep-ph/ v1 11 Mar 2002

LHCb Computing Resources: 2019 requests and reassessment of 2018 requests

CMS FPGA Based Tracklet Approach for L1 Track Finding

Fast pattern recognition with the ATLAS L1Track trigger for the HL-LHC

Prompt data reconstruction at the ATLAS experiment

Design of the new ATLAS Inner Tracker (ITk) for the High Luminosity LHC

Determination of the aperture of the LHCb VELO RF foil

PoS(ACAT08)101. An Overview of the b-tagging Algorithms in the CMS Offline Software. Christophe Saout

A New Segment Building Algorithm for the Cathode Strip Chambers in the CMS Experiment

Direct photon measurements in ALICE. Alexis Mas for the ALICE collaboration

Software and computing evolution: the HL-LHC challenge. Simone Campana, CERN

Status of the TORCH time-of-flight detector

Electron and Photon Reconstruction and Identification with the ATLAS Detector

Performance of the MRPC based Time Of Flight detector of ALICE at LHC

Time of CDF (II)

Level 0 trigger decision unit for the LHCb experiment

Study of the Higgs boson coupling to the top quark and of the b jet identification with the ATLAS experiment at the Large Hadron Collider.

PoS(IHEP-LHC-2011)002

arxiv: v1 [physics.ins-det] 21 Sep 2015

Precision Timing in High Pile-Up and Time-Based Vertex Reconstruction

The CMS data quality monitoring software: experience and future prospects

Computing. DOE Program Review SLAC. Rainer Bartoldus. Breakout Session 3 June BaBar Deputy Computing Coordinator

The Compact Muon Solenoid Experiment. Conference Report. Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland

ATLAS NOTE ATLAS-CONF July 20, Commissioning of the ATLAS high-performance b-tagging algorithms in the 7 TeV collision data

ATLAS ITk Layout Design and Optimisation

Physics Analysis Tools for Beauty Physics in ATLAS

L1 and Subsequent Triggers

TORCH: A large-area detector for precision time-of-flight measurements at LHCb

Track reconstruction with the CMS tracking detector

Stephen J. Gowdy (CERN) 12 th September 2012 XLDB Conference FINDING THE HIGGS IN THE HAYSTACK(S)

CMS High Level Trigger Timing Measurements

Status of PID. PID in in Release Muon Identification Influence of of G4-Bug on on PID. BABAR Collaboration Meeting, Oct 1st 2005

Tracking and Vertexing performance in CMS

THE ATLAS INNER DETECTOR OPERATION, DATA QUALITY AND TRACKING PERFORMANCE.

arxiv: v1 [hep-ex] 7 Jul 2011

The Compact Muon Solenoid Experiment. Conference Report. Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland

Data Quality Monitoring at CMS with Machine Learning

Analogue, Digital and Semi-Digital Energy Reconstruction in the CALICE AHCAL

HPS Data Analysis Group Summary. Matt Graham HPS Collaboration Meeting June 6, 2013

The CMS Computing Model

Physics Analysis Software Framework for Belle II

Monte Carlo Production Management at CMS

Early experience with the Run 2 ATLAS analysis model

Alignment of the ATLAS Inner Detector tracking system

Reliability Engineering Analysis of ATLAS Data Reprocessing Campaigns

CMS data quality monitoring: Systems and experiences

The ATLAS Conditions Database Model for the Muon Spectrometer

Monitoring of Computing Resource Use of Active Software Releases at ATLAS

LHCb Topological Trigger Reoptimization

Studies of the KS and KL lifetimes and

Simulating the RF Shield for the VELO Upgrade

The Database Driven ATLAS Trigger Configuration System

LHCb Computing Status. Andrei Tsaregorodtsev CPPM

VISPA: Visual Physics Analysis Environment

Machine Learning in Particle Physics. Mike Williams MIT June 16, 2017

CMS Simulation Software

Tau ID systematics in the Z lh cross section measurement

How the Monte Carlo production of a wide variety of different samples is centrally handled in the LHCb experiment

OPERA: A First ντ Appearance Candidate

Data handling and processing at the LHC experiments

Data Reconstruction in Modern Particle Physics

Implementing Online Calibration Feed Back Loops in the Alice High Level Trigger

Performance studies of the Roman Pot timing detectors in the forward region of the IP5 at LHC

Evaluation of the computing resources required for a Nordic research exploitation of the LHC

Tracking POG Update. Tracking POG Meeting March 17, 2009

High Level Trigger System for the LHC ALICE Experiment

The Belle II Software From Detector Signals to Physics Results

Machine Learning in Data Quality Monitoring

ALICE tracking system

Transcription:

Computing strategy for PID calibration samples for LHCb Run 2 LHCb-PUB-2016-020 18/07/2016 Issue: 1 Revision: 0 Public Note Reference: LHCb-PUB-2016-020 Created: June, 2016 Last modified: July 18, 2016 Prepared by: Lucio Anderlini a, Sean Benson b, Vladimir Gligorov b, Oliver Lupton c, Barbara Sciascia d, a INFN, Sezione di Firenze b CERN c University of Oxford d INFN, Laboratori Nazionali di Frascati

Date: July 18, 2016 Abstract The samples for the calibration of the Particle Identification (PID) are sets of data collected by LHCb where decay candidates have a kinematic structure which allows unambiguous identification of one the daughters with a selection strategy not relying on its PID-related variables. PID calibration samples are used in a data-driven technique to measure the efficiency of selection requirements on PID variables in many LHCb analyses. Since the second run of the LHCb experiment, PID variable are computed as part of the software trigger and then refined in the offline reconstruction (this includes two different applications, named Brunel and DaVinci). Physics analyses rely on selections combining PID requirements on the online and offline versions of the PID variables. Because of the large, but not full, correlation between online and offline PID variables, the PID samples must be built such that it is possible to access, particle by particle, both versions of the full PID information, such that combined requirements on the offline and online versions can be defined. The only viable solution is to write the PID Calibration samples to a dedicated stream where the information evaluated in the trigger (the online version) is stored together with the full raw event, which is then reconstructed and processed offline. A dedicated algorithm allows to match online and offline candidates and to produce datasets to be used as input for the package that allows analysts to access the calibration samples. This note is focused on PID Calibration samples, but the general layout should be applied to any other calibration sample in Run 2. Contents 1 Introduction.............................. 2 2 PIDCalib package............................ 3 3 Trigger configuration for LHCb Run 2.................... 3 4 Use cases for PID calibration........................ 4 4.1 Analyses on Full stream with PID unbiased trigger............. 4 4.2 Analyses on Turbo Sream datasets.................... 4 4.3 Analyses on Full stream with PID requirements in the Trigger line....... 5 4.4 Considerations about statistics of control samples.............. 5 4.5 Considerations about online-offline convergence.............. 6 4.6 Summary of the specifications..................... 6 5 Calibration lines in Run 2......................... 6 6 Operations with calibration lines during Run 2................ 7 7 Matching the online and offline candidates.................. 8 8 Conclusion............................... 8 9 Acknowledgements........................... 9 10 References............................... 9 A Appendix............................... 10 page 1

1 Introduction Date: July 18, 2016 List of Figures 1 Schematic representation of the data flow for the Calibration samples. Note that there is no offline selection since candidates are produced online and resurrected using Tesla. Daughter particles are matched to the offline candidates in the DaVinci step while producing ntuples. stables contain the s Weight needed to statistically subtract the residual background following the s Plot technique [4]................ 7 List of Tables 1 Summary of processing schemes. Each stream is routed by a specific routing bit (RB). In 2015 for validation purposes,turbo data have been duplicated as different output of Turbo and Turbo validation workflows. In 2016 only the Turbo workflow is used.. 8 2 List of PID-related variables that are inputs to the PIDANN algorithms (Run 1 version) and stored in the TurboReport..................... 10 3 List of PID-related variables that are inputs to the PIDANN algorithms (Run 2 version), and stored in the TurboReport..................... 11 4 Overlap between different samples. Each cell is normalised to the rate expected from each sample (shown in the second column)................. 11 1 Introduction Most of the LHCb analyses rely on Particle Identification (PID) to distinguish between the nature of charged tracks reconstructed by the detector. The detectors used to identify these particles are the Rich detectors (Rich1 and Rich2), the muon system (Muon), the electromagnetic calorimeter (ECAL), and marginally the hadronic calorimeter (HCAL) [1]. As part of the candidate selection of physics analyses, requirements on PID are applied by setting threshold on variables built to summarize the different contributions to the overall PID of the detectors listed above. These variables are grouped in two families: Combined Differential Log-likelihood (DLL, defined for each species of particle as the log-likelihood difference between particle and pion hypotheses), and the response of an Artificial Neural Network (named PIDANN) [2]. (The network output is normalized between 0 and 1 and therefore named ProbNN.) The PIDANN algorithm is tuned on simulated signal and background samples. Depending on the arrangement of the input samples, on the quality of the simulation, and on the available number of simulated events, the response of the PIDANN algorithm can vary. The algorithm that calculates the ProbNN variables has, therefore, been retuned several times. Everytime the data is processed for a specific physics analyses, the input variables are read, and the ProbNN variables are calculated using the latest tuning. Calibration samples provide pure samples of the five most common charged, long-lived particle species produced in LHCb: kaons, pions, protons, electrons and muons, used to measure the distribution of the PID variables for the five different particles. These distributions and their correlations are an essential ingredient to measure the efficiency and background rejection of the PID requirements applied as part of the physics analyses. In Section 2, we briefly introduce the PIDCalib package which is the tool provided to the Collaboration to measure the efficiency of complex PID requirements. Section 3 summarizes the LHCb Trigger structure during the second run of the LHC. The Trigger configuration dictates analysis needs in terms of PID calibration samples as discussed in Section 4. Section 5 is devoted to the discussion of the implementation in the computing model to fulfill the requirements on the size and availability of the calibration samples, while section 6 complements the discussion by defining the context in which the path for these special kind of data has been defined. This includes the allocated bandwidth and the overlap between different streams. Finally, Section 7 discusses the matching of the online and offline candidates used to build the full PID information for calibration samples, input of the PIDCalib package. Concluding remarks and outlook compose Section 8. page 2

3 Trigger configuration for LHCb Run 2 Date: July 18, 2016 2 PIDCalib package The PIDCalib package is the interface between the needs of analysts and the samples prepared by the PID group. It provides simple access to the Calibration samples through a set of scripts which allow to perform different studies using as input the PID calibration samples. A detailed description of the PIDCalib project is available elsewhere [3], here only a short introduction of the main use case is presented to introduce naming conventions and concepts used throughout the discussion. The first step performed as part of the PIDCalib procedure is the statistical background subtraction of the calibration samples. This is achieved with the s Plot technique [4]. Most LHCb analyses that apply PID requirements as part of their candidate selection need a datadriven technique to estimate the efficiency of this requirement on their signal, and on possible backgrounds. The efficiency is measured by assigning to each i-th candidate in the calibration sample a weight w i. The weight w i is chosen to ensure that the binned distributions of a simulated signal (or background) sample and of the weighted calibration sample are consistent. The considered distributions are usually the joint pdf with respect to kinematic variables such as momentum and pseudorapidity, and event-complexity observables such as the number of tracks, or the number of hits in the ECAL preshower (SPD). After reweighting, the efficiency of PID requirements on the calibration samples is the same (within systematic uncertainty) as in the analysed data sample. To ensure flexibility, the PIDCalib package allows to combine selection requirements on different variables associated to the same track. This is essential to allow analysts to assess the efficiency of widely used requirements combining the DLL of different hypotheses (e.g. the request DLLp > 2 and DLLp DLLK > 2 to select protons while rejecting pions and kaons), or even requirements combining DLL and ProbNN variables. 3 Trigger configuration for LHCb Run 2 In order to face the new challenges of the second run of the LHC, the LHCb trigger [5, 6] has evolved into a heterogeneous configuration with a different output data format for different groups of trigger selections. Here we focus on two alternative data formats for physics analyses, named Full stream and Turbo stream. Trigger selections (also called trigger lines) that are part of the Full stream are intended for precision measurements and searches. While the software trigger actually reconstructs candidates, including vertexing and isolation requirements, those are not saved. If the trigger decision is affirmative, the raw event is saved together with some summary information on the trigger decision, named SelReport. The raw event is then reprocessed offline. Trigger lines writing to Turbo stream are intended for analyses of very large samples where only the information related to the candidates is needed. Trigger lines that are part of the Turbo stream produce a decay candidate, for which a large number of detector-related variables is computed, and stored in a summarized report (named TurboReport) together with the candidate itself. The TurboReport can be processed offline using the Tesla application [7]. This is an application designed to process the information computed by the trigger, with the resulting output used to directly perform physics measurements. Any data processing that relies on the TurboReport is independent on the detector raw data which is no longer part of the long-term computing model [8]. During the very first part of Run 2 (June-July 2015, named Early Measurement run), to aid in the commissioning of the Turbo stream mechanism, also the raw information has been saved together with the TurboReport to allow comparison between data stored in the TurboReport and equivalent data obtained after offline processing. The PID information saved in the SelReport (Full stream) will be limited to variables commonly used in the Trigger selection strategy. This includes the basic information from the Muon System [9] and the DLL variables. More PID information is stored in TurboReport and includes all the input variables used to compute the ProbNN discriminants. (The full lists, for both 2015 and 2016 data taking periods, are reported in the Appendix, Tables 2 and 3.) page 3

4 Use cases for PID calibration Date: July 18, 2016 4 Use cases for PID calibration Both analyses based on Full and Turbo stream need to measure the efficiency of PID-based selection requirements. In the former case, these requirements may be applied offline and may have been applied online, using the respective set of computed PID observables. Potential differences between online and offline calculated PID observables lead to the need for data samples usable to calibrate both. This section is devoted to the case-by-case consideration of the physics needs in terms of calibration samples, specifically: the case where no PID-based selection is applied in the trigger (4.1), the case of analyses using Turbo stream data (4.2), and the case where PID-based selections are applied both in the trigger and offline (4.3). 4.1 Analyses on Full stream with PID unbiased trigger The highest precision analyses which need to fully control systematic uncertainties related to PIDbased selection generally do not introduce PID requirement as part of their trigger selection. The PID-based selection strategy relies on offline-reconstructed variables, which are available in both the calibration sample and the analysis sample. In case neural net based combinations of PID observables [2, 10] (ProbNN) are used, these may regularly be recomputed offline as a result of regular updates to their tunings. Therefore, it must be possible to compute ProbNN variables on demand both on analysis samples and on calibration samples. To allow this, all the variables that are inputs to the algorithms that calculate the ProbNN variables are reconstructed offline and stored in the output file (called (micro)dst, from the historical name Data Summary Tape) of both Calibration and Physics samples. Specification: Calibration samples must include all the PID observables (see Tables 2 and 3 in the Appendix) as reconstructed online. When PID performance is very important for an analysis, dedicated studies of the PID performance down to the detector level are often performed. When this is the case, assessing PID efficiency of the new algorithms on real data may require access to the detector raw data stored in the calibration samples. This includes Rich and Calorimeter raw data (to study detector calibrations) Muon raw data (for custom association of hits to the muon candidate) Trigger raw data. These are particularly important because part of the PID information (namely the one from Calorimeter and Muon systems) is included in the earlier stages of the trigger, care is required to ensure that the calibration samples are not biased against the trigger itself. A method, TisTos [5], has been implemented to allow physics analyses to factorise the PID and trigger contributions to total efficiencies. Tracking raw data (for isolation studies) Specification: Calibration samples must include Rich, Calorimeter, Muon, Trigger and Tracker raw data. 4.2 Analyses on Turbo Sream datasets In case of analyses based on the Turbo stream the offline version of the PID observables does not exists. The Calibration samples have to provide the PID information as computed online in order to assess the efficiency of cuts applied in the Trigger line on online variables; of cuts applied offline on the PID variables stored in the TurboReport. page 4

4 Use cases for PID calibration Date: July 18, 2016 Since many trigger selections use requirements on ProbNN, it is important that the information needed to compute these variables is stored in the TurboReport. Once restored with Tesla, the online version of ProbNN variables can be recomputed together with different tunings of the same variables, that can be useful for analyses relying on Turbo stream. Specification: Calibration samples must include the input variables of PIDANN algorithms as they are available online and stored in the TurboReport of analyses based on Turbo stream. 4.3 Analyses on Full stream with PID requirements in the Trigger line In case PID-based selection requirements are applied as part of both online and offline selections, a measurement of their efficiency require the presence in the calibration sample of both the online and offline reconstructed PID observables, on a candidate-by-candidate basis. Consider, for example a physics analysis selecting prompt Λ c pk π + decays. In order to decrease the trigger selection accepted rate without applying a downscale, analysts decide to put a loose DLL requirement on the proton (DLLp > 5). This is enough to reduce the trigger rate to an acceptable level, and has very high efficiency on signal. Since the requirement is very loose, the offline analysis relies on a further cut on the proton (P robnnp > 0.2). Then the analysts want to use the PIDCalib package to assess the efficiency of the PID requirement on the proton. Namely, [DLLp] online > 5 AND [P robnnp] offline > 0.2. Because of the different algorithms used online and offline this is different from the requirement and, of course, from [DLLp] online > 5 AND [P robnnp] online > 0.2, [DLLp] offline > 5 AND [P robnnp] offline > 0.2. Furthermore, as a result of the strong correlation between the two variables, the online requirement on DLLp drastically modifies the distribution of ProbNNp, and thus considering the efficiency ε one can write that ε([dllp] online > 5 AND [P robnnp] offline > 0.2) ε([dllp] online > 5) ε([p robnnp] offline > 0.2). It is therefore clear that calibration samples must to allow the evaluation of mixed requirements such as the one in the example above, on an candidate-by-candidate basis. This can be achieved by matching the candidates reconstructed online, and saved as part of the Turbo stream, to candidates reconstructed offline. The easiest way to achieve this, is to create a sample containing, for each event, both online and offline candidates. Specification: It must be possible to match online and offline candidates stored in the calibration samples, and for each candidate, the online and offline reconstructed PID observables must be stored. 4.4 Considerations about statistics of control samples A key point for the calibration samples is to be properly sampled in order to provide acceptable uncertainty in the determination of the selection efficiency in the binning scheme defined by the analysis needs. The challenge is to cover the wide range of kinematic variables used to measure the detector and PID performance. During Run 1, for a dataset corresponding to an integrated luminosity of 3 fb 1, each calibration sample accounted for roughly 10M events. Despite the size of the samples, some analyses were already dominated by systematic uncertainty on the PID selection efficiency (especially when considering protons). During Run 2, more and more results will face limitations due to systematics effects and control samples are one of the handles to keep these effects under control. Moreover, samples need to be of a sufficient size to allow the study of correlated systematic effects. It has been estimated that calibration samples one order of magnitude larger than those taken in Run 1, together with binned prescaling to page 5

5 Calibration lines in Run 2 Date: July 18, 2016 improve the coverage of low-statistics regions (cf. [11]), should result in systematic uncertainties of an acceptable size. Roughly 100M events per particle type (p, K, π, µ, e), implies a total of 500M events for the calibration stream. This should be achievable allocating 100 Hz bandwidth for the trigger output dedicated to calibration samples during Run 2 operations. Different types of events that are of interest for the PID calibration sample are selected by a suite of trigger lines, which are appropriately prescaled to match the above requirements [11]. For most of Run 2, the analysis of data collected in 2015 allowed the pre-scale factors for the calibration samples to be determined. 4.5 Considerations about online-offline convergence In Run 2, the availability of a larger CPU budget in the HLT and optimized code allow the same reconstruction to be run online and offline. Online and offline tracking reconstructions show very small differences allowing for the same offline quality also in online reconstruction. Reconstruction of PID observable is the same online and offline. Residual differences are still present due to a slightly different reconstruction of the Calorimeter information. Although efforts are in place to reduce this difference to a negligible level, having both online and offline information stored in the calibration samples allows to safely measure PID performance. Even in case of identical online and offline reconstruction, it makes sense to keep the online information, as this is used as part of trigger selections. Future developments of the offline reconstruction and subsequent improvements of the offline performance could cause offline reconstructed PID observables to change, and the loss of the values computed and used online, if those are not retained. To avoid this, the online reconstructed PID observables are stored in the raw data. 4.6 Summary of the specifications The specifications discussed above are listed here for convenience: 1. Calibration samples must include all the PID observables (see Tables 2 and 3 in the Appendix) as reconstructed online. 2. Calibration samples must include Rich, Calorimeter, Muon, Trigger and Tracking raw event. 3. Calibration samples must include the input variables of PIDANN algorithms as they are available online and stored in the TurboReport of analyses based on Turbo stream. 4. It must be possible to match online and offline candidates stored in the calibration samples, and for each candidate, the online and offline reconstructed PID observables must be stored. As a first results, Calibration samples were available together with the first data collected in June and July 2015 allowing the presentation of physics results at the 2015 Summer Conferencec [12, 13]. 5 Calibration lines in Run 2 During the second Run of the LHC, the calibration samples are selected by Hlt2 lines writing to a dedicated stream, TurboCalib, that contains Turbo information and retains the detector raw data. Throughout Run 2, both the TurboReport and the raw event have to be stored in the TurboCalib stream. The TurboReport will be processed offline using Tesla [7]. The Brunel reconstruction will be run centrally, while no further offline processing, i.e. selections and streaming of events, is foreseen, because further selections in addition to those that select the events in the trigger would defeat the purpose of the sample. The creation of candidate decays will be part of the ntuple production step, instead of a separate offline processing step. To produce the final samples needed by PIDCalib, three steps are needed, described in a single configuration file: page 6

6 Operations with calibration lines during Run 2 Date: July 18, 2016 Trigger & Computing PIDCalib expert part PIDCalib user part HLT2 TurboCalib TurboReport + RAW lines DaVinci Create CommonParticles Load Tesla candidates Match Tesla cand. and CommonParticles DaVinci Run your own Reconstruction Try improving PID performance Take precomputed sweights Offline CALIBRATION stream Write ntuples Write sweighted ntuples RECO Brunel Project PIDCalib/CalibDataSel Resurrect Tesla Project Create one histograms per ntuple file Merge all the histograms splot of the samples Write stables PIDCalib/PerfToolScripts Bookkeeping FullTurbo.DST Stream ROOT Apply sweights Script Produce efficiency tables Use reweighting tools MultiTrack Performance Figure 1 Schematic representation of the data flow for the Calibration samples. Note that there is no offline selection since candidates are produced online and resurrected using Tesla. Daughter particles are matched to the offline candidates in the DaVinci step while producing ntuples. stables contain the sweight needed to statistically subtract the residual background following the s Plot technique [4]. Production and selection of the decay candidates; Matching of the online and offline candidates; Output to ntuples, providing the input of the PIDCalib package. An extension of the PIDCalib package has been implemented to support the the production of MicroDST datasets including the s Weight of the calibration candidate obtained from a binned likelihood fit of the signal-background discriminating distributions. These light-weight calibration samples will be available to the Collaboration to study the performance of ad-hoc tunings of the existing algorithms or to study new algorithms without rerunning the fit and the s Plot parts of PIDCalib. The data flow is sketched in Figure 1. 6 Operations with calibration lines during Run 2 The needs that have to be fullfilled by the calibration samples were different in the Early Measurement in 2015, the validation (second part of 2015 data taking), and long Run 2 phases. Throughout 2015, the rate of the calibration samples (for both PID and Tracking lines) was between 1.2 khz (Early Measurement) and 0.6 khz (validation). Data taken in 2015 allowed for a first optimization of both PID and Tracking lines that lead to a rate of 0.4 khz in the 2016 configuration. The optimization will continue in 2016 with the aim of reaching a rate 0.1 khz for TurboCalib for most of Run 2. Besides the rate, another important aspect to be kept under control is the overlap between different streams, because it can badly affects the use of computing resources, in particular the storage. Details on these aspects depend on the specific configuration used. (The situation at the time of writing can be found in the Appendix, see Table 4.) As of this writing, HLT provides different streams: Full, Turbo, TurboCalib, defined in Sect. 3, and two ancillar ones, Lumi, providing compact information about recorded luminosity, the Turbo stream goes through the Turbo and Turbo validation workflows, the latter only in 2015 for validation purposes. A schematic representation is issued in Table 1. From the beginning of Run 2 data taking, page 7

8 Conclusion Date: July 18, 2016 Trigger RB Workflow Offline Raw data File name Full 87 Full Brunel+DaVinci RAW FULL.DST Turbo 88 Turbo Tesla RAW+TurboReport TURBO.MDST Turbo 88 Turbo validation Brunel+Tesla RAW+TurboReport FULLTURBO.DST TurboCalib 90 Calibration Brunel+Tesla RAW+TurboReport FULLTURBO.DST Table 1 Summary of processing schemes. Each stream is routed by a specific routing bit (RB). In 2015 for validation purposes,turbo data have been duplicated as different output of Turbo and Turbo validation workflows. In 2016 only the Turbo workflow is used. calibration data followed the Calibration workflow. The latter, schematically shown in the left part of Fig 1, is meant to process TurboCalib data. Offline and online recreated candidates are saved separately (technically, defining two different Transient Event Store (TES) locations in the same file), allowing for an easy matching of online and offline candidates (see 4.3). The same matching was needed during the validation of the Turbo stream reconstruction. As stated in Section 4.4, for PID calibration purposes, a total trigger rate of about 100 Hz should be sufficient to guarantee the evaluation of the PID performance with the required precision. For very high yield Turbo analyses, this might not be the case. By taking advantage of the fact that only online reconstructed quantities are required to calibrate PID observables for Turbo analyses, additional selections could be added to the Turbo stream itself, where only the small TurboReport is retained, to provide the required additional calibration samples. 7 Matching the online and offline candidates The online and offline candidate are stored separately within each processed event, so comparing their PID observables requires them to be matched. The matching can be done on the basis of the number of shared LHCbIDs (corresponding to physical hits/clusters in the detector), exploiting the TisTos algorithm [5], or combining the two techniques. To build the samples, first the online candidate - as selected by the trigger - is written. Then the track of the probe particle is matched to a track reconstructed in the offline processing. The PID variables of the online and offline reconstructed candidates are written into two different locations and are available for the users to create combined requirements as described in Section 4. The Tesla-Brunel matching algorithm is now part of the LHCb software. 8 Conclusion In order to provide calibration samples for analyses relying on Turbo and Full stream, their selection strategy has been implemented in the trigger. Selected events need to be reconstructed offline and therefore detector raw data must be retained. On the other hand, the full PID information, needed to reconstruct the ProbNN variables, can only be stored for candidates selected by a line in the Turbo stream, The simplest technical solution to satisfy these specification is to write the Calibration samples to a dedicated stream, TurboCalib, containing both the TurboReport and the raw event. The stream is processed centrally to restore the online candidates and with the standard offline reconstruction to produce the offline candidates, which are stored separately in each event in the output file. Online and offline candidates are then matched to produce ntuples containing, candidate-by-candidate, the online and offline PID variables. The TurboCalib stream developed for the PID calibration samples, has been already used also for the Tracking calibration samples. It had a key role in the validation process of both the Turbo stream and the Tesla application. Finally it is also a fundamental tool to develop a viable strategy for the calibration samples in the LHCb Upgrade. page 8

10 References Date: July 18, 2016 9 Acknowledgements We warmly thank our colleagues R. Aaij and P. Charpentier for the careful review of this text in the passage from Internal to Public note 10 References [1] LHCb Collaboration The LHCb Detector at the LHC J. Instrum. 3 (2008) S08005 [2] Chris Jones, ANN PID, https://indico.cern.ch/event/226062/contribution/1/material/slides/0.pdf. [3] Sneha Malde, PIDCalib Packages, https://twiki.cern.ch/twiki/bin/view/lhcb/pidcalibpackage. [4] Muriel Pivk, Francois R. Le Diberder, splot: a statistical tool to unfold data distributions arxiv:physics/0402083 [5] R. Aaij et al., The LHCb Trigger and its Performance in 2011, JINST 8 (2013) P04022. [6] J. Albrecht et al. [LHCb HLT project Collaboration], Performance of the LHCb High Level Trigger in 2012, J. Phys. Conf. Ser. 513 (2014) 012001. [7] R. Aaij et al., Tesla : an application for real-time data analysis in High Energy Physics, arxiv:1604.05596. [8] LHCb Collaboration, LHCb Trigger and Online Upgrade Technical Design Report, CERN-LHCC- 2014-016; LHCB-TDR-016 [9] LHCb MuonID group, Performance of the Muon Identification at LHCb, J. Instrum. 8 (2013) P10020; arxiv:1306.0249 [10] LHCb collaboration LHCb Detector Performance, Int. J. Mod. Phys. A 30 (2015) 1530022, arxiv:1412.6352 [11] L. Anderlini, O. Lupton, B. Sciascia, V. Gligorov Calibration samples for particle identification at LHCb in Run 2, LHCb-PUB-2016-005; CERN-LHCb-PUB-2016-005 [12] R. Aaij et al. [LHCb Collaboration], Measurement of forward J/ψ production cross-sections in pp collisions at s = 13 TeV, JHEP 1510 (2015) 172, arxiv:1509.00771 [13] R. Aaij et al. [LHCb Collaboration], Measurements of prompt charm production cross-sections in pp collisions at s = 13 TeV, JHEP 1603 (2016) 159, arxiv:1510.01707 page 9

A Appendix Date: July 18, 2016 A Appendix Tracking Rich Calorimeters P USED R1 GAS Ecal PIDe PT USED R2 GAS Ecal PIDmu CHI2NDOF ABOVE MU THRESHOLD HCal PIDe NDOF ABOVE k THRESHOLD HCal PIDmu LIKELIHOOD DLLe Prs PIDe GHOST PROBABILITY DLLmu InAccBrem FIT MATCH CHI2 DLLk BremPIDe CLONE DISTANCE DLLp FIT VELO CHI2 DLLbt FIT VELO NDOF FIT T CHI2 FIT T NDOF Muon Velo Background Likelihood VeloCharge Muon Likelihood IsMuon nshared InMuon Acceptance IsLooseMuon Table 2 List of PID-related variables that are inputs to the PIDANN algorithms (Run 1 version) and stored in the TurboReport. page 10

A Appendix Date: July 18, 2016 Tracking Rich Calorimeters P USED R1 GAS Ecal PIDe PT USED R2 GAS Ecal PIDmu CHI2NDOF ABOVE MU THRESHOLD HCal PIDe NDOF ABOVE k THRESHOLD HCal PIDmu GHOST PROBABILITY DLLe Prs PIDe FIT MATCH CHI2 DLLmu InAccBrem FIT VELO CHI2 DLLk BremPIDe FIT VELO NDOF DLLp FIT T CHI2 DLLbt FIT T NDOF Muon Velo Background Likelihood Muon Likelihood IsMuon nshared InMuon Acceptance IsLooseMuon Table 3 List of PID-related variables that are inputs to the PIDANN algorithms (Run 2 version), and stored in the TurboReport. Channel Rate Charm Charm Topo Leptons Turbo EW Low Other Tech Full Turbo Calib Mult (khz) (%) (%) (%) (%) (%) (%) (%) (%) (%) ALL 13.0 23.1 35.4 23.1 16.2 3.1 5.6 4.4 9.0 1.3 CharmFull 3.0 100.0 16.7 13.3 3.3 3.3 6.7 0.0 4.4 0.0 CharmTurbo 4.6 10.9 100.0 10.9 0.7 2.2 4.3 0.0 8.0 0.0 Topo 3.0 13.3 16.7 100.0 5.6 4.4 4.4 0.0 11.1 0.0 Leptons 2.1 4.8 1.6 7.9 100.0 1.6 4.8 0.0 1.6 0.0 TurboCalib 0.4 25.0 25.0 33.3 8.3 100.0 0.0 0.0 0.0 0.0 EW 0.7 27.3 27.3 18.2 13.6 0.0 100.0 0.0 4.5 0.0 LowMult 0.6 0.0 0.0 0.0 0.0 0.0 0.0 100.0 0.0 0.0 Other 1.2 11.4 31.4 28.6 2.9 0.0 2.9 0.0 100.0 0.0 Technical 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 100.0 Table 4 Overlap between different samples. Each cell is normalised to the rate expected from each sample (shown in the second column). page 11