Computing strategy for PID calibration samples for LHCb Run 2

Computing strategy for PID calibration samples for LHCb Run 2 LHCb-PUB-2016-020 18/07/2016 Issue: 1 Revision: 0 Public Note Reference: LHCb-PUB-2016-020 Created: June, 2016 Last modified: July 18, 2016 Prepared by: Lucio Anderlini a, Sean Benson b, Vladimir Gligorov b, Oliver Lupton c, Barbara Sciascia d, a INFN, Sezione di Firenze b CERN c University of Oxford d INFN, Laboratori Nazionali di Frascati

Date: July 18, 2016 Abstract The samples for the calibration of the Particle Identification (PID) are sets of data collected by LHCb where decay candidates have a kinematic structure which allows unambiguous identification of one the daughters with a selection strategy not relying on its PID-related variables. PID calibration samples are used in a data-driven technique to measure the efficiency of selection requirements on PID variables in many LHCb analyses. Since the second run of the LHCb experiment, PID variable are computed as part of the software trigger and then refined in the offline reconstruction (this includes two different applications, named Brunel and DaVinci). Physics analyses rely on selections combining PID requirements on the online and offline versions of the PID variables. Because of the large, but not full, correlation between online and offline PID variables, the PID samples must be built such that it is possible to access, particle by particle, both versions of the full PID information, such that combined requirements on the offline and online versions can be defined. The only viable solution is to write the PID Calibration samples to a dedicated stream where the information evaluated in the trigger (the online version) is stored together with the full raw event, which is then reconstructed and processed offline. A dedicated algorithm allows to match online and offline candidates and to produce datasets to be used as input for the package that allows analysts to access the calibration samples. This note is focused on PID Calibration samples, but the general layout should be applied to any other calibration sample in Run 2. Contents 1 Introduction.............................. 2 2 PIDCalib package............................ 3 3 Trigger configuration for LHCb Run 2.................... 3 4 Use cases for PID calibration........................ 4 4.1 Analyses on Full stream with PID unbiased trigger............. 4 4.2 Analyses on Turbo Sream datasets.................... 4 4.3 Analyses on Full stream with PID requirements in the Trigger line....... 5 4.4 Considerations about statistics of control samples.............. 5 4.5 Considerations about online-offline convergence.............. 6 4.6 Summary of the specifications..................... 6 5 Calibration lines in Run 2......................... 6 6 Operations with calibration lines during Run 2................ 7 7 Matching the online and offline candidates.................. 8 8 Conclusion............................... 8 9 Acknowledgements........................... 9 10 References............................... 9 A Appendix............................... 10 page 1

1 Introduction Date: July 18, 2016 List of Figures 1 Schematic representation of the data flow for the Calibration samples. Note that there is no offline selection since candidates are produced online and resurrected using Tesla. Daughter particles are matched to the offline candidates in the DaVinci step while producing ntuples. stables contain the s Weight needed to statistically subtract the residual background following the s Plot technique [4]................ 7 List of Tables 1 Summary of processing schemes. Each stream is routed by a specific routing bit (RB). In 2015 for validation purposes,turbo data have been duplicated as different output of Turbo and Turbo validation workflows. In 2016 only the Turbo workflow is used.. 8 2 List of PID-related variables that are inputs to the PIDANN algorithms (Run 1 version) and stored in the TurboReport..................... 10 3 List of PID-related variables that are inputs to the PIDANN algorithms (Run 2 version), and stored in the TurboReport..................... 11 4 Overlap between different samples. Each cell is normalised to the rate expected from each sample (shown in the second column)................. 11 1 Introduction Most of the LHCb analyses rely on Particle Identification (PID) to distinguish between the nature of charged tracks reconstructed by the detector. The detectors used to identify these particles are the Rich detectors (Rich1 and Rich2), the muon system (Muon), the electromagnetic calorimeter (ECAL), and marginally the hadronic calorimeter (HCAL) [1]. As part of the candidate selection of physics analyses, requirements on PID are applied by setting threshold on variables built to summarize the different contributions to the overall PID of the detectors listed above. These variables are grouped in two families: Combined Differential Log-likelihood (DLL, defined for each species of particle as the log-likelihood difference between particle and pion hypotheses), and the response of an Artificial Neural Network (named PIDANN) [2]. (The network output is normalized between 0 and 1 and therefore named ProbNN.) The PIDANN algorithm is tuned on simulated signal and background samples. Depending on the arrangement of the input samples, on the quality of the simulation, and on the available number of simulated events, the response of the PIDANN algorithm can vary. The algorithm that calculates the ProbNN variables has, therefore, been retuned several times. Everytime the data is processed for a specific physics analyses, the input variables are read, and the ProbNN variables are calculated using the latest tuning. Calibration samples provide pure samples of the five most common charged, long-lived particle species produced in LHCb: kaons, pions, protons, electrons and muons, used to measure the distribution of the PID variables for the five different particles. These distributions and their correlations are an essential ingredient to measure the efficiency and background rejection of the PID requirements applied as part of the physics analyses. In Section 2, we briefly introduce the PIDCalib package which is the tool provided to the Collaboration to measure the efficiency of complex PID requirements. Section 3 summarizes the LHCb Trigger structure during the second run of the LHC. The Trigger configuration dictates analysis needs in terms of PID calibration samples as discussed in Section 4. Section 5 is devoted to the discussion of the implementation in the computing model to fulfill the requirements on the size and availability of the calibration samples, while section 6 complements the discussion by defining the context in which the path for these special kind of data has been defined. This includes the allocated bandwidth and the overlap between different streams. Finally, Section 7 discusses the matching of the online and offline candidates used to build the full PID information for calibration samples, input of the PIDCalib package. Concluding remarks and outlook compose Section 8. page 2

3 Trigger configuration for LHCb Run 2 Date: July 18, 2016 2 PIDCalib package The PIDCalib package is the interface between the needs of analysts and the samples prepared by the PID group. It provides simple access to the Calibration samples through a set of scripts which allow to perform different studies using as input the PID calibration samples. A detailed description of the PIDCalib project is available elsewhere [3], here only a short introduction of the main use case is presented to introduce naming conventions and concepts used throughout the discussion. The first step performed as part of the PIDCalib procedure is the statistical background subtraction of the calibration samples. This is achieved with the s Plot technique [4]. Most LHCb analyses that apply PID requirements as part of their candidate selection need a datadriven technique to estimate the efficiency of this requirement on their signal, and on possible backgrounds. The efficiency is measured by assigning to each i-th candidate in the calibration sample a weight w i. The weight w i is chosen to ensure that the binned distributions of a simulated signal (or background) sample and of the weighted calibration sample are consistent. The considered distributions are usually the joint pdf with respect to kinematic variables such as momentum and pseudorapidity, and event-complexity observables such as the number of tracks, or the number of hits in the ECAL preshower (SPD). After reweighting, the efficiency of PID requirements on the calibration samples is the same (within systematic uncertainty) as in the analysed data sample. To ensure flexibility, the PIDCalib package allows to combine selection requirements on different variables associated to the same track. This is essential to allow analysts to assess the efficiency of widely used requirements combining the DLL of different hypotheses (e.g. the request DLLp > 2 and DLLp DLLK > 2 to select protons while rejecting pions and kaons), or even requirements combining DLL and ProbNN variables. 3 Trigger configuration for LHCb Run 2 In order to face the new challenges of the second run of the LHC, the LHCb trigger [5, 6] has evolved into a heterogeneous configuration with a different output data format for different groups of trigger selections. Here we focus on two alternative data formats for physics analyses, named Full stream and Turbo stream. Trigger selections (also called trigger lines) that are part of the Full stream are intended for precision measurements and searches. While the software trigger actually reconstructs candidates, including vertexing and isolation requirements, those are not saved. If the trigger decision is affirmative, the raw event is saved together with some summary information on the trigger decision, named SelReport. The raw event is then reprocessed offline. Trigger lines writing to Turbo stream are intended for analyses of very large samples where only the information related to the candidates is needed. Trigger lines that are part of the Turbo stream produce a decay candidate, for which a large number of detector-related variables is computed, and stored in a summarized report (named TurboReport) together with the candidate itself. The TurboReport can be processed offline using the Tesla application [7]. This is an application designed to process the information computed by the trigger, with the resulting output used to directly perform physics measurements. Any data processing that relies on the TurboReport is independent on the detector raw data which is no longer part of the long-term computing model [8]. During the very first part of Run 2 (June-July 2015, named Early Measurement run), to aid in the commissioning of the Turbo stream mechanism, also the raw information has been saved together with the TurboReport to allow comparison between data stored in the TurboReport and equivalent data obtained after offline processing. The PID information saved in the SelReport (Full stream) will be limited to variables commonly used in the Trigger selection strategy. This includes the basic information from the Muon System [9] and the DLL variables. More PID information is stored in TurboReport and includes all the input variables used to compute the ProbNN discriminants. (The full lists, for both 2015 and 2016 data taking periods, are reported in the Appendix, Tables 2 and 3.) page 3

4 Use cases for PID calibration Date: July 18, 2016 4 Use cases for PID calibration Both analyses based on Full and Turbo stream need to measure the efficiency of PID-based selection requirements. In the former case, these requirements may be applied offline and may have been applied online, using the respective set of computed PID observables. Potential differences between online and offline calculated PID observables lead to the need for data samples usable to calibrate both. This section is devoted to the case-by-case consideration of the physics needs in terms of calibration samples, specifically: the case where no PID-based selection is applied in the trigger (4.1), the case of analyses using Turbo stream data (4.2), and the case where PID-based selections are applied both in the trigger and offline (4.3). 4.1 Analyses on Full stream with PID unbiased trigger The highest precision analyses which need to fully control systematic uncertainties related to PIDbased selection generally do not introduce PID requirement as part of their trigger selection. The PID-based selection strategy relies on offline-reconstructed variables, which are available in both the calibration sample and the analysis sample. In case neural net based combinations of PID observables [2, 10] (ProbNN) are used, these may regularly be recomputed offline as a result of regular updates to their tunings. Therefore, it must be possible to compute ProbNN variables on demand both on analysis samples and on calibration samples. To allow this, all the variables that are inputs to the algorithms that calculate the ProbNN variables are reconstructed offline and stored in the output file (called (micro)dst, from the historical name Data Summary Tape) of both Calibration and Physics samples. Specification: Calibration samples must include all the PID observables (see Tables 2 and 3 in the Appendix) as reconstructed online. When PID performance is very important for an analysis, dedicated studies of the PID performance down to the detector level are often performed. When this is the case, assessing PID efficiency of the new algorithms on real data may require access to the detector raw data stored in the calibration samples. This includes Rich and Calorimeter raw data (to study detector calibrations) Muon raw data (for custom association of hits to the muon candidate) Trigger raw data. These are particularly important because part of the PID information (namely the one from Calorimeter and Muon systems) is included in the earlier stages of the trigger, care is required to ensure that the calibration samples are not biased against the trigger itself. A method, TisTos [5], has been implemented to allow physics analyses to factorise the PID and trigger contributions to total efficiencies. Tracking raw data (for isolation studies) Specification: Calibration samples must include Rich, Calorimeter, Muon, Trigger and Tracker raw data. 4.2 Analyses on Turbo Sream datasets In case of analyses based on the Turbo stream the offline version of the PID observables does not exists. The Calibration samples have to provide the PID information as computed online in order to assess the efficiency of cuts applied in the Trigger line on online variables; of cuts applied offline on the PID variables stored in the TurboReport. page 4

4 Use cases for PID calibration Date: July 18, 2016 Since many trigger selections use requirements on ProbNN, it is important that the information needed to compute these variables is stored in the TurboReport. Once restored with Tesla, the online version of ProbNN variables can be recomputed together with different tunings of the same variables, that can be useful for analyses relying on Turbo stream. Specification: Calibration samples must include the input variables of PIDANN algorithms as they are available online and stored in the TurboReport of analyses based on Turbo stream. 4.3 Analyses on Full stream with PID requirements in the Trigger line In case PID-based selection requirements are applied as part of both online and offline selections, a measurement of their efficiency require the presence in the calibration sample of both the online and offline reconstructed PID observables, on a candidate-by-candidate basis. Consider, for example a physics analysis selecting prompt Λ c pk π + decays. In order to decrease the trigger selection accepted rate without applying a downscale, analysts decide to put a loose DLL requirement on the proton (DLLp > 5). This is enough to reduce the trigger rate to an acceptable level, and has very high efficiency on signal. Since the requirement is very loose, the offline analysis relies on a further cut on the proton (P robnnp > 0.2). Then the analysts want to use the PIDCalib package to assess the efficiency of the PID requirement on the proton. Namely, [DLLp] online > 5 AND [P robnnp] offline > 0.2. Because of the different algorithms used online and offline this is different from the requirement and, of course, from [DLLp] online > 5 AND [P robnnp] online > 0.2, [DLLp] offline > 5 AND [P robnnp] offline > 0.2. Furthermore, as a result of the strong correlation between the two variables, the online requirement on DLLp drastically modifies the distribution of ProbNNp, and thus considering the efficiency ε one can write that ε([dllp] online > 5 AND [P robnnp] offline > 0.2) ε([dllp] online > 5) ε([p robnnp] offline > 0.2). It is therefore clear that calibration samples must to allow the evaluation of mixed requirements such as the one in the example above, on an candidate-by-candidate basis. This can be achieved by matching the candidates reconstructed online, and saved as part of the Turbo stream, to candidates reconstructed offline. The easiest way to achieve this, is to create a sample containing, for each event, both online and offline candidates. Specification: It must be possible to match online and offline candidates stored in the calibration samples, and for each candidate, the online and offline reconstructed PID observables must be stored. 4.4 Considerations about statistics of control samples A key point for the calibration samples is to be properly sampled in order to provide acceptable uncertainty in the determination of the selection efficiency in the binning scheme defined by the analysis needs. The challenge is to cover the wide range of kinematic variables used to measure the detector and PID performance. During Run 1, for a dataset corresponding to an integrated luminosity of 3 fb 1, each calibration sample accounted for roughly 10M events. Despite the size of the samples, some analyses were already dominated by systematic uncertainty on the PID selection efficiency (especially when considering protons). During Run 2, more and more results will face limitations due to systematics effects and control samples are one of the handles to keep these effects under control. Moreover, samples need to be of a sufficient size to allow the study of correlated systematic effects. It has been estimated that calibration samples one order of magnitude larger than those taken in Run 1, together with binned prescaling to page 5

5 Calibration lines in Run 2 Date: July 18, 2016 improve the coverage of low-statistics regions (cf. [11]), should result in systematic uncertainties of an acceptable size. Roughly 100M events per particle type (p, K, π, µ, e), implies a total of 500M events for the calibration stream. This should be achievable allocating 100 Hz bandwidth for the trigger output dedicated to calibration samples during Run 2 operations. Different types of events that are of interest for the PID calibration sample are selected by a suite of trigger lines, which are appropriately prescaled to match the above requirements [11]. For most of Run 2, the analysis of data collected in 2015 allowed the pre-scale factors for the calibration samples to be determined. 4.5 Considerations about online-offline convergence In Run 2, the availability of a larger CPU budget in the HLT and optimized code allow the same reconstruction to be run online and offline. Online and offline tracking reconstructions show very small differences allowing for the same offline quality also in online reconstruction. Reconstruction of PID observable is the same online and offline. Residual differences are still present due to a slightly different reconstruction of the Calorimeter information. Although efforts are in place to reduce this difference to a negligible level, having both online and offline information stored in the calibration samples allows to safely measure PID performance. Even in case of identical online and offline reconstruction, it makes sense to keep the online information, as this is used as part of trigger selections. Future developments of the offline reconstruction and subsequent improvements of the offline performance could cause offline reconstructed PID observables to change, and the loss of the values computed and used online, if those are not retained. To avoid this, the online reconstructed PID observables are stored in the raw data. 4.6 Summary of the specifications The specifications discussed above are listed here for convenience: 1. Calibration samples must include all the PID observables (see Tables 2 and 3 in the Appendix) as reconstructed online. 2. Calibration samples must include Rich, Calorimeter, Muon, Trigger and Tracking raw event. 3. Calibration samples must include the input variables of PIDANN algorithms as they are available online and stored in the TurboReport of analyses based on Turbo stream. 4. It must be possible to match online and offline candidates stored in the calibration samples, and for each candidate, the online and offline reconstructed PID observables must be stored. As a first results, Calibration samples were available together with the first data collected in June and July 2015 allowing the presentation of physics results at the 2015 Summer Conferencec [12, 13]. 5 Calibration lines in Run 2 During the second Run of the LHC, the calibration samples are selected by Hlt2 lines writing to a dedicated stream, TurboCalib, that contains Turbo information and retains the detector raw data. Throughout Run 2, both the TurboReport and the raw event have to be stored in the TurboCalib stream. The TurboReport will be processed offline using Tesla [7]. The Brunel reconstruction will be run centrally, while no further offline processing, i.e. selections and streaming of events, is foreseen, because further selections in addition to those that select the events in the trigger would defeat the purpose of the sample. The creation of candidate decays will be part of the ntuple production step, instead of a separate offline processing step. To produce the final samples needed by PIDCalib, three steps are needed, described in a single configuration file: page 6

6 Operations with calibration lines during Run 2 Date: July 18, 2016 Trigger & Computing PIDCalib expert part PIDCalib user part HLT2 TurboCalib TurboReport + RAW lines DaVinci Create CommonParticles Load Tesla candidates Match Tesla cand. and CommonParticles DaVinci Run your own Reconstruction Try improving PID performance Take precomputed sweights Offline CALIBRATION stream Write ntuples Write sweighted ntuples RECO Brunel Project PIDCalib/CalibDataSel Resurrect Tesla Project Create one histograms per ntuple file Merge all the histograms splot of the samples Write stables PIDCalib/PerfToolScripts Bookkeeping FullTurbo.DST Stream ROOT Apply sweights Script Produce efficiency tables Use reweighting tools MultiTrack Performance Figure 1 Schematic representation of the data flow for the Calibration samples. Note that there is no offline selection since candidates are produced online and resurrected using Tesla. Daughter particles are matched to the offline candidates in the DaVinci step while producing ntuples. stables contain the sweight needed to statistically subtract the residual background following the s Plot technique [4]. Production and selection of the decay candidates; Matching of the online and offline candidates; Output to ntuples, providing the input of the PIDCalib package. An extension of the PIDCalib package has been implemented to support the the production of MicroDST datasets including the s Weight of the calibration candidate obtained from a binned likelihood fit of the signal-background discriminating distributions. These light-weight calibration samples will be available to the Collaboration to study the performance of ad-hoc tunings of the existing algorithms or to study new algorithms without rerunning the fit and the s Plot parts of PIDCalib. The data flow is sketched in Figure 1. 6 Operations with calibration lines during Run 2 The needs that have to be fullfilled by the calibration samples were different in the Early Measurement in 2015, the validation (second part of 2015 data taking), and long Run 2 phases. Throughout 2015, the rate of the calibration samples (for both PID and Tracking lines) was between 1.2 khz (Early Measurement) and 0.6 khz (validation). Data taken in 2015 allowed for a first optimization of both PID and Tracking lines that lead to a rate of 0.4 khz in the 2016 configuration. The optimization will continue in 2016 with the aim of reaching a rate 0.1 khz for TurboCalib for most of Run 2. Besides the rate, another important aspect to be kept under control is the overlap between different streams, because it can badly affects the use of computing resources, in particular the storage. Details on these aspects depend on the specific configuration used. (The situation at the time of writing can be found in the Appendix, see Table 4.) As of this writing, HLT provides different streams: Full, Turbo, TurboCalib, defined in Sect. 3, and two ancillar ones, Lumi, providing compact information about recorded luminosity, the Turbo stream goes through the Turbo and Turbo validation workflows, the latter only in 2015 for validation purposes. A schematic representation is issued in Table 1. From the beginning of Run 2 data taking, page 7

8 Conclusion Date: July 18, 2016 Trigger RB Workflow Offline Raw data File name Full 87 Full Brunel+DaVinci RAW FULL.DST Turbo 88 Turbo Tesla RAW+TurboReport TURBO.MDST Turbo 88 Turbo validation Brunel+Tesla RAW+TurboReport FULLTURBO.DST TurboCalib 90 Calibration Brunel+Tesla RAW+TurboReport FULLTURBO.DST Table 1 Summary of processing schemes. Each stream is routed by a specific routing bit (RB). In 2015 for validation purposes,turbo data have been duplicated as different output of Turbo and Turbo validation workflows. In 2016 only the Turbo workflow is used. calibration data followed the Calibration workflow. The latter, schematically shown in the left part of Fig 1, is meant to process TurboCalib data. Offline and online recreated candidates are saved separately (technically, defining two different Transient Event Store (TES) locations in the same file), allowing for an easy matching of online and offline candidates (see 4.3). The same matching was needed during the validation of the Turbo stream reconstruction. As stated in Section 4.4, for PID calibration purposes, a total trigger rate of about 100 Hz should be sufficient to guarantee the evaluation of the PID performance with the required precision. For very high yield Turbo analyses, this might not be the case. By taking advantage of the fact that only online reconstructed quantities are required to calibrate PID observables for Turbo analyses, additional selections could be added to the Turbo stream itself, where only the small TurboReport is retained, to provide the required additional calibration samples. 7 Matching the online and offline candidates The online and offline candidate are stored separately within each processed event, so comparing their PID observables requires them to be matched. The matching can be done on the basis of the number of shared LHCbIDs (corresponding to physical hits/clusters in the detector), exploiting the TisTos algorithm [5], or combining the two techniques. To build the samples, first the online candidate - as selected by the trigger - is written. Then the track of the probe particle is matched to a track reconstructed in the offline processing. The PID variables of the online and offline reconstructed candidates are written into two different locations and are available for the users to create combined requirements as described in Section 4. The Tesla-Brunel matching algorithm is now part of the LHCb software. 8 Conclusion In order to provide calibration samples for analyses relying on Turbo and Full stream, their selection strategy has been implemented in the trigger. Selected events need to be reconstructed offline and therefore detector raw data must be retained. On the other hand, the full PID information, needed to reconstruct the ProbNN variables, can only be stored for candidates selected by a line in the Turbo stream, The simplest technical solution to satisfy these specification is to write the Calibration samples to a dedicated stream, TurboCalib, containing both the TurboReport and the raw event. The stream is processed centrally to restore the online candidates and with the standard offline reconstruction to produce the offline candidates, which are stored separately in each event in the output file. Online and offline candidates are then matched to produce ntuples containing, candidate-by-candidate, the online and offline PID variables. The TurboCalib stream developed for the PID calibration samples, has been already used also for the Tracking calibration samples. It had a key role in the validation process of both the Turbo stream and the Tesla application. Finally it is also a fundamental tool to develop a viable strategy for the calibration samples in the LHCb Upgrade. page 8

10 References Date: July 18, 2016 9 Acknowledgements We warmly thank our colleagues R. Aaij and P. Charpentier for the careful review of this text in the passage from Internal to Public note 10 References [1] LHCb Collaboration The LHCb Detector at the LHC J. Instrum. 3 (2008) S08005 [2] Chris Jones, ANN PID, https://indico.cern.ch/event/226062/contribution/1/material/slides/0.pdf. [3] Sneha Malde, PIDCalib Packages, https://twiki.cern.ch/twiki/bin/view/lhcb/pidcalibpackage. [4] Muriel Pivk, Francois R. Le Diberder, splot: a statistical tool to unfold data distributions arxiv:physics/0402083 [5] R. Aaij et al., The LHCb Trigger and its Performance in 2011, JINST 8 (2013) P04022. [6] J. Albrecht et al. [LHCb HLT project Collaboration], Performance of the LHCb High Level Trigger in 2012, J. Phys. Conf. Ser. 513 (2014) 012001. [7] R. Aaij et al., Tesla : an application for real-time data analysis in High Energy Physics, arxiv:1604.05596. [8] LHCb Collaboration, LHCb Trigger and Online Upgrade Technical Design Report, CERN-LHCC- 2014-016; LHCB-TDR-016 [9] LHCb MuonID group, Performance of the Muon Identification at LHCb, J. Instrum. 8 (2013) P10020; arxiv:1306.0249 [10] LHCb collaboration LHCb Detector Performance, Int. J. Mod. Phys. A 30 (2015) 1530022, arxiv:1412.6352 [11] L. Anderlini, O. Lupton, B. Sciascia, V. Gligorov Calibration samples for particle identification at LHCb in Run 2, LHCb-PUB-2016-005; CERN-LHCb-PUB-2016-005 [12] R. Aaij et al. [LHCb Collaboration], Measurement of forward J/ψ production cross-sections in pp collisions at s = 13 TeV, JHEP 1510 (2015) 172, arxiv:1509.00771 [13] R. Aaij et al. [LHCb Collaboration], Measurements of prompt charm production cross-sections in pp collisions at s = 13 TeV, JHEP 1603 (2016) 159, arxiv:1510.01707 page 9

A Appendix Date: July 18, 2016 A Appendix Tracking Rich Calorimeters P USED R1 GAS Ecal PIDe PT USED R2 GAS Ecal PIDmu CHI2NDOF ABOVE MU THRESHOLD HCal PIDe NDOF ABOVE k THRESHOLD HCal PIDmu LIKELIHOOD DLLe Prs PIDe GHOST PROBABILITY DLLmu InAccBrem FIT MATCH CHI2 DLLk BremPIDe CLONE DISTANCE DLLp FIT VELO CHI2 DLLbt FIT VELO NDOF FIT T CHI2 FIT T NDOF Muon Velo Background Likelihood VeloCharge Muon Likelihood IsMuon nshared InMuon Acceptance IsLooseMuon Table 2 List of PID-related variables that are inputs to the PIDANN algorithms (Run 1 version) and stored in the TurboReport. page 10

A Appendix Date: July 18, 2016 Tracking Rich Calorimeters P USED R1 GAS Ecal PIDe PT USED R2 GAS Ecal PIDmu CHI2NDOF ABOVE MU THRESHOLD HCal PIDe NDOF ABOVE k THRESHOLD HCal PIDmu GHOST PROBABILITY DLLe Prs PIDe FIT MATCH CHI2 DLLmu InAccBrem FIT VELO CHI2 DLLk BremPIDe FIT VELO NDOF DLLp FIT T CHI2 DLLbt FIT T NDOF Muon Velo Background Likelihood Muon Likelihood IsMuon nshared InMuon Acceptance IsLooseMuon Table 3 List of PID-related variables that are inputs to the PIDANN algorithms (Run 2 version), and stored in the TurboReport. Channel Rate Charm Charm Topo Leptons Turbo EW Low Other Tech Full Turbo Calib Mult (khz) (%) (%) (%) (%) (%) (%) (%) (%) (%) ALL 13.0 23.1 35.4 23.1 16.2 3.1 5.6 4.4 9.0 1.3 CharmFull 3.0 100.0 16.7 13.3 3.3 3.3 6.7 0.0 4.4 0.0 CharmTurbo 4.6 10.9 100.0 10.9 0.7 2.2 4.3 0.0 8.0 0.0 Topo 3.0 13.3 16.7 100.0 5.6 4.4 4.4 0.0 11.1 0.0 Leptons 2.1 4.8 1.6 7.9 100.0 1.6 4.8 0.0 1.6 0.0 TurboCalib 0.4 25.0 25.0 33.3 8.3 100.0 0.0 0.0 0.0 0.0 EW 0.7 27.3 27.3 18.2 13.6 0.0 100.0 0.0 4.5 0.0 LowMult 0.6 0.0 0.0 0.0 0.0 0.0 0.0 100.0 0.0 0.0 Other 1.2 11.4 31.4 28.6 2.9 0.0 2.9 0.0 100.0 0.0 Technical 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 100.0 Table 4 Overlap between different samples. Each cell is normalised to the rate expected from each sample (shown in the second column). page 11