LHCb Computing Resources: 2018 requests and preview of 2019 requests

LHCb Computing Resources: 2018 requests and preview of 2019 requests LHCb-PUB-2017-009 23/02/2017 LHCb Public Note Issue: 0 Revision: 0 Reference: LHCb-PUB-2017-009 Created: 23 rd February 2017 Last modified: 28 th February 2017 Prepared By: LHCb Computing Project C. Bozzi/Editor

Introduction Last modified: 28th February 2017 Abstract This document presents a reassessment of computing resources needed by LHCb in 2018 and a preview of computing requests for 2019, as resulting from the current experience of Run2 data taking and recent changes in the LHCb computing model parameters. ii page ii

Introduction Last modified: 28th February 2017 Table of Contents 1. INTRODUCTION... 1 2. THE LHCB COMPUTING MODEL... 1 3. PROCESSING PLANS FOR 2016 AND BEYOND... 2 3.1. SIMULATION... 2 3.2. DATA TAKING... 2 3.3. PROCESSING MODEL FOR RUN 2... 3 3.4. DATA DISTRIBUTION MODELS... 3 4. RESOURCE ESTIMATES UNDER VARIOUS HYPOTHESES... 4 5. RESOURCES NEEDED IN 2018 AND 2019... 7 6. SUMMARY OF REQUESTS... 8 page iii

Introduction Last modified: 28th February 2017 List of Tables Table 3-1: Assumed LHC proton-proton and heavy ion running time for 2017 and 2018. 3 Table 5-1: Estimated CPU work needed for the different activities.... 7 Table 5-2: Disk Storage needed for the different categories of LHCb data.... 7 Table 5-3: Tape Storage needed for the different categories of LHCb data.... 7 Table 6-1: CPU power requested at the different Tier levels.... 8 Table 6-2: LHCb Disk request for each Tier level. For countries hosting a Tier1, the Tier2 contribution could also be provided at the Tier1.... 8 Table 6-3: LHCb Tape request for each Tier level.... 8 List of Figures Figure 4-1: Disk estimates under various hypotheses. The top plot shows the total request, the bottom plot the increment with respect to the previous year.... 5 Figure 4-2 Tape estimates under various hypotheses. The top plot shows the total request, the bottom plot the increment with respect to the previous year.... 6 iv page iv

Introduction Last modified: 28th February 2017 1. Introduction This document presents the computing resources needed by LHCb in the 2018 WLCG year 1 and a preview of 2019 requests. It is based on the latest measurements of the LHCb computing model parameters and latest updates of the LHC running plans. This document is organized as follows. The LHCb computing model, its implementation and recent changes are described in section 2, while processing plans are described in section 3. Resource estimates in different scenarios are given in section 4. A summary of the requests is given in section 5. 2. The LHCb Computing Model A detailed description of the LHCb Computing Model is given elsewhere [LHCb-PUB-2012-014 and LHCb-PUB-2011-009]. Subsequent reports [LHCb-PUB-2013-002, LHCb-PUB-2013-014, LHCb-PUB- 2014-014, LHCb-PUB-2015-003, LHCb-PUB-2016-003] discussed further changes and their impact on the required resources. The 2017 requests were reassessed in Autumn 2016, given that the expected LHC live times in both 2016 and 2017 were considerably higher than previously foreseen. Also, some parameters of the computing model, measured on the 2016 datasets, were different from the one used to compute the requests. Mitigation measures were put in place and the LHCb computing model has been modified, as discussed in [LHCb- PUB-2016-022], with the purpose of keeping the same resource envelope for 2016, limiting the increase in 2017 with respect to the previous requests, and staying within a reasonable budget in the following years. The most relevant features of the LHCb Computing Model are reported below. Data are received from the online system in two streams, o o A TURBO stream, where the output of the online reconstruction is stored on tape and subsequently resurrected in a micro-dst format and made available to analysts A FULL stream, where RAW events are reconstructed offline, then filtered according to selection criteria specific for given analyses (stripping lines). RAW data and the output of the offline reconstruction (RDST) are saved on tape, the stripping output is replicated and distributed on disk storage The stripping output can be in either DST or micro-dst format, where the complete reconstructed event is contained in the former and only signal candidates and possible additional information is included the latter. Stripping lines are designed such, that as many lines as possible are written in micro-dst. The production of simulated events runs continuously, with the aim of producing signal and background samples for a total number of simulated (and reconstructed) events which is in the order of 15% of the total number of collected real data events. In the previous reports, preliminary estimates of the resources required for 2018 and 2019 were computed with the following assumptions: 1 For the purpose of this document a given year always refers to the period between April 1 st of that year and March 31 st of the following year. page 1

Processing plans for 2016 and beyond Last modified: 28th February 2017 Running time for proton collisions of 7.8 million seconds (2017) and 7.8 million seconds (2018). This corresponds to an efficiency for physics of about 60%. A week of proton-argon collisions in a fixed target mode in 2017, assuming an efficiency of 30%. In 2018, a month of heavy ion collisions, with concurrent heavy ion proton collision in fixed target configuration, will also take place. Trigger output rates of 8kHz and 4kHz for the FULL and TURBO streams respectively Two copies of the most recent processing of both data and simulation are kept on disk. For the next-to-most recent data processing the number of copies are two for data, one for simulation. The stripping process produces an output of 120MB per live second of LHC. The event size for RAW, RDST, TURBO, DST and micro-dst in data are 65, 50, 20, 120 and 10 kb, respectively. 3. Processing plans for 2016 and beyond 3.1. Simulation Simulation will continue to produce samples for the LHCb Upgrade studies, the simulation of events according to the observed Run 2 data-taking conditions, and the implementation of the latest updates to generators and decay processes. The implementation of fast MC simulation options, with either the simplification of time-consuming algorithms (e.g. the propagation and detection of photons in the RICH detectors), or a parameterized detector response, in order to mitigate the impact on CPU due to the usage of the full GEANT4 detector simulation, is also in progress. As an alternative, the "particle gun" approach, in which only the signal decay is generated, simulated and reconstructed, and not the entire pp collision, is also available. A reduction factor of ~ 20 in both CPU and disk space is estimated with this technique. Other techniques have been deployed for reducing the storage requirements by running the trigger and stripping steps in so-called "filtering" mode, in which generated MC event which fail the trigger or stripping criteria are thrown away. Until very recently, simulation was saved in the DST format only. The latest simulation cycle, which started in June 2016, includes also the implementation of a micro-dst format for MC, where only the signal part of the event is saved, with considerable disk space savings. 3.2. Data taking Table 3-1 shows the assumptions made concerning the availability of the LHC for physics running in 2016 and 2017 2. In 2016, a detailed investigation optimized the trigger bandwidth division between FULL and TURBO at 8kHz and 4kHz, respectively. However, given that LHCb takes data at a constant average number of collisions per bunch crossing, for a given trigger configuration these rates scale with the number of bunches 2 http://lhc-commissioning.web.cern.ch/lhc-commissioning/schedule/lhc-long-term.htm and links therein. 2 page 2

Processing plans for 2016 and beyond Last modified: 28th February 2017 colliding in the LHC. It is foreseen in 2017 and 2018 that this number will increase from 2076 to 2448, i.e. by a factor 1.18. The trigger rate is therefore scaled up by this factor. The average event size in the TURBO stream went up from the originally planned 10kB to 50kB in 2016. While an optimization of the contents of the TURBO format, it was decided to park on tape 35% of the TURBO stream for the entire Run 2. A review of the various lines entering in the TURBO stream allowed to conclude that the throughput on disk of 2016 TURBO data can be limited to 100MB per live second of the LHC, to be compared with 20kB*4kHz = 80MB/s foreseen in the previous report. This value, rescaled by the factor discussed above to take into account the increased number of bunches, is used to compute the requests. Parameter 2017 2018 Proton physics LHC run days 150 150 LHC efficiency 0.60 0.60 Approx. running seconds 7.8 10 6 7.8 10 6 Number of bunches 2448 2448 Heavy Ion physics Approx. running seconds 0.1 10 6 1.4 10 6 Table 3-1: Assumed LHC proton-proton and heavy ion running time for 2017 and 2018. 3.3. Processing model for Run 2 The data taking model exploited successfully so far in Run 2 will also continue in 2018. No offline reprocessing of RAW data is foreseen during Run 2. If necessary, this can be done during Long Shutdown 2. However, no resources are currently foreseen for this activity. The processing scenario for proton physics in the previous report assumed a full restripping to be performed at the end of each year of data taking, another one a year later; two incremental strippings are performed in between. In all cases, the entire dataset accumulated up to that point was supposed to be involved. In the meantime, given the data volumes involved, the limited bandwidth for tape recalls at the Tier 1 sites and the necessity to avoid to run additional stripping campaigns concurrently with data taking periods, it became clear that only the data accumulated in a given year can be re-stripped at the end of that year. Nevertheless, a full re-stripping of all Run 2 data will take place during LS2 in WLCG year 2019. As a temporary mitigation measure, at the end of 2016 data taking it was decided to perform a full restripping of the 2016 data and an incremental stripping of 2015 data. The 2016 re-stripped data will replace completely the previous stripping cycle, and avoid the incremental strippings to be performed in 2017. The same strategy is foreseen for the 2018 data taking. The stripping throughput used in this report is 165MB per live second of the LHC, as resulting from recent measurements in preparation of the upcoming stripping campaign, very similar to 160MB/s, the throughput measured in the 2016 data taking, and significantly higher than the value of 120MB/s, which was used in the previous report. 3.4. Data distribution models In the LHCb computing model used in the present document, the following policy is used for the distribution of proton physics data on disk at Tier0, Tier1 and Tier2-D 3 : 3 Tier-2D sites are a selection of Tier2 sites with disk, that host part of the most recent version of official datasets for analysis, both for real and simulated data, and are therefore available for running user analysis page 3

Resource estimates under various hypotheses Last modified: 28th February 2017 For real data, two copies of each dataset are available to analysts for both the most recent and the previous version of the processing For simulated data and the most recent processing, two copies of each dataset are kept on disk; for the previous processing version, there is only one disk copy Further reducing the number of copies would impact the efficiency in data analysis, as scheduled and unscheduled outages of computing centers, as well as the increase in the number of jobs in any single site, would slow down the time needed by analysts to process data. 4. Resource estimates under various hypotheses This section shows the evolution of the computing resources under different assumptions. Figure 4-1 and Figure 4-2 show the total and incremental disk and tape requirements for 2017-2019. The variations in CPU requirements are small and not reported. In each plot, the leftmost entry in the 2017, 2018 and 2019 requests corresponds to those reported in LHCb-PUB-2016-022. The other entries show the effect (on top of each other) due to the reorganization of the stripping campaigns (see Section 3.3); variation of stripping throughput (120à165MB/s, see Section 3.3); variation of TURBO throughput (80à100MB/s, see Section 3.2); rescaling of full, stripping (165à195MB/s) and TURBO (100à118MB/s) throughputs due to the different number of bunches (2076à2448, see Section 3.2). For 2017, the last entry represents the WLCG pledge, taken from REBUS. The resources needed in 2018 and 2019 are computed by using the last model, that implements the new scheme for stripping and takes into account the various increases in the stripping and TURBO throughput, rescaled with the number of bunches expected in 2018 and 2019. The detailed requests are presented in the next section. jobs requiring those data. The ensemble of Tier-2D sites gives a total disk space equivalent to an average Tier1 site 4 page 4

Resource estimates under various hypotheses Last modified: 28th February 2017 Figure 4-1: Disk estimates under various hypotheses. The top plot shows the total request, the bottom plot the increment with respect to the previous year. page 5

Resource estimates under various hypotheses Last modified: 28th February 2017 Figure 4-2 Tape estimates under various hypotheses. The top plot shows the total request, the bottom plot the increment with respect to the previous year. 6 page 6

Resources needed in 2018 and 2019 Last modified: 28th February 2017 5. Resources needed in 2018 and 2019 Table 5-1 presents, for the different activities, the CPU work estimates when applying the baseline model defined above. CPU Work in WLCG year (khs06.years) 2018 2019 Prompt Reconstruction 49 0 First pass Stripping 20 0 Full Restripping 0 61 Incremental (Re-)stripping 10 15 Processing of heavy ion collisions 38 0 Simulation 342 411 VoBoxes and other services 4 4 User Analysis 32 38 Total Work (khs06.years) 495 529 Table 5-1: Estimated CPU work needed for the different activities. Table 5-2 presents, for the different data classes, the forecast total disk space usage at the end of the years 2018-2019 when applying the baseline model described in the previous section. Table 5-3 shows, for the different data classes, the forecast total tape usage at the end of the years 2018-2019. Disk storage usage forecast (PB) 2018 2019 Stripped Real Data 16.7 22.6 TURBO Data 5.3 5.3 Simulated Data 13.1 15.5 User Data 1.2 1.2 Heavy Ion Data 4.2 4.2 RAW and other buffers 1.2 1.2 Other 0.6 0.6 Total 42.3 50.7 Table 5-2: Disk Storage needed for the different categories of LHCb data. Tape storage usage forecast (PB) 2018 2019 Raw Data 47.9 47.9 RDST 15.4 15.4 MDST.DST 6.8 6.8 Heavy Ion Data 3.7 3.7 Archive 24.1 31.3 Total 97.9 105.1 Table 5-3: Tape Storage needed for the different categories of LHCb data. page 7

Summary of requests Last modified: 28th February 2017 6. Summary of requests Table 6-1 shows the CPU requests at the various tiers, as well as for the HLT farm and Yandex. We assume that the HLT and Yandex farms will provide the same level of computing power as in the past; therefore we subtract the contributions from these two sites from our requests to WLCG. The required resources are apportioned between the different Tiers taking into account the capacities that are already installed. The disk and tape estimates shown in previous section have to be broken down into fractions to be provided by the different Tiers using the distribution policies described in LHCb-PUB-2013-002. The results of this sharing are shown in Table 6-2 and Table 6-3. CPU Power (khs06) 2018 2019 Tier 0 81 86 Tier 1 253 271 Tier 2 141 152 Total WLCG 475 509 HLT farm 10 10 Yandex 10 10 Total non-wlcg 20 20 Grand total 495 529 Table 6-1: CPU power requested at the different Tier levels. Disk (PB) 2018 2019 Tier0 12.0 14.6 Tier1 24.5 29.0 Tier2 5.8 7.1 Total 42.3 50.7 Table 6-2: LHCb Disk request for each Tier level. For countries hosting a Tier1, the Tier2 contribution could also be provided at the Tier1. Tape (PB) 2018 2019 Tier0 36.4 37.7 Tier1 61.5 67.4 Total 97.9 105.1 Table 6-3: LHCb Tape request for each Tier level. There is a slight (2%) increase of CPU resources with respect to the previously scrutinized requests. For disk, there is a small (6%) decrease in 2018. For tape, there is a 10% increase, mainly due to the increased trigger rate. 8 page 8