LHCb Computing Resource usage in 2017

Similar documents
LHCb Computing Resources: 2018 requests and preview of 2019 requests

LHCb Computing Resources: 2019 requests and reassessment of 2018 requests

LHCb Computing Status. Andrei Tsaregorodtsev CPPM

Computing Resources Scrutiny Group

LHCb Computing Strategy

The CMS Computing Model

Computing. DOE Program Review SLAC. Rainer Bartoldus. Breakout Session 3 June BaBar Deputy Computing Coordinator

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

The evolving role of Tier2s in ATLAS with the new Computing and Data Distribution model

Batch Services at CERN: Status and Future Evolution

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

150 million sensors deliver data. 40 million times per second

ATLAS operations in the GridKa T1/T2 Cloud

The LHC Computing Grid

System upgrade and future perspective for the operation of Tokyo Tier2 center. T. Nakamura, T. Mashimo, N. Matsui, H. Sakamoto and I.

The INFN Tier1. 1. INFN-CNAF, Italy

Andrea Sciabà CERN, Switzerland

Software and computing evolution: the HL-LHC challenge. Simone Campana, CERN

Track pattern-recognition on GPGPUs in the LHCb experiment

LHCb Distributed Conditions Database

Overview of ATLAS PanDA Workload Management

Grid Computing Activities at KIT

Status of KISTI Tier2 Center for ALICE

New strategies of the LHC experiments to meet the computing requirements of the HL-LHC era

Data Transfers Between LHC Grid Sites Dorian Kcira

The ATLAS Tier-3 in Geneva and the Trigger Development Facility

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

First LHCb measurement with data from the LHC Run 2

Constant monitoring of multi-site network connectivity at the Tokyo Tier2 center

An Analysis of Storage Interface Usages at a Large, MultiExperiment Tier 1

The High-Level Dataset-based Data Transfer System in BESDIRAC

UW-ATLAS Experiences with Condor

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

The LHC computing model and its evolution. Dr Bob Jones CERN

CERN and Scientific Computing

Preparing for High-Luminosity LHC. Bob Jones CERN Bob.Jones <at> cern.ch

Large scale commissioning and operational experience with tier-2 to tier-2 data transfer links in CMS

Deferred High Level Trigger in LHCb: A Boost to CPU Resource Utilization

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

Distributed Data Management on the Grid. Mario Lassnig

ANSE: Advanced Network Services for [LHC] Experiments

Update of the Computing Models of the WLCG and the LHC Experiments

Benchmarking the ATLAS software through the Kit Validation engine

1. Introduction. Outline

Clustering and Reclustering HEP Data in Object Databases

Data services for LHC computing

2012 STANDART YEAR CERN Tier0 CAF Tier 1 Total Tier1 Tier1ex Tier2ex Total ex Total CPU (MSI2K)

First Experience with LCG. Board of Sponsors 3 rd April 2009

Optimizing Parallel Access to the BaBar Database System Using CORBA Servers

Stephen J. Gowdy (CERN) 12 th September 2012 XLDB Conference FINDING THE HIGGS IN THE HAYSTACK(S)

Distributed e-infrastructures for data intensive science

Big Computing and the Mitchell Institute for Fundamental Physics and Astronomy. David Toback

Reliability Engineering Analysis of ATLAS Data Reprocessing Campaigns

Distributed Monte Carlo Production for

Storage Resource Sharing with CASTOR.

Storage and I/O requirements of the LHC experiments

The LHC Computing Grid. Slides mostly by: Dr Ian Bird LCG Project Leader 18 March 2008

Users and utilization of CERIT-SC infrastructure

ATLAS NOTE. December 4, ATLAS offline reconstruction timing improvements for run-2. The ATLAS Collaboration. Abstract

Monitoring of Computing Resource Use of Active Software Releases at ATLAS

Streamlining CASTOR to manage the LHC data torrent

Spanish Tier-2. Francisco Matorras (IFCA) Nicanor Colino (CIEMAT) F. Matorras N.Colino, Spain CMS T2,.6 March 2008"

Pushing the Limits. ADSM Symposium Sheelagh Treweek September 1999 Oxford University Computing Services 1

HammerCloud: A Stress Testing System for Distributed Analysis

The grid for LHC Data Analysis

The Grid: Processing the Data from the World s Largest Scientific Machine

CMS High Level Trigger Timing Measurements

The BABAR Database: Challenges, Trends and Projections

Challenges of the LHC Computing Grid by the CMS experiment

arxiv: v1 [physics.ins-det] 1 Oct 2009

The Software Defined Online Storage System at the GridKa WLCG Tier-1 Center

Five years of OpenStack at CERN

DIRAC Distributed Infrastructure with Remote Agent Control

Past. Inputs: Various ATLAS ADC weekly s Next: TIM wkshop in Glasgow, 6-10/06

Long Term Data Preservation for CDF at INFN-CNAF

arxiv: v1 [cs.dc] 12 May 2017

CHIPP Phoenix Cluster Inauguration

ATLAS Distributed Computing Experience and Performance During the LHC Run-2

ATLAS computing activities and developments in the Italian Grid cloud

(Tier1) A. Sidoti INFN Pisa. Outline: Tasks and Goals The analysis (physics) Resources Needed

Clouds in High Energy Physics

Virtualizing a Batch. University Grid Center

CernVM-FS beyond LHC computing

Compact Muon Solenoid: Cyberinfrastructure Solutions. Ken Bloom UNL Cyberinfrastructure Workshop -- August 15, 2005

Data Analysis in ATLAS. Graeme Stewart with thanks to Attila Krasznahorkay and Johannes Elmsheuser

Support for multiple virtual organizations in the Romanian LCG Federation

The CMS experiment workflows on StoRM based storage at Tier-1 and Tier-2 centers

Belle & Belle II. Takanori Hara (KEK) 9 June, 2015 DPHEP Collaboration CERN

A L I C E Computing Model

The EU DataGrid Testbed

Computing / The DESY Grid Center

Data Management for the World s Largest Machine

Example. Section: PS 709 Examples of Calculations of Reduced Hours of Work Last Revised: February 2017 Last Reviewed: February 2017 Next Review:

Reprocessing DØ data with SAMGrid

Netherlands Institute for Radio Astronomy. May 18th, 2009 Hanno Holties

The ATLAS EventIndex: an event catalogue for experiments collecting large amounts of data

Summary of the LHC Computing Review

Data handling and processing at the LHC experiments

DESY. Andreas Gellrich DESY DESY,

Evolution of Database Replication Technologies for WLCG

Transcription:

LHCb Computing Resource usage in 2017 LHCb-PUB-2018-002 07/03/2018 LHCb Public Note Issue: First version Revision: 0 Reference: LHCb-PUB-2018-002 Created: 1 st February 2018 Last modified: 12 th April 2018 Prepared By: LHCb Computing Project C. Bozzi/Editor

Introduction ii page ii

Table of Contents Abstract This document reports the usage of computing resources by the LHCb collaboration during the period January 1 st December 31 st 2017. The data in the following sections have been compiled from the EGI Accounting portal: https://accounting.egi.eu. For LHCb specific information, the data is taken from the DIRAC Accounting at the LHCb DIRAC Web portal: http://lhcb-portal-dirac.cern.ch. page i

Table of Contents Table of Contents 1. INTRODUCTION... 1 2. COMPUTING ACTIVITIES DURING 2017... 2 3. USAGE OF CPU RESOURCES... 3 3.1. WLCG ACCOUNTING... 5 3.2. LHCB DIRAC CPU ACCOUNTING... 7 4. USAGE OF STORAGE RESOURCES... 14 4.1. TAPE STORAGE... 14 4.2. DISK AT TIER0, TIER1S AND TIER2S... 16 5. DATA POPULARITY... 21 6. THROUGHPUT RATES... 24 7. SUMMARY... 25 APPENDIX... 26 page ii

List of Figures List of Figures Figure 3-1: Summary of LHCb computing activities (used CPU power) from Jan 1 st to December 31 st 2017. The line represents the 2017 pledge.... 3 Figure 3-2: Summary of LHCb site usage (used CPU power) from Jan 1 st until December 31 st 2017.... 4 Figure 3-3: Average monthly CPU power provided by the Tier1s (and Tier0) to LHCb from Jan 1 st until December 31 st 2017.... 5 Figure 3-4: Average monthly CPU power provided by the Tier2s to LHCb from Jan 1 st until December 31 st 2017... 6 Figure 3-5: CPU time consumed by LHCb from Jan 1 st until December 31 st 2017 (in days), for WLCG sites, excluding the HLT farm (top, per country), and non-wlcg sites, including the HLT farm and the CERN cloud (bottom, per site).... 9 Figure 3-6: Usage of LHCb Tier0/1s during Jan 1 st Dec 31 st 2017. The top plot shows the used CPU power as a function of the different activities, while the bottom plot shows the contributions from the different sites.... 10 Figure 3-7: Usage of resources in WLCG sites other than Tier0 and Tier1s, pledging to LHCb, during Jan 1 st Dec 31 st 2017. The top plot shows the used CPU power as a function of the different activities, while the bottom plot shows the contributions from the different sites. User jobs (shown in magenta in the top plot) are further detailed in Figure 3-8.... 11 Figure 3-8: Running user jobs on Tier0 and Tier1 sites (top), and outside (bottom) during Jan 1 st Dec 31 st 2017... 12 Figure 3-9: CPU time as a function of the job final status for all jobs (top), and as a function of activity for stalled jobs (bottom)... 13 Figure 4-1: Tape space occupancy for RAW data, January 1 st December 31 st 2017... 14 Figure 4-2: Tape occupancy for RDST data, January 1 st December 31 st 2017... 15 Figure 4-3: Tape occupancy for derived data, January 1 st December 31 st 2017... 15 Figure 4-4: Usage of Disk resources at CERN and Tier1s from Jan 1 st to December 31 st 2017. Real data and simulated data are shown in the top and bottom figures.... 17 Figure 4-5: Usage of buffer disk from January 1 st to December 31 st 2017. The large increase at the end of the year is due to pre-staging from tape in view of the restripping campaign.... 18 Figure 4-6: Usage of disk resources reserved for Users from January 1 st to December 31 st 2017... 18 Figure 4-7: Usage of disk space at Tier-2D sites from Jan 1 st to December 31 st 2017, for real data (top) and simulated data (bottom).... 20 Figure 5-1: Volume of accessed datasets as a function of the number of accesses in the last 13 weeks.... 22 Figure 5-2: Volume of accessed datasets as a function of the number of accesses in the last 26 weeks.... 22 page iii

Introduction Figure 5-3: Volume of accessed datasets as a function of the number of accesses in the last 52 weeks.... 23 Figure 6-1: Throughput rates for exports of RAW data from the online storage to the CERN Tier0 (top) and from the Tier0 to Tier1s (bottom) which also includes prestaging.... 24 iv page iv

Introduction List of Tables Table 1-1: LHCb estimated resource needs for 2017 (September 2016).... 1 Table 1-2: Site 2017 pledges for LHCb.... 1 Table 3-1: Average CPU power provided to LHCb during Jan-Dec 2017 (Tier0 + Tier1s).6 Table 3-2: Average CPU power provided to LHCb from Jan 1 st until December 31 st 2017, in WLCG sites other than the Tier0 and Tier1s... 7 Table 4-1: Situation of Disk Storage resource usage as of February 8 th 2018, available and installed capacity as reported by SRM, and 2017 pledge. The contribution of each Tier1 site is also reported. (*) Due to a prolonged site outage following an incident, the CNAF figures refer to November 9 th, 2017.... 16 Table 4-2: Available and used disk resources at Tier-2D sites on February 8 th 2018.... 19 Table 5-1: Volume (as percentage of total data volume) of datasets accessed at least once in the last 13, 26, 52 weeks, in various periods as reported in this and previous documents... 21 Table A-1: Summary of the scrutinized, pledged, deployed and used resources in the 2017 WLCG year.... 26 page v

Introduction 1. Introduction As part of the WLCG resource review process, experiments are periodically asked to justify the usage of computing resources that have been allocated to them. The requests for the 2017 period were presented in LHCb-PUB-2015-018 and, following the data taking in 2015, reassessed in LHCb-PUB-2016-022. Table 1-1 shows the requests to WLCG. For CPU, an additional average power (over the year) of 10kHS06 and 10kHS06 were deemed to be available from the HLT and Yandex farms, respectively. 2016 CPU Disk Tape (khs06) (PB) (PB) Tier0 67 10.9 25.2 Tier1 207 22.1 42.0 Tier2 116 4.7 n/a Total WLCG 390 37.7 67.2 Table 1-1: LHCb estimated resource needs for 2017 (September 2016). The last update to the pledge of computing resources for LHC from the different sites can be found in: http://wlcg-rebus.cern.ch/apps/pledges/resources/. The LHCb numbers for 2017 are summarized in Table 1-2. 2017 CPU Disk Tape (khs06) (PB) (PB) Tier 0 67 10.9 25.2 Tier 1 199 20.9 43.3 Tier 2 147 3.3 Total WLCG 423 35.1 68.5 Non WLCG 20 Table 1-2: Site 2017 pledges for LHCb. The present document covers the resource usage by LHCb in the period January 1 st December 31 st 2017. It is organized as follows: section 2 presents the computing activities; section 3 shows the usage of CPU resources; section 4 reports on the usage of storage resources. Section 5 shows the results of a study on data popularity. Throughput rates are discussed in Section 6. Finally, everything is summarized on Section 7. page 1

Computing activities during 2017 2. Computing activities during 2017 The usage of offline computing resources involved: (a) the production of simulated events, which runs continuously; (b) running user jobs, which is also continuous; (c) stripping cycles before and after the end of data taking; (d) processing (i.e. reconstruction and stripping of the full stream, µdst streaming of the TURBO stream) of data taken in 2017 in proton-proton collisions; (e) centralized production of ntuples for analysis working groups. Activities related to the 2017 data taking were tested in May and started at the beginning of the LHC physics run mid-june. A steady processing and export of data transferred from the pit to offline were always seen. We recall that LHCb implemented in Run2 a trigger strategy, by which the high-level trigger is split in two parts. The first one (HLT1), synchronous with data taking, writes events at a 150kHz output rate in a temporary disk buffer located on the HLT farm nodes. Real-time calibrations and alignments are then performed and used in the second highlevel trigger stage (HLT2), where event reconstruction algorithms as close as possible to those run offline are applied, and event selection is taking place. Events passing the high-level trigger selections are sent to offline, either via a FULL stream of RAW events which are then reconstructed and processed as in Run1, or via a TURBO stream which directly records the results of the online reconstruction on tape. TURBO data are subsequently reformatted in a µdst format that does not require further processing, are stored on disk and can be used right away for physics analysis. By default, an event of the TURBO stream contains only information related to signal candidates. In order to ease the migration of analyses using also information from the rest of the event, it was allowed in 2016 data taking to persist the entire output of the online reconstruction (TURBO++). This temporary measure increased the average size of a TURBO event from 10kB to about 50kB, with obvious consequences on the required storage space. Efforts have been made in 2017 in order to slim the TURBO content by enabling analysts to save only the information that is really needed, resulting in an average TURBO event size of 30kB. Some effort was also done on streaming the TURBO output, in order to optimize data access. Other than easing and optimizing the analysis of Run2 data, these efforts are also crucial in view of the Run3 upgrade where, due to offline computing limitations, the default data format for analysis will be based on the output of the TURBO stream. In addition to the stripping activity concurrent with data taking, an incremental stripping cycle was run in March-April 2017 on data taken in 2015 and 2016. A full re-stripping of 2015, 2016 and 2017 data was started in autumn 2017. Since November 9 th 2017, the INFN-CNAF Tier1 center was no longer available, due to a major incident. The impact of this outage had a significant impact on the LHCb operational activities. Other sites helped by providing more resources, so that the global CPU work stayed unchanged. Nevertheless, the stripping cycles that were started in November are not complete yet, as about 20% of data located at CNAF still need to be processed. As in previous years, LHCb continued to make use of opportunistic resources, that are not pledged to WLCG, but that significantly contributed to the overall usage. 2 page 2

Usage of CPU Resources 3. Usage of CPU Resources Computing activities (see Figure 3-1) were dominated by Monte Carlo production. Figure 3-1 also shows a continuous contribution due to user jobs and periods where re-stripping was performed in the spring and towards the end of the year. Reconstruction of collision data is performed while taking data from June onwards. The stripping of 2016 proton data started concurrently with data taking, keeping up until the end of data taking. It should be noted that CPU resources for real data reconstruction and stripping represent a very low fraction of the overall CPU usage (5.7%). It is to be noted that the LHCbDirac system is flexible enough to allow real data processing to take place at sites different to that that holds the data (so-called mesh processing), which gives the possibility to help when necessary some Tier1s adding some Tier2s or even other Tier1s to its processing capabilities. Figure 3-1: Summary of LHCb computing activities (used CPU power) from Jan 1 st to December 31 st 2017. The line represents the 2017 pledge. Thanks to the pilot job paradigm used in LHCb, the jobs were executed in the various centers, according to the computing resources available to LHCb (Figure 3-2). The Tier0 and seven Tier1s contribute about 60%, the rest is due to Tier2s, with significant amounts from Tier-2D with disk, and unpledged resources (the Yandex farm and commercial clouds among them). page 3

Usage of CPU Resources The LHCb Online HLT farm was the site providing the largest amount of CPU, 22% of the total, with periods where its computing power was equivalent to that of all the other sites together. Due to the preparation for the 2017 data taking, it was no longer available as of the end of May. It was massively used again during the technical stops (August and September) and after the end of the data taking. Starting in September, Monte Carlo production jobs were also executed on the HLT farm concurrently with data taking although with lower priority, in order to use free CPU cycles when the HLT applications did not saturate the farm. The CPU usage was significantly higher than the requests and the pledges. This is due to the steady demand of large simulated samples from physics working groups, already observed in 2016. Since a significant fraction of these productions were filtered according to trigger and stripping selection criteria, there was no significant increase in the required storage resources. Figure 3-2: Summary of LHCb site usage (used CPU power) from Jan 1 st until December 31 st 2017. 4 page 4

Usage of CPU Resources 3.1. WLCG Accounting The usage of WLCG CPU resources by LHCb is obtained from the different views provided by the EGI Accounting portal. The CPU usage is presented in Figure 3-3 for the Tier1s and in Figure 3-4 for Tier2s 1. The same data is presented in tabular form in Table 3-1 and Table 3-2. The average power used at Tier0+Tier1 sites is about 19% higher than the pledges. The average power used at Tier2s is about 25% higher than the pledges. The average CPU power accounted for by WLCG (including Tier0/1 + Tier2) amounts to 500 khs06, to be compared to 390 khs06 estimated needs quoted in Table 1-1. Additional computing power was used at the LHCb HLT farm and non-wlcg sites, for an estimated contribution of about 230 khs06 on average (see Figure 3-5). At CERN, the usage of CPU was corresponding to the request and pledges. The Tier1s usage is generally higher than the pledges. In general, the LHCb computing model is flexible enough to use computing resources for all production workflows wherever available. Figure 3-3: Average monthly CPU power provided by the Tier1s (and Tier0) to LHCb from Jan 1 st until December 31 st 2017. 1 Including WLCG Tier2 sites that are not pledging resources to LHCb but are accepting LHCb jobs. page 5

Usage of CPU Resources <Power> Used (khs06) Pledge (khs06) CH-CERN 68.5 67 DE-KIT 51.6 33 ES-PIC 8.9 12.3 FR-CCIN2P3 31.5 23.0 IT-INFN-CNAF 34.9 35.9 NL-T1 38.0 20.5 RRC-KI-T1 24.1 16.4 UK-T1-RAL 58.5 58.0 Total 315.9 266.1 Table 3-1: Average CPU power provided to LHCb during Jan-Dec 2017 (Tier0 + Tier1s). Figure 3-4: Average monthly CPU power provided by the Tier2s to LHCb from Jan 1 st until December 31 st 2017 6 page 6

Usage of CPU Resources <Power> Used (khs06) Pledge (khs06) Brazil 6.9 22.0 France 32.7 20.4 Germany 6.6 6.6 Israel 0.1 0 Italy 8.4 20.1 Poland 10.2 4.1 Romania 2.1 6.5 Russia 13.0 22.4 Spain 7.8 6.6 Switzerland 16.1 15.6 UK 80.4 22.8 Total 184.3 147.0 Table 3-2: Average CPU power provided to LHCb from Jan 1 st until December 31 st 2017, in WLCG sites other than the Tier0 and Tier1s. 3.2. LHCb DIRAC CPU Accounting The sharing of CPU time (in days) is shown in Figure 3-5. The top chart reports the CPU time per country provided by WLCG sites, including the ones not providing a pledge for LHCb, and excluding the HLT farm. The bottom chart shows the CPU time per site provided by non-wlcg sites and the CERN cloud. The HLT farm at CERN give the dominant contribution, the Yandex farm and the CERN cloud (including the Oracle cloud contribution) come immediately next. Other farms (Zurich, Ohio Supercomputing Center and Sibir) provide sizeable CPU time. The contribution of volunteer computing (BOINC) is also clearly visible. The CPU power used at Tier0/1s is detailed in Figure 3-6. As seen in the top figure, simulation jobs are dominant (75% of the total), with user jobs contributing about 10% to the total. The stripping cycles are also visible, as well as the offline reconstruction of 2017 data. The contributions from the Tier2s and other WLCG sites (including sites not pledging any resources to LHCb, but excluding the HLT farm) are shown in Figure 3-7. Simulation (92%) and user jobs (5%) dominate the CPU usage. A small fraction of stripping and data reconstruction is also visible in Figure 3-7. Figure 3-8 shows the number of user jobs per site at Tier0 and Tier1s (top) and outside (bottom). Tier-2D sites contribute about 2/3 of the work outside Tier0 and Tier1s. Figure 3-9 shows pie charts of the CPU days used at all sites as a function of the final status of the job (top) and as a function of the activity when jobs are stalled (bottom plot). About 93% of all jobs complete successfully. The major contributors to unsuccessful jobs are stalled jobs. By looking at the bottom plot of Figure 3-9, one can see that stalled jobs (i.e. jobs killed by the batch system, mostly due to excess of CPU or memory usage, or abrupt failure of the worker nodes) are due to simulation jobs in 2/3 of the cases, the remainder being due to user jobs. page 7

Usage of CPU Resources The majority of stalled jobs were temporary bursts, due to a small number of Monte-Carlo productions that were misconfigured. The remaining stalled jobs were typically due to failures in getting the proper size of a batch queue, or in evaluating the CPU-work needed for an event (and thus over evaluating the number of events the jobs can run). 8 page 8

Usage of CPU Resources Figure 3-5: CPU time consumed by LHCb from Jan 1 st until December 31 st 2017 (in days), for WLCG sites, excluding the HLT farm (top, per country), and non-wlcg sites, including the HLT farm and the CERN cloud (bottom, per site). page 9

Usage of CPU Resources Figure 3-6: Usage of LHCb Tier0/1s during Jan 1 st Dec 31 st 2017. The top plot shows the used CPU power as a function of the different activities, while the bottom plot shows the contributions from the different sites. 10 page 10

Usage of CPU Resources Figure 3-7: Usage of resources in WLCG sites other than Tier0 and Tier1s, pledging to LHCb, during Jan 1 st Dec 31 st 2017. The top plot shows the used CPU power as a function of the different activities, while the bottom plot shows the contributions from the different sites. User jobs (shown in magenta in the top plot) are further detailed in Figure 3-8. page 11

Usage of CPU Resources Figure 3-8: Running user jobs on Tier0 and Tier1 sites (top), and outside (bottom) during Jan 1 st Dec 31 st 2017 12 page 12

Usage of CPU Resources Figure 3-9: CPU time as a function of the job final status for all jobs (top), and as a function of activity for stalled jobs (bottom) page 13

Usage of Storage resources 4. Usage of Storage resources 4.1. Tape storage Tape storage grew by about 13.0 PB. Of these, 7.5 PB were due to RAW data taken since June 2017. The rest was due to RDST (2.1 PB) and ARCHIVE (3.4 PB), the latter due to the archival of Monte Carlo productions, re-stripping of former real data, and new Run2 data. The total tape occupancy as of December 31 st 2017 is 52.2 PB, 28.9 PB of which are used for RAW data, 10.7 PB for RDST, 12.6 PB for archived data. This is 20% lower than the original request (LHCb-PUB-2015-081) of 67.2PB but quite in line with the revised estimate (LHCb-PUB-2017-019) of 54.2PB, that should be approached by the end of the WLCG year by an expected growth of 1PB in the ARCHIVE space, due to the end of the stripping campaigns. Figure 4-1: Tape space occupancy for RAW data, January 1 st December 31 st 2017 14 page 14

Usage of Storage resources Figure 4-2: Tape occupancy for RDST data, January 1 st December 31 st 2017 Figure 4-3: Tape occupancy for derived data, January 1 st December 31 st 2017 page 15

Usage of Storage resources 4.2. Disk at Tier0, Tier1s and Tier2s The evolution of Disk usage during 2017 at the Tier0 and Tier1s is presented in Figure 4-4, separately for real data DST and MC DST, Figure 4-5 for buffer space, and Figure 4-6 for user disk. The buffer space is used for storing temporary files such as RDST until they are stripped or DST files before they are merged into large (5 GB) files. The increase of the buffer space in autumn 2017 corresponds to the staging of RDST and RAW files from tape in preparation for the 2015/16 re-stripping campaigns. The increase of the data volumes during data taking periods results in a corresponding increase of the buffer space in the Tier0 and all Tier1s. The real data DST space increased in April and May, due to the incremental stripping of previous Run2 data. An increase due to the 2017 data taking is visible starting from July. The decreases in February-March and June correspond to the removal of old datasets, following analyses of their popularity, for a total recovered space of about 2PB. The space used for simulation also decreases in three steps due to the removal of unused datasets, which allowed to recover about 1.8PB disk space. Table 4-1 shows the situation of disk storage resources at CERN and Tier1s, as well as at each Tier1 site, as of February 8 th 2018. The used space includes derived data, i.e. DST and micro-dst of both real and simulated data, and space reserved for users. The latter accounts for 0.8PB in total. Disk (PB) CERN Tier1s CNAF(*) GRIDKA IN2P3 PIC RAL RRCKI SARA LHCb accounting 4.93 14.95 2.54 2.29 2.22 0.90 4.13 1.19 1.68 SRM T0D1 used 5.13 15.33 2.41 2.45 2.39 0.95 4.10 1.25 1.78 SRM T0D1 free 0.17 2.60 0.87 0.64 0.30 0.05 0.39 0.10 0.25 SRM T1D0 (used+free) 3 1.55 1.85 1.12 0.14 0.03 0.02 0.42 0.09 0.03 SRM T0D1+T1D0 total 2 6.85 19.78 4.40 3.23 2.72 1.02 4.91 1.44 2.06 Pledge 17 10.90 20.85 4.08 3.39 2.80 1.15 6.16 1.26 2.01 Table 4-1: Situation of Disk Storage resource usage as of February 8 th 2018, available and installed capacity as reported by SRM, and 2017 pledge. The contribution of each Tier1 site is also reported. (*) Due to a prolonged site outage following an incident, the CNAF figures refer to November 9 th, 2017. The SRM used and SRM free information concerns only permanent disk storage (T0D1). The first two lines show a good agreement between what the site reports and what the LHCb accounting (first line) reports. The sum of the Tier0 and Tier1s 2017 pledges amount to 31.8 PB. The used disk space is 26.6 PB in total, 20.5 PB of which are used to store real and simulated datasets and 0.8 PB for user data. The remaining is used as buffer. The available disk at Tier-2D is now 4.48 PB, 3.11 PB of which is used. The disk pledge at Tier-2D sites is 3.3PB. Table 4-2 shows the situation of the disk space at the Tier-2D sites. Figure 4-7 shows the disk usage at Tier-2D sites. A similar pattern to that observed for the Tier0 and Tier1 sites is seen. 2 This total includes disk visible in SRM for tape cache. It does not include invisible disk pools for dcache (stage and read pools). 16 page 16

Usage of Storage resources In conclusion, LHCb is currently using 29.7PB of disk space, 15% lower than the original request of 35.1PB (LHCb-PUB-2015-018) and in line with the revised estimate of 29.6PB (LHCb-PUB-2017-019). Figure 4-4: Usage of Disk resources at CERN and Tier1s from Jan 1 st to December 31 st 2017. Real data and simulated data are shown in the top and bottom figures. page 17

Usage of Storage resources Figure 4-5: Usage of buffer disk from January 1 st to December 31 st 2017. The large increase at the end of the year is due to pre-staging from tape in view of the re-stripping campaign. Figure 4-6: Usage of disk resources reserved for Users from January 1 st to December 31 st 2017 18 page 18

Usage of Storage resources Table 4-2: Available and used disk resources at Tier-2D sites on February 8 th 2018. Site SRM free disk (TB) SRM used disk (TB) SRM total disk (TB) CBPF (Brasil) 49 94 143 CPPM (France) 94 236 330 CSCS (Switzerland) 179 521 700 Glasgow (UK) 32 78 110 IHEP (Russia) 13 51 64 LAL (France) 58 142 200 LPNHE (France) 33 97 130 Liverpool (UK) 95 213 308 Manchester (UK) 76 224 300 NCBJ (Poland) 205 395 600 NIPNE (Romania) 176 224 400 RAL-HEP (UK) 120 335 455 UKI-QMUL (UK) 107 193 300 UKI-IC (UK) 133 307 440 Total 1371 3109 4480 page 19

Usage of Storage resources Figure 4-7: Usage of disk space at Tier-2D sites from Jan 1 st to December 31 st 2017, for real data (top) and simulated data (bottom). 20 page 20

Data popularity 5. Data popularity Dataset usage patterns, routinely monitored in LHCb since some years, are used to recover a significant amount of disk space. As already mentioned, about 3.8 PB of disk space were recovered in 2017 (Figure 4-4). Figure 5-1, Figure 5-2, Figure 5-3 show the physical volume on disk of the accessed datasets as a function of the number of accesses in the last 13, 26, and 52 weeks as measured on February 8 th 2018. The number of usages of a dataset over a period of time is defined as the ratio of the number of files used in that dataset to the total number of files in the dataset. The first bin indicates the volume of data that were generated before and were not accessed during that period (e.g. 3.3 PB in the last 52 weeks). The last bin shows the total volume of data that was accessed (e.g. 13.8 PB in the last 52 weeks) 3. The total amount of disk space used by the derived LHCb data is 18.0 PB 4. The datasets accessed at least once in 13, 26 and 52 weeks, occupy respectively 59%, 66% and 77% of the total disk space in use, respectively. There is a slight decrease of the fraction of used data with respect to the previous report (Table 5-1). This is due to a decrease in the usage of 2017 data, for which the stripping concurrent with data taking was affected by a software bug that necessary to prepare and run a re-stripping cycle of those data. LHCb- PUB- 2015-004 (end 2014) LHCb- PUB- 2015-019 (mid 2015) LHCb- PUB-2016-004 (end 2015) LHCb- PUB- 2016-023 (mid 2016) LHCb-PUB- 2017-019 (end 2016) This document (Feb. 2018) 13 weeks 42% 52% 46% 53% 63% 59% 26 weeks 57% 63% 58% 70% 71% 66% 52 weeks 72% 72% 70% 79% 80% 77% Table 5-1: Volume (as percentage of total data volume) of datasets accessed at least once in the last 13, 26, 52 weeks, in various periods as reported in this and previous documents 3 The volume in each bin is not weighted by the number of accesses. 4 Since this number is computed by explicitly excluding files that are not usable for physics analysis, e.g. RDST or temporary unmerged files, it is not directly comparable to the used resources mentioned in Section 4. page 21

Data popularity Figure 5-1: Volume of accessed datasets as a function of the number of accesses in the last 13 weeks. Figure 5-2: Volume of accessed datasets as a function of the number of accesses in the last 26 weeks. 22 page 22

Data popularity Figure 5-3: Volume of accessed datasets as a function of the number of accesses in the last 52 weeks. page 23

Throughput rates 6. Throughput rates RAW data were exported from the pit to the CERN Tier0, and onwards to Tier1s, at the design Run2 throughput rates (Figure 6-1). Transfers of up to 1.4GB/s were observed. In order to optimize the workflow (reducing the number of files) while keeping the processing time within reasonable bounds, the size of the RAW files have been increased (5 GB for FULL stream, 15 GB for TURBO stream). Figure 6-1: Throughput rates for exports of RAW data from the online storage to the CERN Tier0 (top) and from the Tier0 to Tier1s (bottom) which also includes pre-staging. 24 page 24

Summary 7. Summary The usage of computing resources in the 2017 calendar year has been quite smooth for LHCb. Simulation is the dominant activity in terms of CPU work. Additional unpledged resources, as well as clouds, on-demand and volunteer computing resources, were also successfully used. They were essential in providing CPU work during the outage of one Tier1 center. The utilization of storage resources is in line with the most recent reassessment. page 25

Appendix Appendix As requested by the C-RSG, the following table provides a summary of the scrutinized (C-RSG), pledged, deployed and used resources in the WLCG 2017 year (April 1 st 2017 March 31 st 2018), as well some useful ratios between these quantities. Table A-1: Summary of the scrutinized, pledged, deployed and used resources in the 2017 WLCG year. 26 page 26