Monitoring of Computing Resource Use of Active Software Releases at ATLAS

Similar documents
ATLAS NOTE. December 4, ATLAS offline reconstruction timing improvements for run-2. The ATLAS Collaboration. Abstract

Benchmarking the ATLAS software through the Kit Validation engine

The evolving role of Tier2s in ATLAS with the new Computing and Data Distribution model

Software and computing evolution: the HL-LHC challenge. Simone Campana, CERN

Data Analysis in ATLAS. Graeme Stewart with thanks to Attila Krasznahorkay and Johannes Elmsheuser

Prompt data reconstruction at the ATLAS experiment

Early experience with the Run 2 ATLAS analysis model

Large Scale Software Building with CMake in ATLAS

Geant4 Computing Performance Benchmarking and Monitoring

LHCb Computing Resources: 2018 requests and preview of 2019 requests

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

ATLAS software configuration and build tool optimisation

CMS High Level Trigger Timing Measurements

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

ATLAS Tracking Detector Upgrade studies using the Fast Simulation Engine

The ATLAS Trigger Simulation with Legacy Software

CouchDB-based system for data management in a Grid environment Implementation and Experience

Summary of the LHC Computing Review

ATLAS Nightly Build System Upgrade

Stephen J. Gowdy (CERN) 12 th September 2012 XLDB Conference FINDING THE HIGGS IN THE HAYSTACK(S)

ATLAS Distributed Computing Experience and Performance During the LHC Run-2

ATLAS Analysis Workshop Summary

Evolution of Database Replication Technologies for WLCG

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

New strategies of the LHC experiments to meet the computing requirements of the HL-LHC era

The ATLAS Tier-3 in Geneva and the Trigger Development Facility

Hall D and IT. at Internal Review of IT in the 12 GeV Era. Mark M. Ito. May 20, Hall D. Hall D and IT. M. Ito. Introduction.

Virtualizing a Batch. University Grid Center

Data handling and processing at the LHC experiments

Update of the Computing Models of the WLCG and the LHC Experiments

The CMS data quality monitoring software: experience and future prospects

Evaluation of the computing resources required for a Nordic research exploitation of the LHC

LHCb Computing Strategy

Overview. About CERN 2 / 11

AGIS: The ATLAS Grid Information System

Improved ATLAS HammerCloud Monitoring for Local Site Administration

Reliability Engineering Analysis of ATLAS Data Reprocessing Campaigns

Andrea Sciabà CERN, Switzerland

System upgrade and future perspective for the operation of Tokyo Tier2 center. T. Nakamura, T. Mashimo, N. Matsui, H. Sakamoto and I.

LHCb Computing Resources: 2019 requests and reassessment of 2018 requests

ATLAS Offline Data Quality Monitoring

ATLAS PILE-UP AND OVERLAY SIMULATION

ATLAS Offline Software Plans for 2016 and beyond

CERN openlab II. CERN openlab and. Sverre Jarp CERN openlab CTO 16 September 2008

ATLAS DQ2 to Rucio renaming infrastructure

THE ATLAS DATA ACQUISITION SYSTEM IN LHC RUN 2

Servicing HEP experiments with a complete set of ready integreated and configured common software components

IEPSAS-Kosice: experiences in running LCG site

1. Introduction. Outline

Data Quality Monitoring Display for ATLAS experiment

PARALLEL PROCESSING OF LARGE DATA SETS IN PARTICLE PHYSICS

Reducing CPU Consumption of Geant4 Simulation in ATLAS

Tracking and flavour tagging selection in the ATLAS High Level Trigger

The GAP project: GPU applications for High Level Trigger and Medical Imaging

Considerations for a grid-based Physics Analysis Facility. Dietrich Liko

Unified System for Processing Real and Simulated Data in the ATLAS Experiment

First LHCb measurement with data from the LHC Run 2

LHCb Computing Resource usage in 2017

ALICE ANALYSIS PRESERVATION. Mihaela Gheata DASPOS/DPHEP7 workshop

Storage and I/O requirements of the LHC experiments

The creation of a Tier-1 Data Center for the ALICE experiment in the UNAM. Lukas Nellen ICN-UNAM

Data Transfers Between LHC Grid Sites Dorian Kcira

ACCI Recommendations on Long Term Cyberinfrastructure Issues: Building Future Development

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

Big Data Tools as Applied to ATLAS Event Data

The CMS Computing Model

Benchmarking third-party-transfer protocols with the FTS

Improving Packet Processing Performance of a Memory- Bounded Application

Evolution of Database Replication Technologies for WLCG

Determination of the aperture of the LHCb VELO RF foil

The Database Driven ATLAS Trigger Configuration System

A new approach for ATLAS Athena job configuration

Data transfer over the wide area network with a large round trip time

Challenges of the LHC Computing Grid by the CMS experiment

The ATLAS EventIndex: an event catalogue for experiments collecting large amounts of data

An SQL-based approach to physics analysis

The ATLAS EventIndex: Full chain deployment and first operation

TAG Based Skimming In ATLAS

The LHC Computing Grid

Philippe Laurens, Michigan State University, for USATLAS. Atlas Great Lakes Tier 2 collocated at MSU and the University of Michigan

Fast pattern recognition with the ATLAS L1Track trigger for the HL-LHC

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

150 million sensors deliver data. 40 million times per second

Data services for LHC computing

Machine Learning in Data Quality Monitoring

ATLAS distributed computing: experience and evolution

CMS data quality monitoring: Systems and experiences

ATLAS Computing: the Run-2 experience

Experience with Data-flow, DQM and Analysis of TIF Data

ATLAS operations in the GridKa T1/T2 Cloud

HammerCloud: A Stress Testing System for Distributed Analysis

The full detector simulation for the ATLAS experiment: status and outlook

Simulating the RF Shield for the VELO Upgrade

Large scale commissioning and operational experience with tier-2 to tier-2 data transfer links in CMS

UW-ATLAS Experiences with Condor

Preparing for High-Luminosity LHC. Bob Jones CERN Bob.Jones <at> cern.ch

DESY at the LHC. Klaus Mőnig. On behalf of the ATLAS, CMS and the Grid/Tier2 communities

PROOF-Condor integration for ATLAS

The NOvA software testing framework

Installation of CMSSW in the Grid DESY Computing Seminar May 17th, 2010 Wolf Behrenhoff, Christoph Wissing

Transcription:

1 2 3 4 5 6 Monitoring of Computing Resource Use of Active Software Releases at ATLAS Antonio Limosani on behalf of the ATLAS Collaboration CERN CH-1211 Geneva 23 Switzerland and University of Sydney, NSW, Australia E-mail: antonio.limosani@cern.ch ATL-SOFT-PROC-2017-035 02 February 2017 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 Abstract. The LHC is the world s most powerful particle accelerator, colliding protons at centre of mass energy of 13 TeV. As the energy and frequency of collisions has grown in the search for new physics, so too has demand for computing resources needed for event reconstruction. We will report on the evolution of resource usage in terms of CPU and RAM in key ATLAS offline reconstruction workflows at the Tier0 at CERN and on the WLCG. Monitoring of workflows is achieved using the ATLAS PerfMon package, which is the standard ATLAS performance monitoring system running inside Athena jobs. Systematic daily monitoring has recently been expanded to include all workflows beginning at Monte Carlo generation through to end-user physics analysis, beyond that of event reconstruction. Moreover, the move to a multiprocessor mode in production jobs has facilitated the use of tools, such as MemoryMonitor, to measure the memory shared across processors in jobs. Resource consumption is broken down into software domains and displayed in plots generated using Python visualization libraries and collected into pre-formatted auto-generated Web pages, which allow the ATLAS developer community to track the performance of their algorithms. This information is however preferentially filtered to domain leaders and developers through the use of JIRA and via reports given at ATLAS software meetings. Finally, we take a glimpse of the future by reporting on the expected CPU and RAM usage in benchmark workflows associated with the High Luminosity LHC and anticipate the ways performance monitoring will evolve to understand and benchmark future workflows. 1. Introduction ATLAS software is a diverse and globally distributed project consisting of millions of lines of code distributed across more than 2500 packages. In order to meet the research goals of the ATLAS experiment the software stack handles the many data processing steps shown in figure 1. These steps are deployed on the heterogeneous computing systems that make up the Worldwide LHC Computing Grid (WLCG). To both measure known processes with precision and search for rare phenomena, trillions of physics events must be processed. The aim is to achieve a maximum throughput of analysis of physics events within WLCG budgets of CPU, memory and disk storage. To that end detailed monitoring down to the level of resource usage of algorithms is systematically conducted to: ensure workflows respect the nominal memory limit of 2 GB per core; prevent CPU time bottlenecks from entering production workflows; reduce the incidence of job failures due to excessive demands on resources;

39 40 give feedback to developers and allow them to target and confirm performance gains; and overall promote a culture of testing, monitoring, and optimisation. Figure 1. Schematic representation of the Full Chain Monte Carlo production and real data processing. 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 2. PerfMon PerfMon [1] is a toolkit developed in the context of the Athena framework [2], relying on its clear finite state machine to capture snapshots of the application s state for later analysis: initialization stage physics event loop finalization stage During each of these stages, various dedicated monitoring tools are scheduled by the PerfMonSvc, eavesdropping on all the available Gaudi components [2] during a job. This is achieved by relying on the already existing auditor set of entry points depicted in figure 2. PerfMon can be engaged through various modes, which specify the level of detail. By default in all jobs, PerfMon is run in the Semi-Detailed mode, denoted PerfMonSD [7]. PerfMonSD fills a gap between the two standard modes of PerfMon, namely the fast-monitoring mode which has a low resource overhead but no per-component monitoring (a component being an Athena Algorithm, Tool or Service) and the detailed-mode which has both per-component and perevent monitoring. PerfMon retrieves data using the Gaudi framework auditors services e.g. ChronoStatAuditor and MemAuditor. The Gaudi Auditor Service provides a set of auditors that can be used to provide monitoring of various characteristics of the execution of Algorithms. Each auditor is called immediately before and after each call to each Algorithm instance, and can track some resource usage of the Algorithm. Calls that are thus monitored are initialize(), execute() and finalize(). PerfMon s Semi-Detailed mode features per-component but no per-evt storage of performance metrics. It accumulates the data in custom data structures optimised for low memory and CPU footprints and outputs a text summary at the end of the job, both a slightly truncated one inside the log-file and a complete one in a separate dedicated file. PerfMonSD doesn t monitor as many variables as the standard PerfMon modes, but only CPU, VMEM and Malloc usage of each component. PerfMonSD adds negligible overhead to a given Athena job (preferably less than 0.1% of both memory and CPU), thus making it possible to

Figure 2. Monitoring hooks at specific entry points within a Gaudi job. 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 have it enabled by default. PerfMonSD uses the existing PerfMon infrastructure to be able to snapshot CPU and memory states before and after component methods such as initialize(), finalize(), and execute() are invoked. The following job stages are monitored for CPU, VMem and Malloc use: ini initialisation, where measurements for the calls to ::initialize() of Algorithms, Tools and Services are made. 1st contributions from algorithms in the first event, where typically various forms of on-demand or delayed initialisation schemes are triggered. cbk a callback is typically triggered when some conditions data change and calibration constants have to be updated in components which have registered an interest in affected conditions. evt event loop, where measurements of calls to Algorithms ::execute() method from the second event and beyond are called. fin finalisation, where measurements for the calls to ::finalize() of Algorithms, Tools and Services are made. dso cost of loading libraries. preloadproxy the first client who retrieves certain conditions data will automatically trigger a retrieval of that data (and the filling of related data structures in memory). The overhead of this retrieval is measured. 3. Systematic Performance Monitoring Infrastructure Systematic performance monitoring of ATLAS workflows can be broken down into five components: Benchmark job ATLAS has numerous concurrent open nightly releases to serve developments in each of the data processing steps shown in figure 1. For each of these steps either one or

90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 several benchmark jobs are defined. These benchmark jobs are chosen to be representative of the given step being monitored. Job scheduler Open nightly releases are built in their entirety once a day, and as such benchmark jobs are scheduled to run on the given release. The job scheduler of choice has shifted from acron [3] (CERN AFS cron service) to ATLAS Real Time Tester (RTT) framework [4] in order to make use of the extensive bookkeeping and report production facility that RTT provides. However it is foreseen with the ATLAS nightly build system migrating to the use of the open-source software application, Jenkins [6], such that jobs in future will have to rely also on Jenkins. Output PerfMon is providing the results of monitoring in the output log of the job and in ROOT format. Graphs and histograms are also produced in PDF files. Database Currently the results from each job are saved into a file-based database, which files the details and output of a job in a directory structure defined as /dd/mm/yyyy/release/platform/jobname. The database will be migrated to EOS [5], and moreover for ease of processing and interpretation it will be extended to include an SQLite component. Reports Reports in this context refer to the production of plots and subsequent web pages, which graphically display the evolution of the performance of the release and associated subcomponents as a time-series and also comparisons are made between production releases in order to understand changes over longer time periods. Currently the analysis and production of webpages is done using python, with plots made using the matplotlib module. With data being stored in SQLite, much of the current code performing the sorting and categorisation can be replaced with succinct SQL commands, and moreover the results can be published to the web in the form of dashboards to allow for a more interactive user experience. Example plots published in static web pages are shown in figure 3. Actions The most important task is to act on information gathered from the reports. Actions taken span from the manual production of JIRA bug issues to presentations in relevant meetings, for example, to highlight that a particular algorithm or software domain has been responsible for memory, CPU time or output file-size increases. A typical plot is shown in figure 3, where CPU time categorized into software domain is tracked over a two-week period in a given ATLAS nightly software release. Note the consistency and therefore ability to resolve differences on a day-to-day basis in the software performance. This consistency is reliant on the same benchmark machine being reserved for the job and also on not overloading the machine with too may jobs at the same time. A series of such plots are organised into web pages. Typically a given job type will have a dedicated webpage. Jobs are defined on the basis of the step engaged in the data processing workflow. The steps monitoried are defined as follows: Generation of physics events producing EVNT files Simulation of the physical ATLAS detector response using GEANT to the physics event (input EVNT files, output HITS files) Digitisation, determine the signals read out from the detector (input HITS files to RDO (Raw Data Objects) files) Trigger, determine the trigger response ( RDO to RDO (Raw Data Objects) files) Reconstruction (Reconstruct physics objects components) RDO to ESD (Event Summary Data) to AOD (Analysis Object Data) Analysis Derivations (Slim, Skim, and Thin events for dedicated physics analysis streams) AOD to DxAOD (Derived from AOD)

137 138 The resource use measured in each step of the processing of physics events is summarised in Table 1. Figure 3. CPU time reports for ATLAS software algorithms. Table 1. CPU time and memory use in ATLAS workflows. Step Time per event (sec) Memory per core Number of Cores EVNT to HITS 180 1.7 1 HITS to RDO 32 1.5 4 RDO to RDOTrigger 10 1.6 4 RDO to ESD 13 1.8 4 ESD to AOD 0.3 1.2 4 AOD to DxAOD 0.4 2.4 1 139 140 141 142 143 144 145 146 147 4. Monitoring of production workflow at Tier0 The CERN Tier0 processes data live as recorded by ATLAS from collisions in the LHC. Therefore the ATLAS release used at Tier0 is continually updated to meet varying requirements in response to changes in LHC conditions and knowledge gained from previous data processing runs. To improve the robustness and re-build frequency of the Tier0 release has been developed to test for changes to: file outputs memory consumption

148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 CPU time frequency of log file messages in both simulated and real data workflows. The script is configured to run simultaneous tests on the same machine with and without the proposed code changes. It is a set and forget test that takes around 1 hour to complete on a typical interactive node (Intel Core Processor 2.4 GHz - Haswell). There is option to reduce the time by half if addition cpu cores can be accessed. Other special options include comparisons against past production releases and supplementary job configuration instructions. This script has had the effect of preventing disastrous memory leaks and new bottlenecks entering into the code base in imminent production releases. The move to running jobs on multi-core in order for forked processes to share memory has necessiated the development of the MemoryMonitor tool [8], which reports the total memory consumed by the job taking into account the sharing. Specifically the crucial metric reported, as a function of time, is that of the Proportional Set Size (PSS) memory, in addition to the usual metrics of Virtual Memory, Resident Set Size memory, and Swap Space for a job process ID and all of its forked children process IDs. It does so by probing the smaps file for the given process IDs in the proc file system. The tool has also been extended to measure the amount of data read and written by examining the io file. The tool is an invaluable addition to monitoring real workflow at Tier0 and on the WLCG. 5. Optimisation studies The current monitoring and benchmarking framework has allowed developers to: optimise code performance, see Table 2 and figure 4, where the results of compiler and memory allocator studies are presented, respectively. anticipate resource usage based on current workflows, see figure 5. plan resource requests and define limits for future workflows. react to and measure the impact of external changes such as significant code changes, for example, the move from ROOT5 to ROOT6, migration from ATLAS home grown Configuration Management Tool (CMT) to CMake and migration from CLHEP vectors to Eigen vectors. Table 2. Workflow performance under different GCC compiler optimization levels Workflow CPU time per event -O2 (default) -Os -O3 Simulation 156 171 140 Digitization 22 25 20 Trigger 7.9 11.2 7.6 Reconstruction 10.0 11.4 9.4 176 177 178 179 180 181 182 6. Conclusion In a distributed multi-developer environment software monitoring is essential to optimisation of ATLAS workflows. The existing framework is being upgraded to handle current and future workflows. SQL will provide a powerful means of analysing performance data and will allow greater flexibility in tracking changes. The performance monitoring infrastructure has continued to evolve to help ATLAS make optimal use of available computing resources.

Figure 4. CPU time and memory consumption as a function of the different memory allocators in the workflow of ATLAS reconstruction. Figure 5. Full reconstruction time per event as a function of the number of interactions per bunch crossing µ. 183 184 185 186 187 188 189 190 191 192 References [1] Performing Monitoring at ATLAS http://twiki.cern.ch/twiki/bin/view/atlascomputing/monitoringmemory [2] ATLAS Athena Framework http://atlas-proj-computing-tdr.web.cern.ch/atlas-proj-computingtdr/html/computing-tdr-21.htm [3] ACRON-Service http://cern.service-now.com/service-portal/service-element.do?name=acron-service [4] The ATLAS RunTimeTester User Guide http://atlas-rtt.cern.ch/prod/docs/userguide/ [5] Large Disk Storage at CERN http://eos.web.cern.ch/ [6] Jenkins User Handbook http://jenkins.io/user-handbook.pdf [7] Semi-Detailed mode of PerfMon https://twiki.cern.ch/twiki/bin/viewauth/atlascomputing/perfmonsd [8] MemoryMonitor https://twiki.cern.ch/twiki/bin/view/itsdc/profandoptexperimentsapps#memorymonitoringtool