The NOvA software testing framework

Similar documents
Data handling with SAM and art at the NOνA experiment

The NOvA DAQ Monitor System

A data handling system for modern and future Fermilab experiments

SNiPER: an offline software framework for non-collider physics experiments

Reliability Engineering Analysis of ATLAS Data Reprocessing Campaigns

Performance quality monitoring system for the Daya Bay reactor neutrino experiment

Geant4 Computing Performance Benchmarking and Monitoring

File Access Optimization with the Lustre Filesystem at Florida CMS T2

The High-Level Dataset-based Data Transfer System in BESDIRAC

ATLAS software configuration and build tool optimisation

Update of the BESIII Event Display System

The ATLAS Tier-3 in Geneva and the Trigger Development Facility

Stitched Together: Transitioning CMS to a Hierarchical Threaded Framework

CMS High Level Trigger Timing Measurements

arxiv: v1 [physics.ins-det] 19 Oct 2017

ATLAS Tracking Detector Upgrade studies using the Fast Simulation Engine

The ALICE Glance Shift Accounting Management System (SAMS)

Large Scale Management of Physicists Personal Analysis Data Without Employing User and Group Quotas

The CMS data quality monitoring software: experience and future prospects

Improved ATLAS HammerCloud Monitoring for Local Site Administration

ATLAS Nightly Build System Upgrade

Data preservation for the HERA experiments at DESY using dcache technology

Design of the protodune raw data management infrastructure

CMS - HLT Configuration Management System

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

The CMS Computing Model

Real-time dataflow and workflow with the CMS tracker data

ComPWA: A common amplitude analysis framework for PANDA

Monitoring of large-scale federated data storage: XRootD and beyond.

Improved vertex finding in the MINERvA passive target region with convolutional neural networks and Deep Adversarial Neural Network

Monte Carlo Production Management at CMS

WLCG Transfers Dashboard: a Unified Monitoring Tool for Heterogeneous Data Transfers.

Performance quality monitoring system (PQM) for the Daya Bay experiment

CMS data quality monitoring: Systems and experiences

Data oriented job submission scheme for the PHENIX user analysis in CCJ

First LHCb measurement with data from the LHC Run 2

LAr Event Reconstruction with the PANDORA Software Development Kit

Monte Carlo Production on the Grid by the H1 Collaboration

An SQL-based approach to physics analysis

Improvements to the User Interface for LHCb's Software continuous integration system.

Experience with PROOF-Lite in ATLAS data analysis

Data Transfers Between LHC Grid Sites Dorian Kcira

Security in the CernVM File System and the Frontier Distributed Database Caching System

Early experience with the Run 2 ATLAS analysis model

A Geometrical Modeller for HEP

Overview of ATLAS PanDA Workload Management

ATLAS operations in the GridKa T1/T2 Cloud

ATLAS Offline Data Quality Monitoring

Data Quality Monitoring Display for ATLAS experiment

DQM4HEP - A Generic Online Monitor for Particle Physics Experiments

Use of ROOT in the DØ Online Event Monitoring System

Muon Reconstruction and Identification in CMS

Monitoring ARC services with GangliARC

ATLAS PILE-UP AND OVERLAY SIMULATION

Spring 2010 Research Report Judson Benton Locke. High-Statistics Geant4 Simulations

Monitoring of Computing Resource Use of Active Software Releases at ATLAS

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

Popularity Prediction Tool for ATLAS Distributed Data Management

The ATLAS EventIndex: an event catalogue for experiments collecting large amounts of data

Striped Data Server for Scalable Parallel Data Analysis

Reprocessing DØ data with SAMGrid

ALICE Run3/Run4 Computing Model simulation software

Locating the neutrino interaction vertex with the help of electronic detectors in the OPERA experiment

An Analysis of Storage Interface Usages at a Large, MultiExperiment Tier 1

LHCb Computing Resources: 2018 requests and preview of 2019 requests

CMS event display and data quality monitoring at LHC start-up

Experience with Data-flow, DQM and Analysis of TIF Data

Performance of popular open source databases for HEP related computing problems

Work in Tbilisi. David Mchedlishvili (SMART EDM_lab of TSU) GGSWBS , Tbilisi. Shota Rustaveli National Science Foundation

Online remote monitoring facilities for the ATLAS experiment

CouchDB-based system for data management in a Grid environment Implementation and Experience

Software and computing evolution: the HL-LHC challenge. Simone Campana, CERN

CMS users data management service integration and first experiences with its NoSQL data storage

A DAQ system for CAMAC controller CC/NET using DAQ-Middleware

Data handling and processing at the LHC experiments

Modular and scalable RESTful API to sustain STAR collaboration's record keeping

The Database Driven ATLAS Trigger Configuration System

Use of containerisation as an alternative to full virtualisation in grid environments.

Recasting LHC analyses with MADANALYSIS 5

The Diverse use of Clouds by CMS

Large scale commissioning and operational experience with tier-2 to tier-2 data transfer links in CMS

LHCb Computing Strategy

Subtlenoise: sonification of distributed computing operations

Data processing and storage in the Daya Bay Reactor Antineutrino Experiment

IEPSAS-Kosice: experiences in running LCG site

Understanding the T2 traffic in CMS during Run-1

Performance of the ATLAS Inner Detector at the LHC

The evolving role of Tier2s in ATLAS with the new Computing and Data Distribution model

ATLAS NOTE. December 4, ATLAS offline reconstruction timing improvements for run-2. The ATLAS Collaboration. Abstract

Update of the BESIII Event Display System

Evaluation of the computing resources required for a Nordic research exploitation of the LHC

Automated QA framework for PetaScale data challenges

Measurement of the beam yz crabbing tilt due to wakefields using streak camera at CESR

arxiv: v1 [cs.dc] 12 May 2017

Summary of the LHC Computing Review

LHCb Computing Resource usage in 2017

arxiv: v5 [astro-ph.im] 28 Sep 2016

Virtualizing a Batch. University Grid Center

The MICE Physics Computing Capabilities

Optimizing Parallel Access to the BaBar Database System Using CORBA Servers

Transcription:

Journal of Physics: Conference Series PAPER OPEN ACCESS The NOvA software testing framework Related content - Corrosion process monitoring by AFM higher harmonic imaging S Babicz, A Zieliski, J Smulko et al. To cite this article: M Tamsett and C Group 2015 J. Phys.: Conf. Ser. 664 062062 View the article online for updates and enhancements. This content was downloaded from IP address 37.44.196.168 on 31/01/2018 at 06:06

The NOνA software testing framework M Tamsett 1, C Group 2 1 University of Sussex, Falmer, East Sussex, UK 2 University of Virginia, Charlottesville VA, USA E-mail: m.tamsett@sussex.ac.uk Abstract. The NOνA experiment at Fermilab is a long-baseline neutrino experiment designed to study ν e appearance in a ν µ beam. NOνA has already produced more than one million Monte Carlo and detector generated files amounting to more than 1 PB in size. This data is divided between a number of parallel streams such as far and near detector beam spills, cosmic ray backgrounds, a number of data-driven triggers and over 20 different Monte Carlo configurations. Each of these data streams must be processed through the appropriate steps of the rapidly evolving, multi-tiered, interdependent NOνA software framework. In total there are greater than 12 individual software tiers, each of which performs a different function and can be configured differently depending on the input stream. In order to regularly test and validate that all of these software stages are working correctly NOνA has designed a powerful, modular testing framework that enables detailed validation and benchmarking to be performed in a fast, efficient and accessible way with minimal expert knowledge. The core of this system is a novel series of python modules which wrap, monitor and handle the underlying C++ software framework and then report the results to a slick front-end web-based interface. This interface utilises modern, cross-platform, visualisation libraries to render the test results in a meaningful way. They are fast and flexible, allowing for the easy addition of new tests and datasets. In total upwards of 14 individual streams are regularly tested amounting to over 70 individual software processes, producing over 25 GB of output files. The rigour enforced through this flexible testing framework enables NOνA to rapidly verify configurations, results and software and thus ensure that data is available for physics analysis in a timely and robust manner. 1. Introduction The NOvA experiment is a currently active long-baseline neutrino oscillation experiment using the recently upgraded NuMI beam at Fermilab, USA to measure ν µ ν e, ν µ ν e, ν µ ν µ and ν µ ν µ oscillations [1]. The experiment uses two functionally identical detectors composed of alternating horizontal and vertical planes of PVC plastic cells. Each cell has a 4 6 cm cross section and is filled with liquid scintillator and a looped wavelength-shifting fiber attached to an avalanche photodiode for light collection. The detectors are separated by 809 kilometers and located 14 milliradians off-axis from the beam center in order to produce a narrow-band 2 GeV beam near the oscillation maximum for ν e appearance. The Far Detector located in Ash River Minnesota is 15.6m 15.6m 60m, totaling 14 kilotons and 344,064 individual cells. The Near Detector is located one km from the target at Fermilab and is 4.2m 4.2m 15.8m for a total of 300 tons and 20,192 cells. Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. Published under licence by IOP Publishing Ltd 1

2. NOνA software framework The NOνA Offline Analysis Software (NOνASoft) is written in C++ and built on the Fermilab Computing Division s ART framework [2] that uses CERN s ROOT [3] analysis software. NOνASoft makes use of more than 50 external software packages, is developed by more than 50 developers and is used by more than 200 scientists from 38 institutions and 7 countries across 3 continents. NOνA has already produced more than 1 million Monte Carlo and detector generated files amounting to more than 1 PB in size. These are divided between a number of parallel data streams such as far and near detector beam spills, cosmic ray backgrounds, several data-driven triggers and over 20 different Monte Carlo configurations. Each of these data streams must be processed through the appropriate steps of the rapidly evolving, multi-tiered, interdependent NOvA software framework. In total there are greater than 12 individual software tiers, each of which performs a different function and can be configured differently depending on the input stream. 3. Testing framework In order to regularly test and validate that all of these software stages are working correctly NOvA has designed a powerful, modular testing framework that enables detailed validation and benchmarking to be performed in a fast, efficient and accessible way with minimal expert knowledge. The core of this system is a novel series of python modules which wrap, monitor and handle the underlying C++ software framework. These are configured using command line options to select from a number of pre-configured chains which define the input data stream to use. Each chain is itself composed of a number of tiers which each consist of an individual call to the underlying NOνASoft framework. Tiers typically consist of a discrete stage of event processing, such as Monte Carlo generation, translation from the data acquisition to the offline data format, the reconstruction of physics objects or particle identification. Tier-to-tier differences are configured based on a small number of options including the configuration file to use, the preceding tiers upon which this tier is dependent, any commands that need to be executed in advance of the tier and the output files that this tier will produce. The configuration of chains is simple and flexible and is trivially extensible to easily cater for new data streams and tiers. The desired tiers are run sequentially with each NOνASoft executable call being spawned using the python sub-process module. The child process is then tracked via its process identifier. The memory, disk and CPU usage are monitored while at the same time the output and error message streams are captured and injected with timestamps. These timestamps are later used to correlate the tracked metrics with key stages in the NOνASoft framework process, such as database calls, object initiation and the looping over events. Once a tier has concluded, successfully or otherwise, its return code is logged and any output files it produced are analysed in terms of their size and contents. This enables NOνA to check that the desired objects are being created and also to monitor the disk space requirements of these. Furthermore checking of the sanity of output files allows downstream tiers to respond accordingly and not be falsely blamed for upstream failures. Once every tier in a chain has concluded all of the resultant metrics, output files and logs are archived ready for later analysis. The testing framework is executed following nightly builds of the NOνASoft development branch and also following the building of any release version. The suite of tests executed is chosen to exercise all of the core data streams and software tiers and is run on the same batch system as production and user analysis jobs. In total upwards of 14 individual streams are regularly tested amounting to over 70 individual software processes. These produce over 25 GB of output files each day. 2

4. Visualisation of results Tests executed nightly and on stable software release versions are reported via a web-based interface. This interface utilises modern, cross-platform, visualisation libraries to render the test results in a meaningful way. 15 10 Count 5 0 S14-10-15 S14-11-11 S14-12-12 S14-12-29 S15-01-09 S15-01-16 S15-01-21 S15-02-05 Release S15-02-24 S15-03-11 S15-03-31 S15-04-29 S15-05-01 Ended Ongoing STDERR not empty Killed by batch robots Error No output S15-05-07 Figure 1. An example of a test summary graphic showing the collected results of many tests on static snapshots of the software framework over five months. This graph is made using the Highcharts JavaScript library [4]. Cron jobs periodically check on the outputs of all tests and use custom python modules to render the test results into HTML and JavaScript web pages formated using the Bootstrap [5] development tools. These pages are self documenting, providing the user with a detailed break down of the configuration of all tests run. The front page of the web-interface presents two overview graphics, the first showing a highlevel summary of all tests performed on all NOνASoft release versions and the second similarly showing the results of the previous two weeks worth of nightly tests. An example is show in figure 1. Provided alongside this are links to the detailed results page of each suite of tests. Each results page begins with a mid-level summary of the results of all chains, along with links to the details of the batch jobs used to run these tests and their associated log files. Further down the page are low-level breakdowns of the test results of each of the tiers that form each chain. These provide the configuration used to run each tier, along with the full standard output and error streams produced during their running and the returned error code for failed tests. Also presented are the peak memory usage of each tier, the integrated and per event CPU times, the efficiency of the tier in terms of the ratio of input to output events and the number and length of database calls executed by this tier. Figure 2 shows an example of the summary table of a chain. 3

Figure 2. A summary table showing the results of a single test. In this case the chain tested was the Monte Carlo simulation of a far detector cosmic ray events. The chain was composed of three tiers. The first, cry [6] is a cosmic ray Monte Carlo generator. The second pchits is the limited reconstruction tier used in the calibration of the NOνA detectors. The third reco is the detailed reconstruction of physics objects for use in oscillation background studies. In this case the first two tiers passed the tests while the third failed. Details on the cause of the failure are available to the user via the provided links. Further details on the performance of each tier is provided in a linked page which displays metrics such as CPU and memory usage as a function of time. An example of such a metric is shown in figure 3. These interactive graphs are given context via the analysis of the timestamped standard output message streams of each tier. Important events identified within the processing are represented as coloured circles displayed on the metric charts at the corresponding time. These provide the user with snippets of the message stream on mouse over. Information on output files is provided in the summary table in terms of their total and per event size. A further page visualises this in a lot more detail by providing an interactive, zoomable starburst and treemap graphic. An example of this is shown in figure 4. The benchmarking information reported by the tests is further utilised to forecast the future computational needs of the experiment in terms of integrated CPU usage, memory footprint and storage volume needed, which information allows NOvA to rapidly and accurately predict their future needs and to plan production campaigns accordingly. 5. Conclusions This paper has presented the NOνA software testing framework. The rigour enforced through this flexible testing framework enables NOvA to rapidly verify configurations, results and software and thus ensure that data is available for physics analysis in a timely and robust manner. Acknowledgements The author acknowledges support for this research was carried out by the Fermilab scientific and technical staff. Fermilab is Operated by Fermi Research Alliance, LLC under Contract No. De-AC02-07CH11359 with the United States Department of Energy References [1] D. S. Ayres et al. [NOvA Collaboration], FERMILAB-DESIGN-2007-01. [2] C. Green and J. Kowalkowski and M. Paterno and M. Fischler and L. Garren and others, J. Phys. Conf. Ser.,396,022020 (2012). 4

Figure 3. The memory usage of one test visualised using the Bokeh python library [7]. The coloured circles are displayed at times when important processing steps occur. These provide the user with contextual information on mouse hover. [3] R. Brun and F. Rademakers, Nucl. Instrum. Meth. A 389, 81 (1997). [4] Highcharts, http://www.highcharts.com/. [5] Bootstrap, http://getbootstrap.com/. [6] Hagmann C, Lange D, Wright D Cosmic-ray Shower Library (CRY), LLNL UCRL-TM-229453 Lawrence Livermore National Laboratory. Avaliable at http://nuclear.llnl.gov/simulation/cry.pdf [7] Bokeh, http://bokeh.pydata.org/en/latest/index.html. [8] Data Driven Documents, http://d3js.org/. 5

Figure 4. A D3 [8] based graphic displaying the size of one data format produced during tests. This graphic allows users to drill down to study the particular portion of the data format they are concerned with. 6