The NOvA DAQ Monitor System

Similar documents
The NOvA software testing framework

Data handling with SAM and art at the NOνA experiment

Performance quality monitoring system for the Daya Bay reactor neutrino experiment

Design of the protodune raw data management infrastructure

A data handling system for modern and future Fermilab experiments

Monitoring ARC services with GangliARC

SNiPER: an offline software framework for non-collider physics experiments

Geant4 Computing Performance Benchmarking and Monitoring

Security in the CernVM File System and the Frontier Distributed Database Caching System

CMS event display and data quality monitoring at LHC start-up

Data processing and storage in the Daya Bay Reactor Antineutrino Experiment

The ALICE Glance Shift Accounting Management System (SAMS)

Phronesis, a diagnosis and recovery tool for system administrators

Reliability Engineering Analysis of ATLAS Data Reprocessing Campaigns

CMS High Level Trigger Timing Measurements

Message Correlation Analysis Tool for NOvA

Automating usability of ATLAS Distributed Computing resources

ComPWA: A common amplitude analysis framework for PANDA

Large Scale Management of Physicists Personal Analysis Data Without Employing User and Group Quotas

Use of ROOT in the DØ Online Event Monitoring System

Performance quality monitoring system (PQM) for the Daya Bay experiment

The CMS data quality monitoring software: experience and future prospects

The Database Driven ATLAS Trigger Configuration System

RRDTool: A Round Robin Database for Network Monitoring

First Operational Experience from the LHCb Silicon Tracker

File Access Optimization with the Lustre Filesystem at Florida CMS T2

The High-Level Dataset-based Data Transfer System in BESDIRAC

Pixie-500 Express Manual Extension Pulse Shape Analysis Functions

The AAL project: automated monitoring and intelligent analysis for the ATLAS data taking infrastructure

Modular and scalable RESTful API to sustain STAR collaboration's record keeping

Tests of PROOF-on-Demand with ATLAS Prodsys2 and first experience with HTTP federation

Improved ATLAS HammerCloud Monitoring for Local Site Administration

Update of the BESIII Event Display System

Port Tapping Session 2 Race tune your infrastructure

Online remote monitoring facilities for the ATLAS experiment

ATLAS Tracking Detector Upgrade studies using the Fast Simulation Engine

Use of containerisation as an alternative to full virtualisation in grid environments.

Monitoring System for the GRID Monte Carlo Mass Production in the H1 Experiment at DESY

Locating the neutrino interaction vertex with the help of electronic detectors in the OPERA experiment

ATLAS Nightly Build System Upgrade

Evolution of Database Replication Technologies for WLCG

Hall D and IT. at Internal Review of IT in the 12 GeV Era. Mark M. Ito. May 20, Hall D. Hall D and IT. M. Ito. Introduction.

ATLAS software configuration and build tool optimisation

Measurement of the beam yz crabbing tilt due to wakefields using streak camera at CESR

DNS Server Status Dashboard

Data acquisition system of COMPASS experiment - progress and future plans

Subtlenoise: sonification of distributed computing operations

DNS Server Status Dashboard

A first look at 100 Gbps LAN technologies, with an emphasis on future DAQ applications.

THE ATLAS DATA ACQUISITION SYSTEM IN LHC RUN 2

An FPGA Based General Purpose DAQ Module for the KLOE-2 Experiment

Data preservation for the HERA experiments at DESY using dcache technology

ONLINE MONITORING SYSTEM FOR THE EXPERIMENT

Update on PRad GEMs, Readout Electronics & DAQ

High Voltage system for the Double Chooz experiment

Visual Analysis of Lagrangian Particle Data from Combustion Simulations

arxiv: v1 [physics.ins-det] 19 Oct 2017

The Neutron Monitor Control Panel

Anode Electronics Crosstalk on the ME 234/2 Chamber

A framework to monitor activities of satellite data processing in real-time

Scientific Cluster Deployment and Recovery Using puppet to simplify cluster management

First LHCb measurement with data from the LHC Run 2

CMS - HLT Configuration Management System

Server Status Dashboard

MONITORING STORAGE PERFORMANCE OF IBM SVC SYSTEMS WITH SENTRY SOFTWARE

LArTPC Reconstruction Challenges

Machine Learning approach to neutrino experiment track reconstruction. Robert Sulej Connecting The Dots / Intelligent Trackers March 2017

RAC Performance Monitoring and Diagnosis using Oracle Enterprise Manager. Kai Yu Senior System Engineer Dell Oracle Solutions Engineering

Towards Monitoring-as-a-service for Scientific Computing Cloud applications using the ElasticSearch ecosystem

Modeling Resource Utilization of a Large Data Acquisition System

An SQL-based approach to physics analysis

The CMS High Level Trigger System: Experience and Future Development

PoS(ISGC 2011 & OGF 31)047

Software for implementing trigger algorithms on the upgraded CMS Global Trigger System

Oracle Database 11g: Real Application Testing & Manageability Overview

vsan Monitoring January 08, 2018

Electronics and data acquisition systems for the RPC based INO ICAL detector

The ATLAS EventIndex: an event catalogue for experiments collecting large amounts of data

Belle & Belle II. Takanori Hara (KEK) 9 June, 2015 DPHEP Collaboration CERN

Preliminary Research on Distributed Cluster Monitoring of G/S Model

Hall C 12 GeV Analyzer Update. Stephen Wood

CMS data quality monitoring: Systems and experiences

Wiener-DAQ program for data acquisition with Wiener CC-USB CAMAC Controller

A DAQ system for CAMAC controller CC/NET using DAQ-Middleware

Investigation of High-Level Synthesis tools applicability to data acquisition systems design based on the CMS ECAL Data Concentrator Card example

Activity on GEM by the Rome group since last meeting

Monte Carlo Production Management at CMS

The TDAQ Analytics Dashboard: a real-time web application for the ATLAS TDAQ control infrastructure

Data oriented job submission scheme for the PHENIX user analysis in CCJ

Extraction of cross-sea bridges from GF-2 PMS satellite images using mathematical morphology

Data Quality Monitoring Display for ATLAS experiment

A self-configuring control system for storage and computing departments at INFN-CNAF Tierl

The Intelligent FPGA Data Acquisition

Testing and Diagnostics

Implementation of RS-485 Communication between PLC and PC of Distributed Control System Based on VB

0.1 Slow Monitoring and Recording System

Application of Augmented Reality Technology in Workshop Production Management

Legacy code: lessons from NA61/SHINE offline software upgrade adventure

Monitoring of large-scale federated data storage: XRootD and beyond.

LANGUAGE RUNTIME NON-VOLATILE RAM AWARE SWAPPING

Transcription:

Journal of Physics: Conference Series PAPER OPEN ACCESS The NOvA DAQ Monitor System To cite this article: Michael Baird et al 2015 J. Phys.: Conf. Ser. 664 082020 View the article online for updates and enhancements. Related content - A multi-channel distributed DAQ for n-tpc Cheng Xiao-Lei, Liu Jian-Fang, Yu Qian et al. - Future of DAQ Frameworks and Approaches, and Their Evolution towards the Internet of Things Niko Neufeld - Pilot run of the new DAQ of the COMPASS experiment J Novy, M Bodlak, V Jary et al. This content was downloaded from IP address 148.251.232.83 on 20/08/2018 at 08:04

The NOνA DAQ Monitor System Michael Baird 1, Deepika Grover 2, Susan Kasahara 3, and Mark Messier 1 for the NOνA Collaboration 1 Indiana University, Bloomington, IN, USA 2 Banaras Hindu University, Varanasi-221005, Uttar Pradesh, India 3 University of Minnesota, Minneapolis, MN, USA E-mail: mibaird@umail.iu.edu, deepikagrover142@gmail.com, schubert@physics.umn.edu, messier@indiana.edu Abstract. The NOνA (NuMI Off-Axis ν e Appearance) experiment is a long-baseline neutrino experiment designed to search for ν μ ( ν μ)toν e ( ν e) oscillations using Fermilab s NuMI main injector neutrino beam. The experiment consists of two detectors; both positioned 14 mrad off the beam axis: a 220 ton Near Detector constructed in an underground cavern at Fermilab and a 14 kton Far Detector constructed in Ash River, MN, 810 km from the beam source. The health and performance of the NOνA Data Acquisition (DAQ) system is monitored with a DAQ Monitor system based on the Ganglia distributed monitoring system, an open source third-party product which provides much of the NOνA DAQ monitoring needs out-of-the-box. This paper will discuss the use of the Ganglia system for this purpose, including augmentations we have made to the Ganglia base for the specific needs of our system. This paper will also discuss two other systems used to monitor the quality of the data collected by the NOνA detectors: an Online Monitoring system and Event Display, both of which leverage tools from the offline framework to provide close to real time diagnostic tools of detector performance. 1. Introduction The Near and Far detectors of the NOνA experiment have similar design and are constructed of planes of PVC extrusion cells filled with liquid scintillator and wavelength shifting fibers extending the length of each cell. The fiber ends are read out by Avalanche Photodiodes (APDs). An overview of the NOνA DAQ system is shown in Figure 1. The primary task for the DAQ system is to concentrate the data from the large number of APD channels (340000 channels at the Far Detector, 20000 channels at the Near Detector), buffer this data long enough to apply an online trigger, and record the selected data. The DAQ system consists of 180 custom data concentrator modules (DCMs), over 200 buffer farm nodes, and 40 manager nodes (used to execute such processes as run control, global trigger, file transfer, and monitoring tasks) at two detector sites separated by 810 km. The DAQ system and data quality are monitored using a set of tools consisting of a DAQ Monitor system used to monitor the health and performance of the DAQ system, an Online Monitoring system used to monitor the quality of the data produced by the detectors, and an Event Display used to observe and monitor the quality of the data produced by the detectors. Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. Published under licence by IOP Publishing Ltd 1

Figure 1. Overview of the NOνA DAQ system. Both the Near and Far detector share a common DAQ design. The number of components shown in the figure apply to the Far detector. The Front End Boards (FEBs) read out the data from the APDs and provide the first stage of data consolidation. 2. DAQ Monitor The health and performance of the DAQ system is monitored with the DAQ Monitor system. Its purpose is to provide real time information to shift operators on the status of the DAQ system, as well as historical data to DAQ experts and others for detailed analysis of data acquisition performance. The DAQ Monitor system is based on an open source third-party product, the Ganglia distributed monitoring system [1]. Ganglia provides much of the functionality needed for the DAQ Monitor system out-of-the-box with the ability to collect standard computing performance metrics such as CPU usage, memory usage, and network transfer rates. Ganglia also provides the basis for displaying this information in a web display and storing this information in a Round Robin Database (RRD) [2]. Figure 2 illustrates the relationship of the single DAQ Monitor Server with the monitored DAQ components from which it receives DAQ performance metric data, as well as the Ganglia tools used as part of this distribution of metric data. Shown in the figure are examples of two Ganglia provided daemons: gmond. A light weight daemon running on every node to be monitored, this daemon collects the system as well as custom metrics and shares this information as requested. gmetad. A daemon running only on the DAQ Monitor server node. This daemon collects data from one gmond daemon per monitored cluster, and stores this data in a RRD. 2

Figure 2. Architectural Overview of DAQ Monitor Server (daqmon) and the monitored components from which it receives data. The PHP-based web front-end packaged with Ganglia allows the user to view the metrics packaged with Ganglia, such as CPU usage, memory usage and network transfer rates. It is also extensible to provide the ability for the user to create custom graphs. This capability is used to create graphs for NOνA specific metrics. 2.1. Ganglia Customizations The Ganglia base has been customized for the specific needs of the NOνA DAQ Monitoring requirements. The first of these customizations is to introduce a C++ software package to provide a client interface for purposes of constructing and distributing custom metrics. This package makes use of the Ganglia package gmetric functions, and serves as a wrapper with some additional functionality for the generation and sending of custom metrics. The package is used by DAQ components to send monitored quantities specific to their application at regular time intervals to be collected by the gmond daemons running on their node. This custom metric data is then collected and displayed on the web display and stored in the database like other Ganglia metrics. Examples of custom metrics created and monitored in this way are shown in Figure 3. The second customization has been to introduce an application to be executed on the DAQ Monitor Server to monitor the state of the metric data stored in the RRD database and report values out of range to a message viewer service. This application is in communication with the DAQ system and aware of the detector state. Thresholds for reporting monitored metrics out-of-range are configurable on a node, cluster or grid level. An example is shown in Figure 4. Messages will be issued for data found to be out-of-range at the Warning or Error level as configured. 3. Online Monitoring While the DAQ Monitor system is focused on monitoring the health of the DAQ system itself, the Online Monitoring system is focused on close to real-time analysis of the data produced by the detector for purposes of monitoring the data quality. This system produces over 75 different data quality metrics from raw hit/pixel rates and ADC spectra, to error codes reported by 3

Figure 3. Custom metrics displayed in the Ganglia web display. Figure 4. An example XML file in which thresholds are configured to monitor corrupt and missing data is shown on the left. The message viewer in which warnings or error messages are reported is shown on the right. individual hardware elements. An example of one such histogram used to monitor the state of the detector is shown in Figure 5. The Online Monitor operates with a latency of approximately 100 ms from the production of a triggered event. The online monitoring framework has two main programs: a producer that unpacks the raw data and fills a suite of histograms that track quantities pertinent to data quality, and a viewer which organizes and displays the histograms in a meaningful way. These two communicate through a segment of shared memory with the producer functioning as the server and the viewer attaching as a client. User requests made through the viewer are written to a queue in the shared memory and satisified by the producer. The majority of the shared memory is used to exchange the contents of the histograms requested by the user. 4

Figure 5. An example plot from the Online Monitoring system. This particular plot is used to display the average hit rate per FEB throughout the detector for each in diagnosing channels which are hot or cold. In addition to storing the current state of the detector, the producer automatically retains a series of histograms that represent recent snapshots of the detector state looking back as long as 24 hours. Access to these look-back histograms are provided in the viewer so that the user can examine previous performance and generate comparison metrics, such as bin-by-bin ratios or differences. Many histogram parameters, including the look-back schedule, are runtime configurable. Other configurable parameters include descriptive captions to be displayed in the viewer as well as a set of display options for each histogram. This run-time configuration is used to handle differences between histograms required for the near and far detectors, so that identical versions of the online monitoring software can be easily deployed for each detector. 4. Event Display The live event display provides a close to real-time display of the detector data triggered events for the purpose of a high-level overview of detector performance. An example of an event display is shown in Figure 6. The raw information is displayed in a human-readable way so that many global problems that are difficult to diagnose from the hit-level data are easily visible when the data is displayed in the event display. As an example, detector timing problems can create broken tracks, which are immediately apparent to a shifter by looking at a few cosmic ray events. 5. Conclusion The NOνA DAQ system has been fully deployed and is operational at the NOνA Far and Near detectors. The DAQ system and data quality are monitored 24/7 for purposes of maximizing both the quantity and quality of the data collected by the detectors using the tools described in this paper. In particular, we have found that the use of the Ganglia distributed monitoring system as the basis of a DAQ Monitor system has proven to be an effective means by which to 5

Figure 6. An event shown in the NOνA Event Display. This event shows all hits recorded in a 550 microsecond window in the Far Detector. monitor the DAQ system components. Acknowledgments Support for the presenter was provided by the National Science Foundation, NSF RUI grant #1306944. The authors acknowledge that support for this research was carried out by the Fermilab scientific and technical staff. Fermilab is Operated by Fermi Research Alliance, LLC under Contract No. De-AC02-07CH11359 with the United States Department of Energy. The University of Minnesota particle physics group is supported by DE-SC0007838 also from the Department of Energy. The Banaras Hindu University particle physics group is supported by IIFC-np and DST, New Delhi, India. References [1] Ganglia Monitor System. http://ganglia.sourceforge.net/ [2] RRDtool. http://www.mrtg.org/rrdtool/ 6