CMS users data management service integration and first experiences with its NoSQL data storage

Similar documents
CouchDB-based system for data management in a Grid environment Implementation and Experience

Optimization of Italian CMS Computing Centers via MIUR funded Research Projects

ATLAS Nightly Build System Upgrade

The ATLAS EventIndex: an event catalogue for experiments collecting large amounts of data

WLCG Transfers Dashboard: a Unified Monitoring Tool for Heterogeneous Data Transfers.

Tests of PROOF-on-Demand with ATLAS Prodsys2 and first experience with HTTP federation

The evolving role of Tier2s in ATLAS with the new Computing and Data Distribution model

Large scale commissioning and operational experience with tier-2 to tier-2 data transfer links in CMS

Striped Data Server for Scalable Parallel Data Analysis

The CMS workload management system

AGIS: The ATLAS Grid Information System

Towards Monitoring-as-a-service for Scientific Computing Cloud applications using the ElasticSearch ecosystem

The ALICE Glance Shift Accounting Management System (SAMS)

Evolution of Database Replication Technologies for WLCG

Monitoring of large-scale federated data storage: XRootD and beyond.

Performance of popular open source databases for HEP related computing problems

The CMS data quality monitoring software: experience and future prospects

ATLAS Tracking Detector Upgrade studies using the Fast Simulation Engine

Improved ATLAS HammerCloud Monitoring for Local Site Administration

Managing a tier-2 computer centre with a private cloud infrastructure

The Legnaro-Padova distributed Tier-2: challenges and results

ATLAS software configuration and build tool optimisation

File Access Optimization with the Lustre Filesystem at Florida CMS T2

An SQL-based approach to physics analysis

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

The High-Level Dataset-based Data Transfer System in BESDIRAC

ISTITUTO NAZIONALE DI FISICA NUCLEARE

The CMS experiment workflows on StoRM based storage at Tier-1 and Tier-2 centers

HammerCloud: A Stress Testing System for Distributed Analysis

Popularity Prediction Tool for ATLAS Distributed Data Management

Use of containerisation as an alternative to full virtualisation in grid environments.

The DMLite Rucio Plugin: ATLAS data in a filesystem

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

PoS(EGICF12-EMITC2)106

Real-time dataflow and workflow with the CMS tracker data

Andrea Sciabà CERN, Switzerland

Experience with PROOF-Lite in ATLAS data analysis

An Analysis of Storage Interface Usages at a Large, MultiExperiment Tier 1

Tier 3 batch system data locality via managed caches

High Throughput WAN Data Transfer with Hadoop-based Storage

Development of DKB ETL module in case of data conversion

CMS High Level Trigger Timing Measurements

Monitoring System for the GRID Monte Carlo Mass Production in the H1 Experiment at DESY

The ATLAS Tier-3 in Geneva and the Trigger Development Facility

Benchmarking the ATLAS software through the Kit Validation engine

Security in the CernVM File System and the Frontier Distributed Database Caching System

Overview of ATLAS PanDA Workload Management

GStat 2.0: Grid Information System Status Monitoring

PoS(ACAT2010)029. Tools to use heterogeneous Grid schedulers and storage system. Mattia Cinquilli. Giuseppe Codispoti

CMS data quality monitoring: Systems and experiences

Analisi Tier2 e Tier3 Esperienze ai Tier-2 Giacinto Donvito INFN-BARI

Automating usability of ATLAS Distributed Computing resources

A self-configuring control system for storage and computing departments at INFN-CNAF Tierl

Data preservation for the HERA experiments at DESY using dcache technology

Understanding the T2 traffic in CMS during Run-1

The GAP project: GPU applications for High Level Trigger and Medical Imaging

Early experience with the Run 2 ATLAS analysis model

A High Availability Solution for GRID Services

The Compact Muon Solenoid Experiment. Conference Report. Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland

Using Hadoop File System and MapReduce in a small/medium Grid site

The INFN Tier1. 1. INFN-CNAF, Italy

ATLAS Distributed Computing Experience and Performance During the LHC Run-2

The Compact Muon Solenoid Experiment. Conference Report. Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland

Reliability Engineering Analysis of ATLAS Data Reprocessing Campaigns

CernVM-FS beyond LHC computing

Geant4 Computing Performance Benchmarking and Monitoring

Europeana Core Service Platform

ATLAS Data Management Accounting with Hadoop Pig and HBase

ATLAS operations in the GridKa T1/T2 Cloud

Monte Carlo Production Management at CMS

The ATLAS PanDA Pilot in Operation

ANSE: Advanced Network Services for [LHC] Experiments

Evolution of Database Replication Technologies for WLCG

A Tool for Conditions Tag Management in ATLAS

PROOF as a Service on the Cloud: a Virtual Analysis Facility based on the CernVM ecosystem

Using Puppet to contextualize computing resources for ATLAS analysis on Google Compute Engine

CMS - HLT Configuration Management System

Scientific Cluster Deployment and Recovery Using puppet to simplify cluster management

DIRAC pilot framework and the DIRAC Workload Management System

System level traffic shaping in disk servers with heterogeneous protocols

Testing an Open Source installation and server provisioning tool for the INFN CNAF Tier1 Storage system

Towards Network Awareness in LHC Computing

The Database Driven ATLAS Trigger Configuration System

Analysis of internal network requirements for the distributed Nordic Tier-1

Monitoring ARC services with GangliARC

AutoPyFactory: A Scalable Flexible Pilot Factory Implementation

Data Transfers Between LHC Grid Sites Dorian Kcira

Experience with ATLAS MySQL PanDA database service

jspydb, an open source database-independent tool for data management

Data transfer over the wide area network with a large round trip time

Software installation and condition data distribution via CernVM File System in ATLAS

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Evaluation of the Huawei UDS cloud storage system for CERN specific data

Monte Carlo Production on the Grid by the H1 Collaboration

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

Identity federation in OpenStack - an introduction to hybrid clouds

The ATLAS EventIndex: Full chain deployment and first operation

Investigation of High-Level Synthesis tools applicability to data acquisition systems design based on the CMS ECAL Data Concentrator Card example

A data handling system for modern and future Fermilab experiments

A scalable monitoring for the CMS Filter Farm based on elasticsearch

Transcription:

Journal of Physics: Conference Series OPEN ACCESS CMS users data management service integration and first experiences with its NoSQL data storage To cite this article: H Riahi et al 2014 J. Phys.: Conf. Ser. 513 032079 View the article online for updates and enhancements. This content was downloaded from IP address 148.251.232.83 on 16/10/2018 at 22:30

CMS users data management service integration and first experiences with its NoSQL data storage H Riahi 1, D Spiga 2, T Boccali 3, D Ciangottini 4, M Cinquilli 2, J M Hernàndez 5, P Konstantinov 6, M Mascheroni 7 and A Santocchia 4 1 INFN Perugia, Via Alessandro Pascoli, 06123 Perugia, Italy 2 European Organisation for Nuclear Research, IT Department, CH-1211 Geneva 23, Switzerland 3 INFN Pisa, Edificio C - Via F. Buonarroti 2, 56127 Pisa, Italy 4 Universita and INFN Perugia, Via Alessandro Pascoli, 06123 Perugia, Italy 5 CIEMAT, Av Complutense, 40, 28040 Madrid, Spain 6 INRNE, 72, Tzarigradsko Chaussee blvd, BG-1784, Sofia, Bulgaria 7 INFN Milano-Bicocca, Edificio U2, Piazza della Scienza, 3 - I-20126 Milano, Italy E-mail: Hassen.Riahi@pg.infn.it Abstract. The distributed data analysis workflow in CMS assumes that jobs run in a different location to where their results are finally stored. Typically the user outputs must be transferred from one site to another by a dedicated CMS service, AsyncStageOut. This new service is originally developed to address the inefficiency in using the CMS computing resources when transferring the analysis job outputs, synchronously, once they are produced in the job execution node to the remote site. The AsyncStageOut is designed as a thin application relying only on the NoSQL database (CouchDB) as input and data storage. It has progressed from a limited prototype to a highly adaptable service which manages and monitors the whole user files steps, namely file transfer and publication. The AsyncStageOut is integrated with the Common CMS/Atlas Analysis Framework. It foresees the management of nearly nearly 200k users files per day of close to 1000 individual users per month with minimal delays, and providing a real time monitoring and reports to users and service operators, while being highly available. The associated data volume represents a new set of challenges in the areas of database scalability and service performance and efficiency. In this paper, we present an overview of the AsyncStageOut model and the integration strategy with the Common Analysis Framework. The motivations for using the NoSQL technology are also presented, as well as data design and the techniques used for efficient indexing and monitoring of the data. We describe deployment model for the high availability and scalability of the service. We also discuss the hardware requirements and the results achieved as they were determined by testing with actual data and realistic loads during the commissioning and the initial production phase with the Common Analysis Framework. 1. Introduction CMS[1], the Compact Muon Solenoid experiment located at CERN (Geneva, Switzerland), has defined a model where end-user analysis jobs running on a worker node at a site, where the data reside, store and publish their outputs in the storage element of a user-designated site for later access. The AsyncStageOut (ASO)[2] is a central service that handles the transfer and Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. Published under licence by IOP Publishing Ltd 1

publication of users outputs to the final storage element. It is designed as a thin application relying only on the NoSQL database (CouchDB)[3] as input and data storage. The highly adaptable design of the ASO has made easy its integration with the CMS/ATLAS Common Analysis Framework (CAF)[4]. In this paper, an overview of the ASO model and the integration strategy with the CAF are presented. The motivations for using the NoSQL technology as well as the deployment model for the high availability and scalability of the service are described. The hardware requirements and the results achieved are also discussed. 2. Overview of the AsyncStageOut model The direct remote stage-out has been used in CMS since the first versions of the CMS Remote Analysis Builder (CRAB)[5]. Within this approach, the jobs execute the analysis code on the worker nodes of the site where the data resilde and copy from there each output file to a user pre-defined location at the end of the execution. This strategy has caused the most common failure mode for analysis jobs. Overall, about 25 to 30 % of the analysis jobs fail and about 30 to 50 % of the failures are due to remote stage-out, so, between 10 and 15 % of the CMS analysis jobs fail in the remote stage out. Those jobs fail at the end of the processing, so the overall CPU loss is even higher than 10-15 %. To address the synchronous stage-out issues, it was decided to adopt an asynchronous strategy for the remote stage-out. This has required the design and development of a machinery able to stage-out the outputs locally, in a temporary area of the site storage where the code is executed, followed by a subsequent outputs harvesting step where the users outputs are copied to a remote storage using the ASO tool. It has allowed to: reduce the remote stage-out failure rate affecting the analysis jobs avoid the inefficiency of storages due to the unscheduled and potentially concurrent stageout approach by preventing overloads limit the CPU wasting caused by the remote synchronous stage-out of outputs improve automation in users file management tasks The ASO tool is implemented as a standalone tool with a modular architecture. Its core components are described as follows: The TransferDaemon module retrieves data from the files database and then instantiates, for each user, a TransferWorker object which manages File Transfer Service (FTS)[6] jobs. The DBSPublisher retrieves the files ready for publication from the files database. It calls a PublisherWorker object for each user to interact with the Dataset Bookkeeping system (DBS)[7] for the files publication. The file metadata required for the publication are retrieved from the cache area specified in the configuration. The Analytics component contacts the source across the dedicated plugin to communicate the status of the files management request. The ASO Web monitoring is built on top of CouchDB. Figure 1 shows schematically the ASO architecture and the interactions between their components. 3. Integration strategy For the distributed analysis tool sustainability, CRAB3[5] is integrated with the distributed analysis tool of ATLAS, named PanDA[8] into a Common Analysis Framework. Since the management of the users outputs is mandatory for the execution of the CMS analysis workflow, the ASO has been also integrated into CAF. A dedicated plugin, named CMSAdderComponent, has been implemented in PanDA. It is called after the execution of the analysis job on the worker node. It pushes file documents into 2

Figure 1. AsyncStageOut architecture and interactions between their components. the ASO database, files database, across the CouchDB REST interface. In addition, it uploads into a cache area the file metadata required for the publication. Then, the ASO machinery described previously can start. To make the ASO communicating with the CAF, a dedicated plugin source has been implemented into ASO. Since the push mode is used for the communication CAF ASO, this plugin needs just to callback the CAF to update the job status. The message provider, maintained centrally at CERN, namely ActiveMQ[9], has been used to achieve that. The interactions between CAF and ASO are shown in Figure 2. Figure 2. AsyncStageOut and CAF integration. 4. The motivations of using NoSQL The NoSQL database, in particular CouchDB, is extensively used in the Data Management/Workload Management (DMWM) tools of CMS. This technology is used to persist the details of user outputs to manage. The users data management system, ASO, is a new system for CMS. Its implementation was fast using this technology since no particular database design has been required. Moreover, the schema-less nature of NoSQL models satisfies the need to constantly and rapidly incorporate new types of data to enrich the new applications as the ASO. CouchDB exposes natively a REST interface. This feature makes the communication of the ASO with external tools easy. Giving also that monitoring is an important component for CMS computing group, the CouchDB s integrated Web server has facilitated the prototyping and implementation of the ASO Web monitoring by serving the application directly to the browser. Furthermore, no particular deployment of the monitoring application is required since it is encapsulated into CouchDB by means of CouchApp technology. The easy replication and the integrated caching of NoSQL databases, such as CouchDB, influences highly the system scalability and availability. As shown by Figure 3, the replication represents the core of the ASO deployment model to guarantee high scalability and availability of the service. 3

Figure 3. AsyncStageOut deployment model. 5. Scalability test The ASO foresees the management of nearly 200k of users analysis files with minimal delays. During the commissioning of the CAF, scalability test is performed to study the ASO response to a high workload independently of the underlying services, namely FTS and DBS. The ASO has been loaded with nearly 300k files per day. Various scripts have been implemented for this test, in particular, a fakefts.pl script in PhedexLifeCycle[7] Agent to simulate the FTS server operations. One instance of ASO and CouchDB have been deployed in the same box. The machine used in this test is a Virtual Machine with 8 VCPU, 15 GB of RAM and 200 GB of disk storage. It is managed by a Xen Hypervisor. No other machine runs in this Hypervisor to avoid resources overcommitment. By design, ASO performances depend highly on the disk-based database, CouchDB. Within such databases, the views are always cached in the filesystem so the more RAM available the better. In addition, when a document is updated in CouchDB, the new version gets a new revision number, while the old one stays in the database. This caching and update mechanisms make CouchDB a very disk hungry system. The CPU Cores are used in particular for view building. 6. Results In this test, we injected 100k files every 8 hours. Initially, we saw that ASO spends too much time to retrieve the views emitting the value of several parameters. So the views used by the TransferWorker and PublisherWorker components have been split into smaller ones emitting the value of only one parameter. Furthermore, views have been cached in memory via crontab jobs. The first injection of 100k files has started at 16:00. As shown in Figure 4, the ASO has succeeded the transfer of this load in 7 hours, in particular, before the start of the next injection at 00:00. After the second injection, the database starts to be slower retrieving views information, maybe due to the number of updates made in the file collection. Probably that is the reason for the second main dip found, at 6:00, in the temporal plot in Figure 4. In any case, ASO did not show any issue other than a reduced processing rate. The average processing rate for this test is approximatively 9.6k files per hour with maximum of more than 20k files in an hour. In this test, nearly 60 GB of disk storage has been consumed by the CouchDB. Only 9 % of 60 GB has been used for the storage of databases while the rest for views caching. The average CPU idle time was almost stable at 90 %. The RAM has been almost always fully consumed during the processing time of the ASO. 4

Figure 4. Files transferred per hour. After this test, we can assert that ASO with the hardware configuration used appears to be stable at this daily workload despite some delays due to file collections update in CouchDB. Most probably these delays would be reduced by increasing the RAM dedicated to the Virtual Machine. 7. Conclusions The ASO design based on the NoSQL database, CouchDB, has allowed an easy integration of the tool with CAF. Scale tests of the ASO independently of the underlying services have been performed and have shown satisfactory performances of the tool. In particular there are some issues that have been investigated and, so far, well understood. Actually, the ASO can interact only with the Workload management tools such as PanDA. In the future, it will be exposed also to the users. The scalability test infrastructure is ready to be used for the highest load expected. Acknowledgments This work has been partially funded under contract 20108T4XTM of Programmi di Ricerca Scientifica di Rilevante Interesse Nazionale (Italy). References [1] CMS Collaboration, Adolphi R., et al. 2008 The CMS experiment at the CERN LHC, JINST 0803 S08004. [2] H Riahi 2012 Design optimization of the Grid data analysis workflow in CMS, Ph.D Thesis, University of Perugia, Italy. [3] Apache CouchDB database http://couchdb.apache.org. [4] M Girone, et al. The Common Analysis Framework Project, in these proceedings. [5] M Cinquilli, et al. 2012 CRAB3: Establishing a new generation of services for distributed analysis at CMS, J. Phys.: Conf. Ser. 396 032026. [6] A Frohner, et al. 2012 Data management in EGEE, J. Phys.: Conf. Ser. 219 062012. [7] T Wildish, et al. Integration and validation testing for PhEDEx, DBS and DAS with the PhEDEx LifeCycle agent, in these proceedings. [8] T Maeno 2008 PanDA: distributed production and distributed analysis system for ATLAS, J. Phys.: Conf. Ser. 119 062036. [9] ActiveMQ http://activemq.apache.org. 5