Experience with ATLAS MySQL PanDA database service

Similar documents
Overview of ATLAS PanDA Workload Management

Performance of popular open source databases for HEP related computing problems

The PanDA System in the ATLAS Experiment

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science

INDEXING OF ATLAS DATA MANAGEMENT AND ANALYSIS SYSTEM

System upgrade and future perspective for the operation of Tokyo Tier2 center. T. Nakamura, T. Mashimo, N. Matsui, H. Sakamoto and I.

ATLAS Nightly Build System Upgrade

The ATLAS EventIndex: an event catalogue for experiments collecting large amounts of data

The evolving role of Tier2s in ATLAS with the new Computing and Data Distribution model

The ATLAS PanDA Pilot in Operation

Monte Carlo Production on the Grid by the H1 Collaboration

Reliability Engineering Analysis of ATLAS Data Reprocessing Campaigns

Delegates must have a working knowledge of MariaDB or MySQL Database Administration.

Experiences with the new ATLAS Distributed Data Management System

Experience with PROOF-Lite in ATLAS data analysis

Database Services at CERN with Oracle 10g RAC and ASM on Commodity HW

ATLAS Oracle database applications and plans for use of the Oracle 11g enhancements

AutoPyFactory: A Scalable Flexible Pilot Factory Implementation

CMS users data management service integration and first experiences with its NoSQL data storage

Storage Optimization with Oracle Database 11g

Monitoring System for the GRID Monte Carlo Mass Production in the H1 Experiment at DESY

The Care and Feeding of a MySQL Database for Linux Adminstrators. Dave Stokes MySQL Community Manager

DIRAC pilot framework and the DIRAC Workload Management System

AGIS: The ATLAS Grid Information System

Analysis of internal network requirements for the distributed Nordic Tier-1

Popularity Prediction Tool for ATLAS Distributed Data Management

Geant4 Computing Performance Benchmarking and Monitoring

Accelerate Database Performance and Reduce Response Times in MongoDB Humongous Environments with the LSI Nytro MegaRAID Flash Accelerator Card

The High-Level Dataset-based Data Transfer System in BESDIRAC

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

Monitoring ARC services with GangliARC

ASN Configuration Best Practices

Performance tests of Hypertable. Mauro Giacchini, Robert Petkus Group Meeting 10/27, BROOKHAVEN SCIENCE

Oracle at CERN CERN openlab summer students programme 2011

Microsoft. Designing Business Intelligence Solutions with Microsoft SQL Server 2012

EMC Business Continuity for Microsoft Applications

Still All on One Server: Perforce at Scale

Current Status of the Ceph Based Storage Systems at the RACF

Conceptual Modeling on Tencent s Distributed Database Systems. Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc.

Data oriented job submission scheme for the PHENIX user analysis in CCJ

High Availability and Disaster Recovery features in Microsoft Exchange Server 2007 SP1

HP Dynamic Deduplication achieving a 50:1 ratio

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

InnoDB Scalability Limits. Peter Zaitsev, Vadim Tkachenko Percona Inc MySQL Users Conference 2008 April 14-17, 2008

HIGH PERFORMANCE SANLESS CLUSTERING THE POWER OF FUSION-IO THE PROTECTION OF SIOS

ATLAS Data Management Accounting with Hadoop Pig and HBase

Monitoring HTCondor with the BigPanDA monitoring package

ATLAS Distributed Computing Experience and Performance During the LHC Run-2

High Availability and Disaster Recovery Solutions for Perforce

The ATLAS EventIndex: Full chain deployment and first operation

Tests of PROOF-on-Demand with ATLAS Prodsys2 and first experience with HTTP federation

ATLAS operations in the GridKa T1/T2 Cloud

<Insert Picture Here> Looking at Performance - What s new in MySQL Workbench 6.2

MySQL for Database Administrators Ed 4

BusinessObjects XI Release 2

Hardware & System Requirements

ORACLE 11gR2 DBA. by Mr. Akal Singh ( Oracle Certified Master ) COURSE CONTENT. INTRODUCTION to ORACLE

ATLAS DQ2 to Rucio renaming infrastructure

<Insert Picture Here> MySQL Web Reference Architectures Building Massively Scalable Web Infrastructure

Large scale commissioning and operational experience with tier-2 to tier-2 data transfer links in CMS

GridMonitor: Integration of Large Scale Facility Fabric Monitoring with Meta Data Service in Grid Environment

LHCb Distributed Conditions Database

Evaluating Cloud Storage Strategies. James Bottomley; CTO, Server Virtualization

Dell EMC PowerEdge R740xd as a Dedicated Milestone Server, Using Nvidia GPU Hardware Acceleration

HammerCloud: A Stress Testing System for Distributed Analysis

Data Movement & Tiering with DMF 7

Carlos Fernando Gamboa, BNL Andrew Wong, TRIUMF. WLCG Collaboration Workshop, CERN Geneva, April 2008.

Life on the Edge: Monitoring and Running A Very Large Perforce Installation

Oracle Primavera P6 Enterprise Project Portfolio Management Performance and Sizing Guide. An Oracle White Paper April 2011

Dell EMC CIFS-ECS Tool

MySQL HA Solutions. Keeping it simple, kinda! By: Chris Schneider MySQL Architect Ning.com

The ATLAS Production System

Data transfer over the wide area network with a large round trip time

Veeam Availability for Nutanix AHV

WHITE PAPER AGILOFT SCALABILITY AND REDUNDANCY

NetVault Backup Client and Server Sizing Guide 3.0

Deduplication Storage System

Design Patterns for the Cloud. MCSN - N. Tonellotto - Distributed Enabling Platforms 68

A new petabyte-scale data derivation framework for ATLAS

UW-ATLAS Experiences with Condor

CMS High Level Trigger Timing Measurements

SONAS Best Practices and options for CIFS Scalability

Enterprise Backup Architecture. Richard McClain Senior Oracle DBA

New Persistent Back-End for the ATLAS Online Information Service

Chapter 4 Data Movement Process

arxiv: v1 [cs.dc] 12 May 2017

The INFN Tier1. 1. INFN-CNAF, Italy

CO MySQL for Database Administrators

Oracle Zero Data Loss Recovery Appliance (ZDLRA)

Performance comparisons and trade-offs for various MySQL replication schemes

Oracle 1Z MySQL 5.6 Database Administrator. Download Full Version :

Using Puppet to contextualize computing resources for ATLAS analysis on Google Compute Engine

Providing a first class, enterprise-level, backup and archive service for Oxford University

Preventing and Resolving MySQL Downtime. Jervin Real, Michael Coburn Percona

Overcoming Obstacles to Petabyte Archives

A Tool for Conditions Tag Management in ATLAS

Early experience with the Run 2 ATLAS analysis model

Dell PowerEdge R720xd with PERC H710P: A Balanced Configuration for Microsoft Exchange 2010 Solutions

Tuning Intelligent Data Lake Performance

Streamlining CASTOR to manage the LHC data torrent

Transcription:

Journal of Physics: Conference Series Experience with ATLAS MySQL PanDA database service To cite this article: Y Smirnov et al 2010 J. Phys.: Conf. Ser. 219 042059 View the article online for updates and enhancements. Related content - Migration of ATLAS PanDA to CERN Graeme Andrew Stewart, Alexei Klimentov, Birger Koblitz et al. - Optimised access to user analysis data using the glite DPM Sam Skipsey, Greig Cowan, Mike Kenyon et al. - Next generation database relational solutions for ATLAS distributed computing G Dimitrov, T Maeno, V Garonne et al. This content was downloaded from IP address 148.251.232.83 on 25/08/2018 at 08:06

Experience with ATLAS MySQL PanDA Database Service Y Smirnov 1, T Wlodek 1, K De 2, J Hover 1, N Ozturk 2, J Smith 1, T Wenaus 1 and D Yu 1 1 Physics Department, Brookhaven National Laboratory, Upton, NY 11973-5000, USA 2 Department of Physics, University of Texas at Arlington, Arlington, TX, 76019, USA Abstract. The PanDA distributed production and analysis system has been in production use for ATLAS data processing and analysis since late 2005 in the US, and globally throughout ATLAS since early 2008. Its core architecture is based on a set of stateless web services served by Apache and backed by a suite of MySQL databases that are the repository for all PanDA information: active and archival job queues, dataset and file catalogs, site configuration information, monitoring information, system control parameters, and so on. This database system is one of the most critical components of PanDA, and has successfully delivered the functional and scaling performance required by PanDA, currently operating at a scale of half a million jobs per week, with much growth still to come. In this paper we describe the design and implementation of the PanDA database system, its architecture of MySQL servers deployed at BNL and CERN, backup strategy and monitoring tools. The system has been developed, thoroughly tested, and brought to production to provide highly reliable, scalable, flexible and available database services for ATLAS Monte Carlo production, reconstruction and physics analysis. 1. Introduction The MySQL database [1] forms the backbone of ATLAS [2] PanDA[3] (Production and Distributed Analysis) system. This system including corresponding database backend has been developed in Brookhaven National Laboratory (USA) in 2005, deployed since late 2005 to support the distributed MonteCarlo (MC) simulation, reprocessing production and user analysis on the Grid originally within US ATLAS. Since early 2008 it is globally used as the worldwide distributed production system in ATLAS collaboration. MySQL PanDA databases were configured, created, tuned, debugged and used first at BNL, then since June 2008 the second PanDA DB instance has been installed and brought to production in CERN. PanDA databases are the repository for all PanDA-related information, namely they contain the following data: Description of MC-production, reprocessing, validation, test and user analysis jobs (active and archived) and tasks Input, output files and dataset catalogues Short extracts from logfiles and some metadata Site, queue and pilot configuration data Monitoring information c 2010 IOP Publishing Ltd 1

System control parameters, etc. 2. PanDA DataBase servers structure and perfomance In this section we discuss PanDA DataBase backend. PanDA databases at BNL are installed on several independent servers. Two of them (primary and spare) contain production information, four additional ones (two primary and two spares contain archive information, metadata, logs and dataset data. In addition to the production servers there are two MySQL servers which support PanDA development testbed, We use this development database testbed for debugging and testing new schema and configuration changes, performing some scalability and performance investigations, etc. In terms of hardware PanDA database production servers represent two Dell PowerEdge 2950 machines with 4-core Intel Xeon(R) CPU 5150@2.66GHz, 16GB memory using 64-bit RHEL4 OS and the hardware Raid-10 over 6 15krpm 145GB SAS disk drives. The four archive, metadata, log and dataset servers are Penguin Computing Relion 2600SA machines with 8-core Intel Xeon CPU E5335@2.00GHz and 16GB memory using 64-bit RHEL4 OS and six 750GB SATA drives with the software raid 10 implementation. PanDA production DB servers dbpro and dbpro02 run MySQL 5.0.X and host the main PanDA database PandaDB which acts as a fast buffer an stores the information about active Monte Carlo production, validation, reprocessing and data analysis jobs. Completed job records are stored in the database for up to two weeks, following this time period they are moved to the archive database. The database consists of 31 InnoDB tables containing up to 17M of rows. The hardware allows for 380-440 threads accessing the database simultaneously (although maximum number of connections about 800 has been reached). The average performance is 360 queries per second. Queries are approximately 35% select type, 35% update and 25% insert, the remaining ones are of other types (delete, etc.) PanDA archive production MySQL server dbarch3 and the spare node dbarch5 support the following databases: PandaArchiveDB, PandaMetaDB, and PandaLogDB. The largest archive database PandaArchiveDB keeps the full archive of PanDA managed Monte-Carlo production, reprocessing, validation, test and user analysis jobs since the second half of 2005. In the end of 2007 these servers were migrated to a new more powerful hardware and run MySQL 5.0.X. Tables in PandaArchiveDB use MyISAM engine, doesn t have autoincrement-counter, and allow us to move the data from PandaDB through cron-jobs running on the servers. We have developed and successfully implemented some kind of partitioning in this DB: a new bi-monthly structure of job/file archive tables provide the better performance in terms of acceleration of the data mining and getting access to the data required. The database consists of 44 tables; the maximal number of rows per table is about 33,000,000. Other 2 databases running on the same server are DataBases PandaLogDB, and PandaMetaDB. They store the archive of log-extract files for jobs, some monitoring information about pilots, autopilot and scheduler-configuration support (schedconfig table). The same engine MyISAM and bi-mothnly partitioning schema is used as well, These DB have ~52-54 tables with maximum number of rows ~4,600,000 per table. Access pattern to this server: ~400-450 parallel threads open (max ~740) simultaneously. Average query performance is ~1300-1600 queries per second (maximum ~2800). The query distribution on that server is the following: select ~80%, insert ~20% (usually we don t have delete or update queries in the archive). The Dataset, Metadata-prime and Dataset browser databases occupy the servers dbarch4 and the spare node dbarch6. They store metadata and dataset information as well as some statistics about pilots, service information and monitoring. The 55 database tables are implemented using MyISAM engine and allowing for 100-200 parallel threads to access the database (250 threads during peak load periods). The databases serve 3-5 queries per second (up to 12 during peak periods). About 40% of the queries are select type, 30% are update, the rest are delete, insert etc. 2

PanDA databases at CERN have been recently migrated to Oracle. Since January 2008 till March 2009 they were based on MySQL using similar architecture to the the BNL PandaDB. There were 3 primary and 3 fail-over servers. PandaDB used machine pandadb1 and fail over machine pandadb4. It could handle 130-15 threads (max 250 threads) and 10-20 queries per second. The Archive, Meta and LogDBs were installed on pandadb2,3 machines with pandadb5,6 backup-servers. They could handle 10-25 parallel threads (maximum 40 was reached) and 50-60 queries per second. The database performance was monitored using the dedicated PanDA web-interface called PanDA-monitor, which forms an integral part of PanDA production software package. Another very helpful package we used for MySQL statistics monitoring is an open-source tool Mysqlstat. 3. Database backup and replication At BNL we ve developed several database backup policies and implemented them as part of the PanDA database service. 1. Full monthly backup - once a month a full text dump of all databases is performed. The dump is then loaded into fail over databases. An additional copy is stored on tape. 2. Daily backup. Text dump of selected databases is performed daily. The dump is then transferred to fail over node and loaded into fail over servers. Extra copy is stored on tape. 3. Combined text and binary backup. Text backup procedure is slow. For large databases may take considerable time and even create some unexpected issues on the production servers. For that reason it cannot be done more often than once per day. In between the text backups binary (incremental) backup is used. The MySQL server writes record of all write operations performed to database to special log file. This log file is then periodically transferred to failover server and the failover database is then updated. This procedure reduces the amount of data which can be lost in the event of catastrophic database failure. Full text backup procedure uses extensive table locking which affects the database performance, since queries which happen during the backup cannot be executed, are held waiting and frequently time out. To avoid this problem some databases are backed up using so called "hot backup". Hot backup is performed using commercially available software called "innobackup"[4] which performs backup of InnoDB tables while locking rows only. This allows us to avoid an interruption in the production service. The resulting hot backup file is then transferred to fail over machine and loaded into fail over server and a copy is sent to tape storage. Since hot backup is less disruptive from the production point of view it can be performed on-line more frequently than the text backup. Hot backup is not used in combination with incremental backup due to complications with synchronisation of those two backups. In principle it is possible to synchronize them, however we have found out that in practice a similar reduction of data loss risk can be achieved simply by increasing hot backup frequency. 4. Slow query monitoring We also implemented slow query monitoring for Panda database servers. Slow or badly constructed queries can cause severe performance degradation. A slow query can lock MyISAM tables for extended period of time thus holding execution of other queries. It is therefore necessary to perform monitoring of queries and eliminate bad ones. A custom program has been written in Python which monitors queries which are being executed and identifies slow ones. Those which execute longer than allowed time are then killed and logged into report which is periodically submitted to database administrators for analysis. This report allows identifying users and sites which often submit long, inefficient of badly constructed queries. Responsible persons are then contacted and ways to improve their queries are then suggested. 3

MySQL server keeps log of slow queries, however the format of this log makes it very hard to read it. A dedicated program has been written which provides GUI to analyze the slow query logs. This program allows following up MySQL queries on minute by minute basis and helps to identify which exactly query cases slowdowns of the system - even if the query executes fast enough to avoid being killed by slow query killer. The users can be then contacted and suggested changes to their MySQL usage patterns on the client side which could improve the query performance. 5. MySQL monitoring A vary important part of MySQL database monitoring is performed by Nagios[5] - a general purpose monitoring program, de facto community standard. Monitoring is done via nagios plugins - a set of custom written probes which are executed to check the status of monitored services. For the purpose of monitoring MySQL a dedicated set of probes has been written[6]. Examples of tasks performed by nagios plugins include: Periodic verification whether MySQL server is alive and available Measure time required to perform a benchmark query, raise alarm if it is too slow, Check disk space available on server, raise alarm if it is to low, Check the status of backup, raise alarm if the last backup failed, Perform backup validation and quality assurance. Compare the databases on production and failover node and raise alarm if their contents diverge - which can be a sign of failed backup or other data corruption, Check load on database server, raise alarm if it is to high, Check number of MySQL threads and processes. In addition to service monitoring with Nagios a separate monitoring system has been deployed to monitor the database performance using the Ganglia software [7]. It is an open source product for monitoring MySQL databases. Among others it collects information on the system load on servers, memory usage, and available disk space. The performance data is presented on operator s dashboard. In addition to Ganglia MySQL specific performance information (like for example number of running connection threads, average connection time) is collected by mysqlstat a free open source monitoring tool. 6. Conclusions MySQL PanDA DataBase system is one of the most critical components of PanDA, and has successfully delivered the functional and scaling performance required by PanDA. It currently operates at a scale of half a million jobs per week with much growth still to come. Archive catalogues keep track of almost 30,000,000 Monte-Carlo production, reprocessing, validation, test and user analysis jobs, and more than 150,000,000 files produced on the grid during last 4 years. Partitioning (bi-monthly tables design) gives an advantage of a quick searching through the archive and the flexibility in terms of schema evolution. The system provides highly reliable, scalable, flexible and available MySQL database services. Multilevel incremental backup strategy designed and implemented for PanDA DataBases allows quick and easy recovery in case of failure, minimizes the probability of data loss and can be easily implemented for support of other MySQL services (GUMS/VOMS, LRC/LFC, user databases, etc.) References [1] www.mysql.com [2] ATLAS Collaboration (G. Aad et al.). 2008. JINST 3:S08003,2008. [3] ATLAS Collaboration 2008 J.Phys.Conf.Ser.119:062036,2008. [4] http://www.innodb.com/ 4

[5] http://nagios.org [6] Tomasz Wlodek, Carlos Gamboa, Zhenping Liu, Shigeki Misawa, Robert Petkus, Jason Smith, Thomas Throwe, Yingzi Wu and Dantong Yu. 2008 J.Phys.Conf.Ser.119:052031,2008. [7] http://ganglia.info/ 5