Phronesis, a diagnosis and recovery tool for system administrators

Size: px
Start display at page:

Download "Phronesis, a diagnosis and recovery tool for system administrators"

Transcription

1 Journal of Physics: Conference Series OPEN ACCESS Phronesis, a diagnosis and recovery tool for system administrators To cite this article: C Haen et al 2014 J. Phys.: Conf. Ser View the article online for updates and enhancements. Related content - Artificial intelligence in the service of system administrators C Haen, V Barra, E Bonaccorsi et al. - A New Nightly Build System for LHCb M Clemencic and B Couturier - Systematic profiling to monitor and specify the software refactoring process of the LHCb experiment Ben Couturier, E Kiagias and Stefan B Lohn This content was downloaded from IP address on 11/01/2018 at 12:52

2 Phronesis, a diagnosis and recovery tool for system administrators C HAEN 1, V BARRA 2, E BONACCORSI 3 and N NEUFELD 3 1 Univ. Blaise Pascal, Clermont-ferrand cedex, France 2 LIMOS, UMR 6158 CNRS, Univ. Blaise Pascal, Clermont-ferrand cedex, France 3 European Organization for Nuclear Research, CERN CH-1211, Genève 23, Switzerland christophe.haen@cern.ch Abstract. The LHCb experiment relies on the Online system, which includes a very large and heterogeneous computing cluster. Ensuring the proper behavior of the different tasks running on the more than 2000 servers represents a huge workload for the small operator team and is a 24/7 task. At CHEP 2012, we presented a prototype of a framework that we designed in order to support the experts. The main objective is to provide them with steadily improving diagnosis and recovery solutions in case of misbehavior of a service, without having to modify the original applications. Our framework is based on adapted principles of the Autonomic Computing model, on Reinforcement Learning algorithms, as well as innovative concepts such as Shared Experience. While the submission at CHEP 2012 showed the validity of our prototype on simulations, we here present an implementation with improved algorithms and manipulation tools, and report on the experience gained with running it in the LHCb Online system. 1. Introduction LHCb [1] is one of the four large experiments at the Large Hadron Collider at CERN. This experiment relies on a large computing infrastructure [2] to (i) control the data acquisition system and the detector, and (ii) manage the data it produces. The team in charge of the installation and the administration of this system comprises less than 10 people, with three full time workers. To help the system administrators to reach their goal of high availability, we have attempted to provide them with a software which would propose a diagnosis and recovery solution in case of problems, improve with experience and act as a knowledge and problem history database. The paper we published at CHEP 2012 [3] introduced the concepts we used in our software. The validity of these concepts was proven on several simulations. Since then, the algorithms were improved, the software code consolidated and manipulation tools were developed. Further simulations were run to test deeper the ability of the software, and it has now been deployed on a much larger scale in the LHCb Online environment. 2. LISA: LearnIng approach for System Administration In [3], we presented methods that address problems similar to ours. These methods were expert systems [4] and autonomic computing principles like MAPE-K loop [5]. Based on these historical approaches and adding innovative concepts such as the Shared Experience principle, we now define the methodology of our framework as follows: Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. Published under licence by IOP Publishing Ltd 1

3 Linux systems represent the greatest share of the Online environment. We thus decided to focus only on them. Network or Windows-based machine diagnoses are not addressed. Because of the great variety of software running on the LHCb Online HLT farm, our solution needs to be as generic as possible. As files and processes are the components of any application, we decided to use them as basic blocks for our diagnoses. To each type of problem that can be encountered with such entities like wrong file permission, wrong process user, etc is associated a default recovery solution. Note that this method is eqaully valid on Windows servers as it is generic enough. Perform no monitoring, but rather wait to be informed of problems by external sources Existing implementations associate one MAPE-K loop instance to one system and rely on multi-agent theory for synchronization and cooperation. Our approach is to have a single loop for all the systems. This allows the software to spot the dependencies between the various systems. By using Reinforcement Learning algorithms, we improve the diagnostic speed and scalability by reducing the amount of components that are checked before finding the faulty one. The Shared Experience principle consists of sharing the experience between similar systems (like two websites). It reduces both the learning phase of the learning algorithms and the description workload of the users. Using Convention over Configuration [6] contributes in reducing the configuration work of the software. Our software offers a default recovery solution with the full procedure for the fix to be taken into account, as well as information regarding previously encountered situations on the same problematic entity. However, the user has to perform the correction himself. 3. Phronesis Our implementation of the above methodology is called Phronesis. modules described in this section. It is divided in several 3.1. Compiler We defined a new configuration grammar that allows us to describe services as a composition of files, processes and other services. This grammar is actually inspired by the object model, where objects would be mainly files, processes or services and the inheritance concept is used to describe the Shared Experience principle. The user can also define two types of rules: Dependency rule: this rule states that one service needs another one to be fully functional. Recovery rule or Trigger: this rule lists what a given recovery action involves. For example, if the recovery action consists of changing the content of a file, a recovery rule could state that it is required to stop a process before changing the file and another one to start it after the modification. The compiler was developed in Python using the pyparsing library [7]. The choice of Python was made because of the dynamic characteristics of Python, such as the introspection mechanism and weak typing. The compiler reads the configuration files and produces an SQL script output. One critical aspect of the compilation is to not lose the experience that was previously gained by the reinforcement algorithm. This is achieved using custom graph-matching algorithms between the configuration files and the current content of the database. 2

4 3.2. Remote Agent The remote agent is a software program that runs on all the machines the user wants to supervise. Its only purpose is to answer queries from the Core (see 3.3). The complexities of it are at the technical level, and are just implementation details. The query concerns all the attributes of files, processes or the general environment. The agent is developed in C++, using several Boost libraries [8] Core The Core module of the software is the central part which contains all the algorithms used to actually diagnose problems and offer recovery solutions. The main algorithms are listed here: Sorting algorithm: when several problems are reported at the same time, this algorithm has to decide in which order they are analyzed. The order is very important for performance reasons, but also because there might be situations in which one problem cannot be solved before the others are. This algorithm uses Dependency rules to establish the order. Recovery algorithm: once the root cause of a problem is found, it can usually be fixed quite easily (e.g. fix a corrupted file, restart a process). For the changes to be taken into account, extra actions might be required. These actions are defined by Recovery rules. The complication comes from the fact that actions can be required before or after the fix is applied. Computing the full chain of events is a non-trivial task. Reinforcement Learning algorithm: the reinforcement learning algorithm is used to optimize the exploration path from a reported problem to its faulty component. The chosen method consists of keeping track of the paths that were successful in previous cases. Each path has an associated counter which is incremented when the path is faulty. When a new problem is reported, one can rely on these counters to choose the more appropriate path. There are two strategies: either sorting the counters in decreasing orders, either making a weighted random choice. Simulations (see 4.1) show that in average, both strategies are equivalent. Although simple, this method based on counters has great advantages. If a path is reinforced whereas it should not, the user can very easily correct it. The user can also give a priori knowledge. Finally, from a technical point of view, the application of the Shared Experience principle to this method is straightforward. Dependency algorithm: one of the most interesting features of our software is its ability to find dependencies between services based on previous experience. This capacity allows our software to infer new Dependency rules, and thus provides better diagnoses. The implementation is done in C++ and uses Boost libraries. It can be run as a daemon, as an interactive program, or to make a full check of all the services known to it Tools There are two kinds of interactions between the software and the user. Output communication so that the user knows what the software is doing. Input communication for the user to report problems or give feedback. This bidirectional communication is made possible using an Application Programming Interface (API). The output communication is based on an Observer pattern [9], while the input messages are similar to Remote Procedure Calls. Based on the API, several ready-to-use user interfaces were developed: phrutils: a command line tool phrgui: currently being prototyped. A GUI based on the Qt framework [10]. phrxml: only for output communication. This stores all the output into an XML file based ring buffer. 3

5 phrsimu: an interface used by our simulation software to test the algorithms. phricinga: an interface that gathers data from Icinga [11], the monitoring software used at LHCb. phrweb: a web interface based on phrxml and the Django framework [12]. 4. Results 4.1. Simulations It was important in order to test our algorithms to be able to simulate realistic situations. To achieve this, we developed a complete set of tools to produce Monte-Carlo simulations. Phronesis needs to be compiled in a particular way. The reason is that the simulation tool tests the algorithms of the Core module, and not the code quality of the Agents: when under normal usage, remote servers are queried to get information before processing it; in simulation mode, the query is intercepted and a local Agent is instructed what to return. This allows us to test Phronesis on a single local machine. Another software program is used to randomly generate problems based on user input, inject signals to the Core to mock the agents analysis, interact with it to confirm or deny its diagnoses, and produce statistics about the behavior of Phronesis. This tool reproduces almost any kind of environment. Various situations were simulated, which validated the importance of Dependency rules as well as the Shared Experience principle. It also showed that the two exploring strategies of a faulty service mentioned earlier are equivalent in average Real case application Phronesis is now being deployed on the entire LHCb Online cluster. It is to be noted that it is not a replacement to any solution already in place, but is expected to be in addition to it. At the time of writing, a fair fraction of the LHCb Online system is already covered and the diagnoses we had the opportunity to trigger showed useful. Systems under Phronesis supervision include the log aggregation cluster, the event filter software, the web services and the monitoring infrastructure. Despite the fact that there only a small number of unexpected and unprovoked situations, Phronesis could make several correct diagnoses, and offered appropriate solutions. Among these, several diagnoses were a direct consequence of the Convention Over Configuration approach, because the root cause was pointing at elements which the user did not define manually. Examples of diagnoses are: Full inodes for log servers: the log servers store a large number of tiny files (around files with a median size of 100 Kb) on a clustered file system. As a consequence, the pool of inodes was exhausted well before the actual storage space. The solution, correctly suggested by Phronesis, was to remove files. In fact, this problem was spotted before it actually happened because of the default threshold set to 99% of used inodes: it is a great chance, because otherwise all the new logs that would have required a new file would have been silently lost. Incorrect mount options on a web service: one of the web services required a particular folder to be mounted with the write option, which was not the case. Phronesis suggested to remount it with the appropriate option. Although correct, this would not have worked immediately, because an NFS server on which Phronesis had no control was not configured to accept it. Incorrect DIM [13] name server address: the file containing the information was corrupted Various problems on MySQL servers: running out of disk space and errord in the configuration files were among the problems diagnosed by Phronesis on the MySQL database 4

6 Various problems on the monitoring infrastructure: the mail alerts not being sent tracked down to a process not running, the out-of-date results tracked down to a full disk space and checks not executed because of some servers not running are a few issues that Phronesis correctly diagnosed. In some cases, Phronesis completely missed the root cause of the problems. We have observed two types of failures: Errors due to a situation not foreseen in the design. Examples are disk errors or cluster setups. When it did not imply heavy modifications, the code was improved. Other cases were left for future developments. Errors due to incomplete configuration, like missing information or unsupervised service. The configuration was always updated to cover future occurrences of similar cases. 5. Outlook There is still large room for improvement, both in terms of the technical implementation and of functionality. This includes (i) an extension of the configuration grammar, which is unfortunately more verbose than what we hoped at the beginning, (ii) better native support for cluster systems, and (iii) dynamic constraints on the properties of files and processes. The plan is to add more systems under the supervision of Phronesis and add coverage for the corner cases. We hope to be able to release it as an open source solution that the community would pick up, and further develop. References [1] Augusto A A et al. (LHCb) 2008 JINST 3 S08005 [2] Neufeld N (LHCb) 2003 Nucl. Phys. Proc. Suppl [3] Haen C, Barra V, Bonaccorsi E and Neufeld N 2012 Journal of Physics: Conference Series URL [4] Ginsberg M 1993 Essentials of artificial intelligence (Morgan Kaufmann) ISBN [5] IBM 2001 an architectural blueprint for autonomic computing URL " computing the ibm blueprint" [6] Miller J 2009 Microsoft msdn magazine: Design for convention over configuration URL [7] McGuire P Pyparsing website URL [8] Boost-team 2013 Boost libraries URL [9] Gamma E, Helm R, Johnson R and Vlissides J 1995 Design patterns: elements of reusable object-oriented software (Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc.) ISBN [10] Qt-project 2013 Qt project URL [11] Haen C, Bonaccorsi E and Neufeld N 2011 Distributed monitoring system based on icinga Proceedings of ICALEPCS2011 pp URL [12] Foundation D S 2013 Django website URL [13] Gaspar C 1993 Dim website URL 5

ECFS: A decentralized, distributed and faulttolerant FUSE filesystem for the LHCb online farm

ECFS: A decentralized, distributed and faulttolerant FUSE filesystem for the LHCb online farm Journal of Physics: Conference Series OPEN ACCESS ECFS: A decentralized, distributed and faulttolerant FUSE filesystem for the LHCb online farm To cite this article: Tomasz Rybczynski et al 2014 J. Phys.:

More information

ATLAS Nightly Build System Upgrade

ATLAS Nightly Build System Upgrade Journal of Physics: Conference Series OPEN ACCESS ATLAS Nightly Build System Upgrade To cite this article: G Dimitrov et al 2014 J. Phys.: Conf. Ser. 513 052034 Recent citations - A Roadmap to Continuous

More information

The AAL project: automated monitoring and intelligent analysis for the ATLAS data taking infrastructure

The AAL project: automated monitoring and intelligent analysis for the ATLAS data taking infrastructure Journal of Physics: Conference Series The AAL project: automated monitoring and intelligent analysis for the ATLAS data taking infrastructure To cite this article: A Kazarov et al 2012 J. Phys.: Conf.

More information

ComPWA: A common amplitude analysis framework for PANDA

ComPWA: A common amplitude analysis framework for PANDA Journal of Physics: Conference Series OPEN ACCESS ComPWA: A common amplitude analysis framework for PANDA To cite this article: M Michel et al 2014 J. Phys.: Conf. Ser. 513 022025 Related content - Partial

More information

ATLAS Tracking Detector Upgrade studies using the Fast Simulation Engine

ATLAS Tracking Detector Upgrade studies using the Fast Simulation Engine Journal of Physics: Conference Series PAPER OPEN ACCESS ATLAS Tracking Detector Upgrade studies using the Fast Simulation Engine To cite this article: Noemi Calace et al 2015 J. Phys.: Conf. Ser. 664 072005

More information

Improved ATLAS HammerCloud Monitoring for Local Site Administration

Improved ATLAS HammerCloud Monitoring for Local Site Administration Improved ATLAS HammerCloud Monitoring for Local Site Administration M Böhler 1, J Elmsheuser 2, F Hönig 2, F Legger 2, V Mancinelli 3, and G Sciacca 4 on behalf of the ATLAS collaboration 1 Albert-Ludwigs

More information

Dataflow Monitoring in LHCb

Dataflow Monitoring in LHCb Journal of Physics: Conference Series Dataflow Monitoring in LHCb To cite this article: D Svantesson et al 2011 J. Phys.: Conf. Ser. 331 022036 View the article online for updates and enhancements. Related

More information

The ALICE Glance Shift Accounting Management System (SAMS)

The ALICE Glance Shift Accounting Management System (SAMS) Journal of Physics: Conference Series PAPER OPEN ACCESS The ALICE Glance Shift Accounting Management System (SAMS) To cite this article: H. Martins Silva et al 2015 J. Phys.: Conf. Ser. 664 052037 View

More information

ATLAS software configuration and build tool optimisation

ATLAS software configuration and build tool optimisation Journal of Physics: Conference Series OPEN ACCESS ATLAS software configuration and build tool optimisation To cite this article: Grigory Rybkin and the Atlas Collaboration 2014 J. Phys.: Conf. Ser. 513

More information

CASTORFS - A filesystem to access CASTOR

CASTORFS - A filesystem to access CASTOR Journal of Physics: Conference Series CASTORFS - A filesystem to access CASTOR To cite this article: Alexander Mazurov and Niko Neufeld 2010 J. Phys.: Conf. Ser. 219 052023 View the article online for

More information

CMS - HLT Configuration Management System

CMS - HLT Configuration Management System Journal of Physics: Conference Series PAPER OPEN ACCESS CMS - HLT Configuration Management System To cite this article: Vincenzo Daponte and Andrea Bocci 2015 J. Phys.: Conf. Ser. 664 082008 View the article

More information

The TDAQ Analytics Dashboard: a real-time web application for the ATLAS TDAQ control infrastructure

The TDAQ Analytics Dashboard: a real-time web application for the ATLAS TDAQ control infrastructure The TDAQ Analytics Dashboard: a real-time web application for the ATLAS TDAQ control infrastructure Giovanna Lehmann Miotto, Luca Magnoni, John Erik Sloper European Laboratory for Particle Physics (CERN),

More information

The CMS data quality monitoring software: experience and future prospects

The CMS data quality monitoring software: experience and future prospects The CMS data quality monitoring software: experience and future prospects Federico De Guio on behalf of the CMS Collaboration CERN, Geneva, Switzerland E-mail: federico.de.guio@cern.ch Abstract. The Data

More information

Improvements to the User Interface for LHCb's Software continuous integration system.

Improvements to the User Interface for LHCb's Software continuous integration system. Journal of Physics: Conference Series PAPER OPEN ACCESS Improvements to the User Interface for LHCb's Software continuous integration system. Related content - A New Nightly Build System for LHCb M Clemencic

More information

Geant4 application in a Web browser

Geant4 application in a Web browser Journal of Physics: Conference Series OPEN ACCESS Geant4 application in a Web browser To cite this article: Laurent Garnier and the Geant4 Collaboration 2014 J. Phys.: Conf. Ser. 513 062016 View the article

More information

SNiPER: an offline software framework for non-collider physics experiments

SNiPER: an offline software framework for non-collider physics experiments SNiPER: an offline software framework for non-collider physics experiments J. H. Zou 1, X. T. Huang 2, W. D. Li 1, T. Lin 1, T. Li 2, K. Zhang 1, Z. Y. Deng 1, G. F. Cao 1 1 Institute of High Energy Physics,

More information

The High-Level Dataset-based Data Transfer System in BESDIRAC

The High-Level Dataset-based Data Transfer System in BESDIRAC The High-Level Dataset-based Data Transfer System in BESDIRAC T Lin 1,2, X M Zhang 1, W D Li 1 and Z Y Deng 1 1 Institute of High Energy Physics, 19B Yuquan Road, Beijing 100049, People s Republic of China

More information

Evolution of Database Replication Technologies for WLCG

Evolution of Database Replication Technologies for WLCG Journal of Physics: Conference Series PAPER OPEN ACCESS Evolution of Database Replication Technologies for WLCG To cite this article: Zbigniew Baranowski et al 2015 J. Phys.: Conf. Ser. 664 042032 View

More information

CMS High Level Trigger Timing Measurements

CMS High Level Trigger Timing Measurements Journal of Physics: Conference Series PAPER OPEN ACCESS High Level Trigger Timing Measurements To cite this article: Clint Richardson 2015 J. Phys.: Conf. Ser. 664 082045 Related content - Recent Standard

More information

The Database Driven ATLAS Trigger Configuration System

The Database Driven ATLAS Trigger Configuration System Journal of Physics: Conference Series PAPER OPEN ACCESS The Database Driven ATLAS Trigger Configuration System To cite this article: Carlos Chavez et al 2015 J. Phys.: Conf. Ser. 664 082030 View the article

More information

A Tool for Conditions Tag Management in ATLAS

A Tool for Conditions Tag Management in ATLAS A Tool for Conditions Tag Management in ATLAS A. Sharmazanashvili 1, G. Batiashvili 1, G. Gvaberidze 1, L. Shekriladze 1, A. Formica 2 on behalf of ATLAS collaboration 1 Georgian CADCAM Engineering Center

More information

A data handling system for modern and future Fermilab experiments

A data handling system for modern and future Fermilab experiments Journal of Physics: Conference Series OPEN ACCESS A data handling system for modern and future Fermilab experiments To cite this article: R A Illingworth 2014 J. Phys.: Conf. Ser. 513 032045 View the article

More information

Reliability Engineering Analysis of ATLAS Data Reprocessing Campaigns

Reliability Engineering Analysis of ATLAS Data Reprocessing Campaigns Journal of Physics: Conference Series OPEN ACCESS Reliability Engineering Analysis of ATLAS Data Reprocessing Campaigns To cite this article: A Vaniachine et al 2014 J. Phys.: Conf. Ser. 513 032101 View

More information

The NOvA DAQ Monitor System

The NOvA DAQ Monitor System Journal of Physics: Conference Series PAPER OPEN ACCESS The NOvA DAQ Monitor System To cite this article: Michael Baird et al 2015 J. Phys.: Conf. Ser. 664 082020 View the article online for updates and

More information

DIRAC distributed secure framework

DIRAC distributed secure framework Journal of Physics: Conference Series DIRAC distributed secure framework To cite this article: A Casajus et al 2010 J. Phys.: Conf. Ser. 219 042033 View the article online for updates and enhancements.

More information

Monitoring ARC services with GangliARC

Monitoring ARC services with GangliARC Journal of Physics: Conference Series Monitoring ARC services with GangliARC To cite this article: D Cameron and D Karpenko 2012 J. Phys.: Conf. Ser. 396 032018 View the article online for updates and

More information

Streamlining CASTOR to manage the LHC data torrent

Streamlining CASTOR to manage the LHC data torrent Streamlining CASTOR to manage the LHC data torrent G. Lo Presti, X. Espinal Curull, E. Cano, B. Fiorini, A. Ieri, S. Murray, S. Ponce and E. Sindrilaru CERN, 1211 Geneva 23, Switzerland E-mail: giuseppe.lopresti@cern.ch

More information

An SQL-based approach to physics analysis

An SQL-based approach to physics analysis Journal of Physics: Conference Series OPEN ACCESS An SQL-based approach to physics analysis To cite this article: Dr Maaike Limper 2014 J. Phys.: Conf. Ser. 513 022022 View the article online for updates

More information

Monte Carlo Production on the Grid by the H1 Collaboration

Monte Carlo Production on the Grid by the H1 Collaboration Journal of Physics: Conference Series Monte Carlo Production on the Grid by the H1 Collaboration To cite this article: E Bystritskaya et al 2012 J. Phys.: Conf. Ser. 396 032067 Recent citations - Monitoring

More information

Performance of popular open source databases for HEP related computing problems

Performance of popular open source databases for HEP related computing problems Journal of Physics: Conference Series OPEN ACCESS Performance of popular open source databases for HEP related computing problems To cite this article: D Kovalskyi et al 2014 J. Phys.: Conf. Ser. 513 042027

More information

Servicing HEP experiments with a complete set of ready integreated and configured common software components

Servicing HEP experiments with a complete set of ready integreated and configured common software components Journal of Physics: Conference Series Servicing HEP experiments with a complete set of ready integreated and configured common software components To cite this article: Stefan Roiser et al 2010 J. Phys.:

More information

Development of DKB ETL module in case of data conversion

Development of DKB ETL module in case of data conversion Journal of Physics: Conference Series PAPER OPEN ACCESS Development of DKB ETL module in case of data conversion To cite this article: A Y Kaida et al 2018 J. Phys.: Conf. Ser. 1015 032055 View the article

More information

DIRAC pilot framework and the DIRAC Workload Management System

DIRAC pilot framework and the DIRAC Workload Management System Journal of Physics: Conference Series DIRAC pilot framework and the DIRAC Workload Management System To cite this article: Adrian Casajus et al 2010 J. Phys.: Conf. Ser. 219 062049 View the article online

More information

CMS users data management service integration and first experiences with its NoSQL data storage

CMS users data management service integration and first experiences with its NoSQL data storage Journal of Physics: Conference Series OPEN ACCESS CMS users data management service integration and first experiences with its NoSQL data storage To cite this article: H Riahi et al 2014 J. Phys.: Conf.

More information

WLCG Transfers Dashboard: a Unified Monitoring Tool for Heterogeneous Data Transfers.

WLCG Transfers Dashboard: a Unified Monitoring Tool for Heterogeneous Data Transfers. WLCG Transfers Dashboard: a Unified Monitoring Tool for Heterogeneous Data Transfers. J Andreeva 1, A Beche 1, S Belov 2, I Kadochnikov 2, P Saiz 1 and D Tuckett 1 1 CERN (European Organization for Nuclear

More information

GStat 2.0: Grid Information System Status Monitoring

GStat 2.0: Grid Information System Status Monitoring Journal of Physics: Conference Series GStat 2.0: Grid Information System Status Monitoring To cite this article: Laurence Field et al 2010 J. Phys.: Conf. Ser. 219 062045 View the article online for updates

More information

INSPECTOR, A ZERO CODE IDE FOR CONTROL SYSTEMS USER INTERFACE DEVELOPMENT

INSPECTOR, A ZERO CODE IDE FOR CONTROL SYSTEMS USER INTERFACE DEVELOPMENT INSPECTOR, A ZERO CODE IDE FOR CONTROL SYSTEMS USER INTERFACE DEVELOPMENT V. Costa, B. Lefort CERN, European Organization for Nuclear Research, Geneva, Switzerland Abstract Developing operational User

More information

The Design and Optimization of Database

The Design and Optimization of Database Journal of Physics: Conference Series PAPER OPEN ACCESS The Design and Optimization of Database To cite this article: Guo Feng 2018 J. Phys.: Conf. Ser. 1087 032006 View the article online for updates

More information

The DMLite Rucio Plugin: ATLAS data in a filesystem

The DMLite Rucio Plugin: ATLAS data in a filesystem Journal of Physics: Conference Series OPEN ACCESS The DMLite Rucio Plugin: ATLAS data in a filesystem To cite this article: M Lassnig et al 2014 J. Phys.: Conf. Ser. 513 042030 View the article online

More information

Evaluation of the Huawei UDS cloud storage system for CERN specific data

Evaluation of the Huawei UDS cloud storage system for CERN specific data th International Conference on Computing in High Energy and Nuclear Physics (CHEP3) IOP Publishing Journal of Physics: Conference Series 53 (4) 44 doi:.88/74-6596/53/4/44 Evaluation of the Huawei UDS cloud

More information

Geant4 Computing Performance Benchmarking and Monitoring

Geant4 Computing Performance Benchmarking and Monitoring Journal of Physics: Conference Series PAPER OPEN ACCESS Geant4 Computing Performance Benchmarking and Monitoring To cite this article: Andrea Dotti et al 2015 J. Phys.: Conf. Ser. 664 062021 View the article

More information

Automating usability of ATLAS Distributed Computing resources

Automating usability of ATLAS Distributed Computing resources Journal of Physics: Conference Series OPEN ACCESS Automating usability of ATLAS Distributed Computing resources To cite this article: S A Tupputi et al 2014 J. Phys.: Conf. Ser. 513 032098 Related content

More information

The ATLAS Tier-3 in Geneva and the Trigger Development Facility

The ATLAS Tier-3 in Geneva and the Trigger Development Facility Journal of Physics: Conference Series The ATLAS Tier-3 in Geneva and the Trigger Development Facility To cite this article: S Gadomski et al 2011 J. Phys.: Conf. Ser. 331 052026 View the article online

More information

File Access Optimization with the Lustre Filesystem at Florida CMS T2

File Access Optimization with the Lustre Filesystem at Florida CMS T2 Journal of Physics: Conference Series PAPER OPEN ACCESS File Access Optimization with the Lustre Filesystem at Florida CMS T2 To cite this article: P. Avery et al 215 J. Phys.: Conf. Ser. 664 4228 View

More information

DIRAC File Replica and Metadata Catalog

DIRAC File Replica and Metadata Catalog DIRAC File Replica and Metadata Catalog A.Tsaregorodtsev 1, S.Poss 2 1 Centre de Physique des Particules de Marseille, 163 Avenue de Luminy Case 902 13288 Marseille, France 2 CERN CH-1211 Genève 23, Switzerland

More information

Monte Carlo Production Management at CMS

Monte Carlo Production Management at CMS Monte Carlo Production Management at CMS G Boudoul 1, G Franzoni 2, A Norkus 2,3, A Pol 2, P Srimanobhas 4 and J-R Vlimant 5 - for the Compact Muon Solenoid collaboration 1 U. C. Bernard-Lyon I, 43 boulevard

More information

How the Monte Carlo production of a wide variety of different samples is centrally handled in the LHCb experiment

How the Monte Carlo production of a wide variety of different samples is centrally handled in the LHCb experiment Journal of Physics: Conference Series PAPER OPEN ACCESS How the Monte Carlo production of a wide variety of different samples is centrally handled in the LHCb experiment To cite this article: G Corti et

More information

Use of containerisation as an alternative to full virtualisation in grid environments.

Use of containerisation as an alternative to full virtualisation in grid environments. Journal of Physics: Conference Series PAPER OPEN ACCESS Use of containerisation as an alternative to full virtualisation in grid environments. Related content - Use of containerisation as an alternative

More information

Update of the BESIII Event Display System

Update of the BESIII Event Display System Journal of Physics: Conference Series PAPER OPEN ACCESS Update of the BESIII Event Display System To cite this article: Shuhui Huang and Zhengyun You 2018 J. Phys.: Conf. Ser. 1085 042027 View the article

More information

Interoperating AliEn and ARC for a distributed Tier1 in the Nordic countries.

Interoperating AliEn and ARC for a distributed Tier1 in the Nordic countries. for a distributed Tier1 in the Nordic countries. Philippe Gros Lund University, Div. of Experimental High Energy Physics, Box 118, 22100 Lund, Sweden philippe.gros@hep.lu.se Anders Rhod Gregersen NDGF

More information

Large Scale Software Building with CMake in ATLAS

Large Scale Software Building with CMake in ATLAS 1 Large Scale Software Building with CMake in ATLAS 2 3 4 5 6 7 J Elmsheuser 1, A Krasznahorkay 2, E Obreshkov 3, A Undrus 1 on behalf of the ATLAS Collaboration 1 Brookhaven National Laboratory, USA 2

More information

Testing an Open Source installation and server provisioning tool for the INFN CNAF Tier1 Storage system

Testing an Open Source installation and server provisioning tool for the INFN CNAF Tier1 Storage system Testing an Open Source installation and server provisioning tool for the INFN CNAF Tier1 Storage system M Pezzi 1, M Favaro 1, D Gregori 1, PP Ricci 1, V Sapunenko 1 1 INFN CNAF Viale Berti Pichat 6/2

More information

ilcdirac and Continuous Integration: Automated Testing for Distributed Computing

ilcdirac and Continuous Integration: Automated Testing for Distributed Computing Proceedings of the CERN-BINP Workshop for Young Scientists in e + e Colliders, Geneva, Switzerland, 22 25 August 2016, edited by V. Brancolini ans L. Linssen, CERN Proceedings, Vol. 1/2017, CERN-Proceedings-2017-001

More information

System level traffic shaping in disk servers with heterogeneous protocols

System level traffic shaping in disk servers with heterogeneous protocols Journal of Physics: Conference Series OPEN ACCESS System level traffic shaping in disk servers with heterogeneous protocols To cite this article: Eric Cano and Daniele Francesco Kruse 14 J. Phys.: Conf.

More information

Early experience with the Run 2 ATLAS analysis model

Early experience with the Run 2 ATLAS analysis model Early experience with the Run 2 ATLAS analysis model Argonne National Laboratory E-mail: cranshaw@anl.gov During the long shutdown of the LHC, the ATLAS collaboration redesigned its analysis model based

More information

A self-configuring control system for storage and computing departments at INFN-CNAF Tierl

A self-configuring control system for storage and computing departments at INFN-CNAF Tierl Journal of Physics: Conference Series PAPER OPEN ACCESS A self-configuring control system for storage and computing departments at INFN-CNAF Tierl To cite this article: Daniele Gregori et al 2015 J. Phys.:

More information

Data preservation for the HERA experiments at DESY using dcache technology

Data preservation for the HERA experiments at DESY using dcache technology Journal of Physics: Conference Series PAPER OPEN ACCESS Data preservation for the HERA experiments at DESY using dcache technology To cite this article: Dirk Krücker et al 2015 J. Phys.: Conf. Ser. 66

More information

Modular and scalable RESTful API to sustain STAR collaboration's record keeping

Modular and scalable RESTful API to sustain STAR collaboration's record keeping Journal of Physics: Conference Series PAPER OPEN ACCESS Modular and scalable RESTful API to sustain STAR collaboration's record keeping To cite this article: D Arkhipkin et al 2015 J. Phys.: Conf. Ser.

More information

ATLAS operations in the GridKa T1/T2 Cloud

ATLAS operations in the GridKa T1/T2 Cloud Journal of Physics: Conference Series ATLAS operations in the GridKa T1/T2 Cloud To cite this article: G Duckeck et al 2011 J. Phys.: Conf. Ser. 331 072047 View the article online for updates and enhancements.

More information

Tests of PROOF-on-Demand with ATLAS Prodsys2 and first experience with HTTP federation

Tests of PROOF-on-Demand with ATLAS Prodsys2 and first experience with HTTP federation Journal of Physics: Conference Series PAPER OPEN ACCESS Tests of PROOF-on-Demand with ATLAS Prodsys2 and first experience with HTTP federation To cite this article: R. Di Nardo et al 2015 J. Phys.: Conf.

More information

The ATLAS EventIndex: an event catalogue for experiments collecting large amounts of data

The ATLAS EventIndex: an event catalogue for experiments collecting large amounts of data The ATLAS EventIndex: an event catalogue for experiments collecting large amounts of data D. Barberis 1*, J. Cranshaw 2, G. Dimitrov 3, A. Favareto 1, Á. Fernández Casaní 4, S. González de la Hoz 4, J.

More information

Andrea Sciabà CERN, Switzerland

Andrea Sciabà CERN, Switzerland Frascati Physics Series Vol. VVVVVV (xxxx), pp. 000-000 XX Conference Location, Date-start - Date-end, Year THE LHC COMPUTING GRID Andrea Sciabà CERN, Switzerland Abstract The LHC experiments will start

More information

jspydb, an open source database-independent tool for data management

jspydb, an open source database-independent tool for data management Journal of Physics: Conference Series jspydb, an open source database-independent tool for data management To cite this article: Giuseppe Antonio Pierro et al 2011 J. Phys.: Conf. Ser. 331 042020 View

More information

Experience with PROOF-Lite in ATLAS data analysis

Experience with PROOF-Lite in ATLAS data analysis Journal of Physics: Conference Series Experience with PROOF-Lite in ATLAS data analysis To cite this article: S Y Panitkin et al 2011 J. Phys.: Conf. Ser. 331 072057 View the article online for updates

More information

LHCb Distributed Conditions Database

LHCb Distributed Conditions Database LHCb Distributed Conditions Database Marco Clemencic E-mail: marco.clemencic@cern.ch Abstract. The LHCb Conditions Database project provides the necessary tools to handle non-event time-varying data. The

More information

A Framework for Securing Databases from Intrusion Threats

A Framework for Securing Databases from Intrusion Threats A Framework for Securing Databases from Intrusion Threats R. Prince Jeyaseelan James Department of Computer Applications, Valliammai Engineering College Affiliated to Anna University, Chennai, India Email:

More information

Real-time Monitoring, Inventory and Change Tracking for. Track. Report. RESOLVE!

Real-time Monitoring, Inventory and Change Tracking for. Track. Report. RESOLVE! Real-time Monitoring, Inventory and Change Tracking for Track. Report. RESOLVE! Powerful Monitoring Tool for Full Visibility over Your Hyper-V Environment VirtualMetric provides the most comprehensive

More information

Striped Data Server for Scalable Parallel Data Analysis

Striped Data Server for Scalable Parallel Data Analysis Journal of Physics: Conference Series PAPER OPEN ACCESS Striped Data Server for Scalable Parallel Data Analysis To cite this article: Jin Chang et al 2018 J. Phys.: Conf. Ser. 1085 042035 View the article

More information

Improved Information Retrieval Performance on SQL Database Using Data Adapter

Improved Information Retrieval Performance on SQL Database Using Data Adapter IOP Conference Series: Materials Science and Engineering PAPER OPEN ACCESS Improved Information Retrieval Performance on SQL Database Using Data Adapter To cite this article: M Husni et al 2018 IOP Conf.

More information

AGIS: The ATLAS Grid Information System

AGIS: The ATLAS Grid Information System AGIS: The ATLAS Grid Information System Alexey Anisenkov 1, Sergey Belov 2, Alessandro Di Girolamo 3, Stavro Gayazov 1, Alexei Klimentov 4, Danila Oleynik 2, Alexander Senchenko 1 on behalf of the ATLAS

More information

Deploying enterprise applications on Dell Hybrid Cloud System for Microsoft Cloud Platform System Standard

Deploying enterprise applications on Dell Hybrid Cloud System for Microsoft Cloud Platform System Standard Deploying enterprise applications on Dell Hybrid Cloud System for Microsoft Cloud Platform System Standard Date 7-18-2016 Copyright This document is provided as-is. Information and views expressed in this

More information

Benchmarking the ATLAS software through the Kit Validation engine

Benchmarking the ATLAS software through the Kit Validation engine Benchmarking the ATLAS software through the Kit Validation engine Alessandro De Salvo (1), Franco Brasolin (2) (1) Istituto Nazionale di Fisica Nucleare, Sezione di Roma, (2) Istituto Nazionale di Fisica

More information

Verification and Diagnostics Framework in ATLAS Trigger/DAQ

Verification and Diagnostics Framework in ATLAS Trigger/DAQ Verification and Diagnostics Framework in ATLAS Trigger/DAQ M.Barczyk, D.Burckhart-Chromek, M.Caprini 1, J.Da Silva Conceicao, M.Dobson, J.Flammer, R.Jones, A.Kazarov 2,3, S.Kolos 2, D.Liko, L.Lucio, L.Mapelli,

More information

The virtual geometry model

The virtual geometry model Journal of Physics: Conference Series The virtual geometry model To cite this article: I Hivnáová and B Viren 2008 J. Phys.: Conf. Ser. 119 042016 View the article online for updates and enhancements.

More information

A first look at 100 Gbps LAN technologies, with an emphasis on future DAQ applications.

A first look at 100 Gbps LAN technologies, with an emphasis on future DAQ applications. 21st International Conference on Computing in High Energy and Nuclear Physics (CHEP21) IOP Publishing Journal of Physics: Conference Series 664 (21) 23 doi:1.188/1742-696/664//23 A first look at 1 Gbps

More information

Partial Acquisition Prashant Jain and Michael Kircher

Partial Acquisition Prashant Jain and Michael Kircher 1 Partial Acquisition Prashant Jain and Michael Kircher {Prashant.Jain,Michael.Kircher}@mchp.siemens.de Siemens AG, Corporate Technology Munich, Germany Partial Acquisition 2 Partial Acquisition The Partial

More information

First LHCb measurement with data from the LHC Run 2

First LHCb measurement with data from the LHC Run 2 IL NUOVO CIMENTO 40 C (2017) 35 DOI 10.1393/ncc/i2017-17035-4 Colloquia: IFAE 2016 First LHCb measurement with data from the LHC Run 2 L. Anderlini( 1 )ands. Amerio( 2 ) ( 1 ) INFN, Sezione di Firenze

More information

Stefan Koestner on behalf of the LHCb Online Group ( IEEE - Nuclear Science Symposium San Diego, Oct.

Stefan Koestner on behalf of the LHCb Online Group (  IEEE - Nuclear Science Symposium San Diego, Oct. Stefan Koestner on behalf of the LHCb Online Group (email: Stefan.Koestner@cern.ch) IEEE - Nuclear Science Symposium San Diego, Oct. 31 st 2006 Dedicated to B-physics : single arm forward spectrometer

More information

Efficiency Gains in Inbound Data Warehouse Feed Implementation

Efficiency Gains in Inbound Data Warehouse Feed Implementation Efficiency Gains in Inbound Data Warehouse Feed Implementation Simon Eligulashvili simon.e@gamma-sys.com Introduction The task of building a data warehouse with the objective of making it a long-term strategic

More information

Overview of ATLAS PanDA Workload Management

Overview of ATLAS PanDA Workload Management Overview of ATLAS PanDA Workload Management T. Maeno 1, K. De 2, T. Wenaus 1, P. Nilsson 2, G. A. Stewart 3, R. Walker 4, A. Stradling 2, J. Caballero 1, M. Potekhin 1, D. Smith 5, for The ATLAS Collaboration

More information

OnCommand Unified Manager

OnCommand Unified Manager OnCommand Unified Manager Operations Manager Administration Guide For Use with Core Package 5.2.1 NetApp, Inc. 495 East Java Drive Sunnyvale, CA 94089 U.S. Telephone: +1 (408) 822-6000 Fax: +1 (408) 822-4501

More information

Large scale commissioning and operational experience with tier-2 to tier-2 data transfer links in CMS

Large scale commissioning and operational experience with tier-2 to tier-2 data transfer links in CMS Journal of Physics: Conference Series Large scale commissioning and operational experience with tier-2 to tier-2 data transfer links in CMS To cite this article: J Letts and N Magini 2011 J. Phys.: Conf.

More information

THE ATLAS experiment comprises a significant number

THE ATLAS experiment comprises a significant number 386 IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 55, NO. 1, FEBRUARY 2008 Access Control Design and Implementations in the ATLAS Experiment Marius Constantin Leahu, Member, IEEE, Marc Dobson, and Giuseppe

More information

Docker Container Manager: A Simple Toolkit for Isolated Work with Shared Computational, Storage, and Network Resources

Docker Container Manager: A Simple Toolkit for Isolated Work with Shared Computational, Storage, and Network Resources Journal of Physics: Conference Series PAPER OPEN ACCESS Docker Container Manager: A Simple Toolkit for Isolated Work with Shared Computational, Storage, and Network Resources To cite this article: S P

More information

Multiple variables data sets visualization in ROOT

Multiple variables data sets visualization in ROOT Journal of Physics: Conference Series Multiple variables data sets visualization in ROOT To cite this article: O Couet 2008 J. Phys.: Conf. Ser. 119 042007 View the article online for updates and enhancements.

More information

Popularity Prediction Tool for ATLAS Distributed Data Management

Popularity Prediction Tool for ATLAS Distributed Data Management Popularity Prediction Tool for ATLAS Distributed Data Management T Beermann 1,2, P Maettig 1, G Stewart 2, 3, M Lassnig 2, V Garonne 2, M Barisits 2, R Vigne 2, C Serfon 2, L Goossens 2, A Nairz 2 and

More information

Control and Monitoring of the Front-End Electronics in ALICE

Control and Monitoring of the Front-End Electronics in ALICE Control and Monitoring of the Front-End Electronics in ALICE Peter Chochula, Lennart Jirdén, André Augustinus CERN, 1211 Geneva 23, Switzerland Peter.Chochula@cern.ch Abstract This paper describes the

More information

Monitoring the software quality in FairRoot. Gesellschaft für Schwerionenforschung, Plankstrasse 1, Darmstadt, Germany

Monitoring the software quality in FairRoot. Gesellschaft für Schwerionenforschung, Plankstrasse 1, Darmstadt, Germany Gesellschaft für Schwerionenforschung, Plankstrasse 1, 64291 Darmstadt, Germany E-mail: f.uhlig@gsi.de Mohammad Al-Turany Gesellschaft für Schwerionenforschung, Plankstrasse 1, 64291 Darmstadt, Germany

More information

Design Patterns for Description-Driven Systems

Design Patterns for Description-Driven Systems Design Patterns for Description-Driven Systems N. Baker 3, A. Bazan 1, G. Chevenier 2, Z. Kovacs 3, T Le Flour 1, J-M Le Goff 4, R. McClatchey 3 & S Murray 1 1 LAPP, IN2P3, Annecy-le-Vieux, France 2 HEP

More information

CONTROL AND MONITORING OF ON-LINE TRIGGER ALGORITHMS USING GAUCHO

CONTROL AND MONITORING OF ON-LINE TRIGGER ALGORITHMS USING GAUCHO 10th ICALEPCS Int. Conf. on Accelerator & Large Expt. Physics Control Systems. Geneva, 10-14 Oct 2005, WE3A.5-6O (2005) CONTROL AND MONITORING OF ON-LINE TRIGGER ALGORITHMS USING GAUCHO E. van Herwijnen

More information

Online data storage service strategy for the CERN computer Centre G. Cancio, D. Duellmann, M. Lamanna, A. Pace CERN, Geneva, Switzerland

Online data storage service strategy for the CERN computer Centre G. Cancio, D. Duellmann, M. Lamanna, A. Pace CERN, Geneva, Switzerland Online data storage service strategy for the CERN computer Centre G. Cancio, D. Duellmann, M. Lamanna, A. Pace CERN, Geneva, Switzerland Abstract. The Data and Storage Services group at CERN is conducting

More information

Scientific Cluster Deployment and Recovery Using puppet to simplify cluster management

Scientific Cluster Deployment and Recovery Using puppet to simplify cluster management Journal of Physics: Conference Series Scientific Cluster Deployment and Recovery Using puppet to simplify cluster management To cite this article: Val Hendrix et al 2012 J. Phys.: Conf. Ser. 396 042027

More information

Suez: Job Control and User Interface for CLEO III

Suez: Job Control and User Interface for CLEO III Suez: Job Control and User Interface for CLEO III Martin Lohner, Christopher D. Jones, Paul Avery University of Florida, Gainesville Abstract. Suez standardizes the way CLEO III data is processed by providing

More information

Performance quality monitoring system for the Daya Bay reactor neutrino experiment

Performance quality monitoring system for the Daya Bay reactor neutrino experiment Journal of Physics: Conference Series OPEN ACCESS Performance quality monitoring system for the Daya Bay reactor neutrino experiment To cite this article: Y B Liu and the Daya Bay collaboration 2014 J.

More information

The Error Reporting in the ATLAS TDAQ System

The Error Reporting in the ATLAS TDAQ System The Error Reporting in the ATLAS TDAQ System Serguei Kolos University of California Irvine, USA E-mail: serguei.kolos@cern.ch Andrei Kazarov CERN, Switzerland, on leave from Petersburg Nuclear Physics

More information

SOFTWARE ENGINEERING. To discuss several different ways to implement software reuse. To describe the development of software product lines.

SOFTWARE ENGINEERING. To discuss several different ways to implement software reuse. To describe the development of software product lines. SOFTWARE ENGINEERING DESIGN WITH COMPONENTS Design with reuse designs and develops a system from reusable software. Reusing software allows achieving better products at low cost and time. LEARNING OBJECTIVES

More information

Evolution of Database Replication Technologies for WLCG

Evolution of Database Replication Technologies for WLCG Evolution of Database Replication Technologies for WLCG Zbigniew Baranowski, Lorena Lobato Pardavila, Marcin Blaszczyk, Gancho Dimitrov, Luca Canali European Organisation for Nuclear Research (CERN), CH-1211

More information

Database on Demand: insight how to build your own DBaaS

Database on Demand: insight how to build your own DBaaS Journal of Physics: Conference Series PAPER OPEN ACCESS Database on Demand: insight how to build your own DBaaS Related content - DataBase on Demand R Gaspar Aparicio, D Gomez, I Coterillo Coz et al. To

More information

Michael Böge, Jan Chrin

Michael Böge, Jan Chrin PAUL SCHERRER INSTITUT SLS-TME-TA-1999-0015 September, 1999 A CORBA Based Client- Model for Beam Dynamics Applications at the SLS Michael Böge, Jan Chrin Paul Scherrer Institut CH-5232 Villigen PSI Switzerland

More information

An Analysis of Storage Interface Usages at a Large, MultiExperiment Tier 1

An Analysis of Storage Interface Usages at a Large, MultiExperiment Tier 1 Journal of Physics: Conference Series PAPER OPEN ACCESS An Analysis of Storage Interface Usages at a Large, MultiExperiment Tier 1 Related content - Topical Review W W Symes - MAP Mission C. L. Bennett,

More information