Data Management for the World s Largest Machine

Similar documents
Performance of the NorduGrid ARC and the Dulcinea Executor in ATLAS Data Challenge 2

ATLAS NorduGrid related activities

The ATLAS Tier-3 in Geneva and the Trigger Development Facility

Atlas Managed Production on Nordugrid

Usage statistics and usage patterns on the NorduGrid: Analyzing the logging information collected on one of the largest production Grids of the world

Andrea Sciabà CERN, Switzerland

ATLAS Production System in ATLAS Data Challenge 2 Luc Goossens (CERN/EP/ATC) Kaushik De (UTA)

IEPSAS-Kosice: experiences in running LCG site

Lessons Learned in the NorduGrid Federation

Evaluation of the computing resources required for a Nordic research exploitation of the LHC

Interoperating AliEn and ARC for a distributed Tier1 in the Nordic countries.

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Experiences with the new ATLAS Distributed Data Management System

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

Scientific data management

AGIS: The ATLAS Grid Information System

ISTITUTO NAZIONALE DI FISICA NUCLEARE

Atlas Data-Challenge 1 on NorduGrid

Monitoring ARC services with GangliARC

WHEN the Large Hadron Collider (LHC) begins operation

ATLAS Data Challenge 2: A Massive Monte Carlo Production on the Grid

The ATLAS PanDA Pilot in Operation

Operating the Distributed NDGF Tier-1

Reliability Engineering Analysis of ATLAS Data Reprocessing Campaigns

Architecture Proposal

CouchDB-based system for data management in a Grid environment Implementation and Experience

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

Overview of ATLAS PanDA Workload Management

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

Storage and I/O requirements of the LHC experiments

The LHC Computing Grid

The NorduGrid Architecture and Middleware for Scientific Applications

High Performance Computing Course Notes Grid Computing I

Grid Computing a new tool for science

CERN openlab II. CERN openlab and. Sverre Jarp CERN openlab CTO 16 September 2008

Batch Services at CERN: Status and Future Evolution

The CMS Computing Model

Analysis of internal network requirements for the distributed Nordic Tier-1

Distributing storage of LHC data - in the nordic countries

The evolving role of Tier2s in ATLAS with the new Computing and Data Distribution model

The LHC Computing Grid

The Swiss ATLAS Grid End 2008 Progress Report for the SwiNG EB

LHCb Distributed Conditions Database

PoS(EGICF12-EMITC2)106

Travelling securely on the Grid to the origin of the Universe

ARC-XWCH bridge: Running ARC jobs on the XtremWeb-CH volunteer

Future trends in distributed infrastructures the Nordic Tier-1 example

C3PO - A Dynamic Data Placement Agent for ATLAS Distributed Data Management

Monte Carlo Production on the Grid by the H1 Collaboration

The LHC Computing Grid. Slides mostly by: Dr Ian Bird LCG Project Leader 18 March 2008

Introduction to Grid Computing

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

A distributed tier-1. International Conference on Computing in High Energy and Nuclear Physics (CHEP 07) IOP Publishing. c 2008 IOP Publishing Ltd 1

CERN and Scientific Computing

CHIPP Phoenix Cluster Inauguration

Distributed Data Management with Storage Resource Broker in the UK

Storage Resource Sharing with CASTOR.

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

Stephen J. Gowdy (CERN) 12 th September 2012 XLDB Conference FINDING THE HIGGS IN THE HAYSTACK(S)

HEP replica management

ATLAS operations in the GridKa T1/T2 Cloud

Grids and Security. Ian Neilson Grid Deployment Group CERN. TF-CSIRT London 27 Jan

ACCI Recommendations on Long Term Cyberinfrastructure Issues: Building Future Development

Transient Compute ARC as Cloud Front-End

FREE SCIENTIFIC COMPUTING

RUSSIAN DATA INTENSIVE GRID (RDIG): CURRENT STATUS AND PERSPECTIVES TOWARD NATIONAL GRID INITIATIVE

ATLAS Distributed Computing Experience and Performance During the LHC Run-2

Monitoring System for the GRID Monte Carlo Mass Production in the H1 Experiment at DESY

PoS(ACAT2010)039. First sights on a non-grid end-user analysis model on Grid Infrastructure. Roberto Santinelli. Fabrizio Furano.

LHCb Computing Strategy

Optimizing Parallel Access to the BaBar Database System Using CORBA Servers

Grid Architectural Models

Overview. About CERN 2 / 11

The NorduGrid production Grid infrastructure, status and plans

Grid Computing at Ljubljana and Nova Gorica

Computing at Belle II

arxiv: v1 [physics.ins-det] 1 Oct 2009

Virtualizing a Batch. University Grid Center

The Grid: Processing the Data from the World s Largest Scientific Machine

DIAL: Distributed Interactive Analysis of Large Datasets

ATLAS Nightly Build System Upgrade

Reprocessing DØ data with SAMGrid

Clustering and Reclustering HEP Data in Object Databases

Data Transfers Between LHC Grid Sites Dorian Kcira

Improved ATLAS HammerCloud Monitoring for Local Site Administration

Long Term Data Preservation for CDF at INFN-CNAF

Data transfer over the wide area network with a large round trip time

Ivane Javakhishvili Tbilisi State University High Energy Physics Institute HEPI TSU

The CMS data quality monitoring software: experience and future prospects

ARC middleware. The NorduGrid Collaboration

ATLAS 実験コンピューティングの現状と将来 - エクサバイトへの挑戦 坂本宏 東大 ICEPP

Figure 1: cstcdie Grid Site architecture

ATLAS Offline Data Quality Monitoring

PROOF-Condor integration for ATLAS

GRIDS INTRODUCTION TO GRID INFRASTRUCTURES. Fabrizio Gagliardi

A Popularity-Based Prediction and Data Redistribution Tool for ATLAS Distributed Data Management

Popularity Prediction Tool for ATLAS Distributed Data Management

LHCb Computing Resources: 2018 requests and preview of 2019 requests

The SweGrid Accounting System

Transcription:

Data Management for the World s Largest Machine Sigve Haug 1, Farid Ould-Saada 2, Katarina Pajchel 2, and Alexander L. Read 2 1 Laboratory for High Energy Physics, University of Bern, Sidlerstrasse 5, CH-3012 Bern, Switzerland sigve.haug@lhep.unibe.ch 2 Department of Physics, University of Oslo, Postboks 1048 Blindern, NO-0316 Oslo, Norway {farid.ould-saada, katarina.pajchel, a.l.read}@fys.uio.no http://www.fys.uio.no/epf Abstract. The world s largest machine, the Large Hadron Collider, will have four detectors whose output is expected to answer fundamental questions about the universe. The ATLAS detector is expected to produce 3.2 PB of data per year which will be distributed to storage elements all over the world. In 2008 the resource need is estimated to be 16.9 PB of tape, 25.4 PB of disk, and 50 MSI2k of CPU. Grids are used to simulate, access, and process the data. Sites in several European and non-european countries are connected with the Advanced Resource Connector (ARC) middleware of NorduGrid. In the first half of 2006 about 10 5 simulation jobs with 27 TB of distributed output organized in some 10 5 files and 740 datasets were performed on this grid. ARC s data management capabilities, the Globus Replica Location Service, and ATLAS software were combined to achieve a comprehensive distributed data management system. 1 Introduction At the end of 2007 the Large Hadron Collider (LHC) in Geneva, often referred to as the world s largest machine, will start to operate [1]. Its four detectors aim to collect data which is expected to give some answers to fundamental questions about the universe, e.g. what is the origin of mass. The data acquisition system of one of these detectors, the ATLAS detector, will write the recorded information of the proton-proton collision events at a rate of 200 events per second [2]. Each event s information will require 1.6 MB storage space [3]. Taking the operating time of the machine into account this will yield 3.2 PB of recorded data per year. The simulated and reprocessed data comes in addition. The estimated computing resource needs for 2008 are 16.9 PB tape storage, 25.4 PB disk storage and 50.6 MSI2k CPU. The ATLAS experiment uses three grids to store, replicate, simulate, and process the data all over the planet : The LHC Computing Grid (LCG), the B. Kågström et al. (Eds.): PARA 2006, LNCS 4699, pp. 480 488, 2007. c Springer-Verlag Berlin Heidelberg 2007

Data Management for the World s Largest Machine 481 Fig. 1. Geographical snapshot of sites connected with ARC middleware (as of Dec. 2005). Many sites are also organized into national and or organizational grids, e.g. Swegrid and Swiss ATLAS Grid. Open Science Grid (OSG), and the NorduGrid [4] [5] [6]. Here we report on the recent experience with the present distributed simulation and data management system used by the ATLAS experiment on NorduGrid. A geographical map of the sites connected by NorduGrid s middleware The Advanced Resource Connector (ARC) is shown in Figure 1. The network of sites which also have the necessary ATLAS software installed and thus are capable of running ATLAS computing tasks will in the following be called the ATLAS ARC Grid. First, a description of the distributed simulation and data management system follows. Second, a report on the system performance in the period from November 2005 to June 2006 is presented. Then future usage, limitations, and needed

482 S. Haug et al. improvements are commented. Finally, we recapitulate the performance of the ATLAS ARC Grid in this period and draw some conclusions. 2 The Simulation and Data Management System The distributed simulation and data management system on the ATLAS ARC Grid can be divided into three main parts. First, there is the production database which is used for definition and tracking of the simulation tasks [7]. Second, there is the Supervisor-Executor instance which pulls tasks from the production database and submits them to the ATLAS ARC Grid. And finally, there are the ATLAS data management databases which collect the logical file names into datasets [8]. The Supervisor is common for all three grids. The Executor is unique for each grid and contains code to submit, monitor, postprocess and clean the grid jobs. In the case of the ATLAS ARC Grid, this simple structure relies on the full ARC grid infrastructure, in particular also a Globus Replica Location Service (RLS) which maps logical to physical file names [9]. The production database is an Oracle instance where job definitions, job input locations and job output names are kept. Further jobs estimated resource needs, status, etc are stored. The Supervisor-Executor is a Python application which is run by a user whose grid certificate is accepted at all ATLAS ARC sites. The Supervisor communicates with the production database and passes simulation jobs to the Executor in XML format. The Executor then translates the job descriptions into ARC s extended resource specification language (XRSL). Job-brokering is performed with attributes specified in the XRSL job-description and information gathered from the computing clusters with the ARC information system. In particular, clusters have to have the required ATLAS run time environment installed. This is an experiment-specific software package of about 5 GB which is frequently released. When a suitable cluster is found, the job is submitted. The ARC gridmanager on the front-end of the cluster downloads the input files, submits jobs to the local batch system and monitors them to their completion, and uploads the output of successful jobs. In this process the RLS is used to index both input and output files. The physical storage element (SE) for an output file is provided automatically by a storage service which obtains a list of potential SE s indexed by RLS. Thus neither the grid job executing on the batch node nor the Executor do any data movement and do not need to know explicitly where the physical inputs come from or where the physical outputs are stored. When the Executor finds a job finished, it registers the metadata, e.g. a globally unique identifier and creation date, of the joboutput files in the RLS. It sets the desired grid access control list (gacl) on the files and reports back to the Supervisor and the production database. Finally, the production database is periodically queried for finished tasks. For these the logical file names and their dataset affiliation are retrieved in order to register available datasets, their file content, state and locations in the ATLAS dataset databases. Hence, datasets can subsequentially be looked up

Data Management for the World s Largest Machine 483 for replication and analysis. The dataset catalogs provide the logical file names and the indexing service (from among the more than 20 index servers for the three grids of which the ATLAS computing grid is comprised) for the dataset to which the logical file is attached. The indexing service, i.e. the RLS on the ATLAS ARC Grid, provides the physical file location. In short, the production on ATLAS ARC Grid is by design a fully automatic and light weight system which takes advantage of the inherent job-brokering and data management capabilities of the ARC middleware (RLS for indexing logical to physical filenames and storing metadata about files) and the ATLAS distributed data management system (a set of catalogs allowing replication and analysis on a dataset basis). See Reference [10] and [11] for detailed descriptions of the ATLAS and ARC data management systems. 3 Recent System Performance on the ATLAS ARC Grid The preparation for the ATLAS experiment relies on detailed simulations of the physics processes, from the proton-proton collision, via the particle propagation through the detector material, to the full reconstruction of the particles tracks. To a large extent this has been achieved in carefully planned time periods of operation, so-called Data Challenges. Many ARC sites have been providing Table 1. ARC clusters which contributed to the ATLAS simulations in the period from November 2005 to June 2006. The number of jobs per site and the percentage of successful jobs are shown. Cluster Number of jobs Efficiency 1 ingrid.hpc2n.umu.se 6596 0.94 2 benedict.grid.aau.dk 5838 0.88 3 hive.unicc.chalmers.se 14211 0.84 4 pikolit.ijs.si 34106 0.83 5 bluesmoke.nsc.liu.se 9141 0.83 6 hagrid.it.uu.se 6654 0.81 7 grid00.unige.ch 624 0.79 8 morpheus.dcgc.dk 1329 0.76 9 grid.uio.no 2878 0.75 10 lheppc10.unibe.ch 3978 0.73 11 hypatia.uio.no 1542 0.70 12 sigrid.lunarc.lu.se 12038 0.70 13 alice.grid.upjs.sk 3 0.67 14 norgrid.ntnu.no 31 0.48 15 grid01.unige.ch 284 0.35 16 norgrid.bccs.no 286 0.35 17 grid.tsl.uu.se 6 0.00

484 S. Haug et al. Table 2. ARC Storage Elements and their contributions to the ATLAS Computing System Commissioning. Number of files stored by the ATLAS production in the period are shown in the third column. The fourth lists the total space occupied by these files. The numbers were extracted from the Replica Location Service rls://atlasrls.nordugrid.org on 2006-06-13. Storage Element Location Files TB ingrid.hpc2n.umu.se Umeaa 1217 0.2 se1.hpc2n.umu.se Umeaa 14078 1.3 ss2.hpc2n.umu.se Umeaa 70656 5.6 ss1.hpc2n.umu.se Umeaa 74483 6.2 hive-se2.unicc.chalmers.se Goteborg 10412 0.8 harry.hagrid.it.uu.se Uppsala 38226 2.9 hagrid.it.uu.se Uppsala 12620 1.6 storage2.bluesmoke.nsc.liu.se Linkoping 6254 0.6 sigrid.lunarc.lu.se Lund 14425 1.9 swelanka1.it.uu.se Sri Lanka 1 < 0.1 grid.uio.no Oslo 856 < 0.1 grid.ift.uib.no Bergen 1 < 0.1 morpheus.dcgc.dk Aalborg 252 < 0.1 benedict.grid.aau.dk Aalborg 9426 1.3 pikolit.ijs.si:2811 Slovenia 25094 2.0 pikolit.ijs.si Slovenia 21239 2.7 299240 27.1 resources for these large scale production operations [12]. At the present time the third Data Challenge, or the Computing System Commissioning (CSC), is entering a phase of more or less constant production. As part of this constant production about 100 000 simulation jobs were run on ATLAS enabled ARC sites in the period from mid November 2005 to mid June 2006 where the end date just reflects the time of this report. Up to 17 clusters comprising about 1000 CPU s were used as a single resource for these jobs. In Table 1 the clusters and their executed job shares are listed. Depending on their size, access policy, and competition with local users the number of jobs varies. In this period six countries provided resources. The Slovenian cluster, pikolit.ijs.si, was the largest contributor followed by the Swedish resources. The best clusters have efficiencies close to 90% (total ATLAS and grid middleware efficiency). This number reflects what can be expected in a heterogenious grid environment where not only different jobs and evolving software are used, but also the operational efficiency of the numerous computing clusters and storage services is a significant factor. In Table 2 the number of output files and their integrated sizes are listed according to storage elements and locations. About 300 000 files with a total of

Data Management for the World s Largest Machine 485 Fig. 2. TB per country. The graph visualizes the numbers in Table 2. In the period from November 2005 to June 2006 Sweden and Slovenia were the largest storage contributers to the ATLAS Computing System Commissioning. Only ARC storage is considered. 27 TB were produced and stored on disks at 11 sites in five different countries. This gives an average file size of 90 MB. The integrated storage contribution per country is shown in Figure 2. 1 In the ATLAS production of simulated data (future data analysis will produce a different and more chaotic pattern) simulation is done in three steps. For each step input and output sizes vary. In the first step the physics in the proton-proton collisions is simulated, so-called event generation. These jobs have practically no input and output about 0.1 GB per job. In the second step the detector response to the particle interactions is simulated. These jobs use the output from the first step as input. They produce about 1 GB output per job. This output is again used as input for the last step where the reconstruction of the detector response is performed. A reconstruction job takes about 10 GB input in 10 files and produces an output of typically 1 GB. In order to minimize the number of files, it is foreseen to increase the file sizes (from 1 to 10 GB) as network capacity, disk sizes and tape systems evolve. The outputs are normally replicated to at least one other storage element in one of the other grids and in the case of reconstruction outputs (the starting point of most physics analyses) to all the other large computing sites spread throughout the ATLAS grid. The output remains on the storage elements till a central ATLAS decision is made about deletion, most probably several years. 1 This distribution is not representative for the previous data challenges.

486 S. Haug et al. Table 3. ATLAS Datasets on ARC Storage Elements as of 2006-06-13 Category ARC Total ARC/Total Description All 739 3171 0.23 CSC + CTB + MC CSC 489 2179 0.22 Computing System Commisioning CTB 7 86 0.08 Combined Test Beam Production MC 242 906 0.27 MC Production Finally, the output files were logically collected into datasets, objects of analysis and replication. The 300 000 ATLAS files produced in this period and stored on ARC storage elements belong to 739 datasets in the period. The average number of files was then roughly 400, the actual numbers ranging from 50 to 10000. Table 3 shows the categories of datasets and their respective parts of the total numbers. The numbers in the ARC column were collected with the ATLAS DQ2 client, the numbers in the Total column with the PANDA monitor (http://gridui02.usatlas.bnl.gov:25880/server/pandamon/query). Since in the considered period the ATLAS ARC Grid s contribution to the total AT- LAS Grid production is estimated to have been about 11 to 13%, the numbers indicate that rather shorter than average and long jobs were processed. 2 4 Perspective, Limitations and Improvements The limitations of the system must be considered in the context of its desired capabilities. At the moment the system manages some 10 3 jobs per day where each job typically needs less than a day to finish. The number of output files are about three times larger. In order to provide the ATLAS experiment with a significant production grid, the ATLAS ARC Grid should aim to cope with numbers of jobs another order of magnitude larger. In this perspective the ATLAS ARC Grid has no fundamental scaling limitations. However, in order to meet the ambition several improvements are needed. First, the available amount of resources must increase. The present operation almost exhausts the existing. And since the resources are shared and with growing attraction to users, fair-sharing of the resources between local and grid and between different grid-users needs to be implemented. At the moment local users always have implicit first priority. And the grid-users are often mapped to a single local account so that they are effectively treated first-come first-serve. Second, the crucial Replica Location Service provides the desired functionality with mapping from logical to physical file names, certificate authentication and bulk operations and is expected to be able to handle the planned scaling-up 2 The Nordic share of the ATLAS computing resources is 7.5%, according to a memorandum of understanding.

Data Management for the World s Largest Machine 487 of the system. However, the lack of perfect stability is an important problem which remains to be solved. Meanwhile, the persons running the Supervisor- Executor instances should probably have some administration privileges, e.g. the possibility to restart the service. Third, further development should aim at some hours database independency. Both the production database and the data management databases now and then have some hours down time. This should cause problems other than delays in database registrations. Continuous improvements in the ARC middleware ease the operation. However, in the ATLAS ARC Grid there are many independent clusters in production mode and not dedicated to ATLAS. Thus it is impractical to negotiate frequent middleware upgrades on all of them. Hence, the future system should rely as much as possible on the present features. 5 Conclusions As part of the preparations for the ATLAS experiment at the Large Hadron Collider, large amounts of data are simulated on grids. The ATLAS ARC Grid, sites connected with NorduGrid s Advanced Resource Connector and having ATLAS software installed and configured for use by grid-jobs, now continuously contributes to this global effort. In the period from November 2005 to June 2006 about 300 000 output files were produced on the ATLAS ARC Grid. Up to 17 sites in five different countries were used as a single batch facility to run about 100 000 jobs. Compared to previous usage, another layer of organization was introduced in the data management system. This enabled the concept of datasets, i.e. conglomerations of files, which are used as objects for data analysis and replication. The 27 TB output was collected into 740 datasets with the physical output distributed over eight significant sites in four countries. Present experience shows that the system design can be expected to cope with the future load. Provided enough available resources, one person should be able to supervise about 10 4 jobs per day with a few GB of input and output data. The present implementation of the ATLAS ARC Grid is lacking the ability to replicate ATLAS datasets to and from other grids via the ATLAS distributed data management tools [8] and there is no support for tape-based storage elements. These shortcomings will be addressed in the near future. Acknowledgments. The indispensable work of the contributing resources system administrators is highly appreciated. References 1. The LHC Study Group: The Large Hadron Collider, Conceptual Design, CERN- AC-95-05 LHC (1995) 2. ATLAS Collaboration: Detector and Physics Performance Technical Design Report, CERN-LHCC-99-14 (1999)

488 S. Haug et al. 3. ATLAS Collaboration: ATLAS Computing Technical Design Report, CERN- LHCC-2005-022 (2005) 4. Knobloch, J. (ed.): LHC Computing Grid - Technical Design Report, CERN- LHCC-2005-024 (2005) 5. Open Science Grid Homepage: http://www.opensciencegrid.org 6. NorduGrid Homepage: http://www.nordugrid.org 7. Goosesens, L., et al.: ATLAS Production System in ATLAS Data Challenge 2, CHEP 2004, Interlaken, contribution, no. 501 8. ATLAS Collaboration: ATLAS Computing Technical Design Report, CERN- LHCC-2005-022, p. 115 (2005) 9. Nielsen, J., et al.: Experiences with Data Indexing Services supported by the NorduGrid Middleware, CHEP 2004, Interlaken, contribution, no. 253 10. Konstantinov, A., et al.: Data management services of NorduGrid, CERN-2005-002, vol. 2, p. 765 (2005) 11. Branco, M.: Don Quijote - Data Management for the ATLAS Automatic Production System, CERN-2005-002, p. 661 (2005) 12. NorduGrid Collaboration: Performance of the NorduGrid ARC and the Dulcinea Executor in ATLAS Data Challenge 2, CERN-2005-002, vol. 2, p. 1095 (2005)