The ATLAS Production System

Similar documents
Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

Access to ATLAS Geometry and Conditions Databases

Overview of ATLAS PanDA Workload Management

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

The LHC Computing Grid

The ATLAS Distributed Analysis System

The evolving role of Tier2s in ATLAS with the new Computing and Data Distribution model

The Wuppertal Tier-2 Center and recent software developments on Job Monitoring for ATLAS

The CORAL Project. Dirk Düllmann for the CORAL team Open Grid Forum, Database Workshop Barcelona, 4 June 2008

Computing at Belle II

150 million sensors deliver data. 40 million times per second

New data access with HTTP/WebDAV in the ATLAS experiment

Distributed Data Management on the Grid. Mario Lassnig

The ATLAS EventIndex: Full chain deployment and first operation

Event cataloguing and other database applications in ATLAS

Status of KISTI Tier2 Center for ALICE

Considerations for a grid-based Physics Analysis Facility. Dietrich Liko

The PanDA System in the ATLAS Experiment

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

ATLAS PILE-UP AND OVERLAY SIMULATION

Installation of CMSSW in the Grid DESY Computing Seminar May 17th, 2010 Wolf Behrenhoff, Christoph Wissing

DIRAC pilot framework and the DIRAC Workload Management System

Austrian Federated WLCG Tier-2

Spanish Tier-2. Francisco Matorras (IFCA) Nicanor Colino (CIEMAT) F. Matorras N.Colino, Spain CMS T2,.6 March 2008"

LHCb Distributed Conditions Database

LHCb Computing Status. Andrei Tsaregorodtsev CPPM

XRAY Grid TO BE OR NOT TO BE?

Virtualizing a Batch. University Grid Center

PoS(EGICF12-EMITC2)106

First Experience with LCG. Board of Sponsors 3 rd April 2009

Monitoring System for the GRID Monte Carlo Mass Production in the H1 Experiment at DESY

Andrea Sciabà CERN, Switzerland

LHCb Computing Strategy

Outline. ASP 2012 Grid School

Data Transfers Between LHC Grid Sites Dorian Kcira

The Global Grid and the Local Analysis

Computing / The DESY Grid Center

Deploying virtualisation in a production grid

Computing Model Tier-2 Plans for Germany Relations to GridKa/Tier-1

Philippe Charpentier PH Department CERN, Geneva

The Grid: Processing the Data from the World s Largest Scientific Machine

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

HEP Grid Activities in China

XCache plans and studies at LMU Munich

Monte Carlo Production on the Grid by the H1 Collaboration

ANSE: Advanced Network Services for [LHC] Experiments

Service Availability Monitor tests for ATLAS

LCG Conditions Database Project

Grid Operation at Tokyo Tier-2 Centre for ATLAS

The CMS Computing Model

Distributed Monte Carlo Production for

Challenges of the LHC Computing Grid by the CMS experiment

glideinwms Training Glidein Internals How they work and why by Igor Sfiligoi, Jeff Dost (UCSD) glideinwms Training Glidein internals 1

Experience with ATLAS MySQL PanDA database service

Support for multiple virtual organizations in the Romanian LCG Federation

Reprocessing DØ data with SAMGrid

DQ2 - Data distribution with DQ2 in Atlas

Software and computing evolution: the HL-LHC challenge. Simone Campana, CERN

CHIPP Phoenix Cluster Inauguration

ATLAS Oracle database applications and plans for use of the Oracle 11g enhancements

Operating the Distributed NDGF Tier-1

ATLAS COMPUTING AT OU

Popularity Prediction Tool for ATLAS Distributed Data Management

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

ATLAS 実験コンピューティングの現状と将来 - エクサバイトへの挑戦 坂本宏 東大 ICEPP

LHCb Computing Resources: 2018 requests and preview of 2019 requests

glite Grid Services Overview

ARC integration for CMS

LCG data management at IN2P3 CC FTS SRM dcache HPSS

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science

Update of the Computing Models of the WLCG and the LHC Experiments

The ATLAS PanDA Pilot in Operation

ATLAS Analysis Workshop Summary

AMGA metadata catalogue system

University of Johannesburg South Africa. Stavros Lambropoulos Network Engineer

Analisi Tier2 e Tier3 Esperienze ai Tier-2 Giacinto Donvito INFN-BARI

High Energy Physics data analysis

Monitoring the Usage of the ZEUS Analysis Grid

Experience with Data-flow, DQM and Analysis of TIF Data

Tutorial for CMS Users: Data Analysis on the Grid with CRAB

1.Remote Production Facilities

Data transfer over the wide area network with a large round trip time

The National Analysis DESY

ALICE ANALYSIS PRESERVATION. Mihaela Gheata DASPOS/DPHEP7 workshop

HIGH ENERGY PHYSICS ON THE OSG. Brian Bockelman CCL Workshop, 2016

A scalable storage element and its usage in HEP

HammerCloud: A Stress Testing System for Distributed Analysis


Data Management for the World s Largest Machine

DESY. Andreas Gellrich DESY DESY,

CMS Computing Model with Focus on German Tier1 Activities

ATLAS operations in the GridKa T1/T2 Cloud

UK Tier-2 site evolution for ATLAS. Alastair Dewhurst

LHCb Computing Resources: 2019 requests and reassessment of 2018 requests

New Features in PanDA. Tadashi Maeno (BNL)

The Grid. Processing the Data from the World s Largest Scientific Machine II Brazilian LHC Computing Workshop

The INFN Tier1. 1. INFN-CNAF, Italy

PROOF-Condor integration for ATLAS

CMS Tier-2 Program for user Analysis Computing on the Open Science Grid Frank Würthwein UCSD Goals & Status

Transcription:

The ATLAS MC and Data Rodney Walker Ludwig Maximilians Universität Munich 2nd Feb, 2009 / DESY Computing Seminar

Outline 1 Monte Carlo Production Data 2 3

MC Production Data MC Production Data Group and user analysis, calibration

Outline MC Production Data 1 Monte Carlo Production Data 2 3

MC Production for ATLAS MC Production Data Traditionally a distributed activity, e.g. ZEUS Funnel. The difference is in the volume Plan to simulate 10% of 200Hz data taking rate Geant4-10mins per event, 800MB RAM, reconstruction - 1min per event, 2-3GB RAM Steady state of 20[ev/s]*600[s/ev] = 12000 cpu cores Production is bursty due to release cycle. Currently fill up to 25000 cores - but includes T1 resources to be used for reprocessing.

Outline MC Production Data 1 Monte Carlo Production Data 2 3

Computing Model MC Production Data Expect to reprocess all data twice per year apply new alignment, calibration, algorithm improvements more often during commisioning Historically not distributed Co-located with RAW data and databases - re-use experiment primary reconstruction infra-structure ATLAS will reprocess at T1s to maximise cpu and tape resources Reprocess while taking data - quicker and more often New challenges

MC Production Data Distributed Challenges Large(2GB) input files different for each job, c.f. 50 MC jobs skip through 1 EVNT file Data is stored on tape at the T1s also at T0 - write once, read never need automated pre-stage to keep cpus full varied T1 experience of non-archival tape systems access cannot use T0 Db directly - RTT, load, scaling distributed(streams) to T1 Oracle RACs Haven t been practising for 5yrs

Outline 1 Monte Carlo Production Data 2 3

of ProdSys Always consisted of 3 distinct Grids/middleware: OSG, LCG, NDGF OSG is LCG minus glitewms. NDGF very different. Handle heterogeneity using central database of job definitions. Each grid executes them, in their own way, and updates Db. Test in DC1, DC2, CSC challenges/productions. OSG and LCG performed badly. NDGF worked well.

LCG problems EDG WMS slow to submit. Could not fill cpus. High failure rate. Poor brokerage - bad info. No data-job co-location. Job priorities not followed. No data placement - did not move input data near free cpu. Accessing data remotely increased failures, SE overload, confusion. Pure Condorg submission addresses WMS issues but not data placement Tinkering around the edges did not improve it.

OSG solution OSG performed slightly worse than LCG and this prompted major rethink. Less Grid-baggage. Designed data-centric production system, factorising out grid frailties Uses new data handling system(ddm) to move data to cpu Uses pull-method - pilot job has survived Grid middleware when it asks for real job. Nod to LHCb Dirac system. Used very successfully for OSG production in 2007 Canada switched from LCG to production, late 2007, with immediate and impressive improvement. Led to proposal to use everywhere - transition early 2008.

Outline 1 Monte Carlo Production Data 2 3

PANDA System server ProdSys job job https submit job pull https site B pilot pilot site A glite run End-user pilot pilot run Worker Nodes condor-g Scheduler 3

Data placement T1 cloud model imposed. Input data is located at T1 and output data must be copied there for job to finish successfully. Jobs are assigned to T2 and input data is copied there before the job can run. Input guaranteed to be local when pilot picks up job. Output data first stored to local SRM, and then asynchronously to T1. Job not finished until output arrives at T1. If local SRM is broken then jobs fail and no more assigned. Previously employed failover methods led to poor tracability.

Performance

Data Volume Accumulation of MC data at T1s 1-2TB per day, from T1 and it s T2s Timely addition of disk capacity Deletion campaigns have negligible impact

Task Assignment to Clouds Only consider cloud with necessary input data Work to be done in cloud is calculated using cpu time per job, in SI2kdays Sum over assigned tasks, work = (Njobs i Ndone i ).cpu i only count tasks with equal or higher priority. High priority task assignment blind to lower priority tasks. Max work per cloud based on MoU shares, e.g.us 20%, max is 20kSI2kdays cpu time per job is rough guess and usually way off. Plan feedback mechanism to reset to actual time taken by first jobs.

Task Assignment to Clouds II For clouds below their work threshold, choose one with lowest work/nrunning Many ways to get it wrong, e.g. ignore low priority tasks in threshold, and assign in multiple parallel threads.

Tuning Transferring jobs contribute to cloud work Prevents new task assignment Plan to subtract transferring jobs contribution from work

using ProdSys Jobs to reprocess data are defined in Db, and tasks assigned to T1 based on data location - only 1 copy of RAW among T1s. Jobs run at T1 but data input is similar to MC at T2 case. Jobs remain assigned while 10-20 file datasets are recalled from tape.

Outline 1 Monte Carlo Production Data 2 3

Pre-staging RAW data DDM uses srmbringonline and srmls Do not know tape distribution of files. Some assumptions about how they were written, i.e. time ordered during data taking. Must request large number of files and hope SRM and native tape system can optimize. But not more than read cache size. Files must stay on disk until processed. SRM pin does not work, so just periodically BringOnline. No manual T1 work, apart from recovering stuck/broken tapes.

Outline 1 Monte Carlo Production Data 2 3

Conditions Db Access Calibration data, mostly muon system(20mb), valid for 1 day Detector Control System(DCS), 1-2GB per day. Valid for few seconds to few hours. Flat files, LAr calibration, 100MB/day - pointers from Db Athena access CD before and during first event, but can go back for more 10mins of DCS taken, 1min before and 9 after 1st event timestamp Db resource usage large at start of job then little or none

Db Access Methods ATLAS uses COOL and CORAL to abstract access to Db COOL Db is in Oracle RAC - direct access. Hardware dependant. Works well for local, controlled access. Vulnerable to chaotic access. Create SQLite file for run range and access on local file system. Used in MC production, proven to scale perfectly. Good for well-defined uses, with small range. Needs auto-procedure to create and distribute files. Xmas 2008 reprocessing excercise done like this. FroNTier web-based proxy to Oracle RAC. Developed and used by CMS. May offer best of both worlds - under test for ATLAS. Suited for chaotic access from T2, e.g.user analysis.

Frontier for Scaling Db Access Frontier client, integrated into CORAL, converts SQL query into http url. Frontier server decodes back to SQL and executes on close Db. Result is compressed and coded into http to send back. Identical SQL = identical url - cache result! Local caching web proxy (Squid) caches query results Configure to store? results and also larger objects Avoids the multiple RTTs of SQL query Protects Oracle RAC from remote WN attack May benefit from caching (DCS 10min quantization)

Frontier for Scaling Db Access TRIUMF RAC vs Frontier scaling 10min quanta: bunches of 24 jobs use same CD 1000 900 800 Av.time per job(s) 700 600 500 400 300 200 T1 RAC Frontier Frontier cached 100 0 0 50 100 150 200 250 300 350 400 450 #jobs

FroNTier Deployment Assuming test go well... FroNTier servers required at each T1 with Squid (accelerator-mode) Squid caching proxy at T2s. CMS require this too, suggests a new EGEE service node. Testbed exists in DE, FZK and LMU - test with ID alignment and muon calibration. Proposal to reprocess also at T2(DESY-HH). Early adopters install squid?

MC production system scales to meet ATLAS requirements With the addition of pre-staging, same system works for reprocessing needs too Outlook Proliferation of small files overloads DDM, FTS, SRM, LFC. Must merge at T2 before upload to T1. Pre-stage testing and tuning. Handling of lost files, broken tapes. Enabling chaotic(non-t0/1) access