Computing. DOE Program Review SLAC. Rainer Bartoldus. Breakout Session 3 June BaBar Deputy Computing Coordinator

Similar documents
The CMS Computing Model

Computing & Software. DOE Program Review. Rainer Bartoldus. Experimental Research Breakout Session 15 June 2005 SLAC

Not e: The new Babar Comput ing Model is st ill under development. Everything I present here today is still under discussion within Babar.

Belle & Belle II. Takanori Hara (KEK) 9 June, 2015 DPHEP Collaboration CERN

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

Distributed Offline Data Reconstruction in BABAR

LHCb Computing Resource usage in 2017

Prompt data reconstruction at the ATLAS experiment

Optimizing Parallel Access to the BaBar Database System Using CORBA Servers

150 million sensors deliver data. 40 million times per second

Stephen J. Gowdy (CERN) 12 th September 2012 XLDB Conference FINDING THE HIGGS IN THE HAYSTACK(S)

The evolving role of Tier2s in ATLAS with the new Computing and Data Distribution model

LHCb Computing Status. Andrei Tsaregorodtsev CPPM

LHCb Computing Resources: 2018 requests and preview of 2019 requests

Experimental Computing. Frank Porter System Manager: Juan Barayoga

LHCb Computing Strategy

Workload Management. Stefano Lacaprara. CMS Physics Week, FNAL, 12/16 April Department of Physics INFN and University of Padova

CMS Grid Computing at TAMU Performance, Monitoring and Current Status of the Brazos Cluster

Status of KISTI Tier2 Center for ALICE

The ATLAS EventIndex: Full chain deployment and first operation

ATLAS Analysis Workshop Summary

High-Energy Physics Data-Storage Challenges

First LHCb measurement with data from the LHC Run 2

LHCb Distributed Conditions Database

UW-ATLAS Experiences with Condor

ANSE: Advanced Network Services for [LHC] Experiments

Data Transfers Between LHC Grid Sites Dorian Kcira

PROOF-Condor integration for ATLAS

Reprocessing DØ data with SAMGrid

ALICE ANALYSIS PRESERVATION. Mihaela Gheata DASPOS/DPHEP7 workshop

The BaBar Computing Model *

Software and computing evolution: the HL-LHC challenge. Simone Campana, CERN

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

The ATLAS Distributed Analysis System

DESY at the LHC. Klaus Mőnig. On behalf of the ATLAS, CMS and the Grid/Tier2 communities

Computing Model Tier-2 Plans for Germany Relations to GridKa/Tier-1

Data handling and processing at the LHC experiments

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

(Tier1) A. Sidoti INFN Pisa. Outline: Tasks and Goals The analysis (physics) Resources Needed

LHCb Computing Resources: 2019 requests and reassessment of 2018 requests

LHC Computing Models

SAM at CCIN2P3 configuration issues

The ATLAS EventIndex: an event catalogue for experiments collecting large amounts of data

Compact Muon Solenoid: Cyberinfrastructure Solutions. Ken Bloom UNL Cyberinfrastructure Workshop -- August 15, 2005

CouchDB-based system for data management in a Grid environment Implementation and Experience

CMS Belgian T2. G. Bruno UCL, Louvain, Belgium on behalf of the CMS Belgian T2 community. GridKa T1/2 meeting, Karlsruhe Germany February

ATLAS PILE-UP AND OVERLAY SIMULATION

ATLAS NOTE. December 4, ATLAS offline reconstruction timing improvements for run-2. The ATLAS Collaboration. Abstract

The BABAR Database: Challenges, Trends and Projections

Grid Computing Activities at KIT

EventStore A Data Management System

New strategies of the LHC experiments to meet the computing requirements of the HL-LHC era

Virtualizing a Batch. University Grid Center

Constant monitoring of multi-site network connectivity at the Tokyo Tier2 center

Overview of ATLAS PanDA Workload Management

Andrea Sciabà CERN, Switzerland

A L I C E Computing Model

Summary of the LHC Computing Review

Challenges of the LHC Computing Grid by the CMS experiment

Distributed Monte Carlo Production for

HEP replica management

CERN and Scientific Computing

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

1. Introduction. Outline

Electron and Photon Reconstruction and Identification with the ATLAS Detector

Machine Learning in Data Quality Monitoring

arxiv:cs/ v2 [cs.dc] 12 Mar 2004

The Grid: Processing the Data from the World s Largest Scientific Machine

arxiv: v1 [physics.ins-det] 1 Oct 2009

The INFN Tier1. 1. INFN-CNAF, Italy

Computing at Belle II

Contents 1 Introduction 2 2 Design and Implementation The BABAR Framework Requirements and Guidelines

Early experience with the Run 2 ATLAS analysis model

A Generic Multi-node State Monitoring Subsystem

The LHC Computing Grid

CMS Computing Model with Focus on German Tier1 Activities

CERN network overview. Ryszard Jurga

Data Analysis in ATLAS. Graeme Stewart with thanks to Attila Krasznahorkay and Johannes Elmsheuser

PARALLEL PROCESSING OF LARGE DATA SETS IN PARTICLE PHYSICS

Big Computing and the Mitchell Institute for Fundamental Physics and Astronomy. David Toback


Towards Network Awareness in LHC Computing

Analysis & Tier 3s. Amir Farbin University of Texas at Arlington

On the Verge of One Petabyte - the Story Behind the BaBar Database System

Monitoring system for geographically distributed datacenters based on Openstack. Gioacchino Vino

1.Remote Production Facilities

ATLAS operations in the GridKa T1/T2 Cloud

Spanish Tier-2. Francisco Matorras (IFCA) Nicanor Colino (CIEMAT) F. Matorras N.Colino, Spain CMS T2,.6 March 2008"

Distributed Data Management on the Grid. Mario Lassnig

Reliability Engineering Analysis of ATLAS Data Reprocessing Campaigns

The BABAR Prompt Reconstruction System

SPINOSO Vincenzo. Optimization of the job submission and data access in a LHC Tier2

Bookkeeping and submission tools prototype. L. Tomassetti on behalf of distributed computing group

DIAL: Distributed Interactive Analysis of Large Datasets

System upgrade and future perspective for the operation of Tokyo Tier2 center. T. Nakamura, T. Mashimo, N. Matsui, H. Sakamoto and I.

DØ Southern Analysis Region Workshop Goals and Organization

Database monitoring and service validation. Dirk Duellmann CERN IT/PSS and 3D

The GAP project: GPU applications for High Level Trigger and Medical Imaging

I Service Challenge e l'implementazione dell'architettura a Tier in WLCG per il calcolo nell'era LHC

Transcription:

Computing DOE Program Review SLAC Breakout Session 3 June 2004 Rainer Bartoldus BaBar Deputy Computing Coordinator 1

Outline The New Computing Model (CM2) New Kanga/ROOT event store, new Analysis Model, new Micro/Mini, new Bookkeeping Overview of Production Activities Online, Calibration, Reconstruction, Skimming Distributed Computing at Tier A Sites IN2P3, RAL, INFN, GridKa, SLAC Production Summary 2

Computing Model 2 (CM2) Builds on the experience from the original Computing Model Have learned what works, what doesn't Addressing issues that are becoming more important with increasing data volume Result of a thorough 6 months study by CM Working Group 2 Decision to implement in December 2002 Implementation launched (and completed!) in 2003 3

CM2 Kanga/ROOT New Event Store Simple, ROOT-based file format Minimal size overhead to keep disk space costs low Used everywhere from Tier-A/C to workstation/laptop Written directly from Production (ER and SP) Support for multiple data components (tag, aod, esd, tru, cnd) Supports requirements of new analysis model 4

New Analysis Model Storage of composite candidate lists User-defined combinatorics Any user-calculated quantities associated with event or candidate Centralized Skim Production Opportunity for new ideas 4 times a year Provision for easy/fast access to event store Option of interactive access from ROOT prompt Addresses the main reasons for AWG ntuple production 5

Mini and Micro Introduced Mini (DST) format Highly compact output of event reconstruction Contains high-level objects as well as hit information Allows to redo track fitting, vertexing, etc. Very powerful source for analysis+diagnostics Changed to New Micro (DST) format Subset of the Mini ( reduced Mini) Restricted to analysis-level objects, also composites Behaves like Mini in cache mode 6

Overview of BaBar Production Online Prompt Calibration Event Reconstruction Simulation Production Skimming 7

Online Computing Online Event Processing / Level 3 Farm Recording data with consistently high efficiency and minimal deadtime (1-2 % at peak luminosity) Reached 300 Hz logging rates out of Level 3 Trigger (3 times design) Logging Manager (LM) Single server connected to ~30 Level 3 farm nodes (serializing the data streams) Capability of 500 Hz at 30 KB per event is the current limit (seen in L1 passthrough running) 8

Online Computing (cont.) Logging Manager Upgrade Basic Idea Log data in parallel streams to local farm node disks Harvest data asynchronously through one or more servers, merging into single event stream Goals Eliminate dead time due to LM Accommodate logging rates of 50 kb x 500 Hz sustained, and 50 kb x 5 khz peak (scalable) Ready to be deployed this month! 9

Prompt Calibration (PC) Tracks changes of detector conditions in time Runs off 5 Hz of calibration events selected by Level 3 Bhabha, radiative Bhabha, mu pair, hadronic events Extracts calibration constants for each run e.g., beam spot position, SVT timing, alignment, etc. Writes to Conditions Database Rolling calibrations: Slowly varying cnds propagated from one run to next Only stage that needs to process runs sequentially Has to keep up with IR2 logging rate 10

PC Performance PC Farms keeping up with data taking Current farms consist of 16 x dual 1.4 GHz PIII CPUs Typically 1 farm for new data, 1-3 for reprocessing Events read from xtc file, fanned out to nodes Used to be capable of 600 pb -1 per day, recently added new server to achieve up to 1000 pb -1 per day Safe again, but still do not want PC farm to have to scale with luminosity Developed fast Filter to generate small calib. xtc file Will keep PC farms ahead of high luminosities 11

Event Reconstruction (ER) Performs full reconstruction of raw event data Applies conditions data from PC pass Only needs conditions from current run Completely parallelizable at run granularity Allows to scale ER farms to higher luminosities Writes into (new) Event Store 12

ER Performance ER Farms doing processing and reprocessing Currently 4 farms of ~32 x dual 1.4 GHz PIII CPUs plus 2 farms of dual 2.7 GHz PIV Each farm capable of 150 pb -1 (220 pb -1 ) per day Currently 3 farms for new data, 2 to be used for skimming, 1 for reprocessing Can keep up with new data At the same time allows to reprocess early fraction of the Run with optimized calibrations 13

Simulation Production (SP) Distributed over 27 production sites Mostly Tier-C (universities) Little dependence on a single site stable production rates Between 10 and 600 CPUs per site Total of ~2000 CPUs available for SP Resources to produce 60 M events/week SP5 for Run 1-3 data (12-series) ramping down SP6 for Run 4 (14-series) ramped up 14

SP5 Generic BB production complete 2.2 billion events available SLAC and INFN continue to produce signal MC 3M events/wk SLAC INFN 15

SP6 Target and Status Generic BB: 3x lumi 3.2 M / fb -1 udsc + tau: 1x lumi 4.3 M / fb -1 Signal (replicated) 3.3 M / fb -1 1 B events for Run 4 544 M events done All signal + generics up to May 1 done by June 30 16

BaBar Tier A Sites The pillars of the distributed model Originally, IN2P3 in Lyon, France as a replica of SLAC, taking a considerable share of analysis users Then RAL, UK entered as an alternative analysis site for the (initial) Kanga data In 2002, INFN, Padova, Italy joined as first Tier A to take on a production task (reprocessing of Run1-3 data); this year also Bologna (CNAF) for analysis Latest addition is GridKa, Karlsruhe, Germany to participate in skimming and soon also analysis 17

IN2P3 Analysis Seeing continuously high load of BaBar analysis jobs About 200 jobs running in parallel on average during last month The maximum is > 400 jobs Simulation Production MC production has now switched from SP5 (Run 1-3) to SP6 (Run 4) Objectivity Mini + Truth were imported for SP5 conversion into CM2 format 18

IN2P3 (cont.) CM2 Data Import completed through SRB (= Storage Request Broker, a GRID tool!) AllEvents plus all 41 allocated skims are now available at CCIN2P3 for Run 1, 2, 3 Future More disks expected Preparing to switch from RH 7.2 to RHEL 3 (but BaBar is not alone at CCIN2P3) 19

Analysis Resources RAL 304 x dual 1.4 GHz PIII CPUs Queues full + 1000's of jobs waiting most of the time 60-70 new dual 2.8 GHz PIVs to come in June/July 49 TB disk space available All 10- and 12-Series classic Kanga data In process of replacing Streams with Pointer Skims This is freeing up considerable space for CM2 data 30-40 TB new disk coming online in June/July 20

RAL: Batch Usage One Month BaBar queued Fairshare algorithm prevents pending jobs from taking over queue BaBar running Air conditioning maintenance April 6 7 304 CPUs 21

RAL: CM2 Data CM2 Data Availability ARTF (Analysis Resources Task Force) specified: Tau11, Tau1N and Tau33 Quick unofficial survey of users added: BCCC03a3body, BCCC3body, BCCKs3body, BCPi0Ks3body, and BFourBody Will feed back to ARTF for final decision First CM2 data made available last month More to be added as space is being freed up xrootd service for CM2 access starting soon 22

Event Reconstruction Imported 1.4 TB on best day Often 1.2 TB/day INFN 60 fb -1 done in ER (same as IR2) but 110 fb -1 processed Always need to redo early fraction of a new Run Run4 (re-)processing Needed to converge on reconstruction release, conditions (e.g. SVT local alignment) Officially started on Feb 9 th 23

INFN: Reconstruction Control-System Development Developed while in production (now very stable and feature-rich) New CM2 tools for merging, checking collections Lots of new states/checks, and new post processing state machine PR (= PC+ER) Managers System where managers look after farms on both sides of the Atlantic is working well for BaBar Proved very helpful in raising efficiency 24

CNAF (Bologna) INFN: Analysis Former Tier-B of Roma moved to CNAF end of 2003 CNAF also Tier-1 for LHC experiments Hardware Resources 2 front-end servers 30 dual PIII clients and 21 dual PIV clients 6 TB of disk for scratch, AWG, conditions 29 TB for event store Major upgrade this summer 88 new clients plus ~70 TB disk 25

GridKa Activities Two main tasks in the past: Skimming Simulation Production (SP5) In the future, in addition: Analysis Ramped down SP during skimming and ramped it up again now 26

GridKa: CM2 Skimming Production Output January to early April GridKa skimmed 1439 collections of Run3 Produced 9 merges of 110 collections each Skimming/merging used 150 CPUs (sometimes 200 CPUs) corresponding to BaBar share at GridKa Used > 5 TB of disk space for skimming/merging At present Skimming Run 4 data until Padova is ramped up 27

SLAC: CM2 Production Run 1-3 Data: 1.731 x 10 9 events converted to CM2, skimmed, merged and put in bookkeeping This is all the available data. SP5 Monte Carlo: 1.38 x 10 9 events converted and merged (target) 147 x 10 6 BBbar events skimmed Skimming is ramping up now that SP5 conversion is winding down (can do 25-30 M a day, signal 5x faster) 94.1 x 10 6 events merged 28

CM2 Production (cont.) Run 4 Data: ~83 fb -1 recorded (as of today) 62.9 fb -1 in Green Circle dataset (up to May 1) processed, skimmed and merged and put in bookkeeping This completes the Green Circle dataset in CM2! SP6 Monte Carlo: 544 x 10 6 events generated 131 x 10 6 events skimmed 85 x 10 6 events merged 29

Summary Within a year's time, BaBar has reinvented the way it does Computing Computing Model 2 is in place and has successfully finished its first production cycle for ICHEP04 SLAC + the 4 Tier A Centers in Europe are turning around data in record time Over the past year, BaBar has become a more distributed experiment than it ever was before We have to keep our resources at the cutting edge if we want to master the hoped-for explosion in data volume that PEP-II promises 30