CMS Computing Model with Focus on German Tier1 Activities

Similar documents
The CMS Computing Model

Challenges of the LHC Computing Grid by the CMS experiment

Editorial Manager(tm) for Journal of Physics: Conference Series Manuscript Draft

Virtualizing a Batch. University Grid Center

Grid Computing Activities at KIT

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

Stephen J. Gowdy (CERN) 12 th September 2012 XLDB Conference FINDING THE HIGGS IN THE HAYSTACK(S)

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

Data Transfers Between LHC Grid Sites Dorian Kcira

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

CouchDB-based system for data management in a Grid environment Implementation and Experience

The creation of a Tier-1 Data Center for the ALICE experiment in the UNAM. Lukas Nellen ICN-UNAM

IEPSAS-Kosice: experiences in running LCG site

Computing for LHC in Germany

DESY at the LHC. Klaus Mőnig. On behalf of the ATLAS, CMS and the Grid/Tier2 communities

Computing / The DESY Grid Center

The LHC Computing Grid

Andrea Sciabà CERN, Switzerland

DESY. Andreas Gellrich DESY DESY,

Storage and I/O requirements of the LHC experiments

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

Data handling and processing at the LHC experiments

ATLAS Experiment and GCE

Spanish Tier-2. Francisco Matorras (IFCA) Nicanor Colino (CIEMAT) F. Matorras N.Colino, Spain CMS T2,.6 March 2008"

Experience with Data-flow, DQM and Analysis of TIF Data

The evolving role of Tier2s in ATLAS with the new Computing and Data Distribution model

Status of KISTI Tier2 Center for ALICE

Computing at Belle II

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

CMS Belgian T2. G. Bruno UCL, Louvain, Belgium on behalf of the CMS Belgian T2 community. GridKa T1/2 meeting, Karlsruhe Germany February

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. Presented by Manfred Alef Contributions of Jos van Wezel, Andreas Heiss

ISTITUTO NAZIONALE DI FISICA NUCLEARE

Grid Data Management

Computing at the Large Hadron Collider. Frank Würthwein. Professor of Physics University of California San Diego November 15th, 2013

Big Data Analytics and the LHC

Summary of the LHC Computing Review

PROOF-Condor integration for ATLAS

Overview. About CERN 2 / 11

1. Introduction. Outline

Monte Carlo Production on the Grid by the H1 Collaboration

Large scale commissioning and operational experience with tier-2 to tier-2 data transfer links in CMS

The Global Grid and the Local Analysis

Detector Control LHC

Meta-Monitoring for performance optimization of a computing cluster for data-intensive analysis

Data Access and Data Management

Software and computing evolution: the HL-LHC challenge. Simone Campana, CERN

Experience of the WLCG data management system from the first two years of the LHC data taking

High Throughput WAN Data Transfer with Hadoop-based Storage

LHCb Computing Resources: 2018 requests and preview of 2019 requests

The National Analysis DESY

WLCG Transfers Dashboard: a Unified Monitoring Tool for Heterogeneous Data Transfers.

Support for multiple virtual organizations in the Romanian LCG Federation

The LHC Computing Grid. Slides mostly by: Dr Ian Bird LCG Project Leader 18 March 2008

Storage Resource Sharing with CASTOR.

Installation of CMSSW in the Grid DESY Computing Seminar May 17th, 2010 Wolf Behrenhoff, Christoph Wissing

RUSSIAN DATA INTENSIVE GRID (RDIG): CURRENT STATUS AND PERSPECTIVES TOWARD NATIONAL GRID INITIATIVE

Computing Model Tier-2 Plans for Germany Relations to GridKa/Tier-1

Preparing for High-Luminosity LHC. Bob Jones CERN Bob.Jones <at> cern.ch

The Compact Muon Solenoid Experiment. Conference Report. Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland

Real-time dataflow and workflow with the CMS tracker data

CERN openlab II. CERN openlab and. Sverre Jarp CERN openlab CTO 16 September 2008

ATLAS NOTE. December 4, ATLAS offline reconstruction timing improvements for run-2. The ATLAS Collaboration. Abstract

Analisi Tier2 e Tier3 Esperienze ai Tier-2 Giacinto Donvito INFN-BARI

Monitoring of Computing Resource Use of Active Software Releases at ATLAS

arxiv: v1 [physics.ins-det] 1 Oct 2009

b-jet identification at High Level Trigger in CMS

Ivane Javakhishvili Tbilisi State University High Energy Physics Institute HEPI TSU

HEP Grid Activities in China

Reprocessing DØ data with SAMGrid

DIRAC pilot framework and the DIRAC Workload Management System

Batch Services at CERN: Status and Future Evolution

Deploying a CMS Tier-3 Computing Cluster with Grid-enabled Computing Infrastructure

Austrian Federated WLCG Tier-2

PoS(EGICF12-EMITC2)106

Distributed Data Management on the Grid. Mario Lassnig

N. Marusov, I. Semenov

Philippe Laurens, Michigan State University, for USATLAS. Atlas Great Lakes Tier 2 collocated at MSU and the University of Michigan

Tackling tomorrow s computing challenges today at CERN. Maria Girone CERN openlab CTO

Workload Management. Stefano Lacaprara. CMS Physics Week, FNAL, 12/16 April Department of Physics INFN and University of Padova

CMS Grid Computing at TAMU Performance, Monitoring and Current Status of the Brazos Cluster

Grid Computing: dealing with GB/s dataflows

Muon Reconstruction and Identification in CMS

The INFN Tier1. 1. INFN-CNAF, Italy

CSCS CERN videoconference CFD applications

Using the In-Memory Columnar Store to Perform Real-Time Analysis of CERN Data. Maaike Limper Emil Pilecki Manuel Martín Márquez

and the GridKa mass storage system Jos van Wezel / GridKa

I Service Challenge e l'implementazione dell'architettura a Tier in WLCG per il calcolo nell'era LHC

ARC integration for CMS

The ATLAS Production System

Adding timing to the VELO

LHCb Computing Strategy

Tier-2 DESY Volker Gülzow, Peter Wegner

Early experience with the Run 2 ATLAS analysis model

Monitoring System for the GRID Monte Carlo Mass Production in the H1 Experiment at DESY

CHIPP Phoenix Cluster Inauguration

Big Computing and the Mitchell Institute for Fundamental Physics and Astronomy. David Toback

Availability measurement of grid services from the perspective of a scientific computing centre

The CMS data quality monitoring software: experience and future prospects

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors

Transcription:

CMS Computing Model with Focus on German Tier1 Activities Seminar über Datenverarbeitung in der Hochenergiephysik DESY Hamburg, 24.11.2008

Overview The Large Hadron Collider The Compact Muon Solenoid CMS Computing Model CMS in Germany CMS Workflow & Data Transfers CMS Service Challenges Monitoring HappyFace Project CMS Tier1 Computing - DESY DVSEM, 24.11.2008 2

Large Hadron Collider ALICE CMS ATLAS LHCb Proton-Proton Collider Circumference: 27 km Beam Energy: 7 TeV/c² Below Surface: 100 m Temperature: -271 C Energy Use: 1 TWh/a 4 Large Experiments CMS (General-Purpose) Atlas (General-Purpose) LHCb (Physics of b-quarks) Alice (Lead Ion Collisions) 2 Smaller Experiments LHCf and Totem CMS Tier1 Computing - DESY DVSEM, 24.11.2008 3

Compact Muon Solenoid Muon Chambers Hadronic Calorimetry Magnetic Retrun Yoke Technical Details: Total Weigh: Diameter: Total Length: Magnetic Field Solenoid: Yoke: 12 500 t 15 m 21,5 m 4 Tesla 2 Tesla Silicon Tracker Readout Channels, e.g. Tracker: 10 Mio. Superconducting Solenoid Electromagnetic Calorimetry Forward Calorimetry Collission Rate: 40 MHz CMS Tier1 Computing - DESY DVSEM, 24.11.2008 4

Pictures of CMS CMS Tier1 Computing - DESY DVSEM, 24.11.2008 5

CMS Collaboration CMS 38 Nations 182 Institutions > 2000 Scientists & Engineers CMS Tier1 Computing - DESY DVSEM, 24.11.2008 6

Physics Motivation Test of the Standard Model (at the TeV energy scale) Search for the Higgs-Boson In each proton-proton collision, more than 1000 particles are created. The decay products allow to conclude on underlying physics processes of the collision. Physics beyond the SM (e.g. SUSY, extra dimensions,...) CMS Tier1 Computing - DESY DVSEM, 24.11.2008 7

Trigger and Event Rate 60 TB/sec Level 1 Trigger Reduction with ASICs (Hardware) Collision Rate: 40 MHz Event-Size: 1,5 MB 150 GB/sec Tape & HDD Storage for Offline- Analysis Recorded Events: 150 per second 225 MB/sec High Level Trigger Software Data Reduction CMS Tier1 Computing - DESY DVSEM, 24.11.2008 8

[ TB ] [ TB ] [ MSI2k ] Expected Resources CPU 160 140 120 100 80 60 40 20 0 2007 2008 2009 2010 Year Disk Space MSS Space 80000 70000 70000 60000 50000 40000 30000 20000 10000 60000 50000 40000 30000 20000 10000 0 0 2007 2008 2009 2010 2007 2008 2009 2010 Year Year CMS Tier1 Computing - DESY DVSEM, 24.11.2008 9

CMS Computing Model LHC experiments have decided to use distributed computing and storage resources. The Worldwide LHC Computing Grid (WLCG). Grid services based on the glite middleware. Computing centres arranged in a four-tiered hierarchical structure. Availability and resources are regulated by MOU (e.g. Downtime per year, response time, etc.). CMS Tier1 Computing - DESY DVSEM, 24.11.2008 10

CMS Tier Structure CMS Tier1 Computing - DESY DVSEM, 24.11.2008 11

WLCG Resources Taken from the WLCG Real Time Monitor: http://gridportal.hep.ph.ic.ac.uk/rtm/ CMS Tier1 Computing - DESY DVSEM, 24.11.2008 12

Main Tier Responsibilities Tier0: Storage of RAW detector data. First reconstruction of physics objects after data-taking. Tier1: Host one dedicated copy of RAW and reconstructed data outside the Tier0. Re-processing of stored data. Skimming (creation of small sub data samples) Tier2: Monte Carlo production/simulation. Calibration activities. Resources for physics groups analyses. Tier3: Individual user analyses. Interactive logins (for development & debugging) CMS Tier1 Computing - DESY DVSEM, 24.11.2008 13

German CMS Members Uni Hamburg DESY Hamburg/ Zeuthen RWTH Aachen FSP-CMS (BMBF) Forschungszentrum Karlsruhe Uni Karlsruhe CMS Tier1 Computing - DESY DVSEM, 24.11.2008 14

Tier1 GridKa Located at Forschungszentrum Karlsruhe (KIT, Campus Nord) Part of and operated by the Steinbuch Centre for Computing (SCC, KIT) Multi-VO Tier centre (supports 8 experiments) 4 LHC experiments: CMS, ATLAS, ALICE, LHCb 4 non-lhc experiments: CDF, D0, BaBar, Compass CMS Tier1 Computing - DESY DVSEM, 24.11.2008 15

GridKa Resources (2008) Network: CPU: 10 Gbit link CERN GridKa 3 x 10 Gbit link to 3 European Tier1s 10 Gbit link to DFN/X-Win (e.g. Tier2 connections) CMS Resources: Storage: Based on dcache (developed at DESY and FNAL) CMS Disk: ~ 650 TB + D-Grid: ~ 40 TB CMS Tape: ~ 900 TB ~ 800 cores, 2 GB Ram per core D-Grid Resources: ~ 500 cores, 2 GB Ram per core CMS Tier1 Computing - DESY DVSEM, 24.11.2008 16

Tier2/3 and NAF Resources Besides the Tier1 resources, considerable CPU and disk capacities are available in Germany (August 2008): Tier2 (German CMS Tier2 federation): DESY: RWTH Aachen: 400 cores, 185 TB disk 360 cores, 99 TB disk Tier3: RWTH Aachen: Uni Karlsruhe: Uni Hamburg: 850 cores, 165 TB disk 500 cores, 150 TB disk 15 TB disk NAF (National Analysis Facility): DESY: 245 cores, 32 TB disk D-Grid resources, partially usable for FSP-CMS: RWTH Aachen: GridKA: 600 cores, 231 TB disk 500 cores, 40 TB disk CMS Tier1 Computing - DESY DVSEM, 24.11.2008 17

CMS Workflow 1 Transfers 2 Transfers 3 CMS Tier1 Computing - DESY DVSEM, 24.11.2008 18

PhEDEx Data Transfers Physics Experiment Data Export (CMS data placement tool): Various agents/daemons are run at the tier sites (depending on the tier level) upload, download, MSS stage-in/out, DB,... Provides automatic WAN data transfers, based on subscriptions Automatic load-balancing File transfer routing topology (FTS) Automatic bookkeeping (database entries, logging) Consistency checks (DB vs. filesystem) File integrity checks CMS Tier1 Computing - DESY DVSEM, 24.11.2008 19

Other Involved Components Monte Carlo Production Agent (ProdAgent) CMS Remote Analysis Builder (CRAB) Dataset Bookkeeping System (DBS) Data Location Service (DLS) Trivial File Catalog (TFC) Grid middleware (glite) SE, CE, RB, UI,... DBS: Relates dataset/block and site DLS: Relates dataset/block and logical file name (LFN) TFC: Relates LFN and local file name Very complex system Needs intense testing, debugging and optimisation. CMS Tier1 Computing - DESY DVSEM, 24.11.2008 20

WLCG Job Workflow User Interface Computing Element 6. Job execution Task Queue RB Storage glite WMS WM Proxy 2. Job Workload Manager 4. Job Job Controller CondorG 6. Grid enabled data transfer & access Information Service Match Maker Information Supermarket Information System Storage Element CMS Tier1 Computing - DESY DVSEM, 24.11.2008 21

CMS Service Challenges - 1 Computing, Service and Analysis (CSA) Challenges: Test the readiness for data-taking CSA06, CSA07 and CSA08 were performed to test the computing and network infrastructure and the computing resources at a level of 25%, 50% and 100% required for the LHC start-up. The whole CMS workflow was tested: Production/Simulation Prompt reconstruction at the Tier0 Re-reconstruction at the Tier1s Data distribution to Tier1s and Tier2s Data Analyses CMS Tier1 Computing - DESY DVSEM, 24.11.2008 22

CMS Service Challenges - 2 Cumulative data volume transferred during CSA08: > 2 PetaByte in 4 weeks Transfer rate from the Tier0 to the Tier1s: < 600 MB/sec Goal was achieved only partially, problems with: Storage systems (CASTOR at CERN) Problems identified and fixed afterwards. CMS Tier1 Computing - DESY DVSEM, 24.11.2008 23

CMS Service Challenges - 3 CMS Grid-job statistics: May to August 2008 German centres GridKa, DESY and Aachen performed extremely well, compared to all other CMS tier sites. CMS Tier1 Computing - DESY DVSEM, 24.11.2008 24

Results of CMS Challenges Service Goal 2008 Status 2008 Goal 2007 Status 2007 Goal 2006 Status 2006 Tier-0 Reco Rate 150-300 Hz Achieved 100 Hz Only at bursts 50 Hz Achieved Tier-0 Tier-1 Transfer Rate 600 MB/sec Achieved partially 300 MB/sec Only at bursts 150 MB/sec Achieved Tier-1 Tier-2 Transfer Rate 50-500 MB/sec Achieved 20-200 MB/sec Achieved partially 10-100 MB/sec Achieved Tier-1 Tier-1 Transfer Rate 100 MB/sec Achieved 50 MB/sec Achieved partially N/A - Tier-1 Job Submission 50 000 jobs/day Achieved 25 000 jobs/day Achieved 12 000 jobs/day 3 000 jobs/day Tier-2 Job Submission 150 000 jobs/day Achieved 75 000 jobs/day 20 000 jobs/day 48 000 jobs/day Achieved Monte Carlo Simulation 1.5 x 10 9 events/year Achieved 50 x 10 6 events/month Achieved N/A - CMS Tier1 Computing - DESY DVSEM, 24.11.2008 25

First Real CMS Data 20 TB/day 10 TB/day Data Imports to GridKa of cosmic muon samples (per day) Stable running over large periods of time First real beam event: LHC Beam hitting collimators near CMS (Sept. 10th 2008). GridKa was the first CMS tier centre outside CERN with real beam data. CMS Tier1 Computing - DESY DVSEM, 24.11.2008 26

Monitoring - Overview Distributed Grid resources are complex systems. To provide a stable and reliable service, monitoring of the different services and resources is indispensable. Use monitoring to discover failures before the actual services are affected. Different types of monitoring: Central monitoring of all Tier centres Status of the high level grid services and their interconnection Availability of the physical resources of the Tier centres. Central monitoring of experiment specific components Local monitoring at the different computing sites Local monitoring of experiment specific components Many different monitoring instances for different purposes! CMS Tier1 Computing - DESY DVSEM, 24.11.2008 27

Grid Monitoring Central WLCG Monitoring: Service Availability Monitoring (SAM) Standardised Grid jobs, testing the basic grid functionality of a Tier centre. Includes all Grid services of a site. Centrally send to the Tiers (once per hour) Output available on a web page with history Once a critical SAM test has failed on a site, no user jobs will be send to it anymore. CMS Tier1 Computing - DESY DVSEM, 24.11.2008 28

Experiment Monitoring Besides the basic grid functionality, experiment specific components and functionalities are tested. Access point to the results is the CMS ARDA Dashboard. http://arda-dashboard.cern.ch/cms/ SAM: Also tests of dedicated functionalities of the Tier centres requested by CMS. Job Summary: Type of the job and its exit code. History per centre and activity. PhEDEx data transfers: Transfer quality for CMS datasets. Collects error messages in case of failures. Gives overview on the transferred data samples available at the different sites. CMS Tier1 Computing - DESY DVSEM, 24.11.2008 29

Local Site Monitoring Each computing centre has its own monitoring infrastructure: Nagios Ganglia... Offers monitoring for hardware, services and network connections. Adapted for the local particularities. Automated notification in case of failures by email, instant messages, SMS,... Experiment specific local monitoring: Is the local software installation accessible? Are all required services properly configured? CMS Tier1 Computing - DESY DVSEM, 24.11.2008 30

USCHI Ultra Sophisticated CMS Hardware Information System Provides specific tests of the local infrastructure of a site. Mainly driven by previously observed problems. Designed in a modular way to ease the implementation of new tests. Currently, the following tests are running locally at GridKa: Availability of the CMS production software area Sufficient free inodes of the file system Basic grid functionality (e.g. user interface commands) FTS transfer channels (are they enabled, working, etc.) Remaining space of the disk-only area Status of the dcache tape-write buffers Functionality of basic VOBox commands PhEDEx grid proxy validity Output of the test provided in xml format Developed in Karlsruhe A. Scheurer, A. Trunov CMS Tier1 Computing - DESY DVSEM, 24.11.2008 31

Multi-Monitoring Drawbacks The current monitoring infrastructure is characterised by a multitude of different sources to query for information about one single site. Limitations of the current monitoring infrastructure: Information is not clearly arranged: Too much information not related to a dedicated site. Different site design and history plots / formats / technologies. Nearly impossible to correlate information from different sources. Uncomfortable to use: Access many different websites to get all needed information about one site. Different web interfaces require parameters (time range, site name,...) Often needs a long time to download / update plots (> 1 minute!) Increases administrational effort and slows down the identification of errors. A meta-monitoring system can help to improve this situation! CMS Tier1 Computing - DESY DVSEM, 24.11.2008 32

Ideas for Meta-Monitoring Only one website to access all relevant information for one site Simple warning system (with symbols) Single point of access for all relevant information (+ their history) Caching of important external plots for a fast access Backend for implementation of own tests Easy implementation of new tests Self-defined rules for service status and warning-alerts triggering Correlation of data from different sources Fast, simple and up to date All monitoring information should be renewed every 15 minutes Short access / loading times of the web page Easy to install and use - no complex database Access to every test result in less then 3 mouse clicks Provide links to raw information from the data source The meta-monitoring system HappyFace Project offers this functionality. CMS Tier1 Computing - DESY DVSEM, 24.11.2008 33

HappyFace Project - 1 http://ekpganglia.physik.uni-karlsruhe.de/~happyface/happyface/ CMS Tier1 Computing - DESY DVSEM, 24.11.2008 34

HappyFace Project - 2 Our HappyFace instance is currently monitoring the following quantities at GridKa: Local CMS infrastructure: the USCHI test-set Incoming/outgoing PhEDEx transfers: General transfer quality. Classification of error type. Classification of destination and source errors. Status of the WLCG production environment Queries SAM test results and displays eventually failed tests including a hyperlink to the error report. CMS Dashboard: Search Job Summary Service for failed jobs, print statistics of the exit codes. Fairshare of CMS Displays the fraction of running CMS jobs Caching of all plots necessary for monitoring CMS Tier1 Computing - DESY DVSEM, 24.11.2008 35

HappyFace Project - 3 Instances of HappyFace currently used by: Uni Karlsruhe RWTH Aachen Uni Hamburg Uni Göttingen Code and documentation available under: https://ekptrac.physik.uni-karlsruhe.de/happyface Developed in Karlsruhe V. Buege, V. Mauch, A. Trunov Future plans: Platform to share HappyFace modules Provide standardised output in xml format of all HappyFace instances Contact: Viktor Mauch: mauch@ekp.uni-karlsruhe.de Volker Buege: volker.buege@cern.ch CMS Tier1 Computing - DESY DVSEM, 24.11.2008 36

Summary CMS uses Grid technology for data access and processing. Multiple service challenges have proven the readiness for first data. PhEDEx is used for reliable large-scale data management. Experience has shown, that monitoring is indespensable for the operation of such a heterogeneous and complex system. Multitude of monitoring services available but uncomfortable to use. Meta-monitoring HappyFace Project : Very useful for daily administration of services at e.g. GridKa CMS Tier1 Computing - DESY DVSEM, 24.11.2008 37

Backup Backup CMS Tier1 Computing - DESY DVSEM, 24.11.2008 38

GridKa Resource Requests *Numbers for 2013 uncertain. **Numbers for 2014 unofficial. CMS Tier1 Computing - DESY DVSEM, 24.11.2008 39