Service Availability Monitor tests for ATLAS

Similar documents
The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

Monitoring tools in EGEE

Monitoring System for the GRID Monte Carlo Mass Production in the H1 Experiment at DESY

Status of KISTI Tier2 Center for ALICE

AGIS: The ATLAS Grid Information System

Considerations for a grid-based Physics Analysis Facility. Dietrich Liko

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

CMS Tier-2 Program for user Analysis Computing on the Open Science Grid Frank Würthwein UCSD Goals & Status

glite Grid Services Overview

Operating the Distributed NDGF Tier-1

Philippe Charpentier PH Department CERN, Geneva

Grid services. Enabling Grids for E-sciencE. Dusan Vudragovic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia

Bookkeeping and submission tools prototype. L. Tomassetti on behalf of distributed computing group

ARC integration for CMS

AGATA Analysis on the GRID

FREE SCIENTIFIC COMPUTING

The PanDA System in the ATLAS Experiment

HEP Grid Activities in China

Austrian Federated WLCG Tier-2

Analisi Tier2 e Tier3 Esperienze ai Tier-2 Giacinto Donvito INFN-BARI

The glite middleware. Ariel Garcia KIT

Beob Kyun KIM, Christophe BONNAUD {kyun, NSDC / KISTI

PoS(EGICF12-EMITC2)106

glite Middleware Usage

GRID COMPUTING APPLIED TO OFF-LINE AGATA DATA PROCESSING. 2nd EGAN School, December 2012, GSI Darmstadt, Germany

HammerCloud: A Stress Testing System for Distributed Analysis

Edinburgh (ECDF) Update

where the Web was born Experience of Adding New Architectures to the LCG Production Environment

Andrea Sciabà CERN, Switzerland

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

DESY. Andreas Gellrich DESY DESY,

Pan-European Grid einfrastructure for LHC Experiments at CERN - SCL's Activities in EGEE

The Wuppertal Tier-2 Center and recent software developments on Job Monitoring for ATLAS

Workload Management. Stefano Lacaprara. CMS Physics Week, FNAL, 12/16 April Department of Physics INFN and University of Padova

Monte Carlo Production on the Grid by the H1 Collaboration

Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008

Troubleshooting Grid authentication from the client side

The Global Grid and the Local Analysis

Scientific data management

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. Presented by Manfred Alef Contributions of Jos van Wezel, Andreas Heiss

The LHC Computing Grid

ALHAD G. APTE, BARC 2nd GARUDA PARTNERS MEET ON 15th & 16th SEPT. 2006

Ivane Javakhishvili Tbilisi State University High Energy Physics Institute HEPI TSU

The Grid. Processing the Data from the World s Largest Scientific Machine II Brazilian LHC Computing Workshop

High Energy Physics data analysis

Grid Interoperation and Regional Collaboration

Support for multiple virtual organizations in the Romanian LCG Federation

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

The ATLAS Production System

AMGA metadata catalogue system

The National Analysis DESY

DIRAC pilot framework and the DIRAC Workload Management System

( PROPOSAL ) THE AGATA GRID COMPUTING MODEL FOR DATA MANAGEMENT AND DATA PROCESSING. version 0.6. July 2010 Revised January 2011

CRAB tutorial 08/04/2009

First Experience with LCG. Board of Sponsors 3 rd April 2009

DIRAC data management: consistency, integrity and coherence of data

Ganga - a job management and optimisation tool. A. Maier CERN

g-eclipse A Framework for Accessing Grid Infrastructures Nicholas Loulloudes Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac.

GEMSS: a novel Mass Storage System for Large Hadron Collider da

Enabling Grids for E-sciencE. A centralized administration of the Grid infrastructure using Cfengine. Tomáš Kouba Varna, NEC2009.

I Service Challenge e l'implementazione dell'architettura a Tier in WLCG per il calcolo nell'era LHC

Grid Infrastructure For Collaborative High Performance Scientific Computing

CHIPP Phoenix Cluster Inauguration

FTS3 a file transfer service for Grids, HPCs and Clouds

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

The INFN Tier1. 1. INFN-CNAF, Italy

The glite middleware. Presented by John White EGEE-II JRA1 Dep. Manager On behalf of JRA1 Enabling Grids for E-sciencE

The LHC Computing Grid

Installation of CMSSW in the Grid DESY Computing Seminar May 17th, 2010 Wolf Behrenhoff, Christoph Wissing

Distributed production managers meeting. Armando Fella on behalf of Italian distributed computing group

SPINOSO Vincenzo. Optimization of the job submission and data access in a LHC Tier2

Security Service Challenge & Security Monitoring

GRID COMPANION GUIDE

The Grid: Processing the Data from the World s Largest Scientific Machine

Geographical failover for the EGEE-WLCG Grid collaboration tools. CHEP 2007 Victoria, Canada, 2-7 September. Enabling Grids for E-sciencE

Monitoring for IT Services and WLCG. Alberto AIMAR CERN-IT for the MONIT Team

Benchmarking the ATLAS software through the Kit Validation engine

ATLAS Tier-3 UniGe

PoS(ACAT2010)039. First sights on a non-grid end-user analysis model on Grid Infrastructure. Roberto Santinelli. Fabrizio Furano.

Gergely Sipos MTA SZTAKI

Grid and Cloud Activities in KISTI

The evolving role of Tier2s in ATLAS with the new Computing and Data Distribution model

DQ2 - Data distribution with DQ2 in Atlas

Introduction to Grid Infrastructures

Spanish Tier-2. Francisco Matorras (IFCA) Nicanor Colino (CIEMAT) F. Matorras N.Colino, Spain CMS T2,.6 March 2008"

Improved ATLAS HammerCloud Monitoring for Local Site Administration

European Grid Infrastructure

EMI Deployment Planning. C. Aiftimiei D. Dongiovanni INFN

CHEP Report. Jeff Templon

Troubleshooting Grid authentication from the client side

Influence of Distributing a Tier-2 Data Storage on Physics Analysis

The EU DataGrid Testbed

Deploying virtualisation in a production grid

How to use the Grid for my e-science

Middleware-Tests with our Xen-based Testcluster

Oracle Big Data Cloud Service, Oracle Storage Cloud Service, Oracle Database Cloud Service

Tests of PROOF-on-Demand with ATLAS Prodsys2 and first experience with HTTP federation

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

Failover procedure for Grid core services

Transcription:

Service Availability Monitor tests for ATLAS Current Status Work in progress Alessandro Di Girolamo CERN IT/GS

Critical Tests: Current Status Now running ATLAS specific tests together with standard OPS tests. All tests are using ATLAS credentials. Sites and endpoints definition: intersection between GOCDB and TiersOfATLAS (ATLAS specific sites configuration file with Cloud Model) Different services and endpoints need to be tested using different VOMS credentials ATLAS endpoints and paths must be explicitly tested The LFC of the Cloud (residing in the T1) is used. FCR: No banning if sites are failing Critical Tests Only Clouds using LFC are tested with ATLAS specific tests 14 May 2008 Alessandro Di Girolamo 2

Critical Tests: Current Status - Storage Element SE & SRM (centrally from SAM UI): SE-ATLAS-lcg-cr : copy and register (with the cloud LFC) a file from the SAM UI to the endpoint For the Tier1s both Disk and Tape areas are tested SE-ATLAS-lcg-cp: copy back the file from the SE to the UI. Verification of the integrity of the file copied. SE-ATLAS-lcg-del: delete the files from the storage and from the LFC 14 May 2008 Alessandro Di Girolamo 3

Critical Tests: Current Status - Computing Element CE (job submitted on the CE): On all the ATLAS CE in production and certified (from the BDII): keep on running part of the OPS suite: Job Submission Certification Authority version VO software directory ATLAS specific test: ATLAS-vo-lcgTag: Check VO tag management (lcg-tags) 14 May 2008 Alessandro Di Girolamo 4

Critical Tests: Current Status - LFC & FTS LFC: lfc ls: list entries in /grid/atlas lfc wf: create an entry in the LFC FTS: List FTS channels: glite-transfer-channel-list 14 May 2008 Alessandro Di Girolamo 5

Other SAM Tests Many more tests are launched, i.e.: ATLAS-lcg-versions: Check the version of lcg-utils running on the WNs ATLAS-swdirspace: Check the dimension of the ATLAS sw installation area Ganga Robot: Only for ATLAS Tier1 and Tier2 (from the ToA): Compile and execute a real analysis job based on a sample dataset 14 May 2008 Alessandro Di Girolamo 6

Work in progress Publication of SAM SE/SRM ATLAS critical results on the Dashboard 14 May 2008 Alessandro Di Girolamo 7

Work in progress Desiderata for the SAM developers: Tier0/Tier1/Tier2 intrinsic differences Increase site granularity in the SAM DB not to mix results More flexibility to set critical tests SE: SRM2 tests for each space token of each endpoint CE: Tests already developed, to be integrated in the framework Increase the GangaRobot granularity Retrieve Panda production system information CVS and Documentation 14 May 2008 Alessandro Di Girolamo 8

Work in progress New GUI for SAM test for ATLAS: General solution for VO-specific SAM display, i.e. immediately usable for the other VO (extending/improving early prototype for CMS) 14 May 2008 Alessandro Di Girolamo 9

Work in progress New GUI for SAM test for ATLAS Cloud topology view 14 May 2008 Alessandro Di Girolamo 10

How do we move on Improve communication between ATLAS and Sites Initially concentrate on a minimal list of Critical Tests Sites need to be proactive Comprehensive tests (as the GangaRobot) initially under ATLAS responsibility: Followed up by ATLAS Distributed Computing As usual, monitoring is not enough. WLCG monitoring framework ATLAS is coherently using (and developing) tools: e.g the new SAM visualization 14 May 2008 Alessandro Di Girolamo 11

Backup slides 14 May 2008 Alessandro Di Girolamo 12

Site Availability: T0/T1 Site Services Availability: Site Services X = CE, SE, SRM Down: if all services of type X of a site are Down Ok: if all services of type X are Ok Degraded: if some services of type X are Ok and other are Down Site BDII: Ok or Down by taking the status of the site BDII instance Site Availability: The AND of each single Site Services Availability 14 May 2008 Alessandro Di Girolamo 13

Site Availability: one example 14 May 2008 Alessandro Di Girolamo 14

SAM results on CCRC08 Gridmap 14 May 2008 Alessandro Di Girolamo 15