Monitoring tools in EGEE

Similar documents
Grid services. Enabling Grids for E-sciencE. Dusan Vudragovic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia

Grid Interoperation and Regional Collaboration

Deploying virtualisation in a production grid

On the employment of LCG GRID middleware

Service Availability Monitor tests for ATLAS

Operating the Distributed NDGF Tier-1

Service Level Agreement Metrics

Failover procedure for Grid core services

Andrea Sciabà CERN, Switzerland

EGEE and Interoperation

R-GMA (Relational Grid Monitoring Architecture) for monitoring applications

COD DECH giving feedback on their initial shifts

The Grid: Processing the Data from the World s Largest Scientific Machine

glite Grid Services Overview

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

The glite middleware. Presented by John White EGEE-II JRA1 Dep. Manager On behalf of JRA1 Enabling Grids for E-sciencE

LCG-2 and glite Architecture and components

Overview of HEP software & LCG from the openlab perspective

The Grid. Processing the Data from the World s Largest Scientific Machine II Brazilian LHC Computing Workshop

g-eclipse A Framework for Accessing Grid Infrastructures Nicholas Loulloudes Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac.

The glite middleware. Ariel Garcia KIT

AMGA metadata catalogue system

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

FREE SCIENTIFIC COMPUTING

Ivane Javakhishvili Tbilisi State University High Energy Physics Institute HEPI TSU

LCG Installation LCFGng

Data Grid Infrastructure for YBJ-ARGO Cosmic-Ray Project

Status of KISTI Tier2 Center for ALICE

Architecture Proposal

The LHC Computing Grid

Security Service Challenge & Security Monitoring

Towards sustainability: An interoperability outline for a Regional ARC based infrastructure in the WLCG and EGEE infrastructures

XRAY Grid TO BE OR NOT TO BE?

Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008

The EGEE-III Project Towards Sustainable e-infrastructures

Grid Infrastructure For Collaborative High Performance Scientific Computing

ALHAD G. APTE, BARC 2nd GARUDA PARTNERS MEET ON 15th & 16th SEPT. 2006

Features and Future. Frédéric Hemmer - CERN Deputy Head of IT Department. Enabling Grids for E-sciencE. BEGrid seminar Brussels, October 27, 2006

Regional SEE-GRID-SCI Training for Site Administrators Institute of Physics Belgrade March 5-6, 2009

Grids and Security. Ian Neilson Grid Deployment Group CERN. TF-CSIRT London 27 Jan

Pan-European Grid einfrastructure for LHC Experiments at CERN - SCL's Activities in EGEE

Evolution of SAM in an enhanced model for Monitoring WLCG services

EGEE (JRA4) Loukik Kudarimoti DANTE. RIPE 51, Amsterdam, October 12 th, 2005 Enabling Grids for E-sciencE.

HEP Grid Activities in China

Middleware-Tests with our Xen-based Testcluster

Access the power of Grid with Eclipse

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. Presented by Manfred Alef Contributions of Jos van Wezel, Andreas Heiss

E UFORIA G RID I NFRASTRUCTURE S TATUS R EPORT

Argus: The Simplified Policy Language

Implementing GRID interoperability

The impact and adoption of GLUE 2.0 in the LCG/EGEE production Grid

ARC integration for CMS

ATLAS Tier-3 UniGe

EGEE II - Network Service Level Agreement (SLA) Implementation

Improving Grid User's Privacy with glite Pseudonymity Service

The EU DataGrid Testbed

Grid Scheduling Architectures with Globus

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

glite Middleware Usage

The INFN Tier1. 1. INFN-CNAF, Italy

SLCS and VASH Service Interoperability of Shibboleth and glite

MyProxy Server Installation

Geographical failover for the EGEE-WLCG Grid collaboration tools. CHEP 2007 Victoria, Canada, 2-7 September. Enabling Grids for E-sciencE

Recent Evolutions of GridICE: a Monitoring Tool for Grid Systems

Setup Desktop Grids and Bridges. Tutorial. Robert Lovas, MTA SZTAKI

The LHC Computing Grid

Gergely Sipos MTA SZTAKI

Advanced Job Submission on the Grid

DESY. Andreas Gellrich DESY DESY,

EUROPEAN MIDDLEWARE INITIATIVE

Grid Authentication and Authorisation Issues. Ákos Frohner at CERN

A scalable storage element and its usage in HEP

CMS Tier-2 Program for user Analysis Computing on the Open Science Grid Frank Würthwein UCSD Goals & Status

Grid Operation at Tokyo Tier-2 Centre for ATLAS

Monitoring System for the GRID Monte Carlo Mass Production in the H1 Experiment at DESY

ELFms industrialisation plans

The Wuppertal Tier-2 Center and recent software developments on Job Monitoring for ATLAS

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

MSG: An Overview of a Messaging System for the Grid

ARDA status. Massimo Lamanna / CERN. LHCC referees meeting, 28 June

Enabling Grids for E-sciencE. A centralized administration of the Grid infrastructure using Cfengine. Tomáš Kouba Varna, NEC2009.

Enabling Grids for E-sciencE. EGEE security pitch. Olle Mulmo. EGEE Chief Security Architect KTH, Sweden. INFSO-RI

The European DataGRID Production Testbed

Update on AbuseHelper

Interconnect EGEE and CNGRID e-infrastructures

Introduction to Grid Infrastructures

Outline. ASP 2012 Grid School

Summer Student Work: Accounting on Grid-Computing

Philippe Charpentier PH Department CERN, Geneva

Grid technologies, solutions and concepts in the synchrotron Elettra

Installation of CMSSW in the Grid DESY Computing Seminar May 17th, 2010 Wolf Behrenhoff, Christoph Wissing

Bob Jones. EGEE and glite are registered trademarks. egee EGEE-III INFSO-RI

The PanDA System in the ATLAS Experiment

EELA Operations: A standalone regional dashboard implementation

How to use the Grid for my e-science

where the Web was born Experience of Adding New Architectures to the LCG Production Environment

EUROPEAN MIDDLEWARE INITIATIVE

PERFORMABILITY ASPECTS OF THE ATLAS VO; USING LMBENCH SUITE

QosCosGrid Middleware

Distributed Monte Carlo Production for

Transcription:

Monitoring tools in EGEE Piotr Nyczyk CERN IT/GD Joint OSG and EGEE Operations Workshop - 3 Abingdon, 27-29 September 2005 www.eu-egee.org

Kaleidoscope of monitoring tools Monitoring for operations Covered areas Missing parts Conclusions Outline 2

Kaleidoscope of monitoring tools GStat - information system monitor with sanity checks Site Functional Tests - simulation of a real job with most painful features Certificate lifetime monitor - host certificates of service machines GOC Job Monitor - cross-check of job submission RBs CEs GoogleMaps monitor GridIce - fabric level monitoring CIC-on-duty Dashboard - integrated view for operations User job monitoring - set of tools: job status changes, WN statistics, stdout access GridFTP log monitoring - throughput between SEs GOC DB - not a monitoring tool, but gives static information and is a source for monitoring tools 3

GStat - services 4

SFT - VO specific tests 5

GoogleMaps 6

GridIce Sensor New sensor to measure a set of summary information about the resources that a site offers to the Grid the sensor runs only in the CE Metrics: Real number of WNs managed by an LRMS that are accessible by Grid queues List of nodes that are DOWN Real number of JobSlots managed by an LRMS (a JobSlot that is shared among different queues will be counted once Average site CPU & Memory usage Refactored sensor for Job Monitoring (much more light to run) Server: new flash-based dynamic map with different zoom level glite integration (support for mixed testbed) Improvements in privacy and security 7

CIC dashboard - integration 8

GridFTP log monitor 9

Monitoring for operations What we are using monitoring for? to spot the problems (or potential problems) that require actions (CIC-onduty, ROCs) to measure things and give motivation for improvements How do we do this? NOT by looking at them at the same time for the whole day!!! In CIC-on-duty: CIC-on-duty Dashboard - SFT and GStat + ticketing system + communication tool, other tools in special cases FCR - removes from BDII sites that fail critical tests Other cases:... anyway we need something that gives integrated, correlated and processed view How to integrate the monitoring information? idea of a common bus - R-GMA But! R-GMA is not enough without common data schema which allows for synthesis First steps taken: SFT and GStat are publishing results to R-GMA using the same data schema for summary information 10

Covered areas In day-to-day operations: essential functionality of computing resources (+ client software on WNs) - SFT consistency of the information system - GStat validity of host certificates on the service nodes - Cert. Monitor partially RBs (just some of them) - GOC Job Monitor indirectly other central services Other: fabric level information: CPU loads, running processes, etc. - GridIce amount of data flowing in the grid - GridFTP log monitoring job monitoring - useful for end users VO specific monitoring: SFT as a basic framework for VO specific tests thanks to FCR and customized SFT VO can select sites using results from official dteam results but also VO specific tests 11

Missing parts No integrated alarm system suggestion: work on integration with R-GMA, use rule-based summary generator and something like CIC-on-duty dashboard, in future notification (email, sms) No direct monitoring of central services: RB, BDII, RLS, LFC, FTS... observed by many people as for example SFT failure (all sites red or orange) monitoring tools have to be written BUT! keeping in mind integration with alarm system No real monitoring of storage resources SFT is testing SEs indirectly only and not all of them are covered GridFTP log monitor: information on quantity and not quality How do we want to monitor the monitoring tools? :-) 12

Conclusions Some monitoring tools still need to be written - a lot of development and integration work We definitely need a common data schema for monitoring information to integrate But we already have covered a lot of areas - time for integration and alarm system 13