Bookkeeping and submission tools prototype. L. Tomassetti on behalf of distributed computing group

Similar documents
Distributed production managers meeting. Armando Fella on behalf of Italian distributed computing group

Troubleshooting Grid authentication from the client side

Grid services. Enabling Grids for E-sciencE. Dusan Vudragovic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia

glite Grid Services Overview

Workload Management. Stefano Lacaprara. CMS Physics Week, FNAL, 12/16 April Department of Physics INFN and University of Padova

EUROPEAN MIDDLEWARE INITIATIVE

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

The glite middleware. Ariel Garcia KIT

GRID COMPUTING APPLIED TO OFF-LINE AGATA DATA PROCESSING. 2nd EGAN School, December 2012, GSI Darmstadt, Germany

A GENERAL PURPOSE SUITE FOR JOB MANAGEMENT, BOOKKEEPING, AND GRID SUBMISSION

Troubleshooting Grid authentication from the client side

Overview of HEP software & LCG from the openlab perspective

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

Grid Authentication and Authorisation Issues. Ákos Frohner at CERN

Service Availability Monitor tests for ATLAS

Grid Computing. Olivier Dadoun LAL, Orsay. Introduction & Parachute method. Socle 2006 Clermont-Ferrand Orsay)

The glite middleware. Presented by John White EGEE-II JRA1 Dep. Manager On behalf of JRA1 Enabling Grids for E-sciencE

LCG-2 and glite Architecture and components

Geographical failover for the EGEE-WLCG Grid collaboration tools. CHEP 2007 Victoria, Canada, 2-7 September. Enabling Grids for E-sciencE

Status of KISTI Tier2 Center for ALICE

AMGA metadata catalogue system

Monitoring the Usage of the ZEUS Analysis Grid

Grid Infrastructure For Collaborative High Performance Scientific Computing

Beob Kyun KIM, Christophe BONNAUD {kyun, NSDC / KISTI

Implementing GRID interoperability

DESY. Andreas Gellrich DESY DESY,

The Grid: Processing the Data from the World s Largest Scientific Machine

LHCb Distributed Conditions Database

Comparative evaluation of software tools accessing relational databases from a (real) grid environments

HEP Grid Activities in China

Eurogrid: a glideinwms based portal for CDF data analysis - 19th January 2012 S. Amerio. (INFN Padova) on behalf of Eurogrid support group

Gergely Sipos MTA SZTAKI

The EU DataGrid Testbed

The CORAL Project. Dirk Düllmann for the CORAL team Open Grid Forum, Database Workshop Barcelona, 4 June 2008

The PanDA System in the ATLAS Experiment

Monte Carlo Production on the Grid by the H1 Collaboration

PoS(ACAT)020. Status and evolution of CRAB. Fabio Farina University and INFN Milano-Bicocca S. Lacaprara INFN Legnaro

( PROPOSAL ) THE AGATA GRID COMPUTING MODEL FOR DATA MANAGEMENT AND DATA PROCESSING. version 0.6. July 2010 Revised January 2011

ALHAD G. APTE, BARC 2nd GARUDA PARTNERS MEET ON 15th & 16th SEPT. 2006

Monitoring System for the GRID Monte Carlo Mass Production in the H1 Experiment at DESY

The LGI Pilot job portal. EGI Technical Forum 20 September 2011 Jan Just Keijser Willem van Engen Mark Somers

CHIPP Phoenix Cluster Inauguration

Grids and Security. Ian Neilson Grid Deployment Group CERN. TF-CSIRT London 27 Jan

EMI Deployment Planning. C. Aiftimiei D. Dongiovanni INFN

AGATA Analysis on the GRID

Advanced Job Submission on the Grid

On the employment of LCG GRID middleware

Distributed Computing Framework. A. Tsaregorodtsev, CPPM-IN2P3-CNRS, Marseille

Edinburgh (ECDF) Update

Ganga The Job Submission Tool. WeiLong Ueng

Grid Computing. Olivier Dadoun LAL, Orsay Introduction & Parachute method. APC-Grid February 2007

The INFN Tier1. 1. INFN-CNAF, Italy

MyProxy Server Installation

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Introduction to Grid Infrastructures

Benchmarking the ATLAS software through the Kit Validation engine

EGEE and Interoperation

The Grid. Processing the Data from the World s Largest Scientific Machine II Brazilian LHC Computing Workshop

A Login Shell interface for INFN-GRID

Analisi Tier2 e Tier3 Esperienze ai Tier-2 Giacinto Donvito INFN-BARI

Prototype DIRAC portal for EISCAT data Short instruction

CernVM-FS. Catalin Condurache STFC RAL UK

Client tools know everything

The National Analysis DESY

At present we use several collaboration (web) tools, like SuperB website Wiki SVN Document management system etc.

Understanding StoRM: from introduction to internals

Architecture Proposal

Monitoring the ALICE Grid with MonALISA

CMS HLT production using Grid tools

The LHC Computing Grid

Introduction to SciTokens

GSI Online Credential Retrieval Requirements. Jim Basney

PoS(EGICF12-EMITC2)106

PoS(ACAT2010)039. First sights on a non-grid end-user analysis model on Grid Infrastructure. Roberto Santinelli. Fabrizio Furano.

g-eclipse A Framework for Accessing Grid Infrastructures Nicholas Loulloudes Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac.

FREE SCIENTIFIC COMPUTING

GLOBUS TOOLKIT SECURITY

High Performance Computing Course Notes Grid Computing I

GRID COMPANION GUIDE

Grid Data Management

Data Grid Infrastructure for YBJ-ARGO Cosmic-Ray Project

Distributed Data Management with Storage Resource Broker in the UK

Integration of Cloud and Grid Middleware at DGRZR

Locate your Advanced Tools and Applications

SPGrid Efforts in Italy

Computing. DOE Program Review SLAC. Rainer Bartoldus. Breakout Session 3 June BaBar Deputy Computing Coordinator

CMS Tier-2 Program for user Analysis Computing on the Open Science Grid Frank Würthwein UCSD Goals & Status

Ivane Javakhishvili Tbilisi State University High Energy Physics Institute HEPI TSU

Interconnect EGEE and CNGRID e-infrastructures

DIRAC distributed secure framework

Grid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms

The European DataGRID Production Testbed

UNICORE Globus: Interoperability of Grid Infrastructures

Philippe Charpentier PH Department CERN, Geneva

Architecture of the WMS

Austrian Federated WLCG Tier-2

AutoPyFactory: A Scalable Flexible Pilot Factory Implementation

Gridbus Portlets -- USER GUIDE -- GRIDBUS PORTLETS 1 1. GETTING STARTED 2 2. AUTHENTICATION 3 3. WORKING WITH PROJECTS 4

GROWL Scripts and Web Services

Using the MyProxy Online Credential Repository

Transcription:

Bookkeeping and submission tools prototype L. Tomassetti on behalf of distributed computing group

Outline General Overview Bookkeeping database Submission tools (for simulation productions) Framework Design and Grid Services Latest developments, future work Planning 2

Overview The production tools consist of a bookkeeping database a set of submission tools to the Grid a Web Portal to manage and monitor submissions Development started in 2009 3

Overview Initial version devoted to FullSim, just one database table and a simple submission script to LSF @CNAF (summer 2009) Then, FastSim included and initial Web interface provided LSF @CNAF only (Winter 2009) First tagged version in Spring 2010 with FastSim submissions to the Grid In Summer 2010 requests and shift interface added 4

Overview The underlying idea has been to minimize the platform requirements in terms of: hardware and human resources, custom developments, Grid service customization and configuration. Keywords: standard and simple The prototype framework aimed at supporting the Detector TDR activities 5

Overview Requirements: superbvo.org enabled and software installed at sites Input files O(10GB), Output files O(1GB), RAM O(1GB) 1 Grid UI, 1 framework head node (it is the same at present) 1 SE to store the results (central data repository) 6

Bookkeeping database Job description metadata (job parameters,...) Status and system metadata Back-end for all the framework sub-services Jobs running on remote sites communicate to the database via a RESTful interface (GSI proxy + SSL) Database schema partially configurable 7

Head node Bookkeeping database Common Base Schema (Job, Input, Output, Site,... tables) Web Interface related Schema (Job parameters, envvars,...) Customized Bookkeeping Schema (Production, Request,...) VO-specific Schemas VO-specific Schemas 8

Bookkeeping database At present, relational data model is in use MySQL 5.0.77 (SL5 default) Upgrading test to 5.5.10 in next weeks Evaluation of No-SQL databases for specific purposes in progress (no plans to abandon relational model) rdbms agnostic design both schema and interface (tests with PostgreSQL in progress) 9

Grid services Workload Management System (WMS) Job brokering service and middleware interoperability element Virtual Organization Membership Service (VOMS) Authentication, Authorization and Accounting system LHC Computing Grid File Catalog (LFC, LCG-Utils) File metadata catalog and data-handling solution GANGA Job management system (submissions) Storage Resource Manager (SRM V2.2) Data access protocol/layer 10

Grid services conf The single head node requires: glite UI, Framework installation (GANGA, MySQL, Apache) LFC server: VO shared, standard, per site name space VOMS: default conf. (Production Software manager, standard user) WMS: CE-queue filter and middleware interop. LCG-Utils: Input (site SE to WN) / Output (WN to Repository) transfers 11

Production Framework Central EGI service site Remote Site Web Interface Job and metadata management tool for submissions Database system to store the distributed production Metadata update Job Submission RESTful interface Output Transfer VOMS SE metadata (bookkeeping) WMS UI GANGA Bookkeeping Database Central Site SRM LFC 12

Job structure Job for the Grid is a script (Bash or python) It takes care of environment preparation, database communications, file transfers, failover conditions,... VO executable launched within the script - FastSim (PacProduction) only at present - FullSim (Bruno) to be included 13

Job workflow Site Pre-Production Input files and sw transfers to site SEs Job preparation with the Framework Web Interface; launch of submission script from UI or AUTOMATIC from Portal (NEW FEATURE) GANGA bulk submission via WMS Site Site Site WMS Site CE DB (bookkeeping) REST Remote Site WN Job status, wct update, output registration, and log transfer SE (input file) LFC Output file(s) transfer Metadata update 14 UI Job Submission GANGA Framework Web Interface Central Site SE (central data repository)

Job script Failover chain on output transfers: lcg-cr lcg-cp [rf] lcg-cp --nobdii [rf] globus-url-copy [rf] RESTful communication client-side: curl, server-side: php layer proxy authentication using mod_ssl and mod_gridsite curl -v --cert $X509_USER_PROXY --key $X509_USER_PROXY --capath /etc/grid-security/certificates https://bbr-serv09.cr.cnaf.infn.it:8080/restcert1/index.php 15

Web Portal Production initialization Setup for a new production, bookkeeping database initialization Monitor system Report generator Parametric search engine on job metadata, logfile analysis sub-system, job status, job analysis tool,... Production report generator with summaries on parameters, site usage and status graphs,... automatic submission to all the available sites, automatic request selection based on progress Submission interface for shift takers (parametric) Submission interface for experts (custom) Request definition elog system Interface for specifying new production requests (set of job parameters) Fine-grained selection of job submission parameters 16

Web Portal 17

Submissions Web UI Login Same credentials as SVN repo LDAP SuperB Productions managed with the customized Framework Full CNAF [batch system] Fast Request Site Automatic selection Indirect sensor Set of users with valid X509 certificates and grid-proxies Automatic preparation of job scripts, configuration files and bookkeeping entries Manual submission (AUTOMATIC submission via myproxy) IN2P3 RAL PISA QMUL LNL BARI CALTECH SLAC GRIF VICTORIA TORINO CAGLIARI RALPP NAPOLI INFN Tier-1 site hosts the head node 18

Automatic Submission issue: proxy certificate must be personal; apache (httpd user) should get (securely) a user Proxy of the submitting user MyProxy Delegation feature in addition to Renewal already in use Each user (VOMS Production Manager role) must store into MyProxy server his Proxy in two ways (until personal certificate expiration): with a Username (== LDAP uid), w/o password, retrievable by bbrserv09 only (apache user) [delegation] with DN, w/o password, retrievable by his own (expiring) proxy [renewal] apache user can retrieve the proxy stored in MyProxy by using his LDAP uid and then execute the submission scripts using the retrieved user proxy - user proxy is then destroyed 19 Idea discussed with F. Giacomini and V. Ciaschini (CNAF)

Automatic Submission myproxy-init -l LDAP_UID -d -a -n -t 24 -c 0 -R /C=IT/O=INFN/OU=Host/L=CNAF/CN=bbr-serv09.cr.cnaf.infn.it myproxy-init -d -n -v -t 24 -c 0 MyProxy Server Production Manager ONCE user Proxy w/ UNAME user Proxy w/ DN LDAP login WebUI Portal (bbr-serv09.cr.cnaf.infn.it) Job Preparation Job Submission myproxy-logon -l LDAP_UID -n --voms superbvo.org:/ superbvo.org/ Role=ProductionManager -t 24 user Proxy Retrieve user Proxy Renewal GANGA Submission Job GRID 20

Remote sites 15 (+5) sites, two flavors (EGI, OSG) Tier-1 INFN-T1 (Bologna, Italy), IN2P3-CC (Lyon, France), RAL-LCG2 (Oxford, UK) Tier-2 UKI-LT2-QMUL (London, UK), UKI-SOUTHGRID-RALPP (London, UK), UKI-SOUTHGRID-OX-HEP (Oxford, UK), GRIF (Orsay, France), SLAC (Stanford, USA), CIT-CMS-T2B / CIT-HEP-CE (Caltech, USA), OSC (Ohio, USA), VICTORIA-LCG2 (Victoria, Canada), INFN-BARI (Bari, Italy), INFN-CAGLIARI (Cagliari, Italy), INFN-FERRARA (Ferrara, Italy), INFN-LNL-2 (Legnaro, Italy), INFN-MILANO (Milano, Italy), INFN-NAPOLI (Napoli, Italy), INFN-PISA (Pisa, Italy), INFN-TORINO (Torino, Italy) PL-GRID (Poland) More sites being enabled: Ohio OSC, Oxford Tier-2, INFN-Ferrara, INFN- Milano, PL-Grid,... 21

Results September 2010: large production cycle > 11 x 10 9 events (about 180 kjobs in 4 weeks) Peak rate of 7000 simultaneous jobs running on the Grid (3.5k on average) about 8% failure rate executable errors (0.5%), site misconfiguration (2.0%), proxy expiration (4.0%), central site overload during file transfers (1.5%) about 195 years of WCT 22

Latest Development Automatic Submission (to the Grid) Bookkeeping database tuning: use of triggers to speed-up statistic calculations Use of template engine for code/layout separation (in progress) JQuery upgrade, datatable plug-in, updated Google API for charting Python script, v2 and v3 (testing phase) Target site per request (replica at CNAF) 23

Future Work Bookkeeping database replication (beginning with master/slave configuration) Monitor integration with LB service (through direct queries) both on demand and with a daemon Integration of packaged FullSim Enabling FullSim request interface + submission to Grid 24

Planning Do we have new feature requests for the production tools? Next FastSim productions... Next FullSim productions... [do we really need Grid exploitation at present?] 25

Conclusions The SuperB productions proved that the prototype framework is reliable and efficient Customization of the Web interface, the bookkeeping database, and the site requirements have been the key points to achieve the goal 26

Collaboration A. Fella E. Luppi M. Manzali L. Tomassetti M. Favaro G. Fontana C. Luzzi M. Ronzano L. Vettorello students from Ferrara E. Vianello 27

Thank You