ARDA status. Massimo Lamanna / CERN. LHCC referees meeting, 28 June

Similar documents
The ARDA project: Grid analysis prototypes of the LHC experiments

AMGA metadata catalogue system

Andrea Sciabà CERN, Switzerland

Grid services. Enabling Grids for E-sciencE. Dusan Vudragovic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia

glite Grid Services Overview

LCG-2 and glite Architecture and components

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

On the employment of LCG GRID middleware

The LHC Computing Grid

EGEE and Interoperation

Overview of HEP software & LCG from the openlab perspective

Workload Management. Stefano Lacaprara. CMS Physics Week, FNAL, 12/16 April Department of Physics INFN and University of Padova

where the Web was born Experience of Adding New Architectures to the LCG Production Environment

Philippe Charpentier PH Department CERN, Geneva

Status of KISTI Tier2 Center for ALICE

DIRAC pilot framework and the DIRAC Workload Management System

The EU DataGrid Testbed

Eclipse Technology Project: g-eclipse

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

PROOF-Condor integration for ATLAS

The glite middleware. Presented by John White EGEE-II JRA1 Dep. Manager On behalf of JRA1 Enabling Grids for E-sciencE

Edinburgh (ECDF) Update

Grids and Security. Ian Neilson Grid Deployment Group CERN. TF-CSIRT London 27 Jan

Deploying virtualisation in a production grid

EUROPEAN MIDDLEWARE INITIATIVE

Future Developments in the EU DataGrid

The Global Grid and the Local Analysis

Scalable Computing: Practice and Experience Volume 10, Number 4, pp

ALICE Grid/Analysis Tutorial Exercise-Solutions

Troubleshooting Grid authentication from the client side

The European DataGRID Production Testbed

Monitoring the Usage of the ZEUS Analysis Grid

AliEn Resource Brokers

Distributed Computing Grid Experiences in CMS Data Challenge

Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008

Grid Infrastructure For Collaborative High Performance Scientific Computing

LHCb Computing Strategy

The LHC Computing Grid

CMS HLT production using Grid tools

Bookkeeping and submission tools prototype. L. Tomassetti on behalf of distributed computing group

LCG User Registration & VO management

Considerations for a grid-based Physics Analysis Facility. Dietrich Liko

Analisi Tier2 e Tier3 Esperienze ai Tier-2 Giacinto Donvito INFN-BARI

CHIPP Phoenix Cluster Inauguration

Interoperating AliEn and ARC for a distributed Tier1 in the Nordic countries.

Ganga The Job Submission Tool. WeiLong Ueng

The PanDA System in the ATLAS Experiment

ARC integration for CMS

The CORAL Project. Dirk Düllmann for the CORAL team Open Grid Forum, Database Workshop Barcelona, 4 June 2008

Monte Carlo Production on the Grid by the H1 Collaboration

CMS Tier-2 Program for user Analysis Computing on the Open Science Grid Frank Würthwein UCSD Goals & Status

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

PoS(EGICF12-EMITC2)106

DESY. Andreas Gellrich DESY DESY,

Monitoring HTCondor with the BigPanDA monitoring package

Architecture Proposal

Astrophysics and the Grid: Experience with EGEE

Grid Challenges and Experience

High Throughput WAN Data Transfer with Hadoop-based Storage

Austrian Federated WLCG Tier-2

Ganga - a job management and optimisation tool. A. Maier CERN

FREE SCIENTIFIC COMPUTING

The INFN Tier1. 1. INFN-CNAF, Italy

WMS overview and Proposal for Job Status

The Grid. Processing the Data from the World s Largest Scientific Machine II Brazilian LHC Computing Workshop

g-eclipse A Framework for Accessing Grid Infrastructures Nicholas Loulloudes Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac.

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

DIRAC data management: consistency, integrity and coherence of data

First Experience with LCG. Board of Sponsors 3 rd April 2009

RUSSIAN DATA INTENSIVE GRID (RDIG): CURRENT STATUS AND PERSPECTIVES TOWARD NATIONAL GRID INITIATIVE

LHCb Distributed Conditions Database

Troubleshooting Grid authentication from the client side

PoS(ACAT2010)039. First sights on a non-grid end-user analysis model on Grid Infrastructure. Roberto Santinelli. Fabrizio Furano.

Grid Architectural Models

THEBES: THE GRID MIDDLEWARE PROJECT Project Overview, Status Report and Roadmap

RELEASE OF GANGA. Basics and organisation What Ganga should do tomorrow Ganga design What Ganga will do today Next steps

VMs at a Tier-1 site. EGEE 09, Sander Klous, Nikhef

ALHAD G. APTE, BARC 2nd GARUDA PARTNERS MEET ON 15th & 16th SEPT. 2006

Introduction to Grid Infrastructures

Batch Services at CERN: Status and Future Evolution

Interconnect EGEE and CNGRID e-infrastructures

HEP Grid Activities in China

Grid Interoperation and Regional Collaboration

Comparative evaluation of software tools accessing relational databases from a (real) grid environments

Features and Future. Frédéric Hemmer - CERN Deputy Head of IT Department. Enabling Grids for E-sciencE. BEGrid seminar Brussels, October 27, 2006

EUROPEAN MIDDLEWARE INITIATIVE

The grid for LHC Data Analysis

Distributed Computing Framework. A. Tsaregorodtsev, CPPM-IN2P3-CNRS, Marseille

Challenges of the LHC Computing Grid by the CMS experiment

EGI-InSPIRE. Security Drill Group: Security Service Challenges. Oscar Koeroo. Together with: 09/23/11 1 EGI-InSPIRE RI

GRID COMPUTING APPLIED TO OFF-LINE AGATA DATA PROCESSING. 2nd EGAN School, December 2012, GSI Darmstadt, Germany

Enabling Grids for E-sciencE. EGEE security pitch. Olle Mulmo. EGEE Chief Security Architect KTH, Sweden. INFSO-RI

DIRAC File Replica and Metadata Catalog

ISTITUTO NAZIONALE DI FISICA NUCLEARE

The University of Oxford campus grid, expansion and integrating new partners. Dr. David Wallom Technical Manager

( PROPOSAL ) THE AGATA GRID COMPUTING MODEL FOR DATA MANAGEMENT AND DATA PROCESSING. version 0.6. July 2010 Revised January 2011

Experience with Data-flow, DQM and Analysis of TIF Data

The glite middleware. Ariel Garcia KIT

UW-ATLAS Experiences with Condor

Transcription:

LHCC referees meeting, 28 June 2004 ARDA status http://cern.ch/arda Massimo Lamanna / CERN www.eu-egee.org cern.ch/lcg EGEE is a project funded by the European Union under contract IST-2003-508833

LHCC referees meeting: ARDA status Massimo Lamanna 2 Contents The project in a nutshell Highlights from the 4 experiment prototypes ARDA-related workshops Conclusions

LHCC referees meeting: ARDA status Massimo Lamanna 3 ARDA in a nutshell ARDA is an LCG project whose main activity is to enable LHC analysis on the grid ARDA is coherently contributing to EGEE NA4 (using the entire CERN NA4-HEP resource) Use the grid software as it matures (EGEE project) ARDA should be the key player in the evolution from LCG2 to the EGEE infrastructure Provide early and continuous feedback (guarantee the software is what experiments expect/need) Use the last years experience/components both from Grid projects (LCG, VDT, EDG) and experiments middleware/tools (Alien, Dirac, GAE, Octopus, Ganga, Dial, ) Help in adapting/interfacing (direct help within the experiments) Every experiment has different implementations of the standard services, but: Used mainly in production environments Few expert users Coordinated update and read actions ARDA Interface with the EGEE middleware Verify (help to evolve to) such components to analysis environments Many users (Robustness might be an issue) Concurrent read actions (Performance will be more and more an issue) One prototype per experiment A Common Application Layer might emerge in future ARDA emphasis is to enable each of the experiment to do its job Provide a forum for discussion Comparison on results/experience/ideas Interaction with other projects The experiment interfaces agree with Experiment the ARDA interfaces project leader the work plan and Piergiorgio Cerello (ALICE) coordinate David the Adams activity (ATLAS) on the experiment Lucia Silvestris side (users) (CMS) Ulrik Egede (LHCb)

LHCC referees meeting: ARDA status Massimo Lamanna 4 LHCb The LHCb system within ARDA uses GANGA as main component. The LHCb/GANGA plans: enable physicists (via GANGA) to analyse the data being produced during 2004 for their studies It naturally matches the ARDA mandate Deploy the prototype where the LHCb data will be the essential (CERN, RAL, ) At the beginning, the emphasis will be to validate the tool focusing on usability, validation of the splitting and merging functionality for users jobs DIRAC (LHCb production grid): convergence with GANGA / components / experience Grid activity: Use of the Glite testbed (since May 18 th ) Test jobs from Ganga to Glite Other contributions: GANGA interface to Condor (Job submission) and Condor DAGMAN for splitting/merging and error recovery GANGA Release management and software process LHCb Metadata catalogue tests Performance tests Collaborators in Taiwan (ARDA + local DB know-how on Oracle)

LHCC referees meeting: ARDA status Massimo Lamanna 5 CMS The CMS system within ARDA is still under discussion Provide easy access (and possibly sharing) of data for the CMS users is a key issue (Data management): RefDB is the bookkeeping engine to plan and steer the production across different phases (simulation, reconstruction, to some degree into the analysis phase). This service is under test It contained all necessary information except file physical location (RLS) and info related to the transfer management system (TMDB) The actual mechanism to provide these data to analysis users is under discussion Measuring performances underway (similar philosophy as for the LHCb Metadata catalog measurements) Exploratory/preparatory activity ORCA job submission to Glite Glite file catalog Reconstruction instructions RefDB McRunjob Tapes Reconstruction jobs Reconstructed data Summaries of successful jobs T0 worker nodes GDB castor pool Export Buffers Reconstructed data RefDB in CMS DC04 Checks what has arrived Updates RLS Transfer agent TMDB Updates

LHCC referees meeting: ARDA status Massimo Lamanna 6 ATLAS The ATLAS system within ARDA has been agreed ATLAS has a complex strategy for distributed analysis, addressing different area with specific projects (www.usatlas.bnl.gov/ada) Starting point is: DIAL analysis model (high level web services) The AMI metadata catalog is a key component Robustness and performance tests from ARDA Very good relationship with the ATLAS Grenoble group Discussions on technology (EGEE JRA1 in the loop) In the start up phase, ARDA provided help in developing ATLAS production tools Submission to Glite (via simplified DIAL system now possible ) First skeleton of high level services AMI Tests

LHCC referees meeting: ARDA status Massimo Lamanna 7 ALICE Strategy: The ALICE/ARDA will evolve the ALICE analysis system (SuperComputing 2003) Where to improve: Strong requests on networking (inbound connectivity) Heavily connected with the middleware services Inflexible configuration No chance to use PROOF on federated grids like LCG in AliEn User libraries distribution Activity on PROOF Robustness and Error recovery Grid activity: First contact with the Glite testbed C++ access library on Glite PROOF SLAVES PROOF Site A PROOF MASTER SERVER USER SESSION PROOF SLAVES PROOF SLAVES Site B Site C

LHCC referees meeting: ARDA status Massimo Lamanna 8 Site <X> Proofd Startup PROOF SLAVE SERVERS Site A Proofd Rootd Forward Proxy Forward Proxy Status report New Elements an Optional Site Gateway Only outgoing connectivity as a demo Slave Registration/ Booking- DB Grid Service Interfaces PROOF SLAVE SERVERS Site B Slave ports mirrored on Master host during the TGrid UI/Queue UI Master Setup Grid Access Control Service PROOF Steer Grid/Root Authentication workshop Grid File/Metadata Catalogue Booking Request with logical file names Client retrieves list of logical file (LFN + MSN) Grid-Middleware independend PROOF Setup PROOF Master Standard Proof Session Master PROOF Client A. Peters presentation Client

The first 30 days of the EGEE middleware ARDA workshop LHCC referees meeting: ARDA status Massimo Lamanna 9 1 st ARDA workshop (January 2004 at CERN; open) Closed meeting 2 nd ARDA workshop (June 21-23 at CERN; by invitation) Focus on the experiments The first 30 days of EGEE middleware Main focus on LHC experiments and EGEE JRA1 (Glite) (more than on the MW) NA4 meeting mid July Lot of time for technical NA4/JRA1 and NA4/SA1 sessions organised by M. Lamanna and F. Harris discussion EGEE/LCG operations new (never ingredient! enough ) 3 rd ARDA workshop (September 2004; open) The ARDA end-to-end prototypes

LHCC referees meeting: ARDA status Massimo Lamanna 10 ARDA Workshop Programme http://agenda.cern.ch/fullagenda.php?ida=a042197&stylesheet=tools/printable&dl=&dd=

The first 30 days of the EGEE middleware ARDA workshop LHCC referees meeting: ARDA status Massimo Lamanna 11 Effectively, this is the 2 nd workshop (January 04 workshop) Given the new situation: Glite middleware becoming available LCG ARDA project started Experience + need of technical discussions New format: Small (30 participants vs 150 in January) To have it small, by invitation only ARDA team + experiments interfaces EGEE Glite team (selected persons) Experiments technical key persons Technology experts NA4/EGEE links (4 persons) EGEE PTF chair Info on the web: URL:http://lcg.web.cern.ch/LCG/peb/arda/LCG_ARDA_Workshops.htm

LHCC referees meeting: ARDA status Massimo Lamanna 12 Prototype info (F. Hemmer) A First Prototype Middleware on a testbed at CERN and Wisconsin delivered to ARDA on May 18, 2004 and to NA4/Biomed on June 15, 2004 Being integrated in SCM Being used by Testing Cluster Prototype GAS service Using Integration tools Significant contribution from University of Wisconsin Madison on Adapting the Grid Manager for interfacing to PBS/LSF Supporting and debugging the prototype Contributing to the overall design Interfacing with Globus and ISI DJRA1.2: preliminary work performed in the MW working document

LHCC referees meeting: ARDA status Massimo Lamanna 13 Prototype info (F. Hemmer) Status of glite Prototype A Initial Prototype Middleware on a testbed consisting of AliEn shell Job submission: Alien CE->Condor-G->blahp->PBS/Condor Globus Gatekeeper Data Management File catalog Service factored out of Alien, Web Service interface; WSDL to be done Castor & D-Cache SE with SRM gridftp for transfers AliEn FTD Aiod/GFal investigations RLS (EDG) Perl RLS Soap interface for File Catalog integration Not used yet Security VOMS for certificate handling/se gridmap files (NIKHEF) MyProxy for certificate delegation in GAS GAS (Grid Access Service) Prototype with a few file cataloging/rls functions R-GMA With new API; not used yet Being integrated in SCM

LHCC referees meeting: ARDA status Massimo Lamanna 14 Prototype info (F. Hemmer) Next set of components to be added or changed Workload Management Initial prototype WMS components supporting job submission and control, the handling of data requirements via RLS and POOL catalog queries, the ability for CEs to request jobs, all while keeping LCG-2 compatibility. Information Services R-GMA with new API Redesign of Service/ServiceStatus tables and publishing mechanism SE Finish File I/O design, integrate of AIO and GFAL with the security libraries, first prototype I/O system File Transfer Service Integrate Condor Stork soon (needs to implement SRM interface) File Catalog WSDL interface, clients in other languages Replica Catalog Re-factored RLS, integrated with File Access Service Metadata Catalog Initial implementation ready to be integrated in two weeks Grid Access service security model

LHCC referees meeting: ARDA status Massimo Lamanna 15 ARDA and the prototype May 18 First presentation of prototype to ARDA members June 2 June 9 Obtaining certificates from CERN CA and registering for EGEE VO most ARDA members cannot log in due to a registration/login name issue of an organizational nature. Username problems solved for ARDA members. Learning to use the system. Actual testing took place during ~ 2 weeks D. Feichtinger presentation

LHCC referees meeting: ARDA status Massimo Lamanna 16 Testbed More resources (boxes and sites) absolutely needed. Strategic sites (Key Tier1s VOMS/myProxy (lxn5210) for the experiments) should start planning to join GAS (lxn5216) Database, proxy, ldap (lxn5220) Core services (lxn5219) CE (lxn5210) Source: http://egee-jra1.web.cern.ch/egee-jra1/prototype/testbed.htm WN (lxn5211) Issues: SE dcache (lxn5208) Synchronise with the preparation work in EGEE SA1 (Operations) Licensing scheme! SE Castor (lxn5209)

LHCC referees meeting: ARDA status Massimo Lamanna 17 Glite: Tests done GAS was only cursorily tested due to its current limited functionality (simple catalogue interactions) Trivial tests exercising the basic catalogue and job execution functionality Running a few more complex tests using applications installed via AFS as a workaround for the not yet available package management. Some first mass registration and metadata tests following same procedure as has been done for the ATLAS AMI and LHCb metadata catalogues Writing of large number of files (~ 100 000) shows that storage time per files increases First simple GANGA plugin for job submission and monitoring of job status ARDA group members associated with experiments trying to get more complicated, experiment related applications to run (target is four end to end prototypes). ATHENA jobs via DIAL Work in progress for ORCA and DaVinci Developing a new C++ API and plugin for ROOT, which interacts efficiently with the present system (commands are executed via a service situated close to the core system. All transfers use a single custom encoded SOAP string)

LHCC referees meeting: ARDA status Massimo Lamanna 18 Glite: General Problems (1) Since the newest update was installed only last Friday some issues may already have been resolved. Stability of the services Tutorial documentation good, but does not provide much help for more than basic usage. Documentation does not cover the commands extensively (options to commands missing and also Syntax for some arguments) High initial latency for job execution even if queue is empty (up to 10 min.)

LHCC referees meeting: ARDA status Massimo Lamanna 19 Glite: General Problems (2) Minor issues: Online help behavior of commands not useful for unary commands. There should be a standard help flag like h. Some messages returned to the user are cryptic and have no obvious connection with the action (~ more like debug messages). Easier retrieval of a job s output sandbox Question: should users use the "debug #" command to have a better error description when they submit bugs?

LHCC referees meeting: ARDA status Massimo Lamanna 20 Infrastructure/prototype issues First contact with Glite positive Prototype approach positive Incremental changes, fast feedback cycle Workshop useful to start iterating on priorities See Derek s presentation Wish list on the current prototype installation: More resources! More sites! Improve on infrastructure Procedures should be improved Service stability Move to a pre-production service as soon as possible

LHCC referees meeting: ARDA status Massimo Lamanna 21 GAS and shell Useful concepts/tools GAS as Template service for the experiments to provide their orchestration layer Individual services accessible as such They should expose atomic operations Custom GAS can be provided by the experiment API needed Shell extension preferred to dedicated shell emulation Cfr. ALICE/ARDA demo presentation

LHCC referees meeting: ARDA status Massimo Lamanna 22 General/Architectural concerns Next month home work: List of LHC experiments priorities put together by ARDA Connections with the architecture team Strong need to see and interact at the architectural level At least agree on a mailing list Priority is on developing the new system functionality Whenever possible backward compatibility is a nice feature Investment on the experiment side Long standing issues: Outbound connectivity Functionality needed Mechanism to provided needed (not necessarily direct connectivity) Identity of user jobs running in a remote site gridmapfile / setuid is an implementation What is really needed is traceability

LHCC referees meeting: ARDA status Massimo Lamanna 23 Database related Pervasive infrastructure and/or functionality File catalogue Metadata catalogue Other catalogues (cfr. ATLAS talk) Plan for the infrastructure Probably we are already late hopefully not too late Different views on how to achieve Load balance // remote access // replication Disconnected operations Level of consistency Uniform access GSI authentication VO role mapping (authorisation)

LHCC referees meeting: ARDA status Massimo Lamanna 24 File catalog related Experimenting in ARDA has started Looks realistic that the File Catalog interface validated via independent experience Cfr. Andrei Tsaregorodtsev s LHCb talk LFN vs (GU)ID discussion Not conclusive but interesting Bulk of the experiment data store might be built with files with a GUID (and metadata) but without LFN POOL/ROOT compatible (GUID used in navigation, not LFN) Need further exploration!!! Bulk of the experiment files: WORM Working decision Other experiment-dependent mechanisms to cover modifiable files needed N.B. modifiable file = same GUID but different content

LHCC referees meeting: ARDA status Massimo Lamanna 25 Metadata database related Very many parallel initiatives! in the LHC experiments across the experiments (gridpp MDC) in the MW providers community (LCG, OGSA-DAI, Glite, ) Experiments interested in having input on technology No clear plan of convergence Web services is the way to go? Or is it just a data base problem? any GUID-Value pair system could do it Maybe GUID-GUID pairs Anatomy vs. physiology The client API does not map 1:1 onto WSDL Access method!= data transfer protocol This could trigger requests for specific client bindings

LHCC referees meeting: ARDA status Massimo Lamanna 26 Data management related Experimenting in ARDA has started Not all components Data management services needed Data management transfer services Needed! ML: Can Glite provide the TM(DB) service? Reliable file transfer is a key ingredient This functionality was implemented to fill a gap in DC04 CMS is entering a continuous production mode: room for constructive collaboration Misuse protection (cfr. ALICE talk)

LHCC referees meeting: ARDA status Massimo Lamanna 27 Application software installation (package management) All experiments have/use one or more solutions Interest is high for a common solution, but the priority is not very clear Service to manage local installation Some features: Lazy installation The job expresses what it needs, the missing part is installed Triggered for all resources Overwrite the default installation for one experiment Installed software is a resource To be advertised: could be used in the matchmaking process

LHCC referees meeting: ARDA status Massimo Lamanna 28 Conclusions Up and running Since April the 1 st (actually before of that) preparing the ground for the experiments prototype Definition of the detailed programme of work Contributions in the experiment-specific domain 3 out of 4 prototype activity started CMS prototype definition late by 1 month ( preliminary activity going on) Playing with the Glite middleware since 30 days Fruitful workshop 21-23 June Stay tuned