The ARDA project: Grid analysis prototypes of the LHC experiments

Similar documents
ARDA status. Massimo Lamanna / CERN. LHCC referees meeting, 28 June

Andrea Sciabà CERN, Switzerland

AMGA metadata catalogue system

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Ganga The Job Submission Tool. WeiLong Ueng

Considerations for a grid-based Physics Analysis Facility. Dietrich Liko

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science

PoS(ACAT2010)039. First sights on a non-grid end-user analysis model on Grid Infrastructure. Roberto Santinelli. Fabrizio Furano.

The LHC Computing Grid

Batch Services at CERN: Status and Future Evolution

Workload Management. Stefano Lacaprara. CMS Physics Week, FNAL, 12/16 April Department of Physics INFN and University of Padova

ELFms industrialisation plans

Summary of the LHC Computing Review

where the Web was born Experience of Adding New Architectures to the LCG Production Environment

LHCb Computing Strategy

Austrian Federated WLCG Tier-2

RELEASE OF GANGA. Basics and organisation What Ganga should do tomorrow Ganga design What Ganga will do today Next steps

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

EGEE and Interoperation

Grid services. Enabling Grids for E-sciencE. Dusan Vudragovic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia

Grid Challenges and Experience

The CORAL Project. Dirk Düllmann for the CORAL team Open Grid Forum, Database Workshop Barcelona, 4 June 2008

Computing for LHC in Germany

CHIPP Phoenix Cluster Inauguration

Overview of HEP software & LCG from the openlab perspective

PROOF-Condor integration for ATLAS

GRIDS INTRODUCTION TO GRID INFRASTRUCTURES. Fabrizio Gagliardi

Astrophysics and the Grid: Experience with EGEE

The LHC Computing Grid

The grid for LHC Data Analysis

LCG-2 and glite Architecture and components

The European DataGRID Production Testbed

150 million sensors deliver data. 40 million times per second

Future Developments in the EU DataGrid

Philippe Charpentier PH Department CERN, Geneva

MONTE CARLO SIMULATION FOR RADIOTHERAPY IN A DISTRIBUTED COMPUTING ENVIRONMENT

Analisi Tier2 e Tier3 Esperienze ai Tier-2 Giacinto Donvito INFN-BARI

glite Grid Services Overview

AliEn Resource Brokers

Interoperating AliEn and ARC for a distributed Tier1 in the Nordic countries.

Storage and I/O requirements of the LHC experiments

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

On the employment of LCG GRID middleware

WHEN the Large Hadron Collider (LHC) begins operation

ISTITUTO NAZIONALE DI FISICA NUCLEARE

DIRAC File Replica and Metadata Catalog

Introduction to Grid Infrastructures

Status of KISTI Tier2 Center for ALICE

Travelling securely on the Grid to the origin of the Universe

DIRAC data management: consistency, integrity and coherence of data

EGEE (JRA4) Loukik Kudarimoti DANTE. RIPE 51, Amsterdam, October 12 th, 2005 Enabling Grids for E-sciencE.

PoS(EGICF12-EMITC2)106

WMS overview and Proposal for Job Status

LCG Conditions Database Project

The PanDA System in the ATLAS Experiment

HammerCloud: A Stress Testing System for Distributed Analysis

Ganga - a job management and optimisation tool. A. Maier CERN

Streamlining CASTOR to manage the LHC data torrent

RUSSIAN DATA INTENSIVE GRID (RDIG): CURRENT STATUS AND PERSPECTIVES TOWARD NATIONAL GRID INITIATIVE

Data Management for the World s Largest Machine

ALICE Grid/Analysis Tutorial Exercise-Solutions

LHCb Distributed Conditions Database

The INFN Tier1. 1. INFN-CNAF, Italy

Computing at Belle II

Scalable Computing: Practice and Experience Volume 10, Number 4, pp

ATLAS Production System in ATLAS Data Challenge 2 Luc Goossens (CERN/EP/ATC) Kaushik De (UTA)

Grids and Security. Ian Neilson Grid Deployment Group CERN. TF-CSIRT London 27 Jan

Spanish Tier-2. Francisco Matorras (IFCA) Nicanor Colino (CIEMAT) F. Matorras N.Colino, Spain CMS T2,.6 March 2008"

Bob Jones. EGEE and glite are registered trademarks. egee EGEE-III INFSO-RI

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

The Virtual Observatory and the IVOA

Framework for Interactive Parallel Dataset Analysis on the Grid

Interconnect EGEE and CNGRID e-infrastructures

Monitoring HTCondor with the BigPanDA monitoring package

Storage Resource Sharing with CASTOR.

Benchmarking the ATLAS software through the Kit Validation engine

IEPSAS-Kosice: experiences in running LCG site

The Grid: Processing the Data from the World s Largest Scientific Machine

CMS HLT production using Grid tools

The EU DataGrid Testbed

EUROPEAN MIDDLEWARE INITIATIVE

The LHC Computing Grid. Slides mostly by: Dr Ian Bird LCG Project Leader 18 March 2008

High Throughput WAN Data Transfer with Hadoop-based Storage

UW-ATLAS Experiences with Condor

Challenges of the LHC Computing Grid by the CMS experiment

g-eclipse A Framework for Accessing Grid Infrastructures Nicholas Loulloudes Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac.

13th International Workshop on Advanced Computing and Analysis Techniques in Physics Research ACAT 2010 Jaipur, India February

Clouds in High Energy Physics

Experience with Data-flow, DQM and Analysis of TIF Data

CERN and Scientific Computing

FREE SCIENTIFIC COMPUTING

The ATLAS Distributed Analysis System

LHCb Computing Status. Andrei Tsaregorodtsev CPPM

The Global Grid and the Local Analysis

Operating the Distributed NDGF Tier-1

The ATLAS EventIndex: Full chain deployment and first operation

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

Open data and scientific reproducibility

Preparing for High-Luminosity LHC. Bob Jones CERN Bob.Jones <at> cern.ch

The ATLAS PanDA Pilot in Operation

Transcription:

RAL, 13 May 2004 http://cern.ch/arda The ARDA project: Grid analysis prototypes of the LHC experiments Massimo Lamanna ARDA Project Leader Massimo.Lamanna@cern.ch www.eu-egee.org cern.ch/lcg EGEE is a project funded by the European Union under contract IST-2003-508833

RAL, 13 May 2004-2 Contents ARDA Project Mandate and organisation ARDA activities during 2004 General pattern LHCb CMS ATLAS ALICE Conclusions and Outlook

ARDA working group recommendations: our starting point RAL, 13 May 2004-3 New service decomposition Strong influence of Alien system the Grid system developed by the ALICE experiments and used by a wide scientific community (not only HEP) Role of experience, existing technology Web service framework EGEE Middleware Interfacing to existing middleware to enable their use in the experiment frameworks Early deployment of (a series of) prototypes to ensure functionality and coherence ARDA project

RAL, 13 May 2004-4 EGEE and LCG Strong links already established between EDG and LCG. It will continue in the scope of EGEE The core infrastructure of the LCG and EGEE grids will be operated as a single service, and will grow out of LCG service LCG includes many US and Asia partners EGEE includes other sciences Substantial part of infrastructure common to both Parallel production lines as well LCG-2 2004 data challenges Pre production prototype EGEE MW ARDA playground for the LHC experiments ARDA LCG-1 LCG-2 EGEE-1 EGEE-2 VDT/EDG EGEE WS MW

RAL, 13 May 2004-5 End-to-end prototypes: why? Provide a fast feedback to the EGEE MW development team Avoid uncoordinated evolution of the middleware Coherence between users expectations and final product Experiments ready to benefit from the new MW as soon as possible Frequent snapshots of the middleware available Expose the experiments (and the community in charge of the deployment) to the current evolution of the whole system Experiments system are very complex and still evolving Move forward towards new-generation real systems (analysis!) Prototypes should be exercised with realistic workload and conditions No academic exercises or synthetic demonstrations LHC experiments users absolutely required here!!! EGEE Pilot Application A lot of work (experience and useful software) is involved in current experiments data challenges Concrete starting point Adapt/complete/refactorise the existing: we do not need another system!

RAL, 13 May 2004-6 End-to-end prototypes: how? The initial prototype will have a reduced scope Components selection for the first prototype Experiments components not in use for the first prototype are not ruled out (and used/selected ones might be replaced later on) Not all use cases/operation modes will be supported Every experiment has a production system (with multiple backends, like PBS, LCG, G2003, NorduGrid, ). We focus on end-user analysis on a EGEE MW based infrastructure Adapt/complete/refactorise the existing experiment (sub)system! Collaborative effort (not a parallel development) Attract and involve users Many users are absolutely required Informal Use Cases are still being defined, e.g.: A physicist selects a data sample (from current Data Challenges) With an example/template as starting point (s)he prepares a job to scan the data The job is split in sub-jobs, dispatched to the Grid, some error-recovery is automatically performed, merged back in a single output The output (histograms, ntuples) is returned together with simple information on the job-end status

RAL, 13 May 2004-7 ARDA @ Regional Centres Deployability is a key factor of MW success A few Regional Centres will have the responsibility to provide early installation for ARDA Understand Deployability issues Extend the ARDA test bed The ARDA test bed will be the next step after the most complex EGEE Middleware test bed Stress and performance tests could be ideally located outside CERN This is for experiment-specific components (e.g. a Meta Data catalogue) Leverage on Regional Centre local know how Data base technologies Web services Pilot sites might enlarge the resources available and give fundamental feedback in terms of deployability to complement the EGEE SA1 activity (EGEE/LCG operations) Running ARDA pilot installations Experiment data available where the experiment prototype is deployed

RAL, 13 May 2004-8 Coordination and forum activities The coordination activities would flow naturally from the fact that ARDA will be open to provide demonstration benches Since it is neither necessary nor possible that all projects could be hosted inside the ARDA experiments prototypes, some coordination is needed to ensure that new technologies can be exposed to the relevant community Transparent process ARDA should organise a set of regular meetings (one per quarter?) to discuss results, problems, new/alternative solutions and possibly agree on some coherent program of work. The ARDA project leader organises this activity which will be truly distributed and lead by the active partners ARDA is embedded in EGEE NA4 namely NA4-HEP Special relation with LCG GAG LCG forum for Grid requirements and use cases Experiments representatives coincide with the EGEE NA4 experiments representatives ARDA will channel this information to the appropriate recipients ARDA workshop (January 2004 at CERN; open; over 150 participants) ARDA workshop (June 21-23 at CERN; by invitation) The first 30 days of EGEE middleware NA4 meeting mid July (NA4/JRA1 and NA4/SA1 sessions foreseen. Organised by M. Lamanna and F. Harris) ARDA workshop (September 2004?; open)

RAL, 13 May 2004-9 People Massimo Lamanna Birger Koblitz Andrey Demichev Viktor Pose Russia Dietrich Liko Frederik Orellana ALICE Wei-Long Ueng Tao-Sheng Chen Taiwan Derek Feichtinger Andreas Peters Julia Andreeva Juha Herrala Andrew Maier Kuba Moscicki ATLAS CMS LHCb Experiment interfaces Piergiorgio Cerello (ALICE) David Adams (ATLAS) Lucia Silvestris (CMS) Ulrik Egede (LHCb)

RAL, 13 May 2004-10 Example of activity Existing system as starting point Every experiment has different implementations of the standard services Used mainly in production environments Few expert users Coordinated update and read actions ARDA Interface with the EGEE middleware Verify (help to evolve to) such components to analysis environments Many users» Robustness Concurrent read actions» Performance One prototype per experiment A Common Application Layer might emerge in future ARDA emphasis is to enable each of the experiment to do its job Milestone Date Description Very very soon 1.x.1 May 2004 E2E x prototype definition agreed with the experiment 1.x.2 September 2004 E2E x prototype using basic EGEE middleware 1.x.3 November 2004 E2E x prototype improved functionality 1.x December 2004 E2E prototype for experiment x, capable of analysis Already started 2.x December 2005 E2E prototype for experiment x, capable of analysis and production

RAL, 13 May 2004-11 LHCb The LHCb system within ARDA uses GANGA as principal component (see next slide). The LHCb/GANGA plans: enable physicists (via GANGA) to analyse the data being produced during 2004 for their studies It naturally matches the ARDA mandate Have the prototype where the LHCb data will be the key At the beginning, the emphasis will be to validate the tool focusing on usability, validation of the splitting and merging functionality for users jobs The DIRAC system (LHCb grid system, used mainly in production so far, could be a useful playground to understand the detailed behaviour of some components, like the file catalog)

RAL, 13 May 2004-12 GANGA Gaudi/Athena and Grid Alliance Gaudi/Athena: LHCb/ATLAS frameworks The Athena uses Gaudi as a foundation Single desktop for a variety of tasks Help configuring and submitting analysis jobs Keep track of what they have done, hiding completely all technicalities Resource Broker, LSF, PBS, DIRAC, Condor Job registry stored locally or in the roaming profile Automate config/submit/monitor procedures Provide a palette of possible choices and specialized plug-ins (pre-defined application configurations, batch/grid systems, etc.) UI GUI JobOptions Algorithms BkSvc GANGA GAUDI Program Internal Model Histograms Monitoring Results Collective & Resource Grid Services WLM ProSvc Monitor GANGA Friendly user interface (CLI/GUI) is essential GUI Wizard Interface Help users to explore new capabilities Browse job registry Scripting/Command Line Interface Automate frequent tasks python shell embedded into the Ganga GUI Bookkeeping Service SE WorkLoad Manager CE File catalog Profile Service Grid Services Instr. GAUDI Program

RAL, 13 May 2004-13 ARDA contribution to Ganga Integration with EGEE middleware Waiting for the EGEE middleware, we developed an interface to Condor Use of Condor DAGMAN for splitting/merging and error recovery capability Design and Development Command Line Interface Future evolution of Ganga Release management Software process and integration Testing, tagging policies etc. Infrastructure Installation, packaging etc.

RAL, 13 May 2004-14 LHCb Metadata catalog Used in production (for large productions) Web Service layer being developed (main developers in the UK) Oracle backend ARDA contributes a testing focused on the analysis usage Robustness Performances under high concurrency (read mode) Measured network rate vs no. of concurrent clients

RAL, 13 May 2004-15 CERN/Taiwan tests Network monitor Virtual Users Client Oracle DB CERN Bookkeeping Server Web & XML-RPC Service performance tests CPU Load Network Process time Clone Bookkeeping DB in Taiwan Install the WS layer Performance Tests Database I/O Sensor Bookkeeping Server performance tests Taiwan/CERN Bookkeeping Server DB XML-RPC Service performance tests CPU Load, Network send/receive sensor, Process time Client Host performance tests CPU Load, Network send/receive sensor, Process time DB I/O Sensor Oracle DB CPU Load Network Process time Bookkeeping Server TAIWAN

RAL, 13 May 2004-16 CMS The CMS system within ARDA is still under discussion Provide easy access (and possibly sharing) of data for the CMS users is a key issue RefDB is the bookkeeping engine to plan and steer the production across different phases (simulation, reconstruction, to some degree into the analysis phase) It contained all necessary information except file physical location (RLS) and info related to the transfer management system (TMDB) The actual mechanism to provide these data to analysis users is under discussion Measuring performances underway (similar philosophy as for the LHCb Metadata catalog measurements) Reconstruction instructions RefDB McRunjob Tapes RefDB in CMS DC04 Reconstruction jobs Reconstructed data Summaries of successful jobs T0 worker nodes GDB castor pool Export Buffers Reconstructed data Checks what has arrived Updates RLS Transfer agent Updates TMDB

RAL, 13 May 2004-17 ATLAS The ATLAS system within ARDA has been agreed ATLAS has a complex strategy for distributed analysis, addressing different area with specific projects (Fast response, user-driven analysis, massive production, etc : see http://www.usatlas.bnl.gov/ada/) Starting point is the DIAL system The AMI metadata catalog is a key component mysql as a back end Genuine Web Server implementation Robustness and performance tests from ARDA In the start up phase, ARDA provided some help in developing ATLAS production tools Being finalised

RAL, 13 May 2004-18 What is DIAL? Interactive analysis e.g. ROOT, JAS,... DIAL Dataset Job Scheduler AAA Distributed processing running data-specific application

RAL, 13 May 2004-19 AMI studies in ARDA Atlas Metadata- Catalogue, contains File Metadata: Simulation/Reconstruction-Version Does not contain physical filenames Many problems still open: Large network traffic overhead due to schema independent tables SOAP proxy supposed to provide DB access Note that Web Services are stateless (not automatic handles to have the concept of session, transaction, etc ): 1 query = 1 (full) response Large queries might crashed server Shall proxy re-implement all database functionality? Good collaboration in place with ATLAS- Grenoble User User User SOAP-Proxy Meta-Data (MySQL) Studied behaviour using many concurrent clients:

ALICE: Grid enabled PROOF SuperComputing 2003 (SC2003) Demo RAL, 13 May 2004-20 PROOF SLAVES Site C Site A PROOF SLAVES TcpRouter Site B PROOF SLAVES Strategy: TcpRouter The ALICE/ARDA will evolve the analysis system presented by ALICE at SuperComputing 2003 With the new EGEE middleware (at SC2003, AliEn was used) Activity on PROOF Robustness Error recovery PROOF TcpRouter USER SESSION PROOF MASTER SERVER TcpRouter

RAL, 13 May 2004-21 ALICE-ARDA prototype improvements SC2003: The setup was heavily connected with the middleware services Somewhat inflexible configuration No chance to use PROOF on federated grids like LCG in AliEn TcpRouter service needs incoming connectivity in each site Libraries can not be distributed using the standard rootd functionality Improvement ideas: Distribute another daemon with ROOT, which replaces the need for a TcpRouter service Connect each slave proofd/rootd via this daemon to two central proofd/rootd master multiplexer daemons, which run together with the proof master Use Grid functionality for daemon start-up and booking policies through a plug-in interface from ROOT Put PROOF/ROOT on top of the grid services Improve on dynamic configuration and error recovery

RAL, 13 May 2004-22 ALICE-ARDA improved system PROOF SLAVE SERVERS Proxy proofd Proxy rootd Grid Services Booking The remote proof slaves look like a local proof slave on the master machine Booking service is usable also on local clusters PROOF Master

RAL, 13 May 2004-23 Conclusions and Outlook ARDA is starting Main tool: experiment prototypes for analysis Detailed project plan being prepared Good feedback from the LHC experiments Good collaboration with EGEE NA4 Good collaboration with Regional Centres. More help needed Look forward to contribute to the success of EGEE Helping EGEE Middleware to deliver a fully functionally solution ARDA main focus Collaborate with the LHC experiments to set up the end-to-end prototypes Aggressive schedule First milestone for the end-to-end prototypes is Dec 2004