Prototypes of a Computational Grid for the Planck Satellite

Similar documents
Astrophysics and the Grid: Experience with EGEE

Advanced School in High Performance and GRID Computing November Introduction to Grid computing.

ISTITUTO NAZIONALE DI FISICA NUCLEARE

Interconnect EGEE and CNGRID e-infrastructures

Garuda : The National Grid Computing Initiative Of India. Natraj A.C, CDAC Knowledge Park, Bangalore.

IGI (the Italian Grid initiative) and its impact on the Astrophysics community

The Virtual Observatory and the IVOA

Comparative evaluation of software tools accessing relational databases from a (real) grid environments

Grid Architectural Models

Grid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms

On the employment of LCG GRID middleware

Introduction to Grid Infrastructures

Grid Computing Middleware. Definitions & functions Middleware components Globus glite

Grid Challenges and Experience

Requirements for a Future Astronomical Data Analysis Environment

Workload Management. Stefano Lacaprara. CMS Physics Week, FNAL, 12/16 April Department of Physics INFN and University of Padova

Structured Query Language for Virtual Observatory

glite Grid Services Overview

Monitoring the Usage of the ZEUS Analysis Grid

Distributed Archive System for the Cherenkov Telescope Array

Virtual Observatory and Grid-related projects: national and international status

The GENIUS Grid Portal

Workflow-based data parallel applications on the EGEE production grid infrastructure

Multiple Broker Support by Grid Portals* Extended Abstract

ESO Science Archive Interfaces

Grid services. Enabling Grids for E-sciencE. Dusan Vudragovic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia

The Grid: Processing the Data from the World s Largest Scientific Machine

Dr. Giuliano Taffoni INAF - OATS

Heterogeneous Grid Computing: Issues and Early Benchmarks

Overview of LFI maps generation and their characteristics

The European DataGRID Production Testbed

GRIDS INTRODUCTION TO GRID INFRASTRUCTURES. Fabrizio Gagliardi

Chapter 18: Web-based Tools NED VO Services

The EU DataGrid Testbed

Knowledge Discovery Services and Tools on Grids

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

SwinDeW-G (Swinburne Decentralised Workflow for Grid) System Architecture. Authors: SwinDeW-G Team Contact: {yyang,

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Implementing GRID interoperability

FREE SCIENTIFIC COMPUTING

Architecture Proposal

Access the power of Grid with Eclipse

ELFms industrialisation plans

The LHC Computing Grid

High Performance Computing Course Notes Grid Computing I

Introduction to Grid Computing

Easy Access to Grid Infrastructures

Bookkeeping and submission tools prototype. L. Tomassetti on behalf of distributed computing group

A European Vision and Plan for a Common Grid Infrastructure

IVOA and European VO Efforts Status and Plans

MSF: A Workflow Service Infrastructure for Computational Grid Environments

LCG-2 and glite Architecture and components

SDS: A Scalable Data Services System in Data Grid

MONTE CARLO SIMULATION FOR RADIOTHERAPY IN A DISTRIBUTED COMPUTING ENVIRONMENT

Grid Interoperation and Regional Collaboration

Andrea Sciabà CERN, Switzerland

IEPSAS-Kosice: experiences in running LCG site

Grid Computing. Olivier Dadoun LAL, Orsay. Introduction & Parachute method. Socle 2006 Clermont-Ferrand Orsay)

Data Processing for SUBARU Telescope using GRID

Experience with LCG-2 and Storage Resource Management Middleware

The University of Oxford campus grid, expansion and integrating new partners. Dr. David Wallom Technical Manager

The Interaction of the ISO-SWS Pipeline Software and the ISO-SWS Interactive Analysis System

CMS HLT production using Grid tools

g-eclipse A Framework for Accessing Grid Infrastructures Nicholas Loulloudes Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac.

Distributed Repository for Biomedical Applications

EuroHPC Bologna 23 Marzo Gabriella Scipione

Grid Infrastructure For Collaborative High Performance Scientific Computing

UNICORE Globus: Interoperability of Grid Infrastructures

Future Developments in the EU DataGrid

Design of Distributed Data Mining Applications on the KNOWLEDGE GRID

Deliverable D71:(PC1) Draft Collaboration Plan

THE GLOBUS PROJECT. White Paper. GridFTP. Universal Data Transfer for the Grid

Introduction. Software Trends. Topics for Discussion. Grid Technology. GridForce:

Project GRACE: A grid based search tool for the global digital library

THE EUCLID ARCHIVE SYSTEM: A DATA-CENTRIC APPROACH TO BIG DATA

The glite middleware. Ariel Garcia KIT

Deploying virtualisation in a production grid

Greece s Collaborative Ground Segment Initiatives

Eclipse Technology Project: g-eclipse

Performance of R-GMA for Monitoring Grid Jobs for CMS Data Production

Grid Scheduling Architectures with Globus

Data Management for Distributed Scientific Collaborations Using a Rule Engine

Gergely Sipos MTA SZTAKI

Introduction to Grid computing and overview of the European Data Grid Project

Identity Management in ESA Grid on-demand Infrastructure

TAP services integration at IA2 data center

Kepler and Grid Systems -- Early Efforts --

A Login Shell interface for INFN-GRID

Grid computing with IVOA standards and VOTech components

CMS users data management service integration and first experiences with its NoSQL data storage

A Digital Library Framework for Reusing e-learning Video Documents

Provisioning of Grid Middleware for EGI in the framework of EGI InSPIRE

The ATLAS EventIndex: an event catalogue for experiments collecting large amounts of data

A Resource Discovery Algorithm in Mobile Grid Computing Based on IP-Paging Scheme

Grid Data Management in Action: Experience in Running and Supporting Data Management Services in the EU DataGrid Project

THE EUCLID ARCHIVE SYSTEM: A DATA-CENTRIC APPROACH TO BIG DATA

ALHAD G. APTE, BARC 2nd GARUDA PARTNERS MEET ON 15th & 16th SEPT. 2006

Ezio Corso Angelo Leto, Antonio Messina, Riccardo Murri, Riccardo Di Meo, Alessio Terpin

Giovanni Lamanna LAPP - Laboratoire d'annecy-le-vieux de Physique des Particules, Université de Savoie, CNRS/IN2P3, Annecy-le-Vieux, France

A Grid-Enabled Component Container for CORBA Lightweight Components

Transcription:

ASTRONOMICAL DATA ANALYSIS SOFTWARE AND SYSTEMS XIV ASP Conference Series, Vol. 347, 2005 P. L. Shopbell, M. C. Britton, and R. Ebert, eds. Prototypes of a Computational Grid for the Planck Satellite Giuliano Taffoni, Giuliano Castelli, Riccardo Smareglia, Claudio Vuerli, Andrea Zacchei, and Fabio Pasian National Institute for Astrophysics, OATs, Trieste, Italy Davide Maino University of Milano, Milan, Italy Giancarlo Degasperis University of Rome Tor Vergata, Rome, Italy Salim G. Ansari and Jan Tauber European Space Agency, ESRIN, Holland Thomas Ensslin Max Planck Institute for Astrophysics, Garching, Germany Roberto Barbera National Institute for Nuclear Physics, Catania, Italy Abstract. A prototype of a Computation Grid is designed to assessing the possibility of developing pipeline setup for processing Planck Satellite data. The amount of data collected by the satellite during its sky surveys requires an extremely high computational power both for reduction and analysis. For this reason a Grid environment represents an interesting layout to be considered when processing those data. 1. Introduction The ESA Planck satellite mission will fly in 2007. This experiment is aimed to map the microwave sky performing at least two complete sky surveys with an unprecedented combination of sky and frequency coverage, accuracy, stability and sensitivity (Tauber 2000). Planck is composed of a number of microwave and sub-millimeter detectors which are grouped into a high frequency instrument (HFI) and a low frequency instrument (LFI) (Pasian 2002) and covers a frequency range from 30 up to 850 GHz. All levels data processing is assign to the two Data Processing Centers (DPCs), one for the LFI centralized at OAT in Trieste, Italy and one for the HFI distributed between Paris, France, and Cambridge, UK. Both DPCs share a site producing an Early Release Compact Source Catalog (IPAC, Pasadena, USA) 320

Primergy Prototypes of a Computational Grid for the Planck Satellite 321 PlanckGrid Applications Application Specific Environment Computing Resources User Interface GRID Middleware Storage Resources Figure 1. Structure of the Planck@Grid application deployment. and a site gathering and documenting the final results of the mission, located at MPA, in Garching, Germany. The amount of data produced by the whole mission and by the necessary post-processing is a challenging task both in terms of storage and computational needs (Bond et al. 1999). For example only the LFI DPC is in charge to process 100 GB of data. PlanckGrid is a project whose main goal is to verify the possibility of using a Grid Technology to process Planck Satellite data (Smareglia et al. 2004). The project is exploring scientific and technical problems that must be solved to develop GRID data reduction applications and make them available to the Planck community. In this paper we describe the prototype of a specialized environment based on the GRID middleware, to support Planck Applications (see Figure 1). This environment must guarantee: retrieval of data from a storage located outside the GRID with http or ftp protocol, distribution of the Planck software (LevelS, Level1 and Level2) and libraries and storage and replica of raw and reduced data with a secure access policy. This project coordinates two main initiatives: ESA and INAF-OATs joint collaboration; INAF/GILDA (a test-bed Grid infrastructure setup to host test-bed applications that at a later stage will be proposed as test-bed for EGEE). 2. The GRID Environment GRID computing enables the virtualization of distributed computing and data resources such as processing and storage capacity to create a single system image, granting users and applications seamless access to vast IT capabilities. The whole software that underlines the fundamental grid services, such as information services, resource discovery and monitoring, job submission and management, brokering and data management and resource management represents the GRID middleware. The middleware builds upon an number of open source solutions like Globus Toolkit 2 (GT2, Foster & Kesselman 1997) and EDG libraries (Ghiselli 2002). In the case of the ESA-INAF collaboration the grid middleware

Primergy 322 Taffoni et al. UI pipeline run (RSL) RB Grid Site get results WN query data notify results Metadata repository Replica Manager locate data (Web Service) get data (gsiftp) store data SE Grid environment Figure 2. Structure of the Planck simulation workflow. The application environment check on a metadata repository if a simulation was already done using an XML description of the Cosmological and instrumental parameters. If the data exists the application download/reduce them otherwise a new simulation started. The simulated data are stored on the Grid. The application get the metadata that describe the simulated data and store them on the metacatalog repository for future download or post-processing. is based on GT2 and ESA supplied a proprietary workflow tool: GridAssist 1. This workflow tool acts as a resource broker of the computational and storage resources. It is already well tested on GAIA Grid (Ansari 2004). In the case of INAF-GILDA collaboration, the production GRID is supplied by the Istituto Nazionale di Fisica Nucleare (INFN) of Catania. Grid INFN Laboratory for Dissemination Activities (GILDA 2 ) was developed as part of the Italian INFN Grid project and the Enabling Grids for E-science in Europe (EGEE 3 ) project as testbed for the EGEE applications. The EGEE middleware based on LCG (Robertson 2001) provides some basic services like a User Interface (UI), a full equipped job submission environment and a data Replica Manager (RM, Kunszt et al. 2003). The UI is a computer of the GRID system that allows users to submit jobs, access DB and store/replicate files. Storage elements and Worker nodes are the GRID resources in charge of data management and computing. Parallel and scalar computing are supplied by the system. 1 http://tphon.dutchspace.nl/grease/public 2 https://gilda.ct.infn.it 3 http://egee-intranet.web.cern.ch/

Prototypes of a Computational Grid for the Planck Satellite 323 Figure 3. A map of the sky simulated via Grid for a 70 GHz LFI channel. 3. Planck@Grid One of the primary issue for the DPCs is to define, design and run a complete simulation of the Planck mission to test the data analysis pipelines. The simulation software must mimic the Planck observing procedure, and any source of systematic effect related to the detectors in a realistic way. As a first test for the PlanckGrid project we concentrate on the simulation software (SW). The ESA/INAF project setup a GRID of 3 sites (ESTEC, OATs and ESRIN) managed by the GT2 and by the GridAssist that acts also as the application environment to run the Planck Applications. In the INAF/GILDA GRID a workflow is built upon the EGEE middleware. The interaction with the GRID middleware is requested for: authentication, data movement, Planck SW distribution, resource selection, and computing (see Figure 2). 3.1. Working Testbed We successfully port Planck mission simulation SW (LevelS) on LCG/EGEE Grid. SW is supported by a set of Linux shell scripts that allow to interface the simulation pipeline with Grid services. We use the GRID Job Description Language (JDL, Pacini 2003) to submit the numerical calculations and access to the RM data service. Our prototype distributes the LevelS SW on the Grid (RM), selects the available resources, and runs the pipeline, stores the simulated data and assures data access to the Planck users (all the users joining Planck Virtual Organization - VO). A meta-data schema is used to describe the output (parameters, date,size, etc.) and make easy data recovery and post-processing. An example of the simulation results is shown in Figure 3. We also test the reduction SW on the simulated data. As an example we use the destriping procedure as described by Maino et al. (2002). Our input files are the simulated Time Ordered Data distributed on Grid. We design a pipeline

324 Taffoni et al. that interacts with RM to locate raw data, identify the computing and storage resource suitable for its needs (using the GRID Resource Broker) and finally process raw data (using the GRID job submission tools). The output is stored on Grid and signed on the metadata repository. It can be accessed by Planck Users (VO). 4. Conclusions and Future Work We successfully run a simulation of the Planck mission for LFI and we store the simulated mission data on GRID. This data is described by a set of XML files. The metadata description is still on a prototype stage and more work is required to identify the final semantic. The metadata description for the simulation files is stored on GRID. Data files are available for all the Planck users together with their XML description. We plan to port the whole simulation architecture on EGEE GRID to simulate the whole mission also for HFI. This requires to deploy a stable application specific layer and define the metadata description. We also plan to run simulations for different values of the parameters (Cosmological and Instrumental) and to test on GRID the simulated raw data reduction. This implies to port on GRID also the Level2 reduction software and to extend the metadata semantic also to reduced data. Acknowledgments. This work is done with the economical support of the Italian Government and in particular of MIUR. References Ansari, S. G., 2004, in ASP Conf. Ser., Vol. 347, ADASS XIV, ed. P. L. Shopbell, M. C. Britton, & R. Ebert (San Francisco: ASP), 429 Bond, J.R., Crittenden, R.G.,Jaffe, A.H., & Knox, L., 1999, in Computers in Science & Engineering, 1, 21 Foster, I. & Kesselman, C., 1997, Intl J. Supercomputer Applications, 11, 115 Ghiselli, A., 2002, in TERENA Networking Conference, 503 Kunszt, P., Laure, E., Stockinger, H., & Stockinger, K. 2003, in Lecture Notes in Computer Science, Springer-Verlag Heidelberg Vol. 3019, ed. R.Wyrzykowski et al., 848 Maino, D., Burigana, C., Grski, K. M., Mandolesi, N., & Bersanelli, M. 2002, A&A, 387, 356 Pacini, F. 2003, DataGrid-01-TEN-0142-0 2 Pasian, F. 2002, MmSAI, 74, 502 Robertson, L. 2001, CERN/2379/rev Smareglia, R., Pasian, F., Vuerli, C., & Zacchei, A., 2004, in ASP Conf. Ser., Vol. 314, ADASS XIII, ed. F. Ochsenbein, M. Allen, & D. Egret (San Francisco: ASP), 674 Tauber J.,A. 2000, in IAU Symposium, 204, 40