THE EUCLID ARCHIVE SYSTEM: A DATA-CENTRIC APPROACH TO BIG DATA

Similar documents
THE EUCLID ARCHIVE SYSTEM: A DATA-CENTRIC APPROACH TO BIG DATA

University of Groningen

Euclid Archive Science Archive System

Data federations and its applications

BIDS 2016 Santa Cruz de Tenerife

VIRTUAL OBSERVATORY TECHNOLOGIES

ESA Science Archives Architecture Evolution. Iñaki Ortiz de Landaluce Science Archives Team 13 th Sept 2013

Using CernVM-FS to deploy Euclid processing S/W on Science Data Centres

Technological Challenges in the GAIA Archive

Virtual Observatory publication of interferometry simulations

Usage of the Astro Runtime

The Euclid Ground Segment Design for File-based Operations

Exploiting Virtual Observatory and Information Technology: Techniques for Astronomy

Astrophysics and the Grid: Experience with EGEE

Design and Implementation of the Japanese Virtual Observatory (JVO) system Yuji SHIRASAKI National Astronomical Observatory of Japan

From ISO to Gaia : a 20-years journey through data archives management

Euclid Mission Database

The Virtual Observatory and the IVOA

The Portal Aspect of the LSST Science Platform. Gregory Dubois-Felsmann Caltech/IPAC. LSST2017 August 16, 2017

The NOAO Data Lab Design, Capabilities and Community Development. Michael Fitzpatrick for the Data Lab Team

Technology for the Virtual Observatory. The Virtual Observatory. Toward a new astronomy. Toward a new astronomy

Euclid Science Ground Segment (SGS) Processing Operations Concept

Archives Building System Infrastructure: Re-engineering ESA Space Based Missions' Archives ABSTRACT

ESO Science Archive Interfaces

Building on Existing Communities: the Virtual Astronomical Observatory (and NIST)

Gaia Catalogue and Archive Plans and Status

Performance-related aspects in the Big Data Astronomy Era: architects in software optimization

Exploring the Handling of Light Curves in VO tools. Petr Škoda. Jiří Nádvorník, David Andrešič

The Herschel Data Processing System: History, Status and latest Developments

Time series for European Space Agency Solar Orbiter Archive with TimescaleDB

A VOSpace deployment: interoperability and integration in big infrastructure

Information systems playground the target infrastructure

Tutorial "Gaia in the CDS services" Gaia data Heidelberg June 19, 2018 Sébastien Derriere (adapted from Thomas Boch)

Europlanet IDIS: Adapting existing VO building blocks to Planetary Sciences

Data Centres in the Virtual Observatory Age

DOI for Astronomical Data Centers: ESO. Hainaut, Bordelon, Grothkopf, Fourniol, Micol, Retzlaff, Sterzik, Stoehr [ESO] Enke, Riebe [AIP]

Netherlands Institute for Radio Astronomy. May 18th, 2009 Hanno Holties

Merging Grid Technologies

The Astro Runtime. for data access. Noel Winstanley Jodrell Bank, AstroGrid. with the part of Noel played by John Taylor, IfA Edinburgh/AstroGrid

What might astronomers want? Flexible data products and remotely-steered pipelines

National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology WISE Archive.

Greece s Collaborative Ground Segment Initiatives

BEYOND Ground Segment Facility The Hellenic Sentinel Data Hub (Mirror Site)

The IPAC Research Archives. Steve Groom IPAC / Caltech

Query driven visualization of astronomical catalogs

EPN-TAP services : VIRTIS-VENUS EXPRESS

Query driven visualization of large scale multi-dimensional astronomical catalogs Buddelmeijer, Hugo

Computational synergies between LSST and SKA

Spectra & images in VizieR. Gilles Landais (CDS) Laurent Michel (OAS) Pierre Ocvirk (CDS)

THE SQUARE KILOMETER ARRAY (SKA) ESD USE CASE

Giovanni Lamanna LAPP - Laboratoire d'annecy-le-vieux de Physique des Particules, Université de Savoie, CNRS/IN2P3, Annecy-le-Vieux, France

HiPS IVOA standard process

v1.4, 7/1/2017 EPN-TAP services: Virtis / Venus Express demo S. Erard, B. Cecconi, P. Le Sidaner, F. Henry, R. Savalle, C. Chauvin

EUDAT- Towards a Global Collaborative Data Infrastructure

VAPE Virtual observatory Aided Publishing for Education

Daiquiri an VO ready solution for medium size data providers. Anastasia Galkin Jochen Klar Gal Matievic Harry Enke

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

Greece s Collaborative Ground Segment Initiatives

Long-term management of 1000s of All-Sky reference data sets using the HiPS network

Moving e-infrastructure into a new era the FP7 challenge

Hands-on tutorial on usage the Kepler Scientific Workflow System

Long-term preservation for INSPIRE: a metadata framework and geo-portal implementation

EUDAT. Towards a pan-european Collaborative Data Infrastructure - A Nordic Perspective? -

EUDAT - Open Data Services for Research

Future Core Ground Segment Scenarios

Extending the SDSS Batch Query System to the National Virtual Observatory Grid

EUDAT & SeaDataCloud

A VO-friendly, Community-based Authorization Framework

EUDAT. Towards a pan-european Collaborative Data Infrastructure

GEO-SPATIAL METADATA SERVICES ISRO S INITIATIVE

I data set della ricerca ed il progetto EUDAT

Introduction to Grid Computing

Summary of Data Management Principles

Euclid Consortium The Euclid Challenges. Pierre Dubath SDC-CH Department of Astronomy University of Geneva. IAU Symposium 325.

PROCESSING THE GAIA DATA IN CNES: THE GREAT ADVENTURE INTO HADOOP WORLD

IVOA and European VO Efforts Status and Plans


Italy - Information Day: 2012 FP7 Space WP and 5th Call. Peter Breger Space Research and Development Unit

Using IKAROS as a data transfer and management utility within the KM3NeT computing model

Distributed Archive System for the Cherenkov Telescope Array

CC-IN2P3 / NCSA Meeting May 27-28th,2015

The Square Kilometre Array. Miles Deegan Project Manager, Science Data Processor & Telescope Manager

Quality assurance in the ingestion of data into the CDS VizieR catalogue and data services

Processing and analysis of Earth Observation data

PSA ARCHIVING GUIDE FOR EXTERNAL DATA PROVIDERS

v1.0, 18/3/2017 EPN-TAP services: Spectroscopy S. Erard, B. Cecconi, P. Le Sidaner, C. Chauvin

SKA Regional Centre Activities in Australasia

The EOC Geoservice: Standardized Access to Earth Observation Data Sets and Value Added Products ABSTRACT

SIA V2 Analysis, Scope and Concepts. Alliance. Author(s): D. Tody (ed.), F. Bonnarel, M. Dolensky, J. Salgado (others TBA in next version)

Progress Report. Ian Evans On behalf of the Chandra Source Catalog Project Team. Chandra Users Committee Meeting October 25, 2010

Designing the Future Data Management Environment for [Radio] Astronomy. JJ Kavelaars Canadian Astronomy Data Centre

Scientific Data Curation and the Grid

Case Study: CyberSKA - A Collaborative Platform for Data Intensive Radio Astronomy

Theoretical Models in the Virtual Observatory

CyberSKA: Project Update. Cameron Kiddle, CyberSKA Technical Coordinator

The Interaction of the ISO-SWS Pipeline Software and the ISO-SWS Interactive Analysis System

The CEDA Archive: Data, Services and Infrastructure

Pascal Gilles H-EOP-GT. Meeting ESA-FFG-Austrian Actors ESRIN, 24 th May 2016

VO-CLOUD for Machine Learning. Petr Škoda. Lukáš Lopatovský, Andrej Palička, Lumír Mrkva Faculty of Informatics, Czech Technical University, Prague

Using EUDAT services to replicate, store, share, and find cultural heritage data

Transcription:

THE EUCLID ARCHIVE SYSTEM: A DATA-CENTRIC APPROACH TO BIG DATA Rees Williams on behalf of A.N.Belikov, D.Boxhoorn, B. Dröge, J.McFarland, A.Tsyganov, E.A. Valentijn University of Groningen, Groningen, The Netherlands B.Altieri, G.Buenadicha, S.Nieto, J. Salgado, P. de Teodoro European Space Astronomy Center, European Space Agency, Spain

EUCLID ESA mission to map dark matter & dark energy Launch on Soyuz in 2020 Wide field survey 1.2m telescope 3 Optical & IR instruments 15 countries 130 institutes 1300 consortium members 700 scientists Slide 2

Data Processing Challenge Unprecedented data volumes for astronomy satellite mission 1 Peta byte images from Euclid 10-100 Peta byte images from earth observatories Massive simulation efforts required Key features Very strict requirements of quality and calibration heavy (re)processing needed from raw data to science products mission could generate 25 Pbytes/year 10 billion objects in catalogue distributed nature of collaboration 9 national science data centres (SDC) distributed data processing and Slide 3

Euclid Archive System Data Centric approach adopted Builds on experience with the WISE information system used by o OmegaCAM, Muse, LOFAR LTA Builds on experience with earlier ESA archives Traceability and data lineage are strong requirements Central role for the Euclid Mission Archive (EAS) o EAS acts as interface between all ground system components o EAS data distributed across national SDCS o EAS meta-data is centrally stored o EAS metadata contains all information except images o EAS stores dependencies of data products Avoid unnecessary reprocessing Data provenance ensured Slide 4

Euclid Common Data Model Original Euclid Data Model (XSD) Python classes Detailed ECDM objects implemented in full in the EAS Provides lineage & traceability Database schema User Interface and services Slide 5

EAS General Architechure Data Processing System (EAS-DPS) Science Archive System (EAS-SAS) Distributed Storage System (EAS-DSS) Computing facilities SOC Computing facilities SDC-FR Computing facilities SDC-ES Data files Data files Data files Computing facilities SDC-NL Computing facilities SDC-DE Computing facilities SDC-UK Data files Data files Data files Computing facilities SDC-IT Computing facilities SDC-CH Computing facilities SDC-FI Data files Data files Data files Slide 6 6/6

Euclid data model in XSD Euclid Data Model Data Processing System Metadata Access Layer Metadata Science Archive System Metadata COORS Consortium Processing Service Metadata Ingest Service Consortium User Service EC user EC user public user IAL DSS server HPC DSS Storage Node Interface Storage node Storage node Storage node SDC-Y SDC-X DSS server DSS server SDC-Z

Euclid DSS Servers File location in metadata provides a global file system DSS server installed at each SDC provides data access to data at that SDC to other SDCs. No constraint on or file system on nodes DSS server supports the access of data in SDCs using Data product definition different protocols o sftp, https, gridftp, Astro-Wise dataserver, IRods Simple interface: store, retrieve, copy, delete, check Extended interface: cut-out services Adaptable to new or existing solutions at SDCs Slide 8

EAS Infrastructure at SDC-NL Euclid community & SDCs Millipede HPC and NL Big Grid with Euclid IAL. DSS Storage Node LAN Application Nodes: DSS, Metadata request, Ingest server, DBView, M&C Support Slide 9 Target GPFS Storage Infrastucture ESAC DPS Metadata Mirror ESAC SAS Metadata DPS Metadata Oracle RAC

EAS Interfaces DPS @SDC-NL DPS-MRS DPS @ESAC MTS SAS @ESAC EC Users External Users Slide 10

EAS-SAS Overview EAS-SAS @ESDC The SAS is being built at the ESAC Science Data Centre (ESDC), which is responsible for the development and operations of the scientific archives for the Astronomy & Solar System missions of ESA. Science Community The SAS is focused on the needs of the broader scientific community and it will provide access to the scientific metadata from the EAS-DPS. Main Capabilities - Parametric search for Euclid scientific metadata and catalogues - Visualization and exploration of images, spectra and catalogues - Upload, cross-match and sharing capabilities with scientific community Slide 11

EAS-SAS Architecture SAS builds on top of the latest ESDC s common Archives Building System Infrastructure, which defines the common components to the latest ESA Science Archives (i.e. Gaia Archive) HTTP Security MTS SEDM Catalogue + Metadata TAP+ UWS SAMP Server Layer CLI Web GUI Advanced Application Services Users MDR MAL AUS Slide 12

EAS-SAS Technology SAS backend is based on Java SAS Web portal is based on Google Web Toolkit (JavaScript) SAS database is based on PostgreSQL and accessed through TAP+ Interface Modules pgsphere and Q3C provide spherical data types, functions and operators to PostgreSQL SAS provides VO Interfaces (IVOA) EAS access relies on ESA LDAP and CAS service for A&A control Slide 13

EAS-SAS Visualization and Exploration Interface It will provide the following key functionalities: Full sky visualization interface (HiPS maps) Comparison service for Euclid bands and ground-based surveys Catalogue exploration through TAP+ query interface Cross-matching capabilities Functionalities on query results: Spectra service Download in different formats (VOTable, CSV, etc.) Overlay of catalogues Slide 14

EAS-SAS Visualization and Exploration Interface SAS visualization and exploration interface will provide full sky visualization capabilities based on ESASky technology and online catalogue exploitation. Visual exploration interface based on ESASky technology (CDS Aladin Lite) Visualization of HiPS (Hierarchical Progressive Survey) maps Slide 15 ESASky J. Salgado et Al. BiDS 16 presentation

EAS-SAS and VO Interfaces SAS will provide the tools and VO interfaces to enable the Software-to- Data Paradigm and "bring the software to the data" for the Euclid science. VO protocols are part of the ESDC infrastructure TAP+ parametric search for metadata en catalogues based on ADQL Universal Worker Service (UWS) to manage sync/async queries VOSpace to provide data sharing capabilities to the users community SAMP (Simple Application Messaging Protocol) to interoperate with other astronomical analysis applications (Aladin, Topcat). SAMP Browser Aladin, Topcat Slide 16

Added Application Services VOSpace service connected to the Euclid archive will allow the users community to exchange the scientific results in an interactive an useful way. Slide 17

Questions/Answers Slide 18 18/22

The key notion is that a fruitful approach towards handling the data deluge focuses on the administration, modeling, standardizing and tracking of the data rather than on the processing of the data by CPUs. This is a datacentric approach, in which the design of operational systems is driven by data management and data standard issues rather than data processing/cpu/pipelining issues. Slide 19