THE EUCLID ARCHIVE SYSTEM: A DATA-CENTRIC APPROACH TO BIG DATA Rees Williams on behalf of A.N.Belikov, D.Boxhoorn, B. Dröge, J.McFarland, A.Tsyganov, E.A. Valentijn University of Groningen, Groningen, The Netherlands B.Altieri, G.Buenadicha, S.Nieto, J. Salgado, P. de Teodoro European Space Astronomy Center, European Space Agency, Spain
EUCLID ESA mission to map dark matter & dark energy Launch on Soyuz in 2020 Wide field survey 1.2m telescope 3 Optical & IR instruments 15 countries 130 institutes 1300 consortium members 700 scientists Slide 2
Data Processing Challenge Unprecedented data volumes for astronomy satellite mission 1 Peta byte images from Euclid 10-100 Peta byte images from earth observatories Massive simulation efforts required Key features Very strict requirements of quality and calibration heavy (re)processing needed from raw data to science products mission could generate 25 Pbytes/year 10 billion objects in catalogue distributed nature of collaboration 9 national science data centres (SDC) distributed data processing and Slide 3
Euclid Archive System Data Centric approach adopted Builds on experience with the WISE information system used by o OmegaCAM, Muse, LOFAR LTA Builds on experience with earlier ESA archives Traceability and data lineage are strong requirements Central role for the Euclid Mission Archive (EAS) o EAS acts as interface between all ground system components o EAS data distributed across national SDCS o EAS meta-data is centrally stored o EAS metadata contains all information except images o EAS stores dependencies of data products Avoid unnecessary reprocessing Data provenance ensured Slide 4
Euclid Common Data Model Original Euclid Data Model (XSD) Python classes Detailed ECDM objects implemented in full in the EAS Provides lineage & traceability Database schema User Interface and services Slide 5
EAS General Architechure Data Processing System (EAS-DPS) Science Archive System (EAS-SAS) Distributed Storage System (EAS-DSS) Computing facilities SOC Computing facilities SDC-FR Computing facilities SDC-ES Data files Data files Data files Computing facilities SDC-NL Computing facilities SDC-DE Computing facilities SDC-UK Data files Data files Data files Computing facilities SDC-IT Computing facilities SDC-CH Computing facilities SDC-FI Data files Data files Data files Slide 6 6/6
Euclid data model in XSD Euclid Data Model Data Processing System Metadata Access Layer Metadata Science Archive System Metadata COORS Consortium Processing Service Metadata Ingest Service Consortium User Service EC user EC user public user IAL DSS server HPC DSS Storage Node Interface Storage node Storage node Storage node SDC-Y SDC-X DSS server DSS server SDC-Z
Euclid DSS Servers File location in metadata provides a global file system DSS server installed at each SDC provides data access to data at that SDC to other SDCs. No constraint on or file system on nodes DSS server supports the access of data in SDCs using Data product definition different protocols o sftp, https, gridftp, Astro-Wise dataserver, IRods Simple interface: store, retrieve, copy, delete, check Extended interface: cut-out services Adaptable to new or existing solutions at SDCs Slide 8
EAS Infrastructure at SDC-NL Euclid community & SDCs Millipede HPC and NL Big Grid with Euclid IAL. DSS Storage Node LAN Application Nodes: DSS, Metadata request, Ingest server, DBView, M&C Support Slide 9 Target GPFS Storage Infrastucture ESAC DPS Metadata Mirror ESAC SAS Metadata DPS Metadata Oracle RAC
EAS Interfaces DPS @SDC-NL DPS-MRS DPS @ESAC MTS SAS @ESAC EC Users External Users Slide 10
EAS-SAS Overview EAS-SAS @ESDC The SAS is being built at the ESAC Science Data Centre (ESDC), which is responsible for the development and operations of the scientific archives for the Astronomy & Solar System missions of ESA. Science Community The SAS is focused on the needs of the broader scientific community and it will provide access to the scientific metadata from the EAS-DPS. Main Capabilities - Parametric search for Euclid scientific metadata and catalogues - Visualization and exploration of images, spectra and catalogues - Upload, cross-match and sharing capabilities with scientific community Slide 11
EAS-SAS Architecture SAS builds on top of the latest ESDC s common Archives Building System Infrastructure, which defines the common components to the latest ESA Science Archives (i.e. Gaia Archive) HTTP Security MTS SEDM Catalogue + Metadata TAP+ UWS SAMP Server Layer CLI Web GUI Advanced Application Services Users MDR MAL AUS Slide 12
EAS-SAS Technology SAS backend is based on Java SAS Web portal is based on Google Web Toolkit (JavaScript) SAS database is based on PostgreSQL and accessed through TAP+ Interface Modules pgsphere and Q3C provide spherical data types, functions and operators to PostgreSQL SAS provides VO Interfaces (IVOA) EAS access relies on ESA LDAP and CAS service for A&A control Slide 13
EAS-SAS Visualization and Exploration Interface It will provide the following key functionalities: Full sky visualization interface (HiPS maps) Comparison service for Euclid bands and ground-based surveys Catalogue exploration through TAP+ query interface Cross-matching capabilities Functionalities on query results: Spectra service Download in different formats (VOTable, CSV, etc.) Overlay of catalogues Slide 14
EAS-SAS Visualization and Exploration Interface SAS visualization and exploration interface will provide full sky visualization capabilities based on ESASky technology and online catalogue exploitation. Visual exploration interface based on ESASky technology (CDS Aladin Lite) Visualization of HiPS (Hierarchical Progressive Survey) maps Slide 15 ESASky J. Salgado et Al. BiDS 16 presentation
EAS-SAS and VO Interfaces SAS will provide the tools and VO interfaces to enable the Software-to- Data Paradigm and "bring the software to the data" for the Euclid science. VO protocols are part of the ESDC infrastructure TAP+ parametric search for metadata en catalogues based on ADQL Universal Worker Service (UWS) to manage sync/async queries VOSpace to provide data sharing capabilities to the users community SAMP (Simple Application Messaging Protocol) to interoperate with other astronomical analysis applications (Aladin, Topcat). SAMP Browser Aladin, Topcat Slide 16
Added Application Services VOSpace service connected to the Euclid archive will allow the users community to exchange the scientific results in an interactive an useful way. Slide 17
Questions/Answers Slide 18 18/22
The key notion is that a fruitful approach towards handling the data deluge focuses on the administration, modeling, standardizing and tracking of the data rather than on the processing of the data by CPUs. This is a datacentric approach, in which the design of operational systems is driven by data management and data standard issues rather than data processing/cpu/pipelining issues. Slide 19