ODC and future EIDA/ EPOS-S plans within EUDAT2020. Luca Trani and the EIDA Team Acknowledgements to SURFsara and the B2SAFE team

Similar documents
Current status of the Romanian EIDA (European Integrated Data Archive)

USE CASES IN SEISMOLOGY. Alberto Michelini INGV

ORFEUS. making seismic waveform data available from 1987 to EIDA from a centralized data center to a federated data archive history and present status

irods workflows for the data management in the EUDAT pan-european infrastructure

Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013

EUDAT- Towards a Global Collaborative Data Infrastructure

EPOS-S Waveform Services

Federating Data Centers within ORFEUS: The EIDA experience and future plans. The EIDA community Presented by Angelo Strollo* Prague June 29.

National and Regional data node implementation

EIDA European Integrated Data Archives

6 th International Conference on Earthquake Geotechnical Engineering 1-4 November 2015 Christchurch, New Zealand

EUDAT - Open Data Services for Research

EPOS a long term integration plan of research infrastructures for solid Earth Science in Europe

irods Scalable Architecture

EUDAT Data Services & Tools for Researchers and Communities. Dr. Per Öster Director, Research Infrastructures CSC IT Center for Science Ltd

Ways to improve Global Data Sharing and Re-use

EPOS: European Plate Observing System

I data set della ricerca ed il progetto EUDAT

Data management and discovery

EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture. Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members

Inge Van Nieuwerburgh OpenAIRE NOAD Belgium. Tools&Services. OpenAIRE EUDAT. can be reused under the CC BY license

EUDAT. Towards a pan-european Collaborative Data Infrastructure

The VERCE Architecture for Data-Intensive Seismology

Data Staging and Data Movement with EUDAT. Course Introduction Helsinki 10 th -12 th September, Course Timetable TODAY

Seismology and EPOS Integrated Core Services

Scientific data management

EUDAT and Cloud Services

EUDAT & AAI. Daan Broeder MPI for Psycholinguistics

The EUDAT Collaborative Data Infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Damien Lecarpentier CSC-IT Center for Science, Finland EUDAT User Forum, Barcelona

European scale coordination of waveform data exchange

Remote Workflow Enactment using Docker and the Generic Execution Framework in EUDAT

Building a Dutch National Research Infrastructure IRODS UGM 2017

Deliverable 6.4. Initial Data Management Plan. RINGO (GA no ) PUBLIC; R. Readiness of ICOS for Necessities of integrated Global Observations

GEOFON and its Role in Global Seismic Monitoring and Tsunami Warning

Using EUDAT services to replicate, store, share, and find cultural heritage data

EUDAT & SeaDataCloud

EUDAT. Towards a pan-european Collaborative Data Infrastructure - A Nordic Perspective? -

EUDAT. Towards a Collaborative Data Infrastructure. Ari Lukkarinen CSC-IT Center for Science, Finland NORDUnet 2012 Oslo, 18 August 2012

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Europeana Core Service Platform

EUDAT. Towards a pan-european Collaborative Data Infrastructure. KNMI Workshop, Utrecht, Netherlands

A national approach for storage scale-out scenarios based on irods

BEYOND Ground Segment Facility The Hellenic Sentinel Data Hub (Mirror Site)

Safe Replication and Data Staging

Future Core Ground Segment Scenarios

SHale gas. Beata Orlecka-Sikora, Monika Staszek & WP2 Team Institute of Geophysics, Polish Academy of Sciences. Blackpool - June 5-7, 2017

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

B2SAFE metadata management

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Minutes for FDSN WG III Products, Tools and Services Cape Town, South Africa 13 January 2009

Pascal Gilles H-EOP-GT. Meeting ESA-FFG-Austrian Actors ESRIN, 24 th May 2016

Big Data, exploiter de grands volumes de données

Data Discovery - Introduction

European Network and ICOS

Deduplication Storage System

Towards a joint service catalogue for e-infrastructure services

Dutch View on URN:NBN and Related PID Services

Indiana University Research Technology and the Research Data Alliance

EUDAT Towards a Collaborative Data Infrastructure

VERCE : Virtual Earthquake and Seismology Research Community in Europe e-science environment

Data Staging: Moving large amounts of data around, and moving it close to compute resources

SeisComP3 Overview. VS workshop - ETH/SED Zurich. Jan Becker. gempa GmbH, Potsdam, Germany. March, 2013

Mitigating Risk of Data Loss in Preservation Environments

Greece s Collaborative Ground Segment Initiatives

Giovanni Lamanna LAPP - Laboratoire d'annecy-le-vieux de Physique des Particules, Université de Savoie, CNRS/IN2P3, Annecy-le-Vieux, France

Data Intensive processing with irods and the middleware CiGri for the Whisper project Xavier Briand

Greece s Collaborative Ground Segment Initiatives

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

B2FIND and Metadata Quality

NEUROIMAGING RESEARCH DATA LIFE- CYCLE MANAGEMENT

EUDAT Common data infrastructure

Data Staging: Moving large amounts of data around, and moving it close to compute resources

European Cloud Initiative: implementation status. Augusto BURGUEÑO ARJONA European Commission DG CNECT Unit C1: e-infrastructure and Science Cloud

EOSC Services & Architecture: the EOSC-hub approach Tiziana Ferrari, Project Coordinator, EGI Founda?on

7 Things ISVs Must Know About Virtualization

GENeric European Sustainable Information Space for Environment.

Coupled Computing and Data Analytics to support Science EGI Viewpoint Yannick Legré, EGI.eu Director

High Performance Computing Course Notes Grid Computing I

Copernicus Space Component. Technical Collaborative Arrangement. between ESA. and. Enterprise Estonia

Data Centres in the Virtual Observatory Age

European Collaborative Data Infrastructure EUDAT - Training on EUDAT Principles -

IM&T Data Management Strategy Overview March 2013 CSIRO INFORMATION MANAGEMENT & TECHNOLOGY

A Cloud WHERE PHYSICAL ARE TOGETHER AT LAST

Introduction to Grid Computing

Disaster Recovery and Mitigation: Is your business prepared when disaster hits?

The IRIS Data Management Center maintains the world s largest system for collecting, archiving and distributing freely available seismological data.

PID System for eresearch

Post Digitization: Challenges in Managing a Dynamic Dataset. Jasper Faase, 12 April 2012

Open Science Commons: A Participatory Model for the Open Science Cloud

Making Open Data work for Europe

natural disasters: the role of the Group

EGI federated e-infrastructure, a building block for the Open Science Commons

ESA EO Programmes for CM16. EOEP-5 Block 4. Bilateral meeting with AT Delegation and Industry Vienna, 24/05/2016. ESA UNCLASSIFIED - For Official Use

NATIONAL DATA REPOSITORY

CLARIN s central infrastructure. Dieter Van Uytvanck CLARIN-PLUS Tools & Services Workshop 2 June 2016 Vienna

Data, Data, Everywhere. We are now in the Big Data Era.

Data Curation Practices at the Oak Ridge National Laboratory Distributed Active Archive Center

The Global Seismographic Network and recording of the Earthquake of December 26, 2004

The CEDA Archive: Data, Services and Infrastructure

Transcription:

B2Safe @ ODC and future EIDA/ EPOS-S plans within EUDAT2020 Luca Trani and the EIDA Team Acknowledgements to SURFsara and the B2SAFE team 3rd Conference, Amsterdam, The Netherlands, 24-25 September 2014

Outline ORFEUS European Integrated Data Archives concept ORFEUS Data Center Why B2SAFE and description of the implemented solution Current limitations EIDA/EPOS-S future plans within EUDAT2020

ORFEUS ORFEUS (Observatories and Research Facilities for European Seismology), is the non-profit foundation that aims at co-ordinating and promoting digital, broadband (BB) seismology in the European-Mediterranean area. It was founded in 1987 by corporate founders from 13 European countries forming the board of directors. Its activities are distributed between the ORFEUS Data Centre (ODC), gathering, archiving and providing waveform data, and four working groups, coordinating data availability and relevant developments. EIDA since Nov 2012 officially under ORFEUS EIDA GOALs safe, persistent archival and dissemination of high quality seismic waveforms and products via distributed archives easy access for scientists - support multiple access methods, standards and tools open, where possible, but also closed / restricted access provide best datasets available

LMU Users: Geoscientists etc EIDA today BGR ETH/SED GFZ Virtual ArcLink Network web portals arclink_fetch obspy.arclink web services breq_fast SeisComP3 other tools INGV ODC IPGP EIDA

EIDA in numbers EIDA currently provides uniform, unrestricted and rapid access to about ~400 TB of data 80 permanent networks 58 temporary networks 4500 open access stations 1600 real-time permanent stations 25-50 GB downloads per day Permanent station Temporary station EIDA node

ORFEUS Data Center The Orfeus Data Center (ODC) acts as regional data centre within the International Federation of Digital Seismograph Networks (FDSN) and is hosted by the seismological division of the KNMI in The Netherlands The VEBSN (Virtual European Broadband Seismograph Station Network) is the pool of broadband seismographs from which data is shared in real-time with ORFEUS Data Center The centralised archive contains waveform data from 1988 to present, from about 60 networks

ODC: waveform data 28TB Continuous waveforms data growth from 1988 Earthquake waveforms 2000

Seismic continuous waveform Seismic stations continuously record ground motion and send their data in real time to data centers which collect them Each station is equipped with different sensors: typically targeting different spatial and frequency components Data are usually stored as binary files encoded in a standard format called SEED (Standard for the Exchange of Earthquake Data) or better MSEED (its compact version) A common convention is to arrange data in daily files divided per network,station,channel, location code E.g.: SABA.HHE.NA.2014.001, SABA.HHN.NA.2014.001, SABA.HHZ.NA. 2014.001

Why B2SAFE Orfeus mission: Provide safe, persistent and long term archival and preservation of high quality waveform data It matches perfectly with the features offered by B2SAFE Moreover it allows full control on the choice of data replication facility vs generic cloud service => very important for computation Possibility to implement customised rules and data workflows

Solution: overview Main components: - irods: micro services utilised to create customisable replication rules - EPIC handle system: PID minting and management PID Seismic acquisition system irods irods HPSS ODC NFS share ODC NFS mount dcache DMF Thanks to Robert Verkerk

Solution: workflow 1) Register dataset in irods (locally at KNMI): ireg 2) Generate PID at KNMI 3) Replicate dataset to SURFsara 4) Generate PID of the replica at SURFsara 5) Update parent PID with replica information at KNMI

Solution: PIDs

Solution: some numbers - Replication of the ODC archive from 1988 to Dec 2013 completed ~ 6.6 M files - Registration of the first 6 months of 2014 into irods at KNMI ~800.000 files - Registration rate in irods ~100.000 files per day - Replication rate ~ 80.000 files per day (Typical file size range: from a few KB up to 20-30MB) Planned actions: - Finalise replication up to current day - Set up an automatic procedure to register and replicate new files

Solution: current limitations - PID assigned only to archived datasets: no real time yet Datasets are assumed as frozen at a certain point in time - Replication applied to continuous waveforms but it could be extended also to other products. E.g.: stations, qc, - PID granularity - Replication mainly for backup purposes - The current system does not support data provenance Impossible to trace the history of changes on a specific dataset: related to Dynamic Data - PID not coupled with domain specific metadata

ODC Metadata Integration

EIDA and EPOS: EPOS-S European Plate Observing System The existing national Research Infrastructures (RIs) for solid Earth science in Europe generate data and information and are responsible for the operation of instrumentation in each country. These RIs are integrated into the EPOS Thematic Services, which represent dedicated services for each specific community. The distinct Thematic Services are further joined up to create the EPOS Integrated Services consisting of a variety of multidisciplinary services that will allow the access to data, data products, processing and visualisation tools and computational codes and resources for different stakeholders, not limited to the scientific community.

EIDA/EPOS-S plans within EUDAT2020 - Set up a robust, scalable, reliable and secure infrastructure to support the federation of datacenters - Enrich data streams with PIDs and detailed metadata - Enable reproducibility and provenance - Automatic safe replication of datasets on external data resources - Effective exploitation of replicas: reliability, failover, disaster recovery, computation, optimisation - Federated Discovery and Access - Federated Identity Management (in discussion with edugain/geant)

EIDA in EUDAT2020 LMU ETH/SED GFZ Users: Geoscientists etc BGR INGV ODC IPGP EIDA

LMU ETH/SED GFZ Users: Geoscientists etc BGR INGV ODC IPGP EIDA

LMU ETH/SED GFZ Users: Geoscientists etc BGR INGV ODC IPGP EIDA

LMU ETH/SED GFZ Users: Geoscientists etc BGR INGV ODC IPGP EIDA

LMU B2? Users: Geoscientists etc BGR ETH/SED GFZ Data Discovery Data Access Data Computation AAI Citation Workflows B2? New node B2? INGV ODC B2? IPGP EIDA

For more info visit us at http://www.orfeus-eu.org/eida/eida.html

For more info visit us at http://www.orfeus-eu.org/eida/eida.html Thank You