European Collaborative Data Infrastructure EUDAT - Training on EUDAT Principles -

Similar documents
Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure - A Nordic Perspective? -

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT Data Services & Tools for Researchers and Communities. Dr. Per Öster Director, Research Infrastructures CSC IT Center for Science Ltd

EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture. Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members

EUDAT Common data infrastructure

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

I data set della ricerca ed il progetto EUDAT

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Damien Lecarpentier CSC-IT Center for Science, Finland EUDAT User Forum, Barcelona

EUDAT. Towards a pan-european Collaborative Data Infrastructure. KNMI Workshop, Utrecht, Netherlands

EUDAT. Towards a Collaborative Data Infrastructure. Ari Lukkarinen CSC-IT Center for Science, Finland NORDUnet 2012 Oslo, 18 August 2012

EUDAT- Towards a Global Collaborative Data Infrastructure

Using EUDAT services to replicate, store, share, and find cultural heritage data

EUDAT & AAI. Daan Broeder MPI for Psycholinguistics

EUDAT Towards a Collaborative Data Infrastructure

Data Staging and Data Movement with EUDAT. Course Introduction Helsinki 10 th -12 th September, Course Timetable TODAY

Data Discovery - Introduction

Inge Van Nieuwerburgh OpenAIRE NOAD Belgium. Tools&Services. OpenAIRE EUDAT. can be reused under the CC BY license

Safe Replication and Data Staging

ODC and future EIDA/ EPOS-S plans within EUDAT2020. Luca Trani and the EIDA Team Acknowledgements to SURFsara and the B2SAFE team

EUDAT - Open Data Services for Research

irods workflows for the data management in the EUDAT pan-european infrastructure

European Cloud Initiative: implementation status. Augusto BURGUEÑO ARJONA European Commission DG CNECT Unit C1: e-infrastructure and Science Cloud

The EUDAT Collaborative Data Infrastructure

USE CASES IN SEISMOLOGY. Alberto Michelini INGV

Data management and discovery

EUDAT & SeaDataCloud

EUDAT and Cloud Services

Making research data repositories visible and discoverable. Robert Ulrich Karlsruhe Institute of Technology

Ways to improve Global Data Sharing and Re-use

Towards FAIRness: some reflections from an Earth Science perspective

re3data.org Registry of Research Data Repositories Peter Schirmbacher Humboldt-Universität zu Berlin ETD Hong Kong, September 25.

Key Elements of Global Data Infrastructures

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

dcache: challenges and opportunities when growing into new communities Paul Millar on behalf of the dcache team

re3data.org Registry of Research Data Repositories

NorStore. a national infrastructure for scientific data. Andreas O Jaunsen UNINETT Sigma as

Indiana University Research Technology and the Research Data Alliance

Federated Identities and Services: the CHAIN-REDS vision

On the way to Language Resources sharing: principles, challenges, solutions

European Network and ICOS

e-infrastructures in Horizon 2020 e-infrastructures for data and computing

B2SAFE metadata management

Data Staging: Moving large amounts of data around, and moving it close to compute resources

Science Europe Consultation on Research Data Management

Striving for efficiency

Open, Digital Science in Europe

Progress towards the EOSC

Open Science Commons: A Participatory Model for the Open Science Cloud

OpenAIRE Open Knowledge Infrastructure for Europe

Towards repository interoperability

Trust and Certification: the case for Trustworthy Digital Repositories. RDA Europe webinar, 14 February 2017 Ingrid Dillo, DANS, The Netherlands

THE NATIONAL DATA SERVICE(S) & NDS CONSORTIUM A Call to Action for Accelerating Discovery Through Data Services we can Build Ed Seidel

INSPIRE & Environment Data in the EU

Giovanni Lamanna LAPP - Laboratoire d'annecy-le-vieux de Physique des Particules, Université de Savoie, CNRS/IN2P3, Annecy-le-Vieux, France

EUDAT-B2FIND A FAIR and Interdisciplinary Discovery Portal for Research Data

The 5G Infrastructure Association

KIC Added-value Manufacturing: Exploiting synergies and complementarities with EU policies and programmes

Data Management: the What, When and How

GEOSS Data Management Principles: Importance and Implementation

Security in Big and Open Data - WG. Alessandra Scicchitano GÉANT - Project Development Officer Ralph Niederberger Jülich Supercomputing Center (FZJ)

Horizon 2020 and the Open Research Data pilot. Sarah Jones Digital Curation Centre, Glasgow

Supporting European e-infrastructure service providers to join the European Open Science Cloud

PID System for eresearch

Caring for research data and what about software? Peter Doorn, director DANS

e-research Infrastructures for e-science Axel Berg SARA national HPC & e-science support center RAMIRI, June 15, 2011

Data Staging: Moving large amounts of data around, and moving it close to compute resources

OpenAIRE From Pilot to Service

e-infrastructure: objectives and strategy in FP7

EOSC Services & Architecture: the EOSC-hub approach Tiziana Ferrari, Project Coordinator, EGI Founda?on

Federated Identity Management for Research Collaborations. Bob Jones IT dept CERN 29 October 2013

EPOS: European Plate Observing System

GRIDS INTRODUCTION TO GRID INFRASTRUCTURES. Fabrizio Gagliardi

Towards a joint service catalogue for e-infrastructure services

European Open Science Cloud Implementation roadmap: translating the vision into practice. September 2018

VI-SEEM Data Repository. Presented by: Panayiotis Charalambous

CLARIN s central infrastructure. Dieter Van Uytvanck CLARIN-PLUS Tools & Services Workshop 2 June 2016 Vienna

Deliverable 6.4. Initial Data Management Plan. RINGO (GA no ) PUBLIC; R. Readiness of ICOS for Necessities of integrated Global Observations

European digital repository certification: the way forward

META-SHARE: An Open Resource Exchange Infrastructure for Stimulating Research and Innovation

Global Data Sharing The Research Data Alliance

1 Overview chart. PIDs: talk with EPIC PIDs: MoU or advice. Assessment wave 3. VLO overhaul CMDI 1.2

OpenAIRE. Fostering the social and technical links that enable Open Science in Europe and beyond

EPOS a long term integration plan of research infrastructures for solid Earth Science in Europe

InfraStructure for the European Network for Earth System modelling. From «IS-ENES» to IS-ENES2

Petaflop Computing in the European HPC Ecosystem

Some Big Data Challenges

Research Data Management & Preservation: A Library Perspective

RDA DFT Data Models 1: Overview

SDI and the Key Elements

The PICTURE project, ICT R&I priorities in EaP, areas of cooperation

Remote Workflow Enactment using Docker and the Generic Execution Framework in EUDAT

EGM, 9-10 December A World that Counts: Mobilising the Data Revolution for Sustainable Development. 9 December 2014 BACKGROUND

Making Open Data work for Europe

Sharing research data globally and supporting re-use - Playing your part Leif Laaksonen /National Archive of Finland

Managing very large Multimedia Archives and their Integration into Federations

GATE: Big Data for Smart Society Dessislava Petrova-Antonova Sofia University St. Kliment Ohridski Faculty of Mathematics and Informatics

Certification. F. Genova (thanks to I. Dillo and Hervé L Hours)

Transcription:

European Collaborative Data Infrastructure EUDAT - Training on EUDAT Principles - Dr.-Ing. Morris Riedel Federated Systems and Data Juelich Supercomputing Centre (JSC) Adjunct Associate Professor School of Engineering and Natural Sciences University of Iceland

Outline Motivation & EUDAT & IAGOS Context Short Training on Key Principles How to create a collaborative data infrastructure How to create a registered domain of data How to perform policy-based data replication Summary & Possible Actions References 2

Motivation & EUDAT & IAGOS Context 3

Big Data Waves Surfboards Breakwaters How can we manage the rising tide of scientific data High Level Expert Group on Scientific Data Report Lists unsolved questions Outlines challenges Provides visions A Surfboard for Riding The Wave Report Lists 4 key action drivers Identifies 3 strategic goals Clarifies Data Scientists Concrete Next Steps Breakwaters

Data trends Exponential growth Zettabytes Exabytes Petabytes Terabytes Gigabytes Increasing complexity and variety Where to store it? How to find it? How to make the most of it? How to ensure interoperability? How to engage in cross-disciplinary science? 5

In commercial environments Big Data is all about Volume Variety Velocity Big Data is data that becomes large enough that it cannot be processed using conventional methods. [1] O Reilly Radar Team, Big Data Now: Current Perspectives from O'Reilly Radar LOFAR test site Jülich The next generation radio telescope for science The square kilometre array 1 PB in 20 seconds

[2] EUDAT Web page EUDAT Collaborate to tackle big data 7

Training & Working habits with Communities Example: Persistent Identifiers for Data (PIDs) MINDSET Do we all think that PIDs for scientific data are important? Do we agree that safe replication of data is important for unique scientific data sets? SKILLSET How do we use PIDs and what type of PID structures are relevant? What are the techniques to perform data replication and how it relates to PIDs and metadata? TOOLSET Do we use (common) services and tools to work with PIDs being part of community practice? Are there established (common) data replication services and tools available we can use daily? 8

Explore possibilities for IAGOS & EUDAT MINDSET Smooth metadata sharing with other communities (e.g. climate communities), MOZAIC and IAGOS data policies good example in science for re-use Clarify long-term relationships to sustain the ecosystem around the IAGOS/MOZAIC database and ensure its free access for science & society SKILLSET TOOLSET IAGOS/MOZAIC database: observations (1994 today) Persistent Identifier (PIDs) for IAGOS/MOZAIC datasets to get referenced clearly in publications (example US & Owen Cooper this morning) Safe Replication of Datasets for better re-use by scientific research communities in Europe and beyond (e.g. good contacts to US) 9

Possible Concrete example: Discussions over lunch Morris + Owen IAGOS & EUDAT & Research Data Alliance (RDA) Promote a case for world-wide scientific IAGOS research RDA : Share Open Research Data w/o barriers Here: EU (Morris&IAGOS) & US (Owen et al.) + China? Create interest group (e.g. similar as the agricultural interoperability group, but with IAGOS interests) Align with RDA Big Data Analytics group that is interested to work with MOZAIC, and show interop use cases (Owen) IAGOS/MOZAIC database: observations (1994 today) 10

How to create a collaborative data infrastructure Training on Skillset Level 11

Blueprint of a Collaborative Data Infrastructure CLARIN, need experts LifeWatch, close to ENES, the communities EPOS, VPH, knowing INCF, etc. the 6 methods, Core Infrastructures traditions and - more second culturesround (research infrastructures) need a layer to take care => about 12 EUDAT all common, data centers crossdiscipline services How can we link to the existing IAGOS data infrastructure to create benefits for science? [4] High Level Expert Group on Scientific Data, Riding The Wave How Europe can gain from the rising tide of scientific data, October 2010 12

Conceptual View of a CDI [1] M. Riedel and P. Wittenburg et al. A Data Infrastructure Reference Model with Applications: Towards Realization of a ScienceTube Vision with a Data Replication Service 13

Analysis of existing community data infrastructure community interactions based on abstract model (Kahn & Wilensky, 2006) o triple : Data + Metadata + Handle (PID) use it as orientation point! used in many meetings and interactions accepted quickly as reference model helped even in improving community organization plans originator depositor repository A user work ownership requests hands over We might start investigating the IAGOS infrastructure like this deposits via RAP data metadata (Key MD) PID access rights requests stores registered DO data metadata (Key MD) handle generator maintains receives disseminations via RAP PID property record rights type (from central registry) ROR flag mutable flag transaction record replicates repository B Definitions/Entities originator = creates digital works and is owner; depositor = forms work into DO (incl. metadata), digital object (DO) = instance of an abstract data type; registered DOs are such DOs with a Handle; repository (Rep) = network accessible storage to store DOs; RAP (Rep access protocol) = simple access protocol Dissemination = is the data stream a user receives ROR (repository of record) = the repository where data was stored first; Meta Objects (MO) = are objects with properties mutable DOs = some DOs can be modified property record = contains various info about DO type = data of DOs have a type transaction record = all disseminations of a DO 14

[2] EUDAT Web page Example: EUDAT Centres Overlap of several formal partners in IAGOS & EUDAT community centre EUDAT centre 15

Clear Task: Identify Common Services If there are hundreds of Research Infrastructures, how many different data management systems can we sustain? Are there services you might want to share/re-use with/from other communities? 16

Example: Current EUDAT Services Focus Common Services Safe Replication Data Staging Simple Store CLA RIN LW VPH EN ES EP OS IN CF EC RIN Bio Vel Dixa CES SDA X o X X X X x x o o X X X DAR IAH X X X X X X x x x x x x Pan Dat a BB MRI EM SO Metadata X X o X x x x x x x x x x x Webservice platform X o X o X = needed now, x = interested, o = interest, not direct priority Are there services where IAGOS is interested in to use across Europe? 17

Example: EUDAT Services in Preparation Metadata Catalogue Aggregated EUDAT metadata domain. Data inventory Safe Replication Data preservation and access optimization Data Staging Dynamic replication to HPC workspace for processing Simple Store Researcher data store (simple upload, share and access) AAI Network of trust among authentication and authorization actors PID Anchor for identification and integrity Data PID Metadata [2] EUDAT Web page 18

Example: EUDAT user communities EPOS: European Plate Observatory System CLARIN: Common Language Resources and Technology Infrastructure ENES: Service for Climate Modelling in Europe LifeWatch: Biodiversity Data and Observatories VPH: The Virtual Physiological Human INCF: The neuroscience community All share common challenges: Reference models and architectures Persistent data identifiers Metadata management Distributed data sources Data interoperability 19

Example: EUDAT Community Centres

ScienceTube : User perspective of CDI [1] M. Riedel and P. Wittenburg et al. A Data Infrastructure Reference Model with Applications: Towards Realization of a ScienceTube Vision with a Data Replication Service 21

Lessons Learned in this Training Section Accept that many communities have already a data infrastructure, so we need to connect it Knowing triple to organize/understand data plans Understand the major blueprint of a Collaborative Data Infrastructure (CDI) Capable of identifying common data services Knowing the difference between mono-thematic community center and multi-disciplinary centers 22

How to create a registered domain of data Training on Skillset Level 23

Blueprint for a registered domain of data We need to understand if this is interesting to IAGOS users, e.g. use in publications 24

Key Principle of Handle System PID 25

Example: EUDAT Safe Replication Service 26

Use Persistent Identifiers to Identify Data Use Persistent Identifiers (PIDs) Based on the Handle System Used to reference data, including different locations Requires a PID Service One example is the EPIC PID service E.g. register a PID specifying a URI EPIC = European Persistent Identifier Consortium 27

Example: EUDAT Use of EPIC Service 28

Lessons Learned in this Training Section Understand the structure of one possible registered domain of (scientific) data Accept that the handle system is a pragmatic way to identify data not bound to location Knowing that you need Persistent Identifier (PIDs) as reference to digital objects (data) Capable of creating a theoretical use case that is using PIDs and an associated PID service (EPIC) 29

How to perform policy-based data replication Training on Skillset Level 30

Blueprint for safe data replication Better accessibility of scientific data High degrees of reliability and trust Make data referencable More optimal data curation 31

Safe Data Replication Approach Idea: Safe replication between 1 scientific community center and N data centers Replication within a registered domain of data (i.e. PID assigment) Flexibility, scalability and management require policy-based data management (i.e. rule engine) With local policies at centers and global policies for infrastructure(s) Islands (community + data centers) in parallel & close interaction merge? Enabling community as process for acknowledging existing data management plans of communities Safe Replication Data preservation and access optimization

Example EUDAT: Forming strong relationships We could form long-term partner relationships to preserve the observation data and services and to make it largely available 33

Example: EUDAT Science Relationships Are there scientific relationships across EU (or US) we could support? 34

Example: EUDAT Safe Replication Service Metadata Catalogue Aggregated EUDAT metadata domain. Data inventory Safe Replication Data preservation and access optimization Data Staging Dynamic replication to HPC workspace for processing Simple Store Researcher data store (simple upload, share and access) AAI Network of trust among authentication and authorization actors PID Anchor for identification and integrity Data PID Metadata 35

Federated Approach for Use Cases Create M replications at different data centers for N years, exclude data centers X to data centers Z from the replication scheme and make them all accessible by maintaining the given access permissions. 36

EUDAT: Example of Safe Replication 37

Overview: Manual Upload Replicated File Need to understand federations and zones in irods 38

Overview: Rule-based data management Need to understand rules & micro-services in irods 39

Use of Persistent Identifier (PID) Service 40

Lessons Learned in this Training Section Understand that long-term relationships matter Knowing the difference between simple backups and safe data replication Understand key aspects of policy-based replication by defining policies on different levels (i.e. rules global/local/infrastructure) Having an idea for what rules can be used in the context of the registered domain of data 41

Summary & Possible Actions 42

Summary Services are available Safe Replication and Data Staging in operation for a few data centers of core communities Simple Store and MetaData Services will come soon Production means enabling the services together with user communities Worked hard to get this done and to understand how to interface with communities Each community is different it is a long term process of working together Needed to chose for some technologies but take care of technology lock in irods just as a thin layer for example and not as a system doing all There is a far way between we know how it works and having a real service Communities & researchers are interested in operational services Go ahead and extend the collaborative infrastructure with three levels of thinking Working habit of Mindset, Skillset, Toolset 43

Possible Actions Together Synergies between EUDAT and IAGOS seem to exist (also through common partners) Opening data to much more communities, increase IAGOS/MOZAIC uptake Long term preservation and link of metadata to others We need to understand IAGOS better from EUDAT perspective Data Management Plans and links to added value services EUDAT is forming Working Groups Dynamic Data (database, real time transmissions), Scientific Workflows, etc. Explore possibilities for creating an MoU between IAGOS and EUDAT Research Data Alliance (RDA) for research data sharing without barriers Community Group for the IAGOS community to link with US(+China) activities? Something similiar as Agriculture Interoperability Interest Group 44

Thanks for the attention. Get in contact with us: http://www.eudat.eu Join the Research Data Alliance http://rd alliance.org/ 45

References 46

References [1] M. Riedel, P. Wittenburg, J. Reetz, M. van de Sanden, J. Rybicki, B. von St. Vieth, G. Fiameni, G. Mariani, A. Michelini, C. Cacciari, W. Elbers, D. Broeder, R. Verkerk, E. Erastova, M. Lautenschlaeger, R. Budig, H. Thielmann, P. Coveney, S. Zasada, A. Haidar, O. Buechner, C. Manzano, S. Memon, S. Memon, H. Helin, J. Suhonen, D. Lecarpentier, K. Koski and Th. Lippert, A Data Infrastructure Reference Model with Applications: Towards Realization of a ScienceTube Vision with a Data Replication Service, Journal of Internet Services and Applications, Volume 4, Issue 1 [2] EUDAT Web Page, Available online: http://www.eudat.eu [3] RDA Web Page, Available online: http://rd-alliance.org [4] High Level Expert Group on Scientific Data, Riding The Wave How Europe can gain from the rising tide of scientific data, Submission to the European Commission, October 2010, Available online: http://cordis.europa.eu/fp7/ict/e-infrastructure/docs/hlg-sdi-report.pdf [5] Knowledge Exchange Partners, A Surfboard for Riding the Wave Towards a four country action programme on research data, published 2011, updated 2012, Available online: http://www.knowledgeexchange.info/surfboard 47