Data management and discovery

Similar documents
EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture. Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members

Data Discovery - Introduction

Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013

EUDAT Data Services & Tools for Researchers and Communities. Dr. Per Öster Director, Research Infrastructures CSC IT Center for Science Ltd

EUDAT - Open Data Services for Research

The EUDAT Collaborative Data Infrastructure

EUDAT- Towards a Global Collaborative Data Infrastructure

Inge Van Nieuwerburgh OpenAIRE NOAD Belgium. Tools&Services. OpenAIRE EUDAT. can be reused under the CC BY license

EUDAT. Towards a pan-european Collaborative Data Infrastructure

I data set della ricerca ed il progetto EUDAT

EUDAT Common data infrastructure

EUDAT. Towards a Collaborative Data Infrastructure. Ari Lukkarinen CSC-IT Center for Science, Finland NORDUnet 2012 Oslo, 18 August 2012

Using EUDAT services to replicate, store, share, and find cultural heritage data

Data Staging and Data Movement with EUDAT. Course Introduction Helsinki 10 th -12 th September, Course Timetable TODAY

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Damien Lecarpentier CSC-IT Center for Science, Finland EUDAT User Forum, Barcelona

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Fundamentals of Data Infrastructures

EUDAT & AAI. Daan Broeder MPI for Psycholinguistics

EUDAT and Cloud Services

irods workflows for the data management in the EUDAT pan-european infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure - A Nordic Perspective? -

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

EUDAT & SeaDataCloud

EUDAT. Towards a pan-european Collaborative Data Infrastructure. KNMI Workshop, Utrecht, Netherlands

Safe Replication and Data Staging

ODC and future EIDA/ EPOS-S plans within EUDAT2020. Luca Trani and the EIDA Team Acknowledgements to SURFsara and the B2SAFE team

EUDAT Towards a Collaborative Data Infrastructure

Security in Big and Open Data - WG. Alessandra Scicchitano GÉANT - Project Development Officer Ralph Niederberger Jülich Supercomputing Center (FZJ)

B2SAFE metadata management

CLARIN s central infrastructure. Dieter Van Uytvanck CLARIN-PLUS Tools & Services Workshop 2 June 2016 Vienna

Assessing the FAIRness of Datasets in Trustworthy Digital Repositories: a 5 star scale

Towards FAIRness in research data. Per Öster, 3 October 2018

DCH-RP Trust-Building Report

Deliverable 6.4. Initial Data Management Plan. RINGO (GA no ) PUBLIC; R. Readiness of ICOS for Necessities of integrated Global Observations

SURFsara Data Services

EUDAT-B2FIND A FAIR and Interdisciplinary Discovery Portal for Research Data

USE CASES IN SEISMOLOGY. Alberto Michelini INGV

European Collaborative Data Infrastructure EUDAT - Training on EUDAT Principles -

Graph-based Data Integration in EUDAT Data Infrastructure

Key Elements of Global Data Infrastructures

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

Towards FAIRness: some reflections from an Earth Science perspective

A national approach for storage scale-out scenarios based on irods

Certification. F. Genova (thanks to I. Dillo and Hervé L Hours)

Trust and Certification: the case for Trustworthy Digital Repositories. RDA Europe webinar, 14 February 2017 Ingrid Dillo, DANS, The Netherlands

GEOSS Data Management Principles: Importance and Implementation

B2FIND: EUDAT Metadata Service. Daan Broeder, et al. EUDAT Metadata Task Force

Science Europe Consultation on Research Data Management

META-SHARE: An Open Resource Exchange Infrastructure for Stimulating Research and Innovation

EOSC Services & Architecture: the EOSC-hub approach Tiziana Ferrari, Project Coordinator, EGI Founda?on

B2FIND and Metadata Quality

Ensuring Proper Storage for Earth Science Data: The USGS Process to Certify Trusted Digital Repositories

NorStore. a national infrastructure for scientific data. Andreas O Jaunsen UNINETT Sigma as

How FAIR am I? FAIR Principles and Interoperability of Data and Tools

Data Curation Handbook Steps

The Materials Data Facility

Towards a joint service catalogue for e-infrastructure services

EUROPEANA METADATA INGESTION , Helsinki, Finland

Implementation of the Data Seal of Approval

Welcome to the Pure International Conference. Jill Lindmeier HR, Brand and Event Manager Oct 31, 2018

RADAR Introduction and Basic Concepts. Matthias Razum

RADAR Project. Data Archival and Publication as a Service. Matthias Razum FIZ Karlsruhe RESEARCH DATA REPOSITORIUM. Zurich, December 15, 2014

FLAT: A CLARIN-compatible repository solution based on Fedora Commons

e-infrastructure: objectives and strategy in FP7

EGI federated e-infrastructure, a building block for the Open Science Commons

ProQuest Dissertations and Theses Overview. Austin McLean and Marlene Coles CGS Summer Workshop, July 2017

Sharing research data globally and supporting re-use - Playing your part Leif Laaksonen /National Archive of Finland

e-infrastructures in FP7 INFO DAY - Paris

WORKSHOP 28 TH /29 TH APRIL Christine Staiger

Building a Dutch National Research Infrastructure IRODS UGM 2017

Coupled Computing and Data Analytics to support Science EGI Viewpoint Yannick Legré, EGI.eu Director

Linda Strick Fraunhofer FOKUS. EOSC Summit - Rules of Participation Workshop, Brussels 11th June 2018

Indiana University Research Technology and the Research Data Alliance

Big Data infrastructure and tools in libraries

irods Security Aspects Willem Elbers CLARIN-ERIC, Netherlands

Federated Identity Management for Research Collaborations. Bob Jones IT dept CERN 29 October 2013

Webinar Annotate data in the EUDAT CDI

Building a Materials Data Facility (MDF)

RADAR A Repository for Long Tail Data

Data Management Plans. Sarah Jones Digital Curation Centre, Glasgow

Slide 1 & 2 Technical issues Slide 3 Technical expertise (continued...)

Dutch View on URN:NBN and Related PID Services

How to share research data

Digital Curators: Who, What, & How

The National Digital Library Finna Among Digital Research Infrastructures in Finland

Giovanni Lamanna LAPP - Laboratoire d'annecy-le-vieux de Physique des Particules, Université de Savoie, CNRS/IN2P3, Annecy-le-Vieux, France

epic and the Handle System

RADAR. Establishing a generic Research Data Repository: RESEARCH DATA REPOSITORY. Dr. Angelina Kraft

Remote Workflow Enactment using Docker and the Generic Execution Framework in EUDAT

Implementation of the Data Seal of Approval

Different Aspects of Digital Preservation

DRI: Dr Aileen O Carroll Policy Manager Digital Repository of Ireland Royal Irish Academy

Mercè Crosas, Ph.D. Chief Data Science and Technology Officer Institute for Quantitative Social Science (IQSS) Harvard

Data Management A Scientific Tour

B2DROP The EUDAT Personal Cloud Storage

Persistent identifiers, long-term access and the DiVA preservation strategy

Building on Existing Communities: the Virtual Astronomical Observatory (and NIST)

irods at TACC: Secure Infrastructure for Open Science Chris Jordan

Data publication and discovery with Globus

Transcription:

Data management and discovery using EUDAT services Hans van Piggelen SURFsara, The Netherlands

EUDAT Mission offer common data services in CDI to all European researchers services will address the needs of big data volumes as well as of long tail of data respect the communities choices of data organizations achieve harmonization and efficiency in the long term Life Science Workshop, November 13th 2014

Data discovery How to make sure your published data set can be discovered by anyone, anytime? How to store it online? How to make it findable? How to make it uniquely identifiable? How to make the most out of your data? How to make it available, even after 20 years? How to find other data sets? Data repositories and search services! 3

Data management and policies Process of controlling information generated during a (research) project Availability, authenticity, discoverability, curation Preferably: Manage data sets online with ease Set policies, e.g.: Data access per user or group Automatic data replication on file level Share your data sets (publicly) Integration with other services 4

Data repositories Easily share data with collaborators and other researchers Uniquely identify data sets using persistent identifiers Add metadata to improve quality and discoverability Calculate checksums for data objects For small data up to large data sets Curate data for long-term storage 5

Metadata templates Standardised metadata schemes for improved discoverability Defined by research community or organisation In addition to obligatory minimal metadata Generally searchable in any connected service 6

Persistent Identifiers (PIDs) Unique identifier for every uploaded data set Ensures long-term: Authenticity Traceability Discoverability Integrity Persistent: identifier will never change But also: referred data will never change EUDAT: EPIC PID service 7

B2FIND: Metadata Search Service Easily find collections of scientific data generated either by various communities or via EUDAT services Access those data collections through the given references in the metadata to the relevant data stores Europeana of scientific data EUDAT CDI Domain of registered data 8

B2SHARE: Data sharing service Online data repository Web interface Easy deposit and sharing of data sets Public metadata and metadata schemes Multiple sharing levels Embargos EUDAT CDI Domain of registered data 9

B2SHARE portal B2FIND portal Simple upload Add metadata PID registration Metadata Metadata Metadata EUDAT CDI Domain of registered data 10

B2SAFE: Safe Replication Service Robust, safe and highly available data replication service for small- and medium- sized repositories Guard against data loss in long-term archiving and preservation Optimize access for user from different regions Bring data closer to powerful computers for compute-intensive analysis 11

B2STAGE: Data Staging Service Supports researchers in transferring large data collections from EUDAT storage to HPC facilities Reliable, efficient, and easy-to-use tools to manage data transfers Provide the means to re-ingest computational results back into the EUDAT infrastructure EUDAT CDI Domain of registered data PRACE HPC HPC 12

B2DROP: Sync & Exchange Service Allow registered users to upload long tail data Enable sharing objects and collections with other researchers Utilize other EUDAT services to provide reliability and data retention 13

EUDAT services overview 14

Research Timeline Before During AKer B2FIND Data Ingest Service B2DROP / SURFdrive BEEHUB B2STAGE GRID SE Central Archive Research B2SAFE B2SHARE EPIC PID Research Data Storage Trusted Digital Repository 15

EUDAT services overview B2FIND Aggregated EUDAT metadata domain Data inventory B2SAFE Data curaqon and access opqmizaqon B2STAGE Dynamic replicaqon to HPC workspace for processing B2SHARE Researcher data store (simple upload, share and access) AAI Network of trust among authenqcaqon and authorizaqon actors PID IdenQty Integrity AuthenQcity LocaQons B2DROP Easy sharing Local syncing 16