Towards FAIRness in research data. Per Öster, 3 October 2018

Similar documents
Towards FAIRness: some reflections from an Earth Science perspective

The EUDAT Collaborative Data Infrastructure

Data Discovery - Introduction

Data management and discovery

Webinar: Getting started with CSC's IaaS cloud computing services Pouta

EUDAT- Towards a Global Collaborative Data Infrastructure

Deliverable 6.4. Initial Data Management Plan. RINGO (GA no ) PUBLIC; R. Readiness of ICOS for Necessities of integrated Global Observations

Assessing the FAIRness of Datasets in Trustworthy Digital Repositories: a 5 star scale

Mercè Crosas, Ph.D. Chief Data Science and Technology Officer Institute for Quantitative Social Science (IQSS) Harvard

Inge Van Nieuwerburgh OpenAIRE NOAD Belgium. Tools&Services. OpenAIRE EUDAT. can be reused under the CC BY license

DATA MANAGEMENT PLANS Requirements and Recommendations for H2020 Projects. Matthias Razum April 20, 2018

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Damien Lecarpentier CSC-IT Center for Science, Finland EUDAT User Forum, Barcelona

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT and Cloud Services

EUDAT. Towards a pan-european Collaborative Data Infrastructure

GEOSS Data Management Principles: Importance and Implementation

Coupled Computing and Data Analytics to support Science EGI Viewpoint Yannick Legré, EGI.eu Director

ELIXIR Human Data Use Case

EUDAT-B2FIND A FAIR and Interdisciplinary Discovery Portal for Research Data

How FAIR am I? FAIR Principles and Interoperability of Data and Tools

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

EUDAT Data Services & Tools for Researchers and Communities. Dr. Per Öster Director, Research Infrastructures CSC IT Center for Science Ltd

Welcome to the Pure International Conference. Jill Lindmeier HR, Brand and Event Manager Oct 31, 2018

I data set della ricerca ed il progetto EUDAT

Developing a Research Data Policy

FAIR-aligned Scientific Repositories: Essential Infrastructure for Open and FAIR Data

DuraSpace FAIRness and GDPR

EGI federated e-infrastructure, a building block for the Open Science Commons

Persistent Identifier the data publishing perspective. Sünje Dallmeier-Tiessen, CERN 1

EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture. Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members

Science Europe Consultation on Research Data Management

EUDAT - Open Data Services for Research

Dataverse and DataTags

Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013

Towards a joint service catalogue for e-infrastructure services

Checklist and guidance for a Data Management Plan, v1.0

Fundamentals of Data Infrastructures

Data Management Plans. Sarah Jones Digital Curation Centre, Glasgow

ISMTE Best Practices Around Data for Journals, and How to Follow Them" Brooks Hanson Director, Publications, AGU

The Materials Data Facility

Perspectives on Open Data in Science Open Data in Science: Challenges & Opportunities for Europe

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Indiana University Research Technology and the Research Data Alliance

EUDAT & SeaDataCloud

BPMN Processes for machine-actionable DMPs

EarthCube and Cyberinfrastructure for the Earth Sciences: Lessons and Perspective from OpenTopography

Make your own data management plan

Scientific Data Policy of European X-Ray Free-Electron Laser Facility GmbH

ELIXIR Compute platform

Data Staging and Data Movement with EUDAT. Course Introduction Helsinki 10 th -12 th September, Course Timetable TODAY

European Cloud Initiative: implementation status. Augusto BURGUEÑO ARJONA European Commission DG CNECT Unit C1: e-infrastructure and Science Cloud

Data Curation Handbook Steps

Webinar Annotate data in the EUDAT CDI

Why CERIF? Keith G Jeffery Scientific Coordinator ERCIM Anne Assserson eurocris. Keith G Jeffery SDSVoc Workshop Amsterdam

DEVELOPING, ENABLING, AND SUPPORTING DATA AND REPOSITORY CERTIFICATION

EUDAT. Towards a pan-european Collaborative Data Infrastructure - A Nordic Perspective? -

Data management Backgrounds and steps to implementation; A pragmatic approach.

WP4: Data Forum. Øystein Godøy, Boris Radosavljević, Boris Biskaborn, Anna Irrgang

Using EUDAT services to replicate, store, share, and find cultural heritage data

TEXT MINING: THE NEXT DATA FRONTIER

Open Science, FAIR data and effective data management

Research Data Preserve, Share, Reuse, Publish, or Perish Mark Gahegan Director, Centre for eresearch 24 Symonds St

Making Sense of Data: What You Need to know about Persistent Identifiers, Best Practices, and Funder Requirements

Data Insight Feature Briefing Box Cloud Storage Support

EUDAT & AAI. Daan Broeder MPI for Psycholinguistics

The iplant Data Commons

EOSC Services & Architecture: the EOSC-hub approach Tiziana Ferrari, Project Coordinator, EGI Founda?on

B2SAFE metadata management

Enabling Open Science: Data Discoverability, Access and Use. Jo McEntyre Head of Literature Services

Introduction to Data Management

Federated Identity Management for Research Collaborations. Bob Jones IT dept CERN 29 October 2013

B2FIND and Metadata Quality

Fusion Registry 9 SDMX Data and Metadata Management System

Adding Cloud Based Interactive Compute Capabilities to Globus Endpoints

Mass Digitisation Enabling Access, Use and Reuse

Open Data Policy City of Irving

Big Data infrastructure and tools in libraries

DOIs for Research Data

A USER S GUIDE TO REGISTERING AND MAINTAINING DATA SERVICES IN HIS CENTRAL 2.0

Tools for Data Management. Research Data Management : Session 3 9 th June 2015

B2DROP The EUDAT Personal Cloud Storage

Basic Requirements for Research Infrastructures in Europe

EUDAT Towards a Collaborative Data Infrastructure

How to make your data open

How to share research data

Fair data and open data: differences and consequences

Key Elements of Global Data Infrastructures

Data publication and discovery with Globus

Retain, search, review and produce government mobile text messages

Striving for efficiency

META-SHARE: An Open Resource Exchange Infrastructure for Stimulating Research and Innovation

MemSQL Partner Program Guide

Reproducible & Transparent Computational Science with Galaxy. Jeremy Goecks The Galaxy Team

Digital repositories as research infrastructure: a UK perspective

Cheshire 3 Framework White Paper: Implementing Support for Digital Repositories in a Data Grid Environment

SEAD Data Services. Jim Best Practices in Data Infrastructure Workshop. Cooperative agreement #OCI

Versioning with PIDs

clarin:el an infrastructure for documenting, sharing and processing language data

Interoperability and transparency The European context

ZB MED Information Center Life Sciences

Transcription:

Towards FAIRness in research data Per Öster, 3 October 2018 CSC Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus

Towards FAIR data! Anyone expecting that an advertising company or a retailer of merchandise and media will take us there? 2

3 17.10.2018

Data Generation Data Archiving Secure data access remote API ( GA4GH ) Sequencing centers EGA at High speed encrypted data transfer GridFTP/Globus/Aspera Supporting sample logistics Services and Coordination Managing Access Data Owner Data Access Agreement Data Access Committee Data Request Authorization Management Tools ( EGA and CSC REMS ) Federated Authentication Authorization Dataset registry Data transfer hub Policy and Legal Framework Data Users Bringing users to data 4 Secure Compute Clouds

From field measurements to open data 17.10.2018 5 Questions: Sensitive data? Requirements on Authentication and authorization?

6 Instrument Measuring PCs: Raw data at the stations File servers at stations: Raw data and field diaries, cal documents A/D conversion unit conversion Field documentation 17.10.2018 SMEAR data flow Routine data processing = (- unit conversion) - calibration correction - quality check, gapfilling - averaging over space or time Researchers, Data processing server File servers in Helsinki: Raw and intermediate data, documents, scripts ICOS, EBAS,... databases: Near real time and processed data outside UH Feedback on data quality Metadata Metadata Metadata SMEAR database: Processed data in Helsinki IDA (CSC data service): Raw data & document archive, database datasets https://avaa.tdata.fi/web/smart/smear

I t's real! Myexperiment.org Figshare.com Citizens science Dataintensive Scistarter.com Open code An#emerging# ecosystem#of# services#and# standards# Runmycode. org Open workflow s Analysis Preprint ArXiv Open data Datadryad.org Data gathering Publication Open access Roar.eprints. org Open annotation Openannotation.org Researchgate.com Conceptualisation Scient ific blogs Academia.edu Review Collaborative bibliographies Mendeley.com Alternative Reputation syst em s Impact Story Altmetric.com 2! 7 10/17/2018

Support in All Phases of Research Process Plan Customer Portal Experts Guides Websites Training Service Desk Produce & Collect Data International resources Modelling Software Supercomputers Analyse Cloud Services Training Data science Computing Software Store B2SAFE B2SHARE HPC Archive IDA Databases Research longterm preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender 8 2016

Termination. Company may terminate your access to all or any part of the Service at any time, with or without cause, with or without notice, effective immediately, which may result in the forfeiture and destruction of all information associated with your account, including User Submissions. If you wish to terminate your account, you may do so by following instructions available on the Site. Any fees paid hereunder are non-refundable. All provisions of the Terms of Use which by their nature should survive termination shall survive termination, including, without limitation, ownership provisions, warranty disclaimers, indemnity and limitations of liability.

ARTICLE 2: DISCLAIMER 1. The Service is provided "as is" and the Provider disclaims any and all representations and warranties, whether express or implied, including;- but not limited to;- implied warranties of title, merchantability, fitness for any particular purpose or non-infringement. The Provider does not promise any specific results, effects or outcome from the use of the Service. 2. 3. The Provider reserves the right to change, reduce, interrupt or discontinue the Service or parts of it at any time. 4. No one has a right to use the Service; the Provider reserves the right to exclude certain Users.

Are the commercial services sufficient? Nice complement but can not serve as the fundamental infrastructure for research data of national and international interest Need for publicly funded and operated infrastructure e-science Data Factory 11 17.10.2018

THE FAIR DATA PRINCIPLES FINDABLE: o Data are assigned a globally unique and eternally persistent identifier. o Data are described with rich metadata. o (Meta)data are registered or indexed in a searchable resource. o metadata specify the data identifier. ACCESSIBLE: o (Meta)data are retrievable by their identifier using a standardized communications protocol. o The protocol is open, free, and universally implementable. o The protocol allows for an authentication and authorization procedure, where necessary. o Metadata are accessible, even when the data are no longer available. INTEROPERABLE o (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. o (Meta)data use vocabularies that follow FAIR principles. o (Meta)data include qualified references to other (meta)data. RE-USABLE: o Meta(data) have a plurality of accurate and relevant attributes. o Meta(data) are released with a clear and accessible data usage license. o Meta(data) are associated with their provenance. o Meta(data) meet domain-relevant community standards. https://www.force11.org/group/fairgroup/fairprinciples 13 17.10.2018

RESEARCH DATA F FINDABLE A ACCESSIBLE Described in relevant catalog with enough detail Landing page with globally unique identifier Can be retrieved over the internet Versioning and lifecycle documented Tombstone page if data is deleted I INTEROPERABLE Common, documented, and open formats R RE-USABLE Well documented and intelligible Rights clearly stated 14

Services to support the data lifecycle Plan data management with DMPTuuli Discover research datasets via Etsin and B2Find and make sure your data is discoverable, too Store data needed in analysis in CSC s user directories or within your cloud environment Re-use Plan Discover, produce & collect Store stable data in IDA, B2SHARE, HPC Archive or in dedicated databases (Kaivos etc.) Share stored data via B2SHARE or IDA, send large files with FUNET FileSender Share Analyse Reuse open research datasets: geo-informatics data (PaiTuli), speech and text corpora (Language Bank) and from various fields of science (AVAA) Store

The EUDAT Service Suite

Services to support the data lifecycle https://research.csc.fi/data-management-and-analytics

Fairdata services IDA store research data ETSIN find research data QVAIN describe metadata FAIRDATA-PAS Preserve research outputs In addition (1) metadata repository and (2) authentication solution Authentication, ontology, and other concurrent services

Storage Describting Availability Management Components Data producer Data user Data preserver Front end IDA Qvain Etsin Fairdata PAS management interface Back end Fairdata PAS packaging service AAI PAS solution Metax Metadata repository

Persistent Identifiers Resolver Landing page Data file PID Read me License Metadata repository Configuration files

Operational data Active data Not citable Dynamic, volatile 22

Published data Static and persistent Assigned a PID Product of specific research Possibly low reusability Needed for review and transparence 23

Generic Research data Validated by and for researchers Possible to cite Documented (metadata) and versioned Can be dynamic or cumulative 24

25

The Target Local - Storage - Backup - Computing National - Sharing - Long term preservation International - Sharing - Collaboration - Publishing 26

Acknowledgment Timo Vesala, INAR RI, Helsinki University Ville Tenhunen, Helsinki Univeristy Jessica Parland-von Essen, CSC and Fairdata.fi Tommi Nyrönen, ELIXIR-Finland Head of Node, CSC Mikael Linden, CSC Damien Lecarpentier, EUDAT, CSC 27 17.10.2018

facebook.com/cscfi CSC IT Center for Science Ltd twitter.com/cscfi Per Öster Director, Research Infrastructures & Policy per.oster@csc.fi youtube.com/cscfi linkedin.com/company/csc---it-center-for-science github.com/cscfi Kuvat CSC:n arkisto ja Thinkstock