How FAIR am I? FAIR Principles and Interoperability of Data and Tools

Similar documents
Assessing the FAIRness of Datasets in Trustworthy Digital Repositories: a 5 star scale

Trusted Digital Archives

European digital repository certification: the way forward

Certification as a means of providing trust: the European framework. Ingrid Dillo Data Archiving and Networked Services

Trust and Certification: the case for Trustworthy Digital Repositories. RDA Europe webinar, 14 February 2017 Ingrid Dillo, DANS, The Netherlands

Caring for research data and what about software? Peter Doorn, director DANS

DATA MANAGEMENT PLANS Requirements and Recommendations for H2020 Projects. Matthias Razum April 20, 2018

Data Archiving and Networked Services. Valentijn Gilissen, MA

Science Europe Consultation on Research Data Management

Certification. F. Genova (thanks to I. Dillo and Hervé L Hours)

GEOSS Data Management Principles: Importance and Implementation

Towards FAIRness: some reflections from an Earth Science perspective

DuraSpace FAIRness and GDPR

DEVELOPING, ENABLING, AND SUPPORTING DATA AND REPOSITORY CERTIFICATION

FAIR-aligned Scientific Repositories: Essential Infrastructure for Open and FAIR Data

Dutch View on URN:NBN and Related PID Services

Linked Open Data: a short introduction

Data Management Plans. Sarah Jones Digital Curation Centre, Glasgow

From Open Data to Data- Intensive Science through CERIF

Indiana University Research Technology and the Research Data Alliance

European Conference on Quality and Methodology in Official Statistics (Q2008), 8-11, July, 2008, Rome - Italy

re3data.org - Making research data repositories visible and discoverable

Ensuring Proper Storage for Earth Science Data: The USGS Process to Certify Trusted Digital Repositories

What is FAIR? 5 th International Summer School on Rare Disease and Orphan Drug Registries. Claudio Carta 1 and Marco Roos 2

Deliverable 6.4. Initial Data Management Plan. RINGO (GA no ) PUBLIC; R. Readiness of ICOS for Necessities of integrated Global Observations

Application nestor Seal for Trustworthy Digital Archives

Fair data and open data: differences and consequences

Development of guidelines for publishing statistical data as linked open data

Data management and discovery

Corso di Biblioteche Digitali

ZB MED Information Center Life Sciences

/// INTEROPERABILITY BETWEEN METADATA STANDARDS: A REFERENCE IMPLEMENTATION FOR METADATA CATALOGUES

Interoperability and transparency The European context

Building Semantic Interoperability in Europe

FAIR Data for Open Science

Agenda. Bibliography

Inge Van Nieuwerburgh OpenAIRE NOAD Belgium. Tools&Services. OpenAIRE EUDAT. can be reused under the CC BY license

DOIs for Research Data

COMP6217 Social Networking Technologies Web evolution and the Social Semantic Web. Dr Thanassis Tiropanis

Promoting semantic interoperability between public administrations in Europe

Corso di Biblioteche Digitali

Testing the FAIR metrics on data catalogs

Reducing Consumer Uncertainty

World Data Center for Climate at DKRZ

Digital Preservation: How to Plan

Towards FAIRness in research data. Per Öster, 3 October 2018

Data is the new Oil (Ann Winblad)

The Preservation of Digital Records: the InterPARES approach (on the basis of its findings)

Certification Efforts at Nestor Working Group and cooperation with Certification Efforts at RLG/OCLC to become an international ISO standard

The European Commission s science and knowledge service. Joint Research Centre

DRI: Dr Aileen O Carroll Policy Manager Digital Repository of Ireland Royal Irish Academy

Technical documentation. SIOS Data Management Plan

9 March Assessment Policy for Qualifications and Part Qualifications on the Occupational Qualifications Sub-Framework (OQSF)

Dataverse and DataTags

Striving for efficiency

Guidelines 4/2018 on the accreditation of certification bodies under Article 43 of the General Data Protection Regulation (2016/679)

Key Elements of Global Data Infrastructures

DSA WDS Partnership Working Group Catalogue of Common Requirements

Applying Archival Science to Digital Curation: Advocacy for the Archivist s Role in Implementing and Managing Trusted Digital Repositories

Implementing the Army Net Centric Data Strategy in a Service Oriented Environment

Interoperability in Science Data: Stories from the Trenches

Meta-Bridge: A Development of Metadata Information Infrastructure in Japan

19 March Assessment Policy for Qualifications and Part Qualifications on the Occupational Qualifications Sub-Framework (OQSF)

Basic Requirements for Research Infrastructures in Europe

a paradigm for the Introduction to Semantic Web Semantic Web Angelica Lo Duca IIT-CNR Linked Open Data:

CTSA Program Common Metric for Informatics Solutions

Reproducible Workflows Biomedical Research. P Berlin, Germany

DSA WDS Partnership Working Group Catalogue of Common Requirements

Metadata Workshop 3 March 2006 Part 1

Preservation. Policy number: PP th March Table of Contents

Library of Congress BIBFRAME Pilot. NOTSL Fall Meeting October 30, 2015

Reducing Consumer Uncertainty Towards a Vocabulary for User-centric Geospatial Metadata

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Joint Steering Committee for Development of RDA

Enabling Open Science: Data Discoverability, Access and Use. Jo McEntyre Head of Literature Services

Rethinking Semantic Interoperability through Collaboration

Welcome to the Pure International Conference. Jill Lindmeier HR, Brand and Event Manager Oct 31, 2018

How can CLARIN archive and curate my resources?

Datos abiertos de Interés Lingüístico

Slide 1 & 2 Technical issues Slide 3 Technical expertise (continued...)

ARTICLE 29 DATA PROTECTION WORKING PARTY

What s a BA to do with Data? Discover and define standard data elements in business terms

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

Improving a Trustworthy Data Repository with ISO 16363

Chain of Preservation Model Diagrams and Definitions

IoT Standards Ecosystem, What s new?

Health Information Exchange Content Model Architecture Building Block HISO

Robin Wilson Director. Digital Identifiers Metadata Services

Building a missing item in INSPIRE: The Re3gistry

Open Data Solution Architecture Template (SAT) v Beta

Making Open Data work for Europe

Audit & Certification: an auditors perspective. Barbara Sierman, KB National Library of the Netherlands Royal Irish Academy, Dublin 4 june 2013

Implementation of the CoreTrustSeal

Developing a Research Data Policy

Data Exchange in the Earth Sciences

Rules and Process Steps for Certification of Training Organisations and Trainers. Certification Logo

Making research data repositories visible and discoverable. Robert Ulrich Karlsruhe Institute of Technology

How to make your data open

International Audit and Certification of Digital Repositories

Indexing Field Descriptions Recommended Practice

Transcription:

How FAIR am I? FAIR Principles and Interoperability of Data and Tools Peter Doorn, DANS @pkdoorn @dansknaw Plan-Europe - Platform of National escience Centers in Europe PLAN-E meeting, April 27 & 28, 2017, Poznan, PSNC, Poland Acknowledgments: Ingrid Dillo (DSA) and Emily Thomas (FAIR Data Assessment Tool) www.dans.knaw.nl DANS is an institute of KNAW and NWO

DANS is about keeping data FAIR Mission: promote and provide permanent access to digital research resources Institute of Dutch Academy and Research Funding Organisation (KNAW & NWO) since 2005 First predecessor dates back to 1964 (Steinmetz Foundation), Historical Data Archive 1989

What is Interoperability? Interoperability (pronounced IHNtuhr-AHP-uhr-uh-BIHL-ih-tee ) is the ability of a system or a product to work with other systems or products without special effort on the part of the user Source: various dictionaries

http://interoperability-definition.info/en/ Interoperability is a characteristic of a product or system, whose interfaces are completely understood, to work with other products or systems, present or future, in either implementation or access, without any restrictions Degrees of interoperability:

Interoperable data definitions Interoperability describes the extent to which systems and devices can exchange data, and interpret that shared data, without any restriction on access and implementation of the data Data interoperability reflects our ability to let computers find, access and utilise data from physically separate and heterogeneous data repositories (Condition: must be machine readable = the ability for a computer to extract a description of the terms and conditions from a licence document in order to compare and combine to similar data sets) Note that Interoperability is seen as the resultant of Findability, Accessibility and Usability in the last definition

Three Levels of Data Interoperability Foundational interoperability is establishing the basic ability for two or more systems to exchange data. This level allows data exchange from one IT system to be received by another (and does not require the ability for the receiving IT system to interpret the data). Structural interoperability defines the syntax of the data exchange. It ensures that data exchanges between IT systems can be interpreted at the data field level. Semantic interoperability is the ability for two or more systems to effectively exchange, interpret and use data and information. Source: https://swc.net/general/blogs/what-interoperability

DANS and Data Seal of Approval (DSA) 2005: DANS to promote and provide permanent access to digital research resources Formulate quality guidelines for digital repositories, including DANS 2006: 5 basic principles as basis for 16 DSA guidelines 2009: international DSA Board Almost 70 seals acquired around the globe, but with a focus on Europe

The Certification Pyramid ISO 16363:2012 - Audit and certification of trustworthy digital repositories http://www.iso16363.org/ DIN 31644 standard Criteria for trustworthy digital archives http://www.langzeitarchivierung.de http://www.datasealofapproval.org/ https://www.icsu-wds.org/

New common requirements for data repositories by DSA and World Data system (WDS) 18 Requirements: Context (1) Organizational infrastructure (6) Digital object management (8) Technology (2) Additional information and applicant feedback (1) https://goo.gl/kzb1ga

Resemblance DSA FAIR principles DSA Principles (for data repositories) data can be found on the internet data are accessible data are in a usable format data are reliable data can be referred to FAIR Principles (for data sets) Findable Accessible Interoperable Reusable (citable) The resemblance is not perfect: usable format (DSA) is an aspect of interoperable (FAIR) FAIR explicitly addresses machine readability A certified TDR already offers a baseline data quality level

Combine and operationalize: DSA & FAIR Growing demand for quality criteria for research datasets and a way to assess their fitness for use Combine the principles of core repository certification and FAIR Use the principles as quality criteria: Core certification digital repositories FAIR research data (sets) Operationalize the principles as an instrument to assess FAIRness of existing datasets in certified TDRs

All data sets in a Trusted Repository are FAIR, but some are more FAIR than others

Implementing the FAIR Principles To be Findable: F1. (meta)data are assigned a globally unique and eternally persistent identifier. F2. data are described with rich metadata. F3. (meta)data are registered or indexed in a searchable resource. F4. metadata specify the data identifier. To be Accessible: A1 (meta)data are retrievable by their identifier using a standardized communications protocol. A1.1 the protocol is open, free, and universally implementable. A1.2 the protocol allows for an authentication and authorization procedure, where necessary. A2 metadata are accessible, even when the data are no longer available. To be Interoperable: I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. I2. (meta)data use vocabularies that follow FAIR principles. I3. (meta)data include qualified references to other (meta)data. To be Re-usable: R1. meta(data) have a plurality of accurate and relevant attributes. R1.1. (meta)data are released with a clear and accessible data usage license. R1.2. (meta)data are associated with their provenance. R1.3. (meta)data meet domain-relevant community standards. See: http://datafairport.org/fair-principles-living-document-menu and https://www.force11.org/group/fairgroup/fairprinciples

Badges for assessing aspects of data quality and openness These badges do not define good practice, they certify that a particular practice was followed. Sources: Open data institute (UK), Centre for open science (US), Tim-Berners Lee 5-star deployment scheme for Open Data

DANS: FAIR badge scheme F A I R 2 User Reviews 1 Archivist Assessment 24 Downloads First Badge System based on the FAIR principles: proxy for data quality assessment Operationalise the original principles to ensure no interactions among dimensions to ease scoring Consider Reusability as the resultant of the other three: the average FAIRness as an indicator of data quality (F+A+I)/3=R Manual and automatic scoring

First we attempted to operationalise R Reusable as well but we changed our mind Reusable is it a separate dimension? Partly subjective: it depends on what you want to use the data for! Idea for operationalization Solution R1. plurality of accurate and relevant attributes F2: data are described with rich metadata F R1.1. clear and accessible data usage license A R1.2. provenance (for replication and reuse) R1.3. meet domain-relevant community standards Data is in a TDR unsustained data will not remain usable Explication on how data was or can be used is available Data is automatically usable by machines F I Aspect of Repository Data Seal of Approval F I

Findable (defined by metadata (PID included) and documentation) 1. No PID nor metadata/documentation 2. PID without or with insufficient metadata 3. Sufficient/limited metadata without PID 4. PID with sufficient metadata 5. Extensive metadata and rich additional documentation available Accessible (defined by presence of user license) 1. Metadata nor data are accessible 2. Metadata are accessible but data is not accessible (no clear terms of reuse in license) 3. User restrictions apply (i.e. privacy, commercial interests, embargo period) 4. Public access (after registration) 5. Open access unrestricted Interoperable (defined by data format) 1. Proprietary (privately owned), non-open format 2. Proprietary format, accepted by Certified Trustworthy Data Repository 3. Non-proprietary, open format = preferred format 4. As well as in a preferred format, data is standardised using a standard vocabulary (for the research field to which the data pertain) 5. Data is additionally linked to other data to provide context

Creating a FAIR data assessment tool Using an online questionnaire system

Website FAIRDAT Neutral, Independent Analogous to DSA website To contain FAIR data assessments from any repository or website, linking to the location of the data set via (persistent) identifier The repository can show the resultant badge, linking back to the FAIRDAT website F A I R 2 User Reviews 1 Archivist Assessment 24 Downloads

Display FAIR badges in any repository (Zenodo, Dataverse, Mendeley Data, figshare, B2SAFE, )

Can FAIR Data Assessment be automatic? Criterion Automatic? Y/N/Semi Subjective? Y/N/Semi F1 No PID / No Metadata Y N Comments F2 PID / Insuff. Metadata S S Insufficient metadata is subjective F3 No PID / Suff. Metadata S S Sufficient metadata is subjective F4 PID / Sufficient Metadata S S Sufficient metadata is subjective F5 PID / Rich Metadata S S Rich metadata is subjective A1 No License / No Access Y N A2 Metadata Accessible Y N A3 User Restrictions Y N A4 Public Access Y N A5 Open Access Y N I1 Proprietary Format S N Depends on list of proprietary formats I2 Accepted Format S S Depends on list of accepted formats I3 Archival Format S S Depends on list of archival formats I4 + Harmonized N S Depends on domain vocabularies I5 + Linked S N Depends on semantic methods used Optional: qualitative assessment / data review

Focus: the I of FAIR To be Interoperable (according to FORCE11 FAIR group): I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. I2. (meta)data use vocabularies that follow FAIR principles. I3. (meta)data include qualified references to other (meta)data. Note: - Criteria apply to both data and meta data (data descriptions) - I1 and I2 have to do with a common understanding (semantics) of the data - I3 has to do with linking data

One reaction:

In a bigger type font: [ ] a data object is not interoperable per se or in an absolute manner. Interoperability is a property of two (or more) systems, it is achieved once the involved systems manage to exchange and use the target object. [ ] In addition to that, interoperability is not a yes/no property. Rather there are several levels of "interoperability" that can be achieved in a given context. Although "Interoperable" is a very appealing term I suggest to not use it. Rather I propose to use Intelligible (this will not change the acronym). Leonardo Candela - National Research Council of Italy - Institute of Information Science and Technologies (CNR - ISTI)

Tim Berners-Lee 5 Star Deployment Scheme for Linked Open Data Available on the web (whatever format) but with an open licence, to be Open Data Available as machine-readable structured data (e.g. excel instead of image scan of a table) As (2) plus non-proprietary format (e.g. CSV instead of Excel) All the above plus, use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuff All the above, plus: Link your data to other people s data to provide context Note: 1 star has to do with access license; 2-4 stars have to do with (standard) data formats; 5 stars has to do with links between data https://www.w3.org/designissues/linkeddata.html

DANS ideas for measuring data interoperability Proprietary (privately owned), non-open format Proprietary format, but accepted by Certified Trustworthy Data Repository ( accepted format ) Non-proprietary, open format = preferred format or archival format As well as in a preferred format, data is standardised using a standard vocabulary (for the research field to which the data pertain) Data is additionally linked to other data to provide context Note: we are struggling with two basic problems: - How to score multi-file data sets, which have different levels of Interoperability - Perhaps it is enough to distinguish between proprietary, non-open and preferred/archival format

Further complications concerning data interoperability - Domain vocabularies, thesauri, ontologies, classification schemes, etc. exist in a multitude of forms, having very different levels of applicability and acceptance - For instance, how to evaluate the degree of harmonization if in a database of 100 variables two use standard definitions - How to incorporate the divergent levels of technological skills among users, which influence their preferences with respect to interoperability? - Example: users preferring a data set to be in Excel rather than in RDF, because they know how to use Microsoft and do not know how to use semantic web technologies

Questions for discussion 1. What are the indispensable aspects of interoperability of research data? 2. Is it indeed better to replace interoperable by intelligible, as Leonardo Candela suggested? 3. Interoperability seems to be rather a measure of degree than a matter of yes/no. If you agree, how to measure the degree of interoperability (or intelligibility)? 4. If data harmonization reflects a high level of (semantic) interoperability, how to establish the degree of harmonization in a data set? 5. How to take the different levels of user skills into account? How to acknowledge users that cannot work with Linked Open Data?

Thank you for listening! peter.doorn@dans.knaw.nl www.dans.knaw.nl http://www.dtls.nl/go-fair/ https://eudat.eu/events/webinar/fair-data-in-trustworthy-data-repositorieswebinar