Using DCAT-AP for research data

Similar documents
Using DCAT-AP for research data

GeoDCAT-AP Representing geographic metadata by using the "DCAT application profile for data portals in Europe"

GeoDCAT-AP: Use cases and open issues

The European Commission s science and knowledge service. Joint Research Centre

Specific requirements on the da ra metadata schema

DCAT-AP FOR DATA PORTALS IN EUROPE

GeoDCAT-AP. Working Group Meeting 1. Tuesday 31 March 2015, 14:00-16:00 CET (UTC+2)

StatDCAT-AP. A Common Layer for the Exchange of Statistical Metadata in Open Data Portals

Illinois Data Bank Metadata Documentation

INSPIRE metadata and DCAT-AP: Mapping summary

Spatial Data on the Web

INSPIRE & Environment Data in the EU

DCAT-AP CHANGE MANAGEMENT & RELEASE POLICY

Reducing Consumer Uncertainty Towards a Vocabulary for User-centric Geospatial Metadata

FREYA Connected Open Identifiers for Discovery, Access and Use of Research Resources

Data and visualization

Towards a joint service catalogue for e-infrastructure services

Spatial Data on the Web

(Geo)DCAT-AP Status, Usage, Implementation Guidelines, Extensions

Data is the new Oil (Ann Winblad)

Reducing Consumer Uncertainty

Services to Make Sense of Data. Patricia Cruse, Executive Director, DataCite Council of Science Editors San Diego May 2017

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

EUDAT-B2FIND A FAIR and Interdisciplinary Discovery Portal for Research Data

From Open Data to Data- Intensive Science through CERIF

GeoDCAT-AP: a geospatial extension for the DCAT application profile for data portals in Europe

Why CERIF? Keith G Jeffery Scientific Coordinator ERCIM Anne Assserson eurocris. Keith G Jeffery SDSVoc Workshop Amsterdam

PERSISTENT IDENTIFIERS FOR THE UK: SOCIAL AND ECONOMIC DATA

Persistent Identifier the data publishing perspective. Sünje Dallmeier-Tiessen, CERN 1

Introduction to INEXDA s Metadata Schema

Evolution of INSPIRE interoperability solutions for e-government

Research Data Repository Interoperability Primer

DOIs for Research Data

Unlocking the full potential of location-based services: Linked Data driven Web APIs

Interoperability Framework Recommendations

ISO 2146 INTERNATIONAL STANDARD. Information and documentation Registry services for libraries and related organizations

MAKING INSPIRE DATA DISCOVERABLE AND FINDABLE THROUGH POPULAR SEARCH ENGINES

re3data.org - Making research data repositories visible and discoverable

University of Bath. Publication date: Document Version Publisher's PDF, also known as Version of record. Link to publication

Minimal Metadata Standards and MIIDI Reports

Semantic challenges in sharing dataset metadata and creating federated dataset catalogs

Dataset Documentation Reference Guide for Pure Users

Metadata for Data Discovery: The NERC Data Catalogue Service. Steve Donegan

Part 2: Current State of OAR Interoperability. Towards Repository Interoperability Berlin 10 Workshop 6 November 2012

Paving the Rocky Road Toward Open and FAIR in the Field Sciences

SHARING YOUR RESEARCH DATA VIA

/// INTEROPERABILITY BETWEEN METADATA STANDARDS: A REFERENCE IMPLEMENTATION FOR METADATA CATALOGUES

GEOSS Data Management Principles: Importance and Implementation

Data Citation and Scholarship

What is FAIR? 5 th International Summer School on Rare Disease and Orphan Drug Registries. Claudio Carta 1 and Marco Roos 2

Managing Data in the long term. 11 Feb 2016

Scholix Metadata Schema for Exchange of Scholarly Communication Links

Initial Operating Capability & The INSPIRE Community Geoportal

The MEG Metadata Schemas Registry Schemas and Ontologies: building a Semantic Infrastructure for GRIDs and digital libraries Edinburgh, 16 May 2003

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

Reproducibility and FAIR Data in the Earth and Space Sciences

EUDAT Towards a Collaborative Data Infrastructure

First Light for DOIs at ESO

OpenAIRE Guidelines for Data Archive Managers 1.0 December 2012

Create a Data Collection in LSHTM Data Compass

Data Citation Technical Issues - Identification

Methodology and tools for Multilingual Linguistic Linked Data generation

ZB MED Information Center Life Sciences

Science Europe Consultation on Research Data Management

The DataCite Metadata Schema. Frauke Ziedorn Workshop: Metadata and Persistent Identifiers for Social and Economic Data 7th May 2012

Data driven transformation of the public sector Tallinn, Estonia Head of unit 22 September 2016 European Commission

Welcome to the Pure International Conference. Jill Lindmeier HR, Brand and Event Manager Oct 31, 2018

VRE4EIC. Deliverable D4.1. Review of existing VRE Metadata. Document version: 1.0

Introduction to metadata cleansing using SPARQL update queries. April 2014 PwC EU Services

Global standard formats for opening NLK data. Adding to the Global Information Ecosystem

Data Exchange in the Earth Sciences

Metadata Workshop 3 March 2006 Part 1

Guidelines for Multilingual Linked Data generation and publication

OPEN. Promoting the reuse of Open Government Data through the Open Data Interoperability Platform (ODIP) Presentation metadata SUPPORT

SC118DI07171 D Specification of GeoDCAT-AP. GeoDCAT-AP: a geospatial extension for the DCAT application profile for data portals in Europe

RADAR Project. Data Archival and Publication as a Service. Matthias Razum FIZ Karlsruhe RESEARCH DATA REPOSITORIUM. Zurich, December 15, 2014

RADAR. Establishing a generic Research Data Repository: RESEARCH DATA REPOSITORY. Dr. Angelina Kraft

A document-inspired way for tracking changes of RDF data The case of the OpenCitations Corpus

Utilizing PBCore as a Foundation for Archiving and Workflow Management

Horizon 2020 and the Open Research Data pilot. Sarah Jones Digital Curation Centre, Glasgow

Design & Manage Persistent URIs

Neil Jefferies Tanya Gray Jones Bodleian Libraries

A Data Citation Roadmap for Scholarly Data Repositories

Striving for efficiency

Reproducibility and Reuse of Scientific Code Evolving the Role and Capabilities of Publishers

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Damien Lecarpentier CSC-IT Center for Science, Finland EUDAT User Forum, Barcelona

FAIR-aligned Scientific Repositories: Essential Infrastructure for Open and FAIR Data

Introduction to metadata management

Description Set Profiles

Alexander Haffner. RDA and the Semantic Web

Knowledge-Driven Video Information Retrieval with LOD

Making research data repositories visible and discoverable. Robert Ulrich Karlsruhe Institute of Technology

OpenAIRE. Fostering the social and technical links that enable Open Science in Europe and beyond

Mercè Crosas, Ph.D. Chief Data Science and Technology Officer Institute for Quantitative Social Science (IQSS) Harvard

B2FIND and Metadata Quality

Using DCC DMPonline to write a Data Management Plan

DataSTORRE Deposit Guide

Open Data Solution Architecture Template (SAT) v Beta

Deliverable Final Data Management Plan

The Final Updates. Philippe Rocca-Serra Alejandra Gonzalez-Beltran, Susanna-Assunta Sansone, Oxford e-research Centre, University of Oxford, UK

Transcription:

Using DCAT-AP for research data Andrea Perego SDSVoc 2016 Amsterdam, 30 November 2016

The Joint Research Centre (JRC) European Commission s science and knowledge service Support to EU policies with independent scientific evidence Research covers disciplines concerning all the EU policy areas Corporate data policy, based on Open Data principles, adopted in 2015 and now being implemented The implementation of the JRC Data Policy includes the setting up of a corporate data catalogue

The JRC Data Catalogue Single point of access to all data produced and/or maintained at JRC DCAT-AP is the reference metadata standard used, but extended with additional information In particular: Metadata elements relevant across scientific domains Metadata elements needed to support data citation

Information relevant for scientific data Metadata DCAT-AP GeoDCAT-AP DCAT-AP-JRC Dataset authors Data lineage How to use the data Scientific publications Input data dct:provenance dct:source dct:creator vann:usagenote dct:isreferencedby Only 2 additional properties added wrt what supported in Geo/DCAT-AP For more sophisticated specification of data lineage (i.e., machine-readable instead of free-text) we are considering the use of the PROV ontology In such a case, lineage includes all the entities involved in the production of the dataset workflow included

Data citation Making data citable as done with scientific publications Gaining more and more importance in the scientific community Based on the use of persistent identifiers (typically, DOIs for data, ORCIDs for data authors when available) Potentially relevant also outside the scientific domain advantages: Data persistence Data reproducibility Reference metadata standard: DataCite Can DCAT-AP be used for this?

DataCite: Mandatory elements DataCite 4.0 DCAT-AP 1.1 Comments Identifier Partially DataCite requires this to be a DOI, whereas DCAT- AP does not have such requirement Creator No This agent role is supported in GeoDCAT-AP Title Publisher Publication year

DataCite: Recommended elements DataCite 4.0 DCAT-AP 1.1 Comments Subject Contributor Partially Geo/DCAT-AP supports only 2 out of the 21 DataCite contributor types Date Partially Geo/DCAT-AP supports only 3 out of the 9 DataCite date types Resource type Partially Geo/DCAT-AP supports only 2 out of the 14 DataCite resource types Related identifier Description Geolocation

DataCite: Optional elements DataCite 4.0 DCAT-AP 1.1 Comments Language Alternate identifier Size In DCAT-AP, this is a property of the dataset distribution, and not of the dataset itself Format Same as above Version Rights DataCite does not use specific elements for use conditions (i.e., licences) and access rights In DCAT-AP, use conditions are a property of the dataset distribution, whereas access rights are associated with the dataset Funding Reference No

Persistent identifiers Widely used in the scientific community, especially for publications, but now increasingly for authors and data Different approaches are used for representing them in RDF best practices are needed to enable their effective use across platforms But more importantly: How can we make them actionable, irrespective of the platforms they are used in?

Identifiers, by using DCAT-AP Encoding Primary ID Alternative ID Type (DOI, ORCID, etc.) As an HTTP URI @rdf:about owl:sameas - As a literal dct:identifier adms:identifier @rdf:datatype Encoding identifiers as HTTP URIs seems to be the most effective way of making them actionable Quite a few identifier schemes can be encoded as dereferenceable HTTP URIs, and some of them are also returning machine readable metadata (e.g., DOIs, ORCIDs) They can still be encoded as literals, especially if there is the need of knowing the identifier type In such a case, a common identifier type registry would ensure interoperability

Agent roles Each metadata standard has their own. E.g.: ISO 19115: 20 roles DataCite: 20+ roles Geo/DCAT-AP: 4 roles They all use their own vocabularies / code lists Two main issues: How to ensure interoperability? Does it make sense to support all of them across platforms? owner creator distributor supervisor user principal investigator publisher curator sponsor originator producer contact point project processor manager editor project leader custodian funder registration provider agency

Agent roles in research data Important to denote the type of contribution provided by each individual / organization for producing data In some cases, an additional requirement is to specify the temporal dimension of a role i.e., the time frame during which an individual / organisation played a given role And, maybe, also other information e.g., the organisation where the individual held a given position while playing that role The PROV ontology could be used for this purpose, to specify a qualified attribution To address the different use cases, such qualified roles should however be compatible with the corresponding nonqualified forms, and both should be mutually inferable

Agent roles in GeoDCAT-AP Qualified form a:dataset a dcat:dataset; prov:qualifiedattribution [ a prov:attribution ; # The agent role, as per ISO 19115 dct:type <http://inspire.ec.europa.eu/metadatacodelist/responsiblepartyrole/owner> ; # The agent playing that role prov:agent [ a foaf:organization ; foaf:name "European Union"@en ] ]. Non-qualified form a:dataset a dcat:dataset; dct:rightsholder [ a foaf:organization ; foaf:name "European Union"@en ].

Publishing metadata on the Web Why doing this properly? Increase data visibility - as well as of the catalogues giving access to them Simplifying data discovery from the end users side How? Embedding metadata in Web pages by using mechanisms as HTML+RDFa, Microdata, Microformats, JSON-LD Using fit-for-purpose vocabularies, as Schema.org

Mapping DCAT-AP to Schema.org Schema.org includes a number of gaps, preventing the full mapping of DCAT-AP. They include: Identifiers actually, this is being worked upon: http://webschemas.org/identifier Categories and category schemes Is this a problem? DCAT-AP and Schema.org address different use cases Rather, the issue is identifying what is important to be mapped i.e., what information it is useful to be indexed to improve data discovery About the use of Schema.org for research data: https://developers.google.com/search/docs/data-types/datasets

Thanks for your attention! andrea.perego@jrc.ec.europa.eu

For more information JRC Data Policy (doi:10.2788/607378) https://ec.europa.eu/jrc/en/about/jrc-in-brief/data-policy JRC Data Catalogue http://data.jrc.ec.europa.eu/ DataCite to DCAT-AP Mapping DCAT-AP to Schema.org Mapping https://webgate.ec.europa.eu/citnet/stash/projects/odckan/repos/datacite-to-dcatap/ https://webgate.ec.europa.eu/citnet/stash/projects/odckan/repos/dcat-ap-toschema.org/