Linking datasets with user commentary, annotations and publications: the CHARMe project

Similar documents
The CEDA Archive: Data, Services and Infrastructure

Reducing Consumer Uncertainty

Reducing Consumer Uncertainty Towards a Vocabulary for User-centric Geospatial Metadata

Long-term preservation for INSPIRE: a metadata framework and geo-portal implementation

Resilient Linked Data. Dave Reynolds, Epimorphics

DOIs for Research Data

Library Technology Conference, March 20, 2014 St. Paul, MN

Interoperability in Science Data: Stories from the Trenches

From Open Data to Data- Intensive Science through CERIF

Online intercomparison of models and observations using OGC and community standards

SEXTANT 1. Purpose of the Application

FREYA Connected Open Identifiers for Discovery, Access and Use of Research Resources

Scholix Metadata Schema for Exchange of Scholarly Communication Links

Metadata for Data Discovery: The NERC Data Catalogue Service. Steve Donegan

LORE: A Compound Object Authoring and Publishing Tool for Literary Scholars based on the FRBR. Anna Gerber, Jane Hunter

Persistent Identifiers for Earth Science Provenance

Linked Data: What Now? Maine Library Association 2017

The Semantic Planetary Data System

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

Data is the new Oil (Ann Winblad)

Research Elsevier

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Open web-based annotation: Creating a lightweight, portable knowledge layer over biomedicine

Semantic Similarity Assessment to Browse Resources exposed as Linked Data: an Application to Habitat and Species Datasets R. Albertoni, M.

The RMap Project: Linking the Products of Research and Scholarly Communication Tim DiLauro

Federated Access Management Futures

Research Data Repository Interoperability Primer

Google indexed 3,3 billion of pages. Google s index contains 8,1 billion of websites

Specific requirements on the da ra metadata schema

Data publication and discovery with Globus

For each use case, the business need, usage scenario and derived requirements are stated. 1.1 USE CASE 1: EXPLORE AND SEARCH FOR SEMANTIC ASSESTS

Metadata Standards & Applications. 7. Approaches to Models of Metadata Creation, Storage, and Retrieval

Conference of Directors of National Libraries in Asia and Oceania. Hanoi, 20 April 2009

What Do You Mean It Doesn t Make Sense? Redesigning Finding Aids from the Users Perspective

Ontology Servers and Metadata Vocabulary Repositories

The role of vocabularies for estimating carbon footprint for food recipies using Linked Open Data

SciENCV - Putting the Pieces Together VIVO

PDS, DOIs, and the Literature. Anne Raugh, University of Maryland Edwin Henneken, Harvard-Smithsonian Center for Astrophysics

Semantic Web. Tahani Aljehani

Proposal for Implementing Linked Open Data on Libraries Catalogue

Semantic Web Fundamentals

Your Open Science and Research Publishing Platform. 1st SciShops Summer School

Description Cross Domain - Metadata Schema Registry Presentation to ISO Working Group Sydney, 2 November 2004

Part 2: Current State of OAR Interoperability. Towards Repository Interoperability Berlin 10 Workshop 6 November 2012

Metadata. Week 4 LBSC 671 Creating Information Infrastructures

Semantic Annotation, Search and Analysis

Semantic agents for location-aware service provisioning in mobile networks

Linked Open Data and Semantic Technologies for Research in Agriculture and Forestry

OpenAIRE From Pilot to Service

Opus: University of Bath Online Publication Store

Using Linked Data and taxonomies to create a quick-start smart thesaurus

Development of an Ontology-Based Portal for Digital Archive Services

Metadata Workshop 3 March 2006 Part 1

Building the CIARD Framework for Data and Information Sharing: the case of France & INRA

Towards the Semantic Web

Why CERIF? Keith G Jeffery Scientific Coordinator ERCIM Anne Assserson eurocris. Keith G Jeffery SDSVoc Workshop Amsterdam

Context-aware Semantic Middleware Solutions for Pervasive Applications

Einführung in die Erweiterte Realität

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

Digital Public Space: Publishing Datasets

W3C WoT call CONTEXT INFORMATION MANAGEMENT - NGSI-LD API AS BRIDGE TO SEMANTIC WEB Contact: Lindsay Frost at

Not Just for Geeks: A practical approach to linked data for digital collections managers

Minimal Metadata Standards and MIIDI Reports

Content Strategy. A practical guide

FAIR-aligned Scientific Repositories: Essential Infrastructure for Open and FAIR Data

Data Curation Profile Movement of Proteins

Data Curation Profile Plant Genomics

Analysis Ready Data For Land (CARD4L-ST)

CMIP6 Data Citation and Long- Term Archival

Towards semantic asset management and Core Vocabularies for e-government. Makx Dekkers Stijn Goedertier

Taylor & Francis Online. A User Guide.

(Towards) A metadata model for atmospheric data resources

Internet of Things Workshop ST 2015/2016

Interoperability Framework Recommendations

Linked Open Europeana: Semantics for the Digital Humanities

Enabling Open Science: Data Discoverability, Access and Use. Jo McEntyre Head of Literature Services

Building Virtual Earth Observatories Using Scientific Database, Semantic Web and Linked Geospatial Data Technologies

Towards a joint service catalogue for e-infrastructure services

Semantic Interoperability. Being serious about the Semantic Web

OpenAIRE From Pilot to Service The Open Knowledge Infrastructure for Europe

Data Curation Profile Human Genomics

Information Retrieval (IR) through Semantic Web (SW): An Overview

Digital repositories as research infrastructure: a UK perspective

EUDAT & SeaDataCloud

ASTM. Your Portal for Standards, Testing, Learning & More

Implementation and Use of OGC/HMA/WMO/ISO & Inspire Standards in EUMETSAT EO Portal

ITARC Stockholm Olle Olsson World Wide Web Consortium (W3C) Swedish Institute of Computer Science (SICS)

ITARC Stockholm Olle Olsson World Wide Web Consortium (W3C) Swedish Institute of Computer Science (SICS)

StatDCAT-AP. A Common Layer for the Exchange of Statistical Metadata in Open Data Portals

The data explosion. and the need to manage diverse data sources in scientific research. Simon Coles

a paradigm for the Introduction to Semantic Web Semantic Web Angelica Lo Duca IIT-CNR Linked Open Data:

Jisc Research Data Shared Service. Looking at the past, looking to the future

Data Exchange in the Earth Sciences

Technical documentation. SIOS Data Management Plan

Semantic Web Fundamentals

The Semantic Web DEFINITIONS & APPLICATIONS

Deliverable 6.4. Initial Data Management Plan. RINGO (GA no ) PUBLIC; R. Readiness of ICOS for Necessities of integrated Global Observations

Report from the W3C Semantic Web Best Practices Working Group

VocBench v2.0 User Manual

GEOSS Data Management Principles: Importance and Implementation

Transcription:

Linking datasets with user commentary, annotations and publications: the CHARMe project Jon Blower j.d.blower@reading.ac.uk University of Reading On behalf of all CHARMe partners! http://www.charme.org.uk

CHARMe (Jan 2013 Dec 2014) CHARacterization of Metadata to enable highquality climate applications and services How can climate data users decide whether a dataset is fit for their purpose? (N.B. We consider that data quality and fitness for purpose are the same thing) Not specific to climate data!

Commentary metadata 3

Examples of commentary metadata Post-fact annotations, e.g. citations, ad-hoc comments and notes; Results of assessments, e.g. validation campaigns, intercomparisons with models or other observations, reanalysis; Provenance, e.g. dependencies on other datasets, processing algorithms and chain, data source; Properties of data distribution, e.g. data policy and licensing, timeliness (is the data delivered in real time?), reliability; External events that may affect the data, e.g. volcanic eruptions, El- Nino index, satellite or instrument failure, operational changes to the orbit calculations. General rule: information originates from users or external entities, not original data providers However, sometimes information is not available from the data provider!

Primary use case 1. User searches data archive for relevant datasets 2. Each dataset in the results has two CHARMe buttons for reading and creating commentary metadata about the dataset 3. Pressing the button brings up pop-up listing all the annotations about that dataset. Very much like METAFOR / ES-DOC system (right) for climate model descriptions (Can be implemented with very little impact on existing websites, using Javascript magic)

Other use cases Viewing significant events in timeseries data (cf. Google Finance) Creating and discovering annotations about dataset subsets (cf. maphub.github.io) Enabling intercomparisons of data and metadata (cf. ES-DOC)

Open Annotation We propose to use Open Annotation (W3C standard) for modelling annotations Based on Linked Data technologies RDF, SPARQL etc Used by data.gov.uk, Australian Bureau of Meteorology, UK Met Office, many more! Data model is simple and flexible We don t have to design a rigid schema or object model up-front Can be added to as time goes on Can record the motivation behind an annotation Bookmarking, classifying, commenting, describing, editing, highlighting, questioning, replying (lots more) Covers a lot of CHARMe use cases! An annotation can have multiple targets Another CHARMe requirement There is even (limited) support for annotating subsets of resources An advanced CHARMe requirement

http://www.openannotation.org/spec/core/core.html

Important points Everything needs a URI! What is a dataset? is an old chestnut, not yet cracked Means different things in different communities But CHARMe doesn t care: it can annotate anything that has a URI URI hierarchies are managed elsewhere Choosing common vocabularies is critical Also thesauri, ontologies etc URL Uniform Resource Locator http://www.google.com URI Uniform Resource Identifier URN Uniform Resource Name urn:ogc:def:crs:crs84 DOI Digital Object Identifier doi:10753/123.455768

What CHARMe can enable (some examples) Users: - Find me all the documents that have been written about this dataset - in both peer-reviewed journals and the grey literature - and specifically about precipitation in Africa - in both STFC s and Astrium s archives - What factors might affect the quality of this dataset? e.g. upstream datasets, external events Data providers: - Who is using my dataset and what are they saying about it? - Let me subscribe to new user comments and reply to them

What this will not enable Give me the best dataset on sea surface temperature CHARMe will not provide a new quality stamp for datasets But will be able to link to such things if other people publish them CHARMe will not provide access to actual data (Cf. Web of Science enables discovery, but access not in scope) Not planning to create (another) one-stop shop for information We want the information to appear where users are already looking

Some relevant standards ISO19156 Observations and Measurements Conceptual model for capturing the information about observations - fundamental to how data is acquired: estimating the value of some property of a feature of interest with a given procedure Includes hooks for associating quality information ISO19115 (Quality Package) D (Discovery) Metadata ISO19157 specifically focuses on quality, improves on and augments ISO19115 Quality Package UncertML conceptual model for encapsulating probabilistic uncertainties Open Annotation A collaboration focused on an interoperability framework for annotations A data model and ontology Uses Linked Data principles

Some related projects GeoViQua Application of ISO19157, integration with UncertML for the capture of uncertainty information Proposed enhancement to ISO19115 aggregation of information for scoping of metadata MOLES B (Browse) metadata An application of ISO19156 Observations and Measurements CEDA MOLES implementation Metafor CIM 2.0 & ES-Doc Metafor defined a Common Information Model (CIM) to describe climate data and the models that produce it in a standard way ES-Doc expands to generic software and tools for different Earth science data applications ESA LTDP (Long-Term Data Preservation) Includes post-fact information e.g. papers PREPARDE, OpenAIRE, ORCID, DataCite, OBS4MIPS, EnviLOD many more!

What have we done so far? Collected a set of narrative user scenarios from a variety of users Data providers, Data users in various countries Currently turning these into formal User Requirements, then into Software Requirements Using wireframing and rapid prototyping to help refine requirements Made links with related efforts in US and Europe

Can anyone help us with this? We would like to find vocabularies/ontologies that: Describe different kinds of publications (peerreviewed journals, technical reports, websites etc); Describe the relationship between publications and "the things that they are about", e.g. datasets or sensors. For example, we might want to record that "this publication describes how the dataset was produced", or "this publication reports an issue discovered within the dataset".

Summary / conclusions CHARMe will create connected repositories of commentary metadata Will help users tap into existing expert knowledge about climate datasets But nothing in the project is really specific to climate! We will provide this information in existing archives and websites Linked Data technologies will enable CHARMe information to be discovered and used in other systems too

Thank you! Jon Blower j.d.blower@reading.ac.uk University of Reading On behalf of all CHARMe partners! http://www.charme.org.uk