Introduction to metadata cleansing using SPARQL update queries. April 2014 PwC EU Services

Similar documents
Using Joinup as a catalogue for interoperability solutions. March 2014 PwC EU Services

Introduction to the Open Refine RDF tool. March 2014 PwC EU Services

DCAT-AP CHANGE MANAGEMENT & RELEASE POLICY

Introduction to RDF & SPARQL

Design & Manage Persistent URIs

Webinar: federated interoperability solutions on Joinup how to maximize the value delivered?

Introduction to metadata management

OPEN. Open Data & Metadata Quality. Presentation metadata SUPPORT. Training Module 2.2 DATA. Open Data Support is funded by the European

OPEN. Promoting the reuse of Open Government Data through the Open Data Interoperability Platform (ODIP) Presentation metadata SUPPORT

Semantic Interoperability Courses

StatDCAT-AP. A Common Layer for the Exchange of Statistical Metadata in Open Data Portals

case study The Asset Description Metadata Schema (ADMS) A common vocabulary to publish semantic interoperability assets on the Web July 2011

Promoting semantic interoperability between public administrations in Europe

GeoDCAT-AP. Working Group Meeting 1. Tuesday 31 March 2015, 14:00-16:00 CET (UTC+2)

A common metadata approach to support egovernment interoperability

Linked EU Budget. Example queries explained

DCAT-AP FOR DATA PORTALS IN EUROPE

Detailed analysis + Integration plan

From the Web to the Semantic Web: RDF and RDF Schema

Demonstrators User Manual Updated version in the context of Specific Contract No7 under Framework Agreement ENTR/04/24-INFODIS-Lot 2

Deliverable Final Data Management Plan

Semantic Web Information Management

Towards semantic asset management and Core Vocabularies for e-government. Makx Dekkers Stijn Goedertier

How Linked Data is transforming egovernment. and how the ISA Programme is actively pushing forward this transformation for the benefit of Europe

Multi-agent and Semantic Web Systems: RDF Data Structures

Rethinking Semantic Interoperability through Collaboration

Semantic Web and Python Concepts to Application development

For each use case, the business need, usage scenario and derived requirements are stated. 1.1 USE CASE 1: EXPLORE AND SEARCH FOR SEMANTIC ASSESTS

ISA Action 1.17: A Reusable INSPIRE Reference Platform (ARE3NA)

European Interoperability Reference Architecture (EIRA) overview

Semantic Days 2011 Tutorial Semantic Web Technologies

Semantic Web for Earth and Environmental Terminology (SWEET) Status, Future Development and Community Building

WP doc5 - Test Programme

The Semantic Web Revisited. Nigel Shadbolt Tim Berners-Lee Wendy Hall

Semantic Technologies and CDISC Standards. Frederik Malfait, Information Architect, IMOS Consulting Scott Bahlavooni, Independent

The MEG Metadata Schemas Registry Schemas and Ontologies: building a Semantic Infrastructure for GRIDs and digital libraries Edinburgh, 16 May 2003

Making Open Data work for Europe

Deliverable Initial Data Management Plan

Graph Databases. Guilherme Fetter Damasio. University of Ontario Institute of Technology and IBM Centre for Advanced Studies IBM Corporation

GeoDCAT-AP: Use cases and open issues

Semantic Interoperability Courses

How to Create and Send a Major Project Progress Report?

Workshop on Prototyping the catalogue of reusable data visualisation tools in the EU Institutions

e-government Core Vocabularies handbook Using horizontal data standards for promoting interoperability ISA

Knowledge Representation for the Semantic Web

RESOURCES DESCRIPTION FRAMEWORK: RDF

Dynamic Demonstrators Translation Manual Updated version in the context of Specific Contract No7 under Framework Agreement ENTR/04/24-INFODIS-Lot 2

Semantic MediaWiki A Tool for Collaborative Vocabulary Development Harold Solbrig Division of Biomedical Informatics Mayo Clinic

INF3580/4580 Semantic Technologies Spring 2015

Contents. G52IWS: The Semantic Web. The Semantic Web. Semantic web elements. Semantic Web technologies. Semantic Web Services

Data driven transformation of the public sector Tallinn, Estonia Head of unit 22 September 2016 European Commission

Empowering the reuse of Open Government Data across Europe

Sign Off and Date: Sign Off and Date: 17/11/2008. Approved by: Zuzana MAZANOVA Emilio CASTRILLEJO

Building Blocks of Linked Data

COMPUTER AND INFORMATION SCIENCE JENA DB. Group Abhishek Kumar Harshvardhan Singh Abhisek Mohanty Suhas Tumkur Chandrashekhara

Day 2. RISIS Linked Data Course

Semantic Web Update W3C RDF, OWL Standards, Development and Applications. Dave Beckett

User and Reference Manual

Semantiska webben DFS/Gbg

Global ebusiness Interoperability Test Beds (GITB) Test Registry and Repository User Guide

Chapter 13: Advanced topic 3 Web 3.0

FOCUS MEETING ON FAIR DATA DEVELOPMENTS. Luiz Olavo Bonino -

IETF TRUST. Legal Provisions Relating to IETF Documents. Approved November 6, Effective Date: November 10, 2008

Semantic Web Engineering

The European Commission s science and knowledge service. Joint Research Centre

Welcome to INFO216: Advanced Modelling

Using DCAT-AP for research data

Using DCAT-AP for research data

Controlled vocabularies, taxonomies, and thesauruses (and ontologies)

Semantics in RDF and SPARQL Some Considerations

CS Knowledge Representation and Reasoning (for the Semantic Web)

SECTION 10 EXCHANGE PROTOCOL

Linked Open Data: a short introduction

RDF AND SPARQL. Part IV: Syntax of SPARQL. Dresden, August Sebastian Rudolph ICCL Summer School

The Semantic Institution: An Agenda for Publishing Authoritative Scholarly Facts. Leslie Carr

Transforming Data from into DataPile RDF Structure into RDF

1. CONCEPTUAL MODEL 1.1 DOMAIN MODEL 1.2 UML DIAGRAM

Digital Public Space: Publishing Datasets

Study and guidelines on Geospatial Linked Data as part of ISA Action 1.17 Resource Description Framework

a paradigm for the Semantic Web RDF Data Model Angelica Lo Duca IIT-CNR Linked Open Data:

KNOWLEDGE GRAPHS. Lecture 2: Encoding Graphs with RDF. TU Dresden, 23th Oct Markus Krötzsch Knowledge-Based Systems

Linked Data: What Now? Maine Library Association 2017

Semantic Web In Depth: Resource Description Framework. Dr Nicholas Gibbins 32/4037

Enterprise Information Integration using Semantic Web Technologies:

Introduction to Semantic Web Databases. Version 1 Prepared By: Amgad Madkour Ph.D. Candidate Purdue University April 2018

Solution Architecture Template (SAT) Design Guidelines

SEMANTIC WEB AN INTRODUCTION. Luigi De

MARKET ACCESS DATABASE

XML based Business Frameworks. - II- Description grid for XML frameworks

Building Semantic Interoperability in Europe

Database of historical places, persons, and lemmas

Implementing and extending SPARQL queries over DLVHEX

Evolution of INSPIRE interoperability solutions for e-government

Today s Plan. 1 Repetition: RDF. 2 Jena: Basic Datastructures. 3 Jena: Inspecting Models. 4 Jena: I/O. 5 Example. 6 Jena: ModelFactory and ModelMaker

W3C Workshop on RDF Access to Relational Databases October, 2007 Boston, MA, USA D2RQ. Lessons Learned

Multi-agent Semantic Web Systems: Data & Metadata

JENA: A Java API for Ontology Management

Developing markup metaschemas to support interoperation among resources with different markup schemas

Report on the implementation of a LOGD pilot with the Greek public administration

Semantic Web Technologies

Transcription:

Introduction to metadata cleansing using SPARQL update queries April 2014 PwC EU Services

Learning objectives By the end of this module, you will have an understanding of: How to transform your metadata using simple SPARQL Update queries How to conform to the ADMS-AP to get your interoperability solutions ready to be shared on Joinup The main types of errors that you could face when uploading metadata of interoperability solutions on Joinup 2

How can this tutorial help you? Interoperability solutions owners may have the possibility to generate automatically in RDF the descriptive metadata of their solutions. Sometimes, this metadata may not be conform to the ADMS Application Profile for Joinup (ADMS-AP), preventing it from being uploaded on Joinup. This tutorial provides basic knowledge on how to transform and cleanse RDF metadata using SPARQL Update queries in order to conform to the ADMS- AP. Since its launch in 2011 Joinup has been steadily growing in popularity. It currently receives more than 60.000 visits per month and is hosting some 130 online communities. SPARQL is the query language for RDF and also allows for creating, updating and deleting RDF triples. ADMS-AP: https://joinup.ec.europa.eu/asset/adms/asset_release/admsapplication-profile-joinup 3

Outline 1. The context ADMS-AP for describing your interoperability solutions About SPARQL About RDF 2. Construct ADMS-AP compliant RDF Why? Construct queries 3. Metadata cleansing Why? The main queries 3 examples 4. Metadata upload to Joinup 4

What is the ADMS Application Profile for Joinup (ADMS-AP) The Asset Description Metadata Schema Application Profile is a common vocabulary used for all type of interoperability solutions. It allows interoperability solutions providers to describe their solutions and easily upload the descriptions on Joinup. It allows users to easily discover and re-use interoperability solutions coming from Joinup using a common vocabulary. 5

ADMS-AP for describing your interoperability solutions on Joinup Public administrations Repository Academic Explore Find Select Obtain Using the ADMS Application Profile ADMS-AP Your repository Standardisation bodies Repository Businesses Repository 6

Automatic or manual path to generate ADMS-AP Transformation with Open Refine Interoperability solutions Cleansing with SPARQL This tutorial focuses on the automatic path to generate ADMS-AP compliant RDF. See how to transform with Open Refine: https://joinup.ec.europa.eu/svn/adms/trainings/introduc tion_to_open_refine_rdf_tool.pptx 7

SPARQL Protocol and RDF Query Language (SPARQL) SPARQL is the standard language to query graph data represented as RDF triples. o One of the three core standards of the Semantic Web, along with RDF and OWL. o Became a W3C standard January 2008. o SPARQL 1.1 standard as of 2013. 8

The Resource Description Framework (RDF) RDF represents data as (subject, predicate, object) triples. A set of triples is an RDF graph. rdf:type http://myasset.eu/ dct:title adms:asset My asset name Resources (URIs), often abbreviated NB: subjects and objects may also be blank nodes. Resources Plain literals: Text, Text @en Typed literals: 42 ^^xsd:integer, 2014-01-01 ^^xsd:date 9

A graph can be represented with different syntaxes RDF/XML required by Joinup <rdf:description about= http://myasset.eu/ > <rdf:type rdf:resource= http://www.w3.org/ns/adms#asset /> <dct:title>my asset name</dct:title> <dct:description>description of the asset</dct:description> <dct:modified rdf:datatype= http://www.w3.org/2001/xmlschema#datetime > 2014-01-01T00:00:00Z </dct:modified> </rdf:description> Turtle used in SPARQL and in this tutorial dct:title My asset name ; dct:description Description of the asset ; dct:modified 2014-01-01T00:00:00Z ^^xsd:datetime. Syntaxes are equivalent. It is easy to transform one into another. 10

SPARQL is a query language for RDF data Query: SELECT * WHERE {?asset dct:title?title. } Graph pattern: an RDF graph with placeholder variables (e.g.,?asset) dct:title My asset name ; dct:description Description of the asset ; dct:modified 2014-01-01T00:00:00Z ^^xsd:datetime. <http://yourasset.eu/> dct:title Your asset name ; dct:description Another asset. Results:?asset <http://yourasset.eu/>?title My asset name Your asset name 11

SPARQL queries have many forms SPARQL SELECT to query data from a graph (not used in this tutorial) SPARQL CONSTRUCT to transform one graph into another (used for creating ADMS-AP from existing RDF) SPARQL Update to modify a graph in place (used to cleanse ADMS-AP metadata) 12

A useful tool to transform RDF files Used to create and edit RDF files and run SPARQL queries over them. A free version is also available. TopBraid Composer is the leading industrial-strength RDF editor and OWL ontology editor, as well as the best SPARQL tool on the market. Source: http://semanticweb.org/ For download: http://www.topquadrant.com/downloads/ 13

Outline 1. The context ADMS-AP for describing your interoperability solutions About SPARQL About RDF 2. Construct ADMS-AP compliant RDF Why? Construct queries 3. Cleanse metadata Why? The main queries 3 examples 4. Metadata upload to Joinup 14

Construct ADMS-AP from existing RDF Why? You may already have the metadata description of your interoperability solutions in a RDF file that is not compliant with ADMS-AP (e.g. missing out on mandatory properties or on the use of recommended controlled vocabularies). The following slides help you to create a compliant ADMS-AP RDF graph from your initial RDF. 15

Construct ADMS-AP from existing RDF using a SPARQL CONSTRUCT query CONSTRUCT {?asset dct:title?title ; dct:description?description ; dct:modified?modified ; dct:type <http://purl.org/adms/assettype/ontology> ; dct:relation?related ; dcat:distribution?d.?d a adms:assetdistribution ; dcat:accessurl?asset. Result graph to construct } WHERE { }?asset a voaf:vocabulary ; dct:title?title ; dct:description?description ; dct:modified?modified. OPTIONAL {?asset voaf:similar?related } BIND(IRI(CONCAT(STR(?asset), "?type=distribution")) AS?d) Graph pattern to query Recommended and optional fields Construct new URIs using expressions 16

Construct ADMS-AP from existing RDF the result is a new RDF graph <http://data.lirmm.fr/ontologies/food> a voaf:vocabulary ; dct:title Food Ontology @en ; dct:description This ontology @en ; dct:modified 2013-09-24 ; voaf:similar <http://www.w3.org/tr/2003/pr-owl-guide- 20031215/food>. <http://www.w3.org/tr/2003/pr-owl-guide-20031215/food> a voaf:vocabulary ; dct:title Food Ontology in OWL @en ; dct:description Along with @en ; dct:modified 2003-12-15. <http://data.lirmm.fr/ontologies/food> dct:title Food Ontology @en ; dct:description This ontology @en ; dct:modified 2013-09-24 ; dct:type <http://purl.org/adms/assettype/ontology> ; dct:relation <http://www.w3.org/tr/2003/pr-owl-guide-20031215/food> ; dcat:distribution <http://data.lirmm.fr/ontologies/food?type=distribution>. <http://data.lirmm.fr/ontologies/food?type=distribution> a adms:assetdistribution ; dcat:accessurl <http://data.lirmm.fr/ontologies/food>. <http://www.w3.org/tr/2003/pr-owl-guide-20031215/food> dct:title Food Ontology in OWL @en ; dct:description Along with @en ; dct:modified 2003-12-15 ; dct:type <http://purl.org/adms/assettype/ontology> ; dcat:distribution <http://www.w3.org/tr/2003/pr-owl-guide- 20031215/food?type=distribution>. Example from the Linked Open Vocabulary repository. <http://www.w3.org/tr/2003/pr-owl-guide- 20031215/food?type=distribution> a adms:assetdistribution ; dcat:accessurl <http://www.w3.org/tr/2003/pr-owl-guide- 20031215/food>. 17 17

Outline 1. The context ADMS-AP for describing your interoperability solutions About SPARQL About RDF 2. Construct ADMS-AP compliant RDF Why? Construct queries 3. Metadata cleansing Why? The main queries 3 examples 4. Metadata upload to Joinup 18

Metadata cleansing Why? You may need to make some small modifications to your RDF graph in order to have it fully compliant to ADMS-AP Only ADMS-AP compliant descriptive metadata can be uploaded on Joinup. Joinup has a built-in ADMS-AP validation feature to help you pinpoint inconsistencies with the standard. 19

Metadata cleansing with SPARQL update queries Add static triples (INSERT DATA) Remove static triples (DELETE DATA) Modify static triples (combine INSERT DATA and DELETE DATA) Add triples based on query results (INSERT) Remove triples based on query results (DELETE) Modify triples based on query results (DELETE/INSERT) For more info: http://www.w3.org/tr/sparql11-update/#graphupdate https://joinup.ec.europa.eu/community/ods/document/tm13-introduction-rdf-sparql-en 20

Metadata cleansing add static triples Example: add the title of a specific interoperability solution (modelled as an adms:asset) Query: INSERT DATA { dct:title Asset name @en. } Before: dct:description Description. After: dct:title Asset name @en ; dct:description Description. 21

Metadata cleansing remove static triples Example: remove an erroneous date of a specific asset Query: DELETE DATA { dct:issued 2242-01-01 ^^xsd:date. } Before: dct:title Asset name @en ; dct:description Description ; dct:issued 2242-01-01 ^^xsd:date. After: dct:title Asset name @en ; dct:description Description. 22

Metadata cleansing modify static triples Example: modify the title of a specific asset Query: DELETE DATA { dct:title Asset name @en. } INSERT DATA { dct:title My asset name @en. } Before: dct:title Asset name @en ; dct:description Description. After: dct:title My asset name @en ; dct:description Description. 23

Metadata cleansing add triples based on query results Example: add asset type for all assets whose name contain Schema Query: INSERT {?asset dct:type <http://purl.org/adms/assettype/schema>. } WHERE {?asset dct:title?title. FILTER(CONTAINS(?title, Schema )) } Before: dct:title My Asset Schema ; dct:description Description. <http://yourasset.eu/> dct:title Your Asset Vocabulary. After: dct:title My Asset Schema ; dct:description Description ; dct:type <http://purl.org/adms/assettype/schema>. <http://yourasset.eu/> dct:title Your Asset Vocabulary. 24

Metadata cleansing remove triples based on query results Example: remove all asset modification dates in the future Before: Query: DELETE {?asset dct:modified?date. } WHERE {?asset dct:modified?date. FILTER(?date > NOW()) } dct:title Asset name @en ; dct:modified 2242-01-01T00:00:00Z ^^xsd:datetime. <http://yourasset.eu/> dct:title Your Asset Vocabulary ; dct:modified 2000-08-12T11:42:22Z ^^xsd:datetime. After: dct:title Asset name @en ; dct:description Description. <http://yourasset.eu/> dct:title Your Asset Vocabulary ; dct:modified 2000-08-12T11:42:22Z ^^xsd:datetime. 25

Metadata cleansing modify triples based on query results Example: replace a word in all asset titles Query: DELETE {?asset dct:title?title. } INSERT {?asset dct:title?newtitle. } WHERE {?asset dct:title?title. BIND(REPLACE(?title, grt, great ) AS?newtitle) } Before: dct:title My grt asset. <http://yourasset.eu/> dct:title Your asset. After: dct:title My great asset. <http://yourasset.eu/> dct:title Your asset. 26

Metadata cleansing Proposed fixes for 3 common issues Ensure all text fields have a language tag Transform date strings into xsd:datetime values Add missing asset modification dates 27

Metadata cleansing Ensure all text fields have a language tag Query: DELETE {?s?p?o. } INSERT {?s?p?olang. } WHERE {?s?p?o. FILTER(?p IN (foaf:name, dct:title, dct:description)) FILTER(LANG(?o) = ) BIND(STRLANG(?o, en ) AS?olang) } Before: dct:title Asset name ; dct:description Description @en. After: dct:title Asset name @en ; dct:description Description @en. 28

Metadata cleansing Transform YYYY-MM-DD strings into xsd:datetime values Query: DELETE {?s dct:modified?str. } INSERT {?s dct:modified?date. } WHERE {?s dct:modified?str. BIND(xsd:dateTime(CONCAT(?str, T00:00:00Z )) AS?date) } Before: dct:title Asset name @en ; dct:description Description @en ; dct:modified 2014-02-24. After: dct:title Asset name @en ; dct:description Description @en ; dct:modified 2014-02-24T00:00:00Z ^^xsd:datetime. 29

Metadata cleansing Add missing asset modification dates, copying the creation date Before: Query: INSERT {?asset dct:modified?date. } WHERE {?asset dct:issued?date. FILTER NOT EXISTS {?asset dct:modified?modified } } dct:title Asset name @en ; dct:issued 2014-02-24T00:00:00Z ^^xsd:datetime. <http://yourasset.eu/> dct:title Your asset @en ; dct:issued 2012-01-01T00:00:00Z ^^xsd:datetime ; dct:modified 2014-03-04T00:00:00Z ^^xsd:datetime. After: dct:title Asset name @en ; dct:issued 2014-02-24T00:00:00Z ^^xsd:datetime ; dct:modified 2014-02-24T00:00:00Z ^^xsd:datetime. <http://yourasset.eu/> dct:title Your asset @en ; dct:issued 2012-01-01T00:00:00Z ^^xsd:datetime ; dct:modified 2014-03-04T00:00:00Z ^^xsd:datetime. 30

Outline 1. The context ADMS-AP for describing your interoperability solutions About SPARQL About RDF 2. Construct ADMS-AP compliant RDF Why? Construct queries 3. Metadata cleansing Why? The main queries 3 examples 4. Metadata upload to Joinup 31

Metadata upload to Joinup Upload an RDF/XML file to Joinup 1. On your repository page, click on Upload metadata 2. Select the RDF/XML file 3. Click on Upload the metadata file 2 1 3 32

Metadata upload to Joinup Get the upload status 1. Log in with your account 2. Go to the repository page 3. Click on Report file 33

Metadata upload to Joinup Reading the upload log Lines have the format: 2013-08-30 17:36:02 INFO - Treatment of the repository Timestamp Level Message INFO WARN ERROR Information message Warning (you may ignore it) Error (you should fix it) 34

Related learning resources Introduction to ADMS-AP How to import and export ADMS-AP conform metadata of interoperability solutions on Joinup Introduction to the Open Refine RDF tool Using Joinup as catalogue for interoperability solutions Introduction to the advanced search functionality of EFIR 35

Disclaimers 1. The views expressed in this presentation are purely those of the authors and may not, in any circumstances, be interpreted as stating an official position of the European Commission. The European Commission does not guarantee the accuracy of the information included in this presentation, nor does it accept any responsibility for any use thereof. Reference herein to any specific products, specifications, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favouring by the European Commission. All care has been taken by the author to ensure that s/he has obtained, where necessary, permission to use any parts of manuscripts including illustrations, maps, and graphs, on which intellectual property rights already exist from the titular holder(s) of such rights or from her/his or their legal representative. 2. This presentation has been carefully compiled by PwC, but no representation is made or warranty given (either express or implied) as to the completeness or accuracy of the information it contains. PwC is not liable for the information in this presentation or any decision or consequence based on the use of it. PwC will not be liable for any damages arising from the use of the information contained in this presentation. The information contained in this presentation is of a general nature and is solely for guidance on matters of general interest. This presentation is not a substitute for professional advice on any particular matter. No reader should act on the basis of any matter contained in this publication without considering appropriate professional advice.

Project Officer Szabolcs.SZEKACS@ec.europa.eu Contractors Nikolaos.Loutas@be.pwc.com Joan.Bremers@be.pwc.com Visit our initiatives Get involved ADMS. SW CISR COMMUNITY OF INTEROPERABILITY SOLUTION REPOSITORIES Follow @Joinup_EU on Twitter Join the CISR community on Joinup Joinup and ADMS are funded by the ISA Programme 37