Neil Jefferies Tanya Gray Jones Bodleian Libraries

Similar documents
Neil Jefferies Bodleian Libraries University of Oxford

Neil Jefferies Bodleian Libraries

Linked Open Data: a short introduction

Multi-agent and Semantic Web Systems: Linked Open Data

Corso di Biblioteche Digitali

Proposal for Implementing Linked Open Data on Libraries Catalogue

Corso di Biblioteche Digitali

The Semantic Web Revisited. Nigel Shadbolt Tim Berners-Lee Wendy Hall

Semantic Web Fundamentals

a paradigm for the Introduction to Semantic Web Semantic Web Angelica Lo Duca IIT-CNR Linked Open Data:

Introduction. October 5, Petr Křemen Introduction October 5, / 31

Design & Manage Persistent URIs

THE GETTY VOCABULARIES TECHNICAL UPDATE

Semantic Web Fundamentals

Semantic web. Tapas Kumar Mishra 11CS60R32

The Semantic Institution: An Agenda for Publishing Authoritative Scholarly Facts. Leslie Carr

Multi-agent Semantic Web Systems: Data & Metadata

Reducing Consumer Uncertainty

From the Web to the Semantic Web: RDF and RDF Schema

The Semantic Web DEFINITIONS & APPLICATIONS

Structured Data To RDF II Deliverable D4.3.2

Metadata. Week 4 LBSC 671 Creating Information Infrastructures

H1 Spring B. Programmers need to learn the SOAP schema so as to offer and use Web services.

Library of Congress BIBFRAME Pilot. NOTSL Fall Meeting October 30, 2015

Alphabet Soup: A Metadata Overview Melanie Schlosser Metadata Librarian

The Politics of Vocabulary Control

The Semantic Web: A Vision or a Dream?

Semantic Web Systems Linked Open Data Jacques Fleuriot School of Informatics

Semantic Web: vision and reality

The CEN Metalex Naming Convention

Linked data and its role in the semantic web. Dave Reynolds, Epimorphics

> Semantic Web Use Cases and Case Studies

Compound or complex object: a set of files with a hierarchical relationship, associated with a single descriptive metadata record.

Metadata Workshop 3 March 2006 Part 1

Towards the Semantic Desktop. Dr. Øyvind Hanssen University Library of Tromsø

Semantic Web. Tahani Aljehani

CHAPTER 1 INTRODUCTION

COMP6217 Social Networking Technologies Web evolution and the Social Semantic Web. Dr Thanassis Tiropanis

Linked Data: What Now? Maine Library Association 2017

The P2 Registry

Web Architecture Part 3

How FAIR am I? FAIR Principles and Interoperability of Data and Tools

WebGUI & the Semantic Web. William McKee WebGUI Users Conference 2009

Semantic Web and Electronic Information Resources Danica Radovanović

An Introduction to PREMIS. Jenn Riley Metadata Librarian IU Digital Library Program

Development of guidelines for publishing statistical data as linked open data

Million Book Universal Library Project :Manual for Metadata Capture, Digitization, and OCR

Semantic Web Test

Semantic Web and Natural Language Processing

Multi-agent and Semantic Web Systems: RDF Data Structures

Reducing Consumer Uncertainty Towards a Vocabulary for User-centric Geospatial Metadata

Building Consensus: An Overview of Metadata Standards Development

Information Retrieval (IR) through Semantic Web (SW): An Overview

CEN MetaLex. Facilitating Interchange in E- Government. Alexander Boer

COMP9321 Web Application Engineering

Chapter 2 SEMANTIC WEB. 2.1 Introduction

COMP9321 Web Application Engineering

Digital Library Curriculum Development Module 4-b: Metadata Draft: 6 May 2008

Mapping from Flat or Hierarchical Metadata Schemas to a Semantic Web Ontology. Justyna Walkowska, Marcin Werla

WHY WE NEED AN XML STANDARD FOR REPRESENTING BUSINESS RULES. Introduction. Production rules. Christian de Sainte Marie ILOG

The Semantic Web & Ontologies

KNOWLEDGE GRAPHS. Lecture 2: Encoding Graphs with RDF. TU Dresden, 23th Oct Markus Krötzsch Knowledge-Based Systems

For each use case, the business need, usage scenario and derived requirements are stated. 1.1 USE CASE 1: EXPLORE AND SEARCH FOR SEMANTIC ASSESTS

Using DCAT-AP for research data

Contents. G52IWS: The Semantic Web. The Semantic Web. Semantic web elements. Semantic Web technologies. Semantic Web Services

Enrichment, Reconciliation and Publication of Linked Data with the BIBFRAME model. Tiziana Possemato Casalini Libri

Web 2.0 and the Semantic Web

RDA Resource Description and Access

Linked data for manuscripts in the Semantic Web

Business to Consumer Markets on the Semantic Web

Semantic Web Systems Introduction Jacques Fleuriot School of Informatics

Datos abiertos de Interés Lingüístico

Towards the Semantic Web

RDF: Resource Description Failures and Linked Data Letdowns

Computer Science Applications to Cultural Heritage. Metadata

B4M36DS2, BE4M36DS2: Database Systems 2

Helmi Ben Hmida Hannover University, Germany

University of Bath. Publication date: Document Version Publisher's PDF, also known as Version of record. Link to publication

Metadata and Encoding Standards for Digital Initiatives: An Introduction

Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODS. Jenn Riley IU Metadata Librarian DLP Brown Bag Series February 25, 2005

Metadata Standards and Applications. 4. Metadata Syntaxes and Containers

Using Linked Data and taxonomies to create a quick-start smart thesaurus

RDF and Digital Libraries

METAINFORMATION INCORPORATION IN LIBRARY DIGITISATION PROJECTS

Its All About The Metadata

Database of historical places, persons, and lemmas

Integration of resources on the World Wide Web using XML

Digital Public Space: Publishing Datasets

Practical Experiences with Ingesting Materials for Long-Term Preservation

Web Information System Design. Tatsuya Hagino

Hyperdata: Update APIs for RDF Data Sources (Vision Paper)

Introduction to Linked Data

Building Blocks of Linked Data

ISA Action 1.17: A Reusable INSPIRE Reference Platform (ARE3NA)

Oracle Spatial and Graph: Benchmarking a Trillion Edges RDF Graph ORACLE WHITE PAPER NOVEMBER 2016

OXLOD Pilot Oxford Linked Data. 4 October OeRC

Spatial Data on the Web

JENA: A Java API for Ontology Management

BIBFRAME Update Why, What, When. Sally McCallum Library of Congress NCTPG 10 February 2015

Linked Data and RDF. COMP60421 Sean Bechhofer

Transcription:

Neil Jefferies Tanya Gray Jones Bodleian Libraries

Session Structure Metadata and Data Modelling using the Prov Ontology

Objects Common objects reappear in many places: Items Works, (Manifestations) Artefects, Components Labels Classifications, Vocabularies, Ontologies, Names, Attribute values Sort and group items These are vital for discovery (not everything is full- text indexable) Context Places, People, Geopolitical entities, Collections Locate items It is *possible* for something to be more than one types of object Fictitious creations, automata Objects have Attributes Literal (properties) Relationships to other objects Internal structure

Important Considerations The Model should fit the Knowledge If you are working hard to make your information fit then you are using the wrong approach Don t sacrifice accuracy for conformance Standards have implicit biases and assumptions Affects the types of question that can be asked or answered Efficiency matters! Preservation Economics of re- use File format choice Significant properties Metadata is critical* Re- use Final format vs continued use Cannot anticipate how Most potential users not born

No need for a single approach Standards suffer from scope creep Handle their initial design targets well and everything else rather less so Author Digitised Images Book Sooner or later your information will become graph- like MODS EXIF RDF types relationships, unlike an vcard (Bibliographic) (photographic) RDBMS RDF (like many standards) can PREMIS ALTO (text technically encode almost anything (Preservation) coordinates) but Different knowledge types are best treated differently CC- BY- SA (Rights) Text (OCR Output) Mashing it all together is confusing and reduces reusability Text (Abstract) JPEG (Image) It is also inefficient There are existing standards (W3C/IETF > DH > Library) TIFF (Image)

Data and Metadata Questions? Context Provenance Evidence Qualification

A False Dichotomy (partly) An artefact derives much of its meaning from attributes that are not intrinsic to the artefact itself Context - the circumstances under which it was created Provenance - the route by which it came to be where it is now This is especially true for digital materials A file is a meaningless stream of bytes The name can be readily changed it is not intrinsic The file format is not intrinsic text vs XML vs HTML vs TeX The metadata alone can have more meaning than the data alone Can we even unambiguously define metadata? Image vs transcription vs abstract vs description A digital object should be considered a greater whole comprising several streams of information that can be arbitrarily labelled data or metadata but all of which contribute to the intellectual content of the object

Original Context The original context in which the Context artefact was created Current context is the product of provenance Who created it? Author, illustrator, scribe, typesetter, printer, publisher? Why did they create it? How did they create it? Provides evidence for Gives meaning to Where and when did they create Artefect it? What was going on when they created it?

Context is shared The Paradise of Dainty Devises??? Chemistry of Insulin: determination of the structure of insulin opens the way to greater understanding of life processes Nucleotide sequence of 5S- ribosomal RNA from Escherichia coli Hitchikers Guide to the Galaxy The Restaurant at the End of the Univerese So Long and Thanks for all the Fish

Provenance How an artefact came to where and how it is now? How a digital surrogate was created/curated etc. Digital and physical in parallel Conservation and preservation applies to both The basic questions are framed in similar terms to original context but with an emphasis on Time and Process The original context is just the early part of provenance! Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister... <HTML><HEAD><Title>Alice's Adventures in Wonderland - - Chapter I</Title></HEAD><BODY>

Provenance/Context Models Key components: Objects (entities, things ) Events located in space and time Agent Participates In Agents: Create/change other entities/relationships Items: artefacts, people, places Labels: Classifications, ontologies, ideas, bibliographic works Event Which changes Groups: Organisations, collections (geopolitical constructs) Relationships (typed) Item CIDOC- CRM www.cidoc- crm.org ISO 21127 UNESCO/International Council of Museums Schema.org www.schema.org (roles and events recently added!) Google, Bing!, Yahoo etc. TEI www.tei- c.org PREMIS www.loc.gov/standards/premis Preservation metadata originally (3.X is a significant revision)

Evidence Data models are about assertions *NOT* truth or reality! Provenance of assertions about objects matters this is a key mechanism of scholarship: Who made the assertion? When? On what basis? Assertions may be multiple or contradictory Some use cases attempt to compute confidence or probability values (!) In practice This can be and is ignored for some cases (intrinsic properties of an object) This is often the starting point for further research (library catalogue, pre- existing data)

Expressing Evidence Most evidence can be accommodated by adopting an event- oriented expression of information The mechanism used for expressing context and provenance also works here Author BirthDate BirthPlace Manuscript Author AuthorBirthDate PlaceofCreation Creation Event Time Place Evidence DateofCreation EvidenceForAuthorPlaceDateOfCreation Title Abstract Manuscript CurrentLocation DateOfDepositAtCurrentLocation EvidenceForDateOfDepositAtCurrentLocation Title Abstract Deposit Event Time Place Evidence

Another Viewpoint We can reframe the previous discussion in terms of a general need to be able to qualify an assertion in terms one or more of: Time When an assertion is true An obvious case, the existence of a person Place Where an assertion is true Professor of History at Oxford <> Professor of History at Heidelberg Places can be geopolitical entities such as jurisdictions Which are themselves time dependent Source Who made the assertion An anonymous text is a valid source though Evidence - Why the assertion has been made and counter- evidence too Confidence How much can the assertion be trusted Often depends on the source and evidence

Different Knowledge Types Increasing Uncertainty Need for Qualification Derived Knowledge History Meaning Relate Semantic Elements to other objects Who, When, Where Iconography Context Immediate information available from the object environment Metadata! Creator, Location in Library, Accession Documented Provenance Semantic Elements Meaningful chunks of content Titles/Subtitles, Personal Names, Place Names, Contents Lists, Indices, Dates Image Components Intrinsic Information Raw information content Raw Text, Lines, Headers, Pagination, Images Text Coordinates Physical Attributes Material, Page Size, Font, Colour

Modelling using the Prov Ontology Questions The Semantic Web and RDF The Prov Ontology Examples

The Semantic Web The Semantic Web Tim Berners- Lee (1998), Semantic Web Roadmap. http://www.w3.org/designissues/semantic.html Key components URI (Uniform Resource Indicator) to indicate where things can be found online Unicode (multilingual at the outset) RDF (Resource Description Framework) The Semantic way of expressing information as triples XML (Extensible Markup Language) One way of encoding RDF information Others formats such as JSON- LD are used RDF- S Used for expressing RDF schemas (in RDF) OWL (Web Ontology Language) General mechanism for expressing ontologies/vocabularies/schemas Superset of RDF- S and a lot more complex (also OWL- Lite) RIF (Rule Interchange Format) Intended to define rules for processing RDF, actually maps between many existing rule formats SPARQL (Simple Protocol and RDF Query Language) Query language for RDF usually run against a triple store Crypto encryption and signing technologies to ensure data can be transmitted securely Phew! Fortunately, it is possible to generate RDF without knowing about much of this! If you need to there are tools available!

Linked Open Data Semantic Web is necessary but not sufficient Tim Berners- Lee (2006), Linked Data. http://www.w3.org/designissues/linkeddata.html Four rules: Identify everything with URI s (avoid literals if possible) Use Web URI s i.e. URL s Return meaningful semantic information when a URI is requested this could be simple RDF or a SPARQL endpoint Make links 2010 addendum - Five Stars for Linked Open Data Available on the web (whatever format) but with an open licence Available as machine- readable structured data (text rather than scan) as (2) plus: Non- proprietary format (e.g. CSV instead of excel) as (3) plus: Use RDF and SPARQL, so that people can point at your stuff as (4) plus: Link your data to other people s data to provide context URI s are now IRI s (Internationalised Resource Identifier) With added Unicode Support

Basic RDF Construct RDF 1.1 Concepts and Abstract Syntax http://www.w3.org/tr/2014/rec- rdf11- concepts- 20140225/ RDF Triple Each part may be a literal or an IRI (or a blank node) A literal has a data type and may have a controlled vocabulary (defined by RDF- S, OWL etc.) An IRI points to a resource that returns more RDF that gives more detail about the part in question Links to non- RDF data (e.g. images) are possible and necessary RDF Graph Collection of triples (which are related in some way) RDF Dataset Collection of graphs (one is default, others are named for convenience)

The PROV Ontology W3C standard http://www.w3.org/tr/prov- o

Relationship in Context The basic Prov- O relationships are rather generic so they need to be qualified Roles define how an entity relates to an activity Entity includes agents

Start Modelling I have not discussed how data is actually captured and stored this is intentional and should not be considered until you Understand what information you have Understand what questions you want to answer Understand what tools you have available Understand what additional information you need to acquire The data modelling process will help with some of these (to some extent no promises) (It s only a model) RDF can be represented in many different ways http://camelot- dev.bodleian.ox.ac.uk/ Non- RDF data Where possible, consider expressing your outputs in a similar manner this will enrich the basic dataset and allow further development

Questions

One more thing The Rules bit Can define inference rules for machine reasoning If (A is- the- son- of B) and (B is- the- son- of C) then (A is- the- grandson- of C) Simplifies data entry Enriches datasets And more