Linked data for manuscripts in the Semantic Web

Similar documents
The Semantic Web and expert metadata: pull apart then bring together. Gordon Dunsire

Table of contents for The organization of information / Arlene G. Taylor and Daniel N. Joudrey.

RDA work plan: current and future activities

Interoperability and Semantics in RDF representations of FRBR, FRAD and FRSAD

Alexander Haffner. RDA and the Semantic Web

RDA and Linked Data. by Gordon Dunsire National Seminar, National Library of Finland, Helsinki, Finland, 25 March 2014

Description Set Profiles

Contribution of OCLC, LC and IFLA

Introduction and background

Linked Open Data in Aggregation Scenarios: The Case of The European Library Nuno Freire The European Library

Resource Description and Access Setting a new standard. Deirdre Kiorgaard

Links, languages and semantics: linked data approaches in The European Library and Europeana. Valentine Charles, Nuno Freire & Antoine Isaac

Linking library data: contributions and role of subject data. Nuno Freire The European Library

Building Consensus: An Overview of Metadata Standards Development

National Library 2

Metadata. Week 4 LBSC 671 Creating Information Infrastructures

Joint Steering Committee for Development of RDA. Gordon Dunsire, Chair, JSC Technical Working Group

Cataloguing is riding the waves of change Renate Beilharz Teacher Library and Information Studies Box Hill Institute

Looking to the Future

LIDA, Zadar, June Library Models and Standards, and Their Availability in the Semantic Web Workshop

Joint Steering Committee for Development of RDA. Mapping ISBD and RDA element sets; briefing/ discussion paper

Metadata: The Theory Behind the Practice

The metadata content standard: RDA

Oshiba Tadahiko National Diet Library Tokyo, Japan

RDA Resource Description and Access

datos.bne.es: a Library Linked Data Dataset

Global standard formats for opening NLK data. Adding to the Global Information Ecosystem

Enhancing information services using machine to machine terminology services

Appellations, Authorities, and Access Plus. Gordon Dunsire Presented to CC:DA, Chicago, USA, 24 June 2017

AACR3: Resource Description and Access

School of Library & Information Science, Kent State University. NOR-ASIST, April 4, 2011

Assessing Metadata Utilization: An Analysis of MARC Content Designation Use

Libraries, classifications and the network: bridging past and future. Maria Inês Cordeiro

Metadata Standards and Applications. 6. Vocabularies: Attributes and Values

Description Set Profiles: A constraint language for Dublin Core Application Profiles

Linked Open Europeana: Semantics for the Digital Humanities

Joined up data and dissolving catalogues

To: Joint Steering Committee for Development of RDA. Subject: RDF representation of RDA relationship designators: a follow- up discussion paper

Metadata Workshop 3 March 2006 Part 1

On practical aspects of enhancing semantic interoperability using SKOS and KOS alignment

Library of Congress BIBFRAME Pilot. NOTSL Fall Meeting October 30, 2015

BIBFRAME Update Why, What, When. Sally McCallum Library of Congress NCTPG 10 February 2015

PRESENTATION OUTLINE RESEARCH AND DEVELOPMENT OF FRBR-BASED SYSTEMS TO SUPPORT USER INFORMATION SEEKING 11/9/2010. Dr. Yin Zhang Dr.

Looking to the Future: Information Systems and Metadata

Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODS. Jenn Riley IU Metadata Librarian DLP Brown Bag Series February 25, 2005

Metadata Standards and Applications

Joint Steering Committee for Development of RDA. This paper discusses the data about data or meta- metadata elements (meta- elements) in RDA.

CATALOGUING SECTION ISBD Review Group

Modelling bibliographic information: purposes, prospects, potential

Activities Report, September 2015-August 2016 (Draft)

The MEG Metadata Schemas Registry Schemas and Ontologies: building a Semantic Infrastructure for GRIDs and digital libraries Edinburgh, 16 May 2003

Library of Congress Controlled Vocabularies as Linked Data:

IFLA Library Reference Model. WHAT AND WHY? Chris Oliver University of Ottawa Library

Corso di Biblioteche Digitali

Latest news! IFLA LRM s impact on cataloguing

RDA: a quick introduction Chris Oliver. February 2 nd, 2011

Update on 3R Project (RDA Toolkit Restructure and Redesign Project)

Corso di Biblioteche Digitali

The cataloging world marches towards the next in a continuing procession of evolving bibliographic standards RDA: Resource Description and Access.

Linked Data: What Now? Maine Library Association 2017

INTRODUCTION. RDA provides a set of guidelines and instructions on recording data to support resource discovery.

Semantiska webben DFS/Gbg

Ontologies aka: your metadata elements

3R Project. RDA Toolkit Restructure and Redesign Project. James Hennelly, Managing Editor of RDA Toolkit Judy Kuhagen, 3R Project Consultant

WORLD LIBRARY AND INFORMATION CONGRESS

University of Bath. Publication date: Document Version Publisher's PDF, also known as Version of record. Link to publication

Transforming Our Data, Transforming Ourselves RDA as a First Step in the Future of Cataloging

The role of vocabularies for estimating carbon footprint for food recipies using Linked Open Data

MARC Futures. International Workshop: MARC 21 Experiences, Challenges, and Visions May Sally H. McCallum Library of Congress

Multi-agent Semantic Web Systems: Data & Metadata

Publishing Vocabularies on the Web. Guus Schreiber Antoine Isaac Vrije Universiteit Amsterdam

RDA: Where We Are and

THE GETTY VOCABULARIES TECHNICAL UPDATE

ARKIVO: an Ontology for Describing Archival Resources

RDA Steering Committee and RSC Working Group Chairs

Based on the functionality defined there are five required fields, out of which two are system generated. The other elements are optional.

Google indexed 3,3 billion of pages. Google s index contains 8,1 billion of websites

Linked Open Europeana: Semantic Leveraging of European Cultural Heritage

Formalizing Dublin Core Application Profiles Description Set Profiles and Graph Constraints

5JSC/ACOC/1/Rev 7 August Joint Steering Committee for the Revision of AACR

Reducing Consumer Uncertainty

RDA DEVELOPMENTS OF NOTE P RESENTED TO CC:DA B Y KATHY GLENNAN, ALA REPRESENTATIVE TO THE RSC J ANUARY 21, 2017

Studying conceptual models for publishing library data to the Semantic Web

Minutes of the ISBD Linked Data Study Group s Meeting

Ontology Servers and Metadata Vocabulary Repositories

Describing Knowledge Organization Systems in BARTOC and JSKOS

Joint Steering Committee for Development of RDA. Related document: 5JSC/RDA/Scope/Rev/4

Library Technology Conference, March 20, 2014 St. Paul, MN

Joint Steering Committee for Development of RDA

Draft for discussion, by Karen Coyle, Diane Hillmann, Jonathan Rochkind, Paul Weiss

- What we actually mean by documents (the FRBR hierarchy) - What are the components of documents

BIBLIOGRAPHIC REFERENCE DATA STANDARD

Instantiation: metadata models, rationales and realities for knowledge organization NASKO/Toronto. June 17, 2011

Building RDA using the FRBR Library Reference Model

Application profiles and cataloging a manifestation

Current state of Linked Data in Digital Libraries

Open Library Data and Embrace the World Library Linked Data

Europeana Data Model. Stefanie Rühle (SUB Göttingen) Slides by Valentine Charles

GLOSSARY OF LIBRARY TECHNOLOGY TERMINOLOGY

Association for Library Collections and Technical Services (A Division of the American Library Association) Cataloging and Classification Section

Transcription:

Linked data for manuscripts in the Semantic Web Gordon Dunsire Summer School in the Study of Historical Manuscripts Zadar, Croatia, 26 30 September 2011 Topic II: New Conceptual Models for Information Organization Wednesday, 28 September 2011

Overview Basic concepts of RDF (Resource Description Framework) Basis of linked data in the Semantic Web Library (+ archive + museum) standards and RDF Methodology for creating linked data from bibliographic records for manuscripts

Semantic Web machine-readable metadata Faster! 24/7/365! Global! In a standard machine-processable format Resource Description Framework (RDF) RDF supports simple, single metadata statements known as triples Each statement is in 3 parts

RDF triple The title of this manuscript is Ode to himself Subject of the statement = Subject: This manuscript Nature of the statement = Predicate: (has) title Value of the statement = Object: Ode to himself This manuscript has title Ode to himself subject predicate object This letter has author Jane Doe This codex has material papyrus

Identifiers Need unambiguous way of identifying each part of the triple for efficient machine-processing Human labels ( This codex, has title ) no good Same thing, different labels; different things, same label Exploit the utility of the URL Machine-readable, regular syntax, unambiguous, global Uniform Resource Identifier (URI)

Uniform Resource Identifier Can be any unique combination of numbers and letters No intrinsic meaning; it s just an identifying label Can look like a URL http://iflastandards.info/ns/isbd/elements/p1004 But does not lead to a Web page (in principle...) RDF requires the subject and predicate of triple to be URIs Object can be a URI, or a literal string ( Ode to himself )

Identifying bibliographic metadata Represent bibliographic schema attributes and relationships as RDF properties (= predicates) Each property has own URI Resource Description and Access (RDA), International Standard Bibliographic Description (ISBD), Functional Requirements for Bibliographic Records (FRBR), etc. Assign URIs to specific bibliographic resources The things described in catalogues and finding aids Manuscripts, collections, digital surrogates, etc. Vocabularies, subject headings, classifications, etc.

Ms1URI hastitleuri Ode to himself Ms1URI hasauthoruri Name1URI Name1URI hasnnameuri Jonson, Ben Name1URI hasbirthplaceuri Place1URI Place1URI hascoordinatesuri abcxyz This Ms1URI ms has has hasmaterial author title Ode Ben Parchment to Jonson himself

Ms1URI hastitleuri Ode to himself Parchment material Requires... treatment This ms title Ode to himself author location birthplace Place X Ben Jonson normalised name coordinates Jonson, Ben abcxyz

IFLA standards RDF representations of standards for universal bibliographic control are being developed FR (Functional Requirements) family of models For Bibliographic Records (FRBR) For Authority Data (FRAD) For Subject Authority Data (FRSAD) International Standard Bibliographic Description (ISBD) Record structure and content standard for exchange of national metadata UNIMARC Encoding for ISBD records (Bibliographic) and FRAD (Authorities)

Representation in RDF Entities => RDF classes Class = category of thing E.g. FRBR Person Attributes, tags, (sub)fields, relationships => RDF properties Property = category of statement about things E.g. ISBD title proper E.g. UNIMARC 200 $a (title proper) E.g. FRBR title of the manifestation Controlled term values => SKOS vocabularies SKOS = Simple Knowledge Organization System E.g. ISBD Area 0 (content and media type)

Namespaces Each element set of RDF classes + properties, and each vocabulary, has its own namespace Namespace is a set of URIs with the same common root or base domain E.g. http://iflastandards.info/ns/isbd/terms/contentform/ Local part is added to the root to form a URI E.g. http://iflastandards.info/ns/isbd/terms/contentform/ + T1009 = http://iflastandards.info/ns/isbd/terms/contentform/t100 9 URI for text in the ISBD Content form vocabulary

FR family Each model has its own namespace To reflect historical development Each re-uses earlier RDF elements Consolidated model under development Being informed by analysis of RDF representation FRBR RDF published FRBRer (entity-relationship) ontology Namespace elements plus OWL FRBRoo (object-oriented) Extension of CIDOC Conceptual Reference Model (for museums) FRAD and FRSAD now also published Approved at IFLA 2011 conference

ISBD Element set, and vocabularies for content and media types Namespaces now published DC Application Profile in development Models the ISBD record What properties (fields) Mandatory? Repeatable? Aggregated statements Sub-elements and punctuation

ISBD AP snippet <!-- Area 0 is mandatory and non-repeatable--> <StatementTemplate ID="hasContentFormAndMediaTypeArea" minoccurs="1" maxoccurs="1" type="nonliteral"> <Property>http://iflastandards.info/ns/isbd/elements/P1158</Property> <!-- Area 0 is an aggregated statement with SES --> <NonLiteralConstraint descriptiontemplateref="dthascontentformandmediatypearea"> <ValueStringConstraint> <SyntaxEncodingScheme>http://iflastandards.info/ns/isbd/elements/C2003 </SyntaxEncodingScheme> </ValueStringConstraint> </NonLiteralConstraint> </StatementTemplate>

UNIMARC Proposal for RDF representation made at IFLA 2011 http://conference.ifla.org/sites/default/files/files/ papers/ifla77/187-dunsire-en.pdf Discussed with Permanent UNIMARC Committee Now seeking funds for implementing a project

Other library standards in RDF (1) RDA: resource description and access Content standard based on FR models Refines the FR properties Many more controlled vocabularies than AACR Anglo-American Cataloguing Rules MARC21 Preliminary construction of unofficial namespace underway MODS/MADS (Metadata Object/Authority Description Schema) Metadata structure based on MARC21 Library of Congress Name Authority File in MADS RDF RDF representation of MODS just beginning...

Other library standards in RDF (2) BIBO: Bibliographic Ontology Classes and properties for citations and bibliographic references DCMI Metadata Terms (Dublin Core) High-level common-denominator classes and properties for memory institution metadata Lots of controlled vocabularies Library of Congress Subject Headings, Rameau (French subject headings), SWD (German subject headings), Dewey Decimal Classification, RDA vocabularies, etc.

Manuscripts in other namespaces Collex Tools for Digital Research in the Humanities http://www.performantsoftware.com/nines_wiki/ index.php/submitting_rdf BiBO (Bibliographic Ontology) http://bibotools.googlecode.com/svn/biboontology/trunk/doc/index.html

Text strings; no URIs

Acknowledgement: Antoine Isaac, STITCH Demo: SKOS, browsing and alignment Subject vocabulary, collection 1 Subjects

Acknowledgement: Antoine Isaac, STITCH Demo: SKOS, browsing and alignment Hierarchical path from root to selected subject Possible specialization for selected subject

Acknowledgement: Antoine Isaac, STITCH Demo: SKOS, browsing and alignment Semantic alignment of subjects activated Document from Collection 2

Acknowledgement: Antoine Isaac, STITCH Demo: SKOS, browsing and alignment Subject from voc2 aligned to voc1:amphibians

From record to triples (in 9 stages) Very large numbers of records Catalogue records, finding aids, etc. 300 million; 1 billion? High quality metadata In comparison with many other communities Each record may generate many triples 30 raw triples (no inferences) per MARC record? Very, very large numbers of triples Billions? Trillions?

1. Take a record Field/attribute Value Record ID 54321 Title Notes on an electrical experiment Author Michael Faraday Date 1845 LCSH Impedance (electricity) Material Paper Content form Text

2. Disaggregate to single statements Record Attribute Value 54321 (has) title Notes on an electrical experiment 54321 (has) author Michael Faraday 54321 (has) date 1845 54321 (has) LCSH Impedance (electricity) 54321 (has) material Paper 54321 (has) content form Text

3. Create URI for record Must be unique, so 54321 no good on its own http URIs are a good ( cool ) thing (W3C) So add record ID to a unique http domain E.g. http://mycollectionx.com unique to the library + 54321 + 54321 http://mycollectionx.com/54321 (or http://mycollectionx.com#54321) This is not a URL!

4. Replace record ID with URI URI Attribute Value mlx:54321 (has) title mlx:54321 (has) author mlx:54321 (has) date 1845 mlx:54321 (has) LCSH mlx:54321 (has) material Paper mlx:54321 (has) content form Notes on an electrical experiment Michael Faraday Impedance (electricity) Text mlx = qname (xmlns) = shorthand for http://mylibraryx.com/

5. Find URIs for attributes Attributes are modelled as RDF properties (predicates) in element set namespaces E.g. Dublin Core terms (dct); ISBD (isbd); FRBR (frbrer); RDA (rdaxxx); Bibliographic Ontology (bibo); etc. Choose namespace, find property with same (or closest) meaning (e.g. definition) as attribute Nearest property minimises loss of information Get URI for property If no suitable property, choose another namespace Properties do not have to come from single namespace Match and mix!

5 (cont). Find URI for title http://purl.org/dc/terms/title (dct:title) http://iflastandards.info/ns/isbd/elements/p1 014 (isbd:p1014) hastitleproper http://rdvocab.info/elements/titleproper (rdagr1:titleproper)

5 (cont). Find URI for author dct:creator rdarole:author (isbd does not cover headings )

5 (cont). Find URI for date dct:date isbd:p1018 hasdateofpublicationproductiondistribution rdagr1:dateofproduction Unbounded version: no domain or range

5 (cont). Find URI for LCSH LCSH is a subject vocabulary Controlled terms So attribute is really subject And the term itself is the value dct:subject

5 (cont). Find URI for material rdagr1:basematerial Unbounded version: no domain or range

5 (cont). Find URI for content form Assuming record uses new ISBD Area 0... isbd: P1001 hascontentform

6. Replace attributes with URIs URI URI Value mlx:54321 isbd:p1014 mlx:54321 rdarole:author mlx:54321 isbd:p1018 1845 mlx:54321 dct:subject Notes on an electrical experiment Michael Faraday Impedance (electricity) mlx:54321 rdagr1:basematerial Paper mlx:54321 isbd:p1001 Text

7. Find URIs for values If object of a triple is a URI, it can link to the subject of another triple with the same URI Linked data! Values from controlled vocabularies may have URIs Possible vocabularies: author, subject, material, content form NOT: title, date For author: Virtual International Authority File (VIAF) For LCSH: Library of Congress Authorities & Vocabularies For ISBD Area 0: Open Metadata Registry For RDA: Open Metadata Registry

7 (cont). Find URI for author Author: Michael Faraday viaf: http://viaf.org/viaf/ viaf:38158158

7 (cont). Find URI for subject (LCSH) LCSH: Impedance (electricity) lcsh: http://id.loc.gov/authorities/subjects lcsh:sh85064610

7 (cont). Find URIs for other values Material: Paper RDA base material rdabm:1011 Content form: Text ISBD Content form isbdcf:t1009

8. Replace values with URIs subject predicate object mlx:54321 isbd:p1014 mlx:54321 rdarole:author Notes on an electrical experiment mlx:54321 isbd:p1018 1845 mlx:54321 dct:subject viaf:38158158 lcsh:sh85064610 mlx:54321 rdagr1:basematerial rdabm:1011 mlx:54321 isbd:p1001 isbdcf:t1009

9. Publish triples (linked data) mlx:54321 isbd:p1014 Notes on an electrical experiment mlx:54321 rdarole:author viaf:38158158 mlx:54321 isbd:p1018 1845 mlx:54321 dct:subject lcsh:sh85064610 mlx:54321 rdagr1:basematerial rdabm:1011 mlx:54321 isbd:p1001 isbdcf:t1009

Notes on an electrical experiment isbd:p1014 mlx:54321 1845 isbd:p1018 rdarole:author rdagr1:basematerial Faraday, Michael, 1791-1867 dct:subject foaf:name viaf:38158158 lcsh:sh85064610 rdabm:1011 isbd:p1001 isbdcf:t1009 skos:preflabel paper madsrdf:authoritativelabel Impedance (electricity) skos:preflabel text tekst

Thank you! gordon@gordondunsire.com Open Metadata Registry http://metadataregistry.org