Integration of Heterogeneous Metadata in Europeana. Cesare Concordia Institute of Information Science and Technology-CNR

Similar documents
The Biblioteca de Catalunya and Europeana

Europeana, the prototype EDLfoundation Europeana Network Europeana, vs. 1.0 ThoughtLab Technical requirements

Europeana and the Mediterranean Region

Digital Library Interoperability. Europeana

When Semantics support Multilingual Access to Cultural Heritage The Europeana Case. Valentine Charles and Juliane Stiller

EUROMUSE: A web-based system for the

The Europeana Data Model, current status

ECLAP Kick-off An Aggregator Project for EUROPEANA

Europeana Data Model. Stefanie Rühle (SUB Göttingen) Slides by Valentine Charles

Specific tools to be used for conversion and adaptation of proprietary museum data

Links, languages and semantics: linked data approaches in The European Library and Europeana. Valentine Charles, Nuno Freire & Antoine Isaac

Introduction to the Europeana SIP CREATOR

EUROPEANA METADATA INGESTION , Helsinki, Finland

An aggregation system for cultural heritage content

Achieving interoperability between the CARARE schema for monuments and sites and the Europeana Data Model

Digital Library Interoperability. The European Digital Library

Fondly Collisions: Archival hierarchy and the Europeana Data Model

From The European Library to The European Digital Library. Jill Cousins Inforum, Prague, May 2007

Developing Shareable Metadata for DPLA

& Interoperability Issues

The Europeana Data Model and Europeana Libraries Robina Clayphan

Evolving Europeana s Metadata: from ESE to EDM

Common presentation of data from archives, libraries and museums in Denmark Leif Andresen Danish Library Agency October 2007

Europeana update: aspects of the data

The Local Amsterdam Cultural Heritage Linked Open Data Network

Introduction to Metadata for digital resources (2D/3D)

D4.8 Report on semantic interoperability with Europeana

Introduction

ECP-2008-DILI EuropeanaConnect. D5.7.1 EOD Connector Documentation and Final Prototype. PP (Restricted to other programme participants)

The CARARE project: modeling for Linked Open Data

Developing an Automatic Metadata Harvesting and Generation System for a Continuing Education Repository: A Pilot Study

MINT METADATA INTEROPERABILITY SERVICES

Presentation to Canadian Metadata Forum September 20, 2003

Multimedia Project Presentation

Workflow option for getting an existing CONTENTdm collection ready for IM DPLA harvest

CARARE: project overview

Metadata for the caenti.

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

Cataloguing manuscripts in an international context

BHL-EUROPE: Biodiversity Heritage Library for Europe. Jana Hoffmann, Henning Scholz

Data is the new Oil (Ann Winblad)

Europeana: from. inspirational idea to sustainable service. National Conference Romania. Cluj-Napoca 16 th June Lizzy Komen, Europeana

Building a framework for semantic cultural heritage data

UKOLN involvement in the ARCO Project. Manjula Patel UKOLN, University of Bath

INTRO INTO WORKING WITH MINT

Copyright 2012 Taxonomy Strategies. All rights reserved. Semantic Metadata. A Tale of Two Types of Vocabularies

Reflecting on digital library technologies

From Open Data to Data- Intensive Science through CERIF

CARARE 2.0: a metadata schema for 3D Cultural Objects

Information und Wissen: global, sozial und frei?

Hunting for semantic clusters

HOPE Heritage of the People s Europe Grant agreement No Deliverable D2.2

Mapping from Flat or Hierarchical Metadata Schemas to a Semantic Web Ontology. Justyna Walkowska, Marcin Werla

Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

Europeana Creative. EDM Endpoint. Custom Views.

Europeana Creative. EDM Endpoint. Custom Views

The Local Amsterdam Cultural Heritage Linked Open Data Network. Lukas Koster Library of the University of Amsterdam.

Implementing digital folklore collections

The Design of a DLS for the Management of Very Large Collections of Archival Objects

Performing LOD: Using the Europeana Data Model (EDM) for the aggregation of metadata from the performing arts domain

Nuno Freire National Library of Portugal Lisbon, Portugal

University of Bath. Publication date: Document Version Publisher's PDF, also known as Version of record. Link to publication

RDF and Digital Libraries

The Minister s Selection

Designing a Multi-level Metadata Standard based on Dublin Core for Museum data

National Documentation Centre Open access in Cultural Heritage digital content

New Zealand Government Locator Service (NZGLS) Metadata Schema Compliance Study

Linked Open Europeana: Semantics for the Digital Humanities

EFG s data cleaning and enrichment work

Linked Open Europeana: Semantic Leveraging of European Cultural Heritage

A common metadata approach to support egovernment interoperability

Linking library data: contributions and role of subject data. Nuno Freire The European Library

Reducing Consumer Uncertainty Towards a Vocabulary for User-centric Geospatial Metadata

Deliverable Number: D 5.3. Title of the Deliverable: Metadata gateway. Dissemination Level: Contractual Date of Delivery to EC: Month 12

Based on the functionality defined there are five required fields, out of which two are system generated. The other elements are optional.

Metadata for Non-conventional Cultural/Historical Resources: Cultural Heritage in South/Southeast Asia, Japanese Pop-culture, and Disaster Archives

Bringing Europeana and CLARIN together: Dissemination and exploitation of cultural heritage data in a research infrastructure

Ohio Digital Network Metadata Application Profile

Joining the BRICKS Network - A Piece of Cake

Museum Collections and the Semantic Web

Appendix REPOX User Manual

Europeana Linked Open Data data.europeana.eu

Terminology Management Platform (TMP)

OpenAIRE Guidelines Promoting Repositories Interoperability and Supporting Open Access Funder Mandates

Interoperability Challenges in Digital Libraries

Open Archives Initiatives Protocol for Metadata Harvesting Practices for the cultural heritage sector

EUDAT-B2FIND A FAIR and Interdisciplinary Discovery Portal for Research Data

For those of you who may not have heard of the BHL let me give you some background. The Biodiversity Heritage Library (BHL) is a consortium of

Implementing Digital Folklore Collections

Europeana and semantic alignment of vocabularies

Sharing Archival Metadata MODULE 20. Aaron Rubinstein

(Geo)DCAT-AP Status, Usage, Implementation Guidelines, Extensions

The DIGMAP Virtual Digital Library

Workshop B: Application Profiles Canadian Metadata Forum September 28, 2005

Metadata for Digital Collections: A How-to-Do-It Manual

Utilizing PBCore as a Foundation for Archiving and Workflow Management

A European Tower of Babel?

Report on the European Resolution Discovery Service (ERDS) Meeting (Feb 17/18, 2010)

Digitisation Standards

Metadata Requirements for Digital Museum Environments

Transcription:

Integration of Heterogeneous Metadata in Europeana Cesare Concordia cesare.concordia@isti.cnr.it Institute of Information Science and Technology-CNR

Outline What is Europeana The Europeana data model The Europeana Semantic Elements (ESE) Case study: the data ingestion in the Europeana prototype Conclusion and next steps

What is Europeana European Digital Library Open access to the digitized objects of European cultural institutions Cross multilingual search European cultural heritage at a single place Across cultural domains and across countries General public - User centered Digital Library technologies + Web 2.0

What is Europeana The Europeana Digital Library will be the result of a number of projects run by different cultural heritage institutions, among them there are: Athena an aggregator that helps museums bringing their content to Europeana APENet a BPN whose objective is to build an Internet Gateway for Documents and Archives in Europe EUROPEANAlocal that aims to improve the interoperability of the digital content held by regional and local institutions European Film Gateway: find solutions for providing integrated access to the Europe's cinematographic heritage All are part-funded by the European Commission s econtentplus programme.

What is Europeana The full implementation of the Europeana is the goal of the two core technology projects: EuropeanaConnect that will provide the technologies and resources to semantically enrich the digital content in Europeana. Europeana V1.0 that will implement the technology platform They are successors to the EDLNet thematic network which created the EDL Foundation and the Europeana prototype (www.europena.eu) in short Europeana v1.0 and EuropeanaConnect will turn the prototype into an operational service

Europeana architecture Europeana is not a Web Portal Europeana is a services platform providing an Application Program Interface (API) enabling cultural institutions and users to Access Europeana content Provide content to Europeana Build applications using Europeana functionalities for their own use. According to DELOS classification Europeana is a Digital Library System (DLS) The Europeana Portal is a web application using the Europeana API to access the Europeana Digital Library

Europeana DLS functional architecture

Europeana data flow

Europeana data space Europeana will create a data space that is a representation of the content providers data spaces The Europeana data space will contain Digital Surrogate Objects (DSO) that are defined as follows: "the minimal significant documentary object unit a given content provider is able / willing to identify (in the case of textual object there thus can be surrogates on the level of the entire document, on chapter level or on page, paragraph, sentence or even word levels)" [ EDLNet Deliverable 2.5] Each DSO will be a web resource, it will be identified by a URI owned by Europeana.

Digital Surrogate Objects There are several kinds of DSO, depending on the kinds of objects to be represented: Real Physical Object (RPO): the physical object, for instance a painting, a building, a book Digital Representation Object (DRO): a digital object obtained by digitizing an RPO, usually created by the data provider Digital Primary Object (DPO), a "born digital" object, i.e. a digital object that is not a DRO

Digital Surrogate Objects Each DSO contains at least an identifier, a link to the Object in the content provider data space, the metadata record describing the object and some elementary technical and licensing information There should be a one-to-one correspondence between remote object entities and DSOs Surrogates can be linked each others

Digital Surrogate Objects On a very abstract level Europeana can be seen as a large collection of DSOs representing born digital or digitised cultural heritage objects Surrogates will be linked to semantic resources representing concepts as well as to reference entities such as persons, places and periods in time (contextualization)

Surrogate Object Data Model (EDLNet)

Europeana Semantic Elements In EDLNet the Surrogate Data Model has been implemented using the Europeana Semantic Elements (ESE) metadata format The ESE, consists of the Dublin Core (DC) metadata elements, a subset of the DC terms and a set of twelve elements which were created to meet Europeana s needs.

Source Element Refinement(s) Europeana Semantic Elements Source Element Refinement(s) DC title alternative DC coverage spatial; temporal DC creator DC rights DC subject DC terms provenance DC DC DC description tableofcontents publisher contributor DC date created; issued DC type DC format extent; medium DC DC identifier source DC relation isversionof; hasversion; isreplacedby; replaces; isrequiredby; Europeana relation isshownby; isshownat Europeana usertag Europeana unstored Europeana object Europeana language Europeana provider Europeana type Europeana uri Europeana year Europeana hasobject Europeana country

Europeana Semantic Elements (ESE) DSOs are created during the Europeana data ingestion process using the information provided by the content suppliers

EDLNet Data Ingestion Workflow Source collections are acquired via harvesting or received by content providers as XML files The mapper checks the source collections and creates the mapping rules XSLT is used to implement the mapping ESE collections are stored and indexed

Europeana prototype data space The first Europeana prototype has been presented in November 08 as result of the EDLNet project Next prototype will be released in September 09 As of April 09, it contains DSOs referring information objects provided by 54 cultural institutions from 24 European countries 4.5 milion of surrogate digital objects stored in the data space 15 different metadata formats

Heterogeneity: records by countries

Heterogeneity: records by languages

Heterogeneity: records by data providers

Heterogeneity: records grouped by metadata format

Source collection snippet: DC <ListRecords> <record> <dc:title>από τα γλυκοχαράµατα της ζωής µου: Σαλαµίς</dc:title> <dc:creator>κ. Ν. Κωνσταντινίδης</dc:creator> <dc:subject/> <dc:description/> <dc:publisher>νέα Ζωή</dc:publisher> <dc:contributor/> <dc:date>1970-01-01</dc:date> <dc:type>articles</dc:type> <dc:format>image/jpeg</dc:format> <dc:identifier>http://xantho.lis.upatras.gr/kosmopolis/index.php/nea_zoi/article/view/ 313 <dc:identifier> <dc:source>νέα Ζωή; Vol 1, No 1 (1904); σελ. 07-08</dc:source> <dc:language>gr</dc:language> <dc:coverage/> <dc:rights/> </record>

Example: DC mapping

Source collection snippet: MemoireSDAP <BASE> <NAME>Mémoire</NAME> <DOMAINE>SDAP</DOMAINE> <NOTICES> <NOTICE ID="AP080050805330033"> <REF> AP080050805330033 </REF> <ADRESSE> rue Petit ; villa "Rosario", ilot A</ADRESSE> <AUTP>Richard, Fran ßoise</AUTP> <AUTOR>Delamotte, Patrick (architecte)</autor> <COM> Mers-les-Bains </COM> <MCL>Secteur sauvegardé</mcl> <LEG>cartouche céramique en relief façon "cuir" ; décor briques vernissées.</ LEG> <COULEUR>OUI</COULEUR> <LIB>Epoque 19ème</LIB> <PAYS>France</PAYS> <INSEE>80533</INSEE> <TYPEIMG>JPG ; oui</typeimg> <TYPSUPP>DS1</TYPSUPP> <REFIM>AP080_050805330033NUCA_P.JPG,DS1,,</REFIM> <VIDEO>/Wave/image/memoire/1047/ap080_050805330033nuca_p.jpg;/Wave/ image/memoire/1047/ap080_050805330033nuca_v.jpg</video> </NOTICE>.

Analysis file of the MemoireSDAP collection

Example: MemoireSDAP mapping

<xsl:template match="base"> <metadata> <xsl:comment>europeana:type has 'image' value </xsl:comment> <xsl:apply-templates select="notices/notice"/> </metadata> </xsl:template> <xsl:template match="notices/notice"> <record> <xsl:apply-templates select="deno"/> <xsl:apply-templates select="idprod"/> <xsl:apply-templates select="adresse"/> <xsl:apply-templates select="autor"/> <xsl:apply-templates select="autti"/> <xsl:apply-templates select="com"/> <xsl:apply-templates select= VIDEO"/> XSLT snippet

XSLT Snippet <xsl:template match="video"> <europeana:isshownby > <xsl:text>http://www.culture.gouv.fr</xsl:text><xsl:value-of select="substringbefore(.,'; )"/> </europeana:isshownby> <europeana:object > <xsl:text>http://www.culture.gouv.fr</xsl:text><xsl:value-of select="substringbefore(.,';')"/> </europeana:object> </xsl:template>

Analysis file of the mapped ESE collection

Theory and Practice Lot of manual work for writing the mapping rules and implement them Often mapping files cannot be reused Same metadata element have different kind of values for different collections It is difficult distinguish metadata records describing DR, DP or RP objects Many content providers provide minimal metadata records, it is difficult to build significant DSOs Few relationships among digital objects in the metadata

Theory and Practice Surrogates contextualization is a complex task Implicit contextualization i.e. matching elements and attributes values with (few) classification schemes and/or authority files has been applied to several collections Explicit contextualization i.e. using elements values directly linking to semantic resources (ex. IsAbout) has been in practice never applied Need to adopt authority files

Data normalization problem: example

DSO Data Model (EDLNet): theory and practice

Ingestion: from EDLNet to Europeana The DSO data model is currently being reviewed, it should move from DC to CIDOC-CRM Extracting and adding snippets (research) Provenance Events Model adopted for Europeana will be released in September 09 More involvement of content providers in the ingestion workflow (EuropeanaConnect) Extracting knowledge from the data space to contextualize and create relationships among DSOs (research) Try to make the mapping a semi-automatic process (research) Going open source Europeana Labs

Acknowledgements The Digital Surrogate Object model has been defined by WP leaders of EDLNet Work Package 2: Makx Dekkers, Stefan Gradmann and Carlo Meghini and reviewed by EDLNet members The ESE model has been defined by Go Sugimoto (EDL Office) EDLNet Interoperability Manager and reviewed by EDLNet members. The xml-analyzer program has been developed by Gerald de Jong and Sjoerd Siebinga (EDL Office).

Thanks