ARKIVO: an Ontology for Describing Archival Resources

Similar documents
Towards an Ontology for Describing Archival Resources

Linking library data: contributions and role of subject data. Nuno Freire The European Library

a paradigm for the Introduction to Semantic Web Semantic Web Angelica Lo Duca IIT-CNR Linked Open Data:

Syrtis: New Perspectives for Semantic Web Adoption

Contribution of OCLC, LC and IFLA

Proposal for Implementing Linked Open Data on Libraries Catalogue

Racer: An OWL Reasoning Agent for the Semantic Web

Development of an Ontology-Based Portal for Digital Archive Services

Opus: University of Bath Online Publication Store

OntoXpl Exploration of OWL Ontologies

The Semantic Planetary Data System

The Semantic Web & Ontologies

On the Reduction of Dublin Core Metadata Application Profiles to Description Logics and OWL

Library of Congress BIBFRAME Pilot. NOTSL Fall Meeting October 30, 2015

Knowledge and Ontological Engineering: Directions for the Semantic Web

Semantic Web Fundamentals

TrOWL: Tractable OWL 2 Reasoning Infrastructure

Mapping between Digital Identity Ontologies through SISM

COMP6217 Social Networking Technologies Web evolution and the Social Semantic Web. Dr Thanassis Tiropanis

Towards the Semantic Desktop. Dr. Øyvind Hanssen University Library of Tromsø

Semantic Web: vision and reality

Table of contents for The organization of information / Arlene G. Taylor and Daniel N. Joudrey.

DBpedia-An Advancement Towards Content Extraction From Wikipedia

Linked data for manuscripts in the Semantic Web

Probabilistic Information Integration and Retrieval in the Semantic Web

ProLD: Propagate Linked Data

Linked Data: What Now? Maine Library Association 2017

An Evaluation of Geo-Ontology Representation Languages for Supporting Web Retrieval of Geographical Information

Ontology Matching with CIDER: Evaluation Report for the OAEI 2008

A service based on Linked Data to classify Web resources using a Knowledge Organisation System

The role of vocabularies for estimating carbon footprint for food recipies using Linked Open Data

Building Consensus: An Overview of Metadata Standards Development

New Tools for the Semantic Web

Corso di Biblioteche Digitali

Corso di Biblioteche Digitali

Reducing Consumer Uncertainty

Introduction. October 5, Petr Křemen Introduction October 5, / 31

Information Retrieval (IR) through Semantic Web (SW): An Overview

datos.bne.es: a Library Linked Data Dataset

A Dublin Core Application Profile in the Agricultural Domain

Annotation for the Semantic Web During Website Development

Libraries, classifications and the network: bridging past and future. Maria Inês Cordeiro

OSDBQ: Ontology Supported RDBMS Querying

Towards Development of Ontology for the National Digital Library of India

Representing Product Designs Using a Description Graph Extension to OWL 2

An FCA Framework for Knowledge Discovery in SPARQL Query Answers

University of Wisconsin-Milwaukee: School of Information Studies

Ontology Exemplification for aspocms in the Semantic Web

A Community-Driven Approach to Development of an Ontology-Based Application Management Framework

Semantic web. Tapas Kumar Mishra 11CS60R32

CHAPTER 1 INTRODUCTION

Grounding OWL-S in SAWSDL

Making Ontology Documentation with LODE

Semantic Web Fundamentals

a paradigm for the Semantic Web Linked Data Angelica Lo Duca IIT-CNR Linked Open Data:

Semantic Interoperability of Dublin Core Metadata in Digital Repositories

WebGUI & the Semantic Web. William McKee WebGUI Users Conference 2009

Approach for Mapping Ontologies to Relational Databases

OntoShare An Ontology-based Knowledge Sharing System for Virtual Communities of Practice

Automating Instance Migration in Response to Ontology Evolution

Assessing Metadata Utilization: An Analysis of MARC Content Designation Use

Research Article A Method of Extracting Ontology Module Using Concept Relations for Sharing Knowledge in Mobile Cloud Computing Environment

SEMANTIC WEB AND COMPARATIVE ANALYSIS OF INFERENCE ENGINES

Interoperability for Digital Libraries

Extracting knowledge from Ontology using Jena for Semantic Web

Ontologies for Multimedia Annotation: An overview

Adaptable and Adaptive Web Information Systems. Lecture 1: Introduction

The Semantic Web DEFINITIONS & APPLICATIONS

Linking FRBR Entities to LOD through Semantic Matching

Ontology Servers and Metadata Vocabulary Repositories

Web Ontology for Software Package Management

Performance Evaluation of Semantic Registries: OWLJessKB and instancestore

Semantic Web and Electronic Information Resources Danica Radovanović

Context-aware Semantic Middleware Solutions for Pervasive Applications

Design Process Ontology Approach Proposal

Contents. G52IWS: The Semantic Web. The Semantic Web. Semantic web elements. Semantic Web technologies. Semantic Web Services

A Tagging Approach to Ontology Mapping

Two Layer Mapping from Database to RDF

Learning Probabilistic Ontologies with Distributed Parameter Learning

Towards Ontology Mapping: DL View or Graph View?

Navigating Getty Provenance Resources & Datasets 2016 Pacific Neighborhood Consortium Annual Conference

Semantic Web Mining and its application in Human Resource Management

Linked Data and RDF. COMP60421 Sean Bechhofer

Linked Open Data in Aggregation Scenarios: The Case of The European Library Nuno Freire The European Library

Publishing a Scorecard for Evaluating the Use of Open-Access Journals Using Linked Data Technologies

NOTSL Fall Meeting, October 30, 2015 Cuyahoga County Public Library Parma, OH by

Semantic Web and Natural Language Processing

A Map-based Integration of Ontologies into an Object-Oriented Programming Language

The Data Web and Linked Data.

Reducing Consumer Uncertainty Towards a Vocabulary for User-centric Geospatial Metadata

Cluster-based Instance Consolidation For Subsequent Matching

Digital Archives: Extending the 5S model through NESTOR

State of the Art of Semantic Web

Ontology Creation and Development Model

Generating and Managing Metadata for Web-Based Information Systems

Temporality in Semantic Web

Software Configuration Management Using Ontologies

Linked data implementations who, what, why?

Development of Contents Management System Based on Light-Weight Ontology

Ontology-based Architecture Documentation Approach

Transcription:

ARKIVO: an Ontology for Describing Archival Resources Laura Pandolfo 1, Luca Pulina 1, and Marek Zieliński 2 1 Dipartimento di Chimica e Farmacia, Università di Sassari, via Vienna 2, 07100 Sassari Italy, laura.pandolfo@uniss.it, lpulina@uniss.it 2 Pi lsudski Institute of America, 138 Greenpoint Avenue, Brooklyn, NY 11222 USA MZielinski@pilsudski.org Abstract. In this paper we present arkivo, an ontology designed to accommodate the archival description of historical document collections. The aim of arkivo is to provide a reference schema for a rich representation of data elements in digital historical archives. This paper briefly reports design and implementation of arkivo, as well as its application on a real world case study, namely the Józef Pi lsudski Institute of America digitized collections. 1 Context & Motivation Information technologies changed the way of doing archival research. The real change happened in the 1990s, after the advent of the Web, when a great number of historical documents were published online stored in digital archives, by providing the users the possibility to have a direct access to millions of documents in a rapid and easy way. Recently, digital archives are facing new challenges in order to overcome traditional data management and information browsing. In this context, Semantic Web (SW) [1] technologies can improve digital archives by facilitating archival metadata storage and adding semantic capabilities, which increase the quality of the information retrieval process. In particular, ontologies, which are defined as a formal specification of domain knowledge conceptualization [2], play a key role in several aspects, e.g., resources description by means of taxonomies and vocabularies in order to promote interoperability and consistency between different sources [3]. In the last decade, there has been a great amount of effort in designing vocabularies and metadata standards to catalogue documents and collections, such as Functional Requirements for Bibliographic Records (FBRB) 1, MAchine- Readable Cataloging (MARC) 2, Metadata Object Description Schema (MODS) 3, and Encoded Archival Description (EAD) 4, just to cite a few well-known examples. Metadata standards such as FRBR, EAD and MODS seem to be more 1 http://www.cidoc-crm.org/frbroo/ 2 https://www.loc.gov/marc/ 3 http://www.loc.gov/standards/mods/ 4 https://www.loc.gov/ead/

devoted to human consumption rather than machine processing [4], while concerning MARC some experts experienced that it is not suitable neither for machine processable nor for actionable metadata [5, 6]. Also, MODS is focused on objects such as books, and EAD, even reflecting the hierarchy of an archive, is focused on finding aids and the support for digitized objects is limited. Despite the wide range of metadata standards, there is an ongoing lack of clarity regarding the use of these resources, which leads to the conclusion that in absence of a standardized vocabulary or ontology, different institutions will continue to use their own distinct systems and different metadata schemas. In this paper we present arkivo, an ontology designed to accommodate the archival description, supporting archive workers by encompassing both the hierarchical structure of archives and the rich metadata attributes used during the annotation process. The strength of arkivo is not only to provide a reference schema for publishing Linked Data [7], but also to describe the relevant elements contained in these documents. The paper is organized as follows. In Section 2, we describe the design process of arkivo and we give some details about the implementation. In Section 3, we describe the application of arkivo to the digitized collections of the Józef Pi lsudski Institute of America, and we conclude the paper in discussing future work. 2 ARKIVO Ontology In this Section, we describe the design process of arkivo, in order to define key concepts and relations. We also give some insights about its implementation. As a starting point, we considered the archival management practices and the most common methods used by archives for storing and cataloging materials. Archival resources are typically organized following a hierarchy composed of different layers (i.e., fonds, file, series, item), in which fonds represent the highest level of that structure, while item the lowest. In the design phase, we considered the best practices used by archive workers in the metadata collection process. Usually, this tasks is conducted by selecting items and trying to locate, in addition to a title and abstract, key persons, places, dates and events mentioned in the item. These data are especially important in describing an historical document. One can sometimes identify the author, or document creation date, in which case the assignment of the role can be more specific. In most cases, however, one can only note that the name, place or date was mentioned in the document. We decided to model these categories of information into the ontology, in order to give all the useful elements to support the annotation process. Next, we examined some existing standards and metadata vocabularies potentially compatible with the considered domain, in order to re-use them as core ontologies. In the following, we report core ontologies and vocabularies used in arkivo. Dublin Core [8] and schema.org [9] represent the most used vocabularies for describing and cataloging both Web and physical resources, such as

Web pages and books. Due to the generic nature of their terms, we also refer to BIBO 5 ontology in order to have a detailed and exhaustive document classification. Concerning the description of personal information, we decided to use FOAF 6, a widely used vocabulary to describe persons as well as organizations. arkivo uses also Geonames 7 ontology for linking a place name to its geographical location and LODE [10] ontology which is one of the most often-used model for publishing events as Linked Data. arkivo has been developed using the OWL 2 DL language [11]. In the following, we report some implementation details of the ontology 8. The main classes in arkivo are Collection, which represents the set of documents or collections, and Item, which is the smallest indivisible unit of an archive. In order to describe the structure of the archive, different subclasses of the class Collection are modeled, namely the subclasses Fonds, File and Series. Using existential quantification property restriction (owl:somevaluesfrom), we defined that the class Item as the class of individuals that are linked to individuals in the class Fonds by the ispartof property, as shown below using the DL syntax [12] Item isp artof.f onds This means that there is an expectation that every instance of Item is part of a collection, and that collection is a member of the class Fonds. This is useful to capture incomplete knowledge. For example, if we know that the individual 701.180/11884 is an item, we can infer that it is part at least of one collection. Moreover, we defined some union of classes for those classes that perform a specific function on the ontology. In this case, we used owl:unionof constructor to combine atomic classes to complex classes, as we describe in the following: CreativeT hing Collection HistoricalEvent Item This class denotes things created by agents and it includes individuals that are contained in at least one of the classes Collection, HistoricalEvent or Item. NamedT hing P lace Date Agent It refers to things, such as date, place and agent, that are related to individuals in the CreativeThing by the object property ismentionedin, and it includes individuals that belong to at least one of the classes Place, Date or Agent. 3 Case Study: the Józef Pi lsudski Archival Collections We applied the arkivo ontology to describe the archival holdings, partially digitized, of the Pi lsudski Institute of America. In order to support data integration process of combining data residing at different sources, we used external identifiers. In this way, the resources of Pi lsudski Digital Archival Collections have been linked to external 5 http://bibliontology.com/ 6 http://www.foaf-project.org/ 7 http://www.geonames.org/ontology 8 The full arkivo documentation is available at https://github.com/arkivoteam/ ARKIVO

Fig. 1. Graphical representation of archival collections data using arkivo. datasets of the Linked Data Cloud in order to enrich the information provided with each resource. We selected, among others, Wikidata 9, DBpedia 10, and VIAF (Virtual International Authority File) 11, as the most common source of identifiers of people, organizations and historical events. In Figure 1, we report an example of individuals and properties stored in the Pi lsudski digital archive, and how these data can be connected to resources belonging to external datasets. The individuals are drawn as labeled ellipses, object properties between individuals are shown as labeled edges, while boxes represent data properties related to individuals. Looking at this figure, we can see that the fonds A701.025 is stored by the organization Pi lsudski Institute of America and it has different data properties, such as its title, i.e., Wojny Polskie, its creation start date, i.e., 1914, and end date, i.e., 1984. The object property isabout reveals the main topic addressed in this fonds, represented by the individual R10001, which has its title, i.e., Kampania wrześniowa, and its date, i.e.,1939. The individual R10001, which belongs to both the class HistoricalEvent and the class Subject, is in relationship to the DBpedia s instance Invasion of Poland through the property owl:sameas, which indicates that these two URI references actually refer to the same thing. The integration with external dataset, such as DBpedia, makes it possible to discover and share information. The archival collections of the Pi lsudski Institute of America can be effectively explored and queried at http://stardog.vuotto.tech/#/databases/arkivo_x3 12. The triple store has been implemented using Stardog 5 Community edition [13]. 9 https://www.wikidata.org/wiki/wikidata:main_page 10 https://wiki.dbpedia.org 11 https://viaf.org 12 We provided a test user with username user test and password test4jpi.

4 Conclusion and Future Work In this paper we briefly presented arkivo, an ontology designed to accommodate the archival description of historical document collections. We reported the current usage of arkivo in the context of the historical archive of the Józef Pi lsudski Institute of America. In our work on the Józef Pi lsudski archival collections, we collected approximately 300 thousand triples and its knowledge base is populated, among others, by 14,199 authors, 12,848 collections, 28,8644 items, 1,572 places and 6,615 dates. Currently, we are working on the realization of the ontology-based digital archive of the Pi lsudski Institute. As future work, we will investigate methodologies for the automatization of the ontology population process exploiting the techniques presented in [14, 15]. References 1. Berners-Lee, T., Hendler, J., Lassila, O., et al.: The semantic web. Scientific american 284(5) (2001) 28 37 2. Guarino, N., Oberle, D., Staab, S.: What is an ontology? In: Handbook on ontologies. Springer (2009) 1 17 3. Kruk, S.R., McDaniel, B.: Semantic Digital Libraries. Springer (2009) 4. Alemu, G., Stevens, B., Ross, P., Chandler, J.: Linked data for libraries: Benefits of a conceptual shift from library-specific record structures to rdf-based data models. New Library World 113(11/12) (2012) 549 570 5. Coyle, K., Hillmann, D.: Resource description and access (rda): Cataloging rules for the 20th century. D-Lib 13(1/2) (2007) 6. Tennant, R.: Marc must die. LIBRARY JOURNAL-NEW YORK- 127(17) (2002) 26 27 7. Bizer, C., Heath, T., Berners-Lee, T.: Linked data the story so far. Semantic services, interoperability and web applications: emerging concepts (2009) 205 227 8. Weibel, S.L., Koch, T.: The dublin core metadata initiative. D-lib magazine 6(12) (2000) 1082 9873 9. Patel-Schneider, P.F.: Analyzing schema.org. In: International Semantic Web Conference, Springer (2014) 261 276 10. Shaw, R., Troncy, R., Hardman, L.: Lode: Linking open descriptions of events. In: Asian Semantic Web Conference, Springer (2009) 153 167 11. Grau, B.C., Horrocks, I., Motik, B., Parsia, B., Patel-Schneider, P., Sattler, U.: Owl 2: The next step for owl. Web Semantics: Science, Services and Agents on the World Wide Web 6(4) (2008) 309 322 12. Baader, F., Lutz, C.: Description logic. In: Studies in Logic and Practical Reasoning. Volume 3. Elsevier (2007) 757 819 13. Inc., C.: Stardog 5: The manual. Available online: http://docs. stardog.com/. Last accessed on June 2018 (2017) 14. Pandolfo, L., Pulina, L., Adorni, G.: A framework for automatic population of ontology-based digital libraries. In: AI* IA 2016 Advances in Artificial Intelligence. Springer (2016) 406 417 15. Pandolfo, L., Pulina, L.: Adnoto: A self-adaptive system for automatic ontologybased annotation of unstructured documents. In: To appear in Proc. of the 30th International Conference on Industrial, Engineering, Other Applications of Applied Intelligent Systems. Springer (2017)