The Semantics of Semantic Interoperability: A Two-Dimensional Approach for Investigating Issues of Semantic Interoperability in Digital Libraries

Similar documents
Interoperability for Digital Libraries

Opus: University of Bath Online Publication Store

Comparing Open Source Digital Library Software

Surveying the Digital Library Landscape

Metadata: The Theory Behind the Practice

67th IFLA Council and General Conference August 16-25, 2001

Metadata Workshop 3 March 2006 Part 1

Assessing Metadata Utilization: An Analysis of MARC Content Designation Use

Metadata and Encoding Standards for Digital Initiatives: An Introduction

MAINTAINING QUALITY METADATA: TOWARD EFFECTIVE DIGITAL RESOURCE LIFECYCLE MANAGEMENT

Metadata Quality Assessment: A Phased Approach to Ensuring Long-term Access to Digital Resources

Table of contents for The organization of information / Arlene G. Taylor and Daniel N. Joudrey.

A Repository of Metadata Crosswalks. Jean Godby, Devon Smith, Eric Childress, Jeffrey A. Young OCLC Online Computer Library Center Office of Research

Creating descriptive metadata for patron browsing and selection on the Bryant & Stratton College Virtual Library

Cross-domain Metadata Interoperability for Integrated Information Services

Development of an Ontology-Based Portal for Digital Archive Services

A Comparative Study of the Search and Retrieval Features of OAI Harvesting Services

IMLS National Leadership Grant LG "Proposal for IMLS Collection Registry and Metadata Repository"

1-9 August 2003, Berlin

data elements (Delsey, 2003) and by providing empirical data on the actual use of the elements in the entire OCLC WorldCat database.

Better Guidelines, Better Functionality: How Metadata Supports the Cycle of System Improvement at UNT

Networked Access to Library Resources

Enabling Interaction and Quality in a Distributed Data DRIS

COAR Interoperability Roadmap. Uppsala, May 21, 2012 COAR General Assembly

Metadata for the Web A Necessary Evil? CS 431 March 2, 2005 Carl Lagoze Cornell University

Building Consensus: An Overview of Metadata Standards Development

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

Digital Library Curriculum Development Module 4-b: Metadata Draft: 6 May 2008

Digital repositories as research infrastructure: a UK perspective

Metadata Standards and Applications. 6. Vocabularies: Attributes and Values

Managing Image Metadata

Open Archives Initiatives Protocol for Metadata Harvesting Practices for the cultural heritage sector

Evaluation and Design Issues of Nordic DC Metadata Creation Tool

If you build it, will they come? Issues in Institutional Repository Implementation, Promotion and Maintenance

Metadata for general purposes

A GML SCHEMA MAPPING APPROACH TO OVERCOME SEMANTIC HETEROGENEITY IN GIS

Document Title Ingest Guide for University Electronic Records

Development of a Digital Repository Prototype applied to Faculty of Pharmacy, University of Lisbon. Sílvia Lopes Pedro Faria Lopes Fernanda Campos

Integrating Multi-dimensional Information Spaces

The Semantic Institution: An Agenda for Publishing Authoritative Scholarly Facts. Leslie Carr

1 of 9 16-Mar-15 09:32

Applying the Levels of Conceptual Interoperability Model to a Digital Library Ecosystem a Case Study

RDA Resource Description and Access

Archives in a Networked Information Society: The Problem of Sustainability in the Digital Information Environment

SNHU Academic Archive Policies

Update on the TDL Metadata Working Group s activities for

PRIOR System: Results for OAEI 2006

Research on the Interoperability Architecture of the Digital Library Grid

Metadata for Digital Collections: A How-to-Do-It Manual. Introduction to Resource Description and Dublin Core

MPEG-21 Current Work Plan

University of Bath. Publication date: Document Version Publisher's PDF, also known as Version of record. Link to publication

Semantic Technologies for Nuclear Knowledge Modelling and Applications

The Dublin Core Metadata Element Set

A Service-Oriented Architecture for Digital Libraries

AN EXPLORATORY STUDY OF THE DESCRIPTION FIELD IN THE DIGITAL PUBLIC LIBRARY OF AMERICA

Towards Development of Ontology for the National Digital Library of India

Preserving repository content: practical steps for repository managers

Formalizing Dublin Core Application Profiles Description Set Profiles and Graph Constraints

Comparing Curricula for Digital Library. Digital Curation Education

Functional & Technical Requirements for the THECB Learning Object Repository

Institutional Repository using DSpace. Yatrik Patel Scientist D (CS)

RVOT: A Tool For Making Collections OAI-PMH Compliant

Preservation Planning in the OAIS Model

Towards a joint service catalogue for e-infrastructure services

Meta-Bridge: A Development of Metadata Information Infrastructure in Japan

re3data.org - Making research data repositories visible and discoverable

Headings: Academic Libraries. Database Management. Database Searching. Electronic Information Resource Searching Evaluation. Web Portals.

Information Retrieval and Knowledge Organisation

Linked Open Data in Aggregation Scenarios: The Case of The European Library Nuno Freire The European Library

Internal Structure of Information Packages in Digital Preservation

Increasing access to OA material through metadata aggregation

Automation of Semantic Web based Digital Library using Unified Modeling Language Minal Bhise 1 1

Functional Requirements of Metadata System: From User Needs Perspective

Managing the Transition from Large-Scale Oral History Research to Digital Archive: The Digital Librarian s Perspective

The Scottish Collections Network: landscaping the Scottish common information environment. Gordon Dunsire

LORE: A Compound Object Authoring and Publishing Tool for the Australian Literature Studies Community

Re-designing Online Terminology Resources for German Grammar

Verification of Multiple Agent Knowledge-based Systems

Nuno Freire National Library of Portugal Lisbon, Portugal

DL User Interfaces. Giuseppe Santucci Dipartimento di Informatica e Sistemistica Università di Roma La Sapienza

Metadata Catalogue Issues. Daan Broeder Max-Planck Institute for Psycholinguistics

Networking European Digital Repositories

Ubiquitous and Open Access: the NextGen library. Edmund Balnaves, Phd. Information Officer, IFLA IT Section

An Intelligent Retrieval Platform for Distributional Agriculture Science and Technology Data

The Ohio State University s Institutional Repository: The Knowledge Bank. Knowledge Bank Users Group. 2 nd Annual Meeting May 11, Welcome!

Metadata Standards and Applications. 4. Metadata Syntaxes and Containers

Cheshire 3 Framework White Paper: Implementing Support for Digital Repositories in a Data Grid Environment

Metadata Overview: digital repositories

Contribution of OCLC, LC and IFLA

Navigating the Universe of ETDs: Streamlining for an Efficient and Sustainable Workflow at the University of North Florida Library

Specific requirements on the da ra metadata schema

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

Guidelines for Developing Digital Cultural Collections

An Approach to Resolve Data Model Heterogeneities in Multiple Data Sources

PERFORMANCE EVALUATION OF ONTOLOGY AND FUZZYBASE CBIR

ARKive-ERA Project Lessons and Thoughts

Representing Multiple Standards in a Single DAM: Use of Atomic Classes

Enabling Seamless Sharing of Data among Organizations Using the DaaS Model in a Cloud

Sharing Data on the Aquileia Heritage: Proposals for a Research Project

Library Technology Conference, March 20, 2014 St. Paul, MN

Transcription:

The Semantics of Semantic Interoperability: A Two-Dimensional Approach for Investigating Issues of Semantic Interoperability in Digital Libraries EunKyung Chung, eunkyung.chung@usm.edu School of Library and Information Science, University of Southern Mississippi William E. Moen, wemoen@unt.edu School of Library and Information Science, University of North Texas Introduction The networked information environment comprising digital libraries, digital collections, and digital repositories increase people s expectations for information access. Specifically, users anticipate better search capabilities across these networked information resources and the metadata records associated with the resources. The opportunities also exist for improved information exchange or integration in and between communities and applications (Arms et al., 2002; Chen, 1999; Moen, 2001; Tennant, 2001; Zeng & Chan, 2004). To improve capability for information exchange or integration, and to improve search across these resources, it is essential to enhance interoperability of systems and data. This involves addressing not only interoperability at the syntactic and functional levels but at the semantic level (Moen, 2001). Various information services such as federated searching, harvesting metadata, and gathering are often dependent on the capability and quality of semantic interoperability across different collections, systems, domains, and communities (Arms et al., 2002). Improved semantic interoperability provides a basis for more effective information services for users. Effective semantic interoperability is required to meet users expectations of discovering relevant networked information resources via searches across heterogeneous as well as within homogeneous systems and collections (Chen, 1999; Tennant, 2001). Although there have been efforts in improving interoperability, many of those focused on protocol and other aspects of interoperability. It is important at this point to examine the current status of semantic interoperability in terms of digital libraries, collections, and repositories across domains, communities, and applications. The purpose of this preliminary study described here is to suggest one approach for addressing some aspects of semantic interoperability and to use this approach to examine four implementations in different communities and applications using Greenstone, Fedora, DSpace, and EPrints. Specifically, four selected implementations were examined with respect to two fundamental dimensions of semantic interoperability: data attribute (i.e., names of fields or elements in the metadata records) and data value (i.e., the actual content of the fields or elements). This study intends to show how this methodological approach can identify potential challenges to semantic interoperability and subsequent studies using this method can yield results and findings that may lead to common guidelines, suggestions, and solutions for the digital library community in terms of semantic interoperability. Related Research and Research Methodology Different types or levels of search and data interoperability, such as syntactic, functional and semantic, have been identified (Moen, 2001). While syntactic and functional levels 1

may focus on protocols used for information retrieval system communication, the semantic level concerns the understandings and meanings of interchanged data. (Ouksel & Sheth, 1999). In the hierarchy of challenges for effective interoperability, semantic interoperability is regarded as a grand-challenge research area for achieving high-quality interoperability (Lynch & Garcia-Molina, 1995). Although there are complex challenges to achieve semantic interoperability, many efforts have been made to improve semantic interoperability (Chen, 1999) to support high quality information exchange and integration across communities. Semantic interoperability research falls generally into two primary areas: data attributes and data values. The data attribute area addresses questions as to the names, labels, semantics, and granularity of metadata scheme elements and database fields. The data value area addresses the data or information provided in an element or database field. As networked information resources expand in type and numbers in collections, libraries, and repositories, various general or domain-specific, standard and nonstandard metadata schema are being used. Moen (2001) and Zeng and Chan (2004) pointed out, that the diversity of metadata schemes, schemas, and element sets in use means that the data-attribute area is one to be considered to improve semantic interoperability. Accordingly, a range of approaches have been attempted to improve semantic data attribute interoperability, and these approaches can be categorized as either precoordinated or post-coordinated. The a priori agreement approach consists of specifying through the use of application profiles or registries or other methods the data attributes (i.e., metadata elements) to be used in systems. Diverse repositories or libraries all agree to use the same data attributes. A post-coordinated approach, on the other hand, endeavors to map, match, or crosswalk different data attributes after specific digital libraries, collections, and repositories are implemented. Buckland (1999) proposed yet another way of dealing with the diversity of data attributes by mapping users query vocabularies to metadata vocabularies within various systems. Semantic interoperability related to the data-value area focuses on the data content or information associated with the data attributes, and one of the most critical aspects of this is related to subject-related search and retrieval. Ontologies and controlled vocabularies are constructed with different scopes and levels of abstraction and granularity as needed by various disciplines, and this increases the difficulty to achieve semantic interoperability with data values than with the data-attribute dimension. Proposals for dealing with data value interoperability have been suggested such as Kuhr s (2003) mega-thesaurus to map multiple controlled vocabularies. Doerr (2001) approached the issue with a concept-based mapping between numerous vocabularies for subject retrieval. Figure 1 illustrates the relationship between the two dimensions data attribute and data value in the context of semantic interoperability. As semantic interoperability of the two dimensions increases, overall semantic interoperability increases accordingly. 2

Semantic Interoperability of Data Values Overall Semantic Interoperability Semantic Interoperability of Data Attributes Figure 1. Dimensions of Semantic Interoperability Table 1 specifies components of the two dimensions. The data attribute dimension s components are: semantics and contents (Chan & Zeng, 2006). The semantics aspect is designed to constrain the attributes according agreed upon meaning and the data to be associated with that attribute. Using a searching example, if a user wants to search across repositories for author information, the target repositories should execute the search against the author attribute (or element or field). Thus, the system maintains the user s semantic intention with the search (and not expand the search to other attributes). The other variation is to match users searching intentions with semantically related attributes. There might be a case in which a user wants to search information with an author attribute, but the target systems label that attribute as originator, creator, or writer. To achieve a higher degree of semantic interoperability between these systems, the seemingly dissimilar but semantically related attributes should be connected (or mapped ) with each other in a proper way. The contents aspect is defined as declarations or instructions of what and how values should be assigned to the elements (Chan & Zeng, 2006). For instance, the creator attribute could be prescribed differently in terms of the data values in it depending on specific collections, communities, and domains. The data-value dimension of semantic interoperability is more complex and difficult to achieve, since the main concerns of the dimension are dependent on specific controlled vocabularies and ontologies. In addition to addressing various controlled vocabularies and ontologies, the data-value dimension is interrelated with aspects of linguistics and meanings of words such as polysemy, synonymity, and granularity, especially in subjectrelated attributes. Based on the elements of the data-value dimension, with respect to interoperability among different collections, domains, and communities, it becomes much more complicated to achieve semantic interoperability than in homogeneous settings. 3

Table 1. The Elements in two dimensions Dimension Element Example Data Attribute Semantics The meaning of Creator is defined as the party or parties responsible for creating the specific resource Content The Creator attribute should be assigned to individuals or organizations responsible for the intellectual contents of a specific resource, rather than the physical contents Data Value Vocabularies and Ontologies Different controlled vocabularies and ontologies Granularity Different levels of terms assigned in the same concept (e.g., vehicle vs. sedan) Word Polysemy and synonymity Applying the Two Dimensional Approach Digital libraries and repositories have been implemented in many communities and domains for a variety of collections, and using several software platforms and applications such as Greenstone, DSpace, Eprints, and Fedora. Four implementations, The Writing University Archive, Duke Law Faculty Scholarship Repository, Drexel University E-repository and Archives, and Tufts Digital Library were selected for analysis using the two-dimensional perspective. These collections primarily contain scholarly works from specific institutions and were designed to assist teaching and research activities. Table2 summaries key details of these implementations. Duke Law Faculty Scholarship Repository Drexel University E- repository and archives Tufts Digital Library Table 2. DL examples across different applications DL Title URL Application Data Attributes The Writing University Archive http://iwp.infoscience.uiowa.edu/cgi- Greenstone Titles, Authors, Country, Year bin/library http://eprints.l aw.duke.edu/ http://dspace.l ibrary.drexel.e du/ http://dl.tufts.e du/ Eprints DSpace Fedora Title, Authors, Abstract, Keywords, Subjects, Type, Conference, Department, Editors, Status, Refereed, Publication, Year Collections, Titles, Authors, Subjects Creator, Collection, Date, Description, Organizations, People, Places, Topics, Title, Type 4

Table 2 also lists the data attributes identified through an examination of each of these implementations. This shows that data attributes occur that are semantically related but given different attribute names such as authors, creators, and contributors. Secondly, certain data attributes may occur in one or more implementations but not in all. For example, there is no subject attribute in The Writing University Archive, while it occurs in the others. In addition, there are more granular attributes in an implementation that could be subsumed by more broadly defined attributes in other implementations. For example, the Tufts Digital Library has data attributes of topics, people, places, and organization that could be considered subject-related attributes. The contents component in data attributes were examined in terms of how and what to assign the values in. In the case of the author or creator data attribute in The Writing University Archive and Drexel University E-repository and archives, the primary description method is in the order of last name and first name, but there is a differently abbreviated description issue, especially in the first name. In addition, the data-value dimension was examined with respect to subject or topic data attributes. There is variation in which controlled vocabularies and ontologies are used in the different implementations for assigning subject terms as data values to the appropriate data attributes. The Tufts Digital Library defines the data value of the subject attribute as terms or values selected for identification of broader categorization of an object 1 without specifying any controlled vocabularies. The Duke Law Faculty Scholarship Repository uses Library of Congress Subject Headings for terms to use in the subject attribute. In addition, Drexel University E-repository and archives uses inhouse controlled vocabulary for the subject attribute. This preliminary study primarily examined the utilization of various controlled vocabularies and ontologies, rather than granularity and word elements in the data-value dimension, but this method will support examining those other components in this dimension. Conclusion Semantic interoperability plays a key role in providing quality information services in the networked information environment. This preliminary study related to semantic interoperability examined four implementations within similar domains of knowledge from a two-dimensional perspective. The data-attribute dimension, semantics and contents of the data attribute were examined in terms of different naming practices, occurring and non-occurring attributes, granular and more general attributes that may be semantically related. and abbreviated-designation issues. The data-value dimension was examined with respect to different controlled vocabularies and ontologies in the subject-related attributes. Future research using this two-dimensional perspective can investigate issues and potential challenges for semantic interoperability in digital libraries, repositories, and archives across domains of knowledge, communities, and applications. Extending and expanding this research can contribute to a more comprehensive survey these networked information resources, and the findings from the research can result in suggestions and solutions to improve semantic interoperability for these collections and systems. 1 http://dca.tufts.edu/tdl/search_help/index.html#advanced_search 5

References Arms, W. Y., Hillmann, D., Lagoze, C., Kraft, D., Marisa, R., Saylor, J., et al. (2002). A spectrum of interoperability. D-Lib Magazine, 8(1). Retrieved Feb 10 2007, from http://www.dlib.org/dlib/january02/arms/01arms.html. Buckland, M. (1999). Mapping entry vocabulary to unfamiliar metadata vocabularies. D- Lib Magazine, 5(1). Retrieved Retrieved Feb 10 2007, from http://www.dlib.org/dlib/january99/buckland/01buckland.html. Chan, L. M. & Zeng, M. L. (2006). Metadata interoperability and standardization a study of methodology I: Achieving interoperability at the schema level. D-Lib Magazine, 12(6). Retrieved Feb 11 2007 from http://www.dlib.org/dlib/june06/zeng/06zeng.html. Chen, H. (1999). Semantic research for digital libraries. D-Lib Magazine, 5(10). Retrieved Feb 9 2007 from http://www.dlib.org/dlib/october99/chen/10chen.html. Doerr, M. (2001). Semantic problems of thesaurus mapping. Journal of Digital Information, 1(8). Retrieved Feb 1 2007 from http://jodi.tamu.edu/articles/v01/i08/doerr/. Kuhr, P. S. (2003). Putting the world back together: Mapping multiple vocabularies into a single thesaurus. In Subject Retrieval in a Networked Environment: Proceedings of the IFLA Satellite Meeting, Dublin, OH (pp. 37-42). München: K.G. Saur. Lynch, C., & Garcia-Molina, H. (1995). Interoperability, scaling, and the digital libraries research agenda. Retrieved February, 1, 2007, from http://wwwdiglib.stanford.edu/diglib/pub/reports/iita-dlw/main.html. Moen, W. (2001). Mapping the interoperability landscape for networked information retrieval. In Proceedings of 1st ACM/IEEE-CS Joint Conference on Digital Libraries (pp. 50-51). New York: ACM Press. Ouksel, A. M., & Sheth, A. (1999). Semantic interoperability in global information systems: A brief introduction to the research area and the special section. SIGMOD Record, 28(1), 5-12. Tennant, R. (2001). Different paths to interoperability. Library Journal, 126(3), 118-119. Zeng, M. L., & Chan, L. M. (2004). Trends and issues in establishing interoperability among knowledge organization systems. Journal of the American Society for Information Science and Technology, 55(5), 377-395. 6