Digital Archives: Extending the 5S model through NESTOR

Similar documents
The NESTOR Model: Properties and Applications in the Context of Digital Archives

Empowering Archives through Annotations

A methodology for Sharing Archival Descriptive Metadata in a Distributed Environment

The Design of a DLS for the Management of Very Large Collections of Archival Objects

Building a Distributed Digital Library System Enhancing the Role of Metadata

Graph-based Automatic Suggestion of Relationships among Images of Illuminated Manuscripts

An Ontology to Make the DELOS Reference Model and the 5S Model Interoperable

An Architecture to Share Metadata among Geographically Distributed Archives

A Distributed Digital Library System Architecture for Archive Metadata

Handling Hierarchically Structured Resources Addressing Interoperability Issues in Digital Libraries

Modeling Archives by means of OAI-ORE

Integrating Multi-dimensional Information Spaces

Fausto Giunchiglia and Mattia Fumagalli

An Annotation Tool for Semantic Documents

Process Mediation in Semantic Web Services

6S: Adding a Semantic Model to 5S Framework

Archivists Toolkit: Description Functional Area

Information retrieval concepts Search and browsing on unstructured data sources Digital libraries applications

Extending the Facets concept by applying NLP tools to catalog records of scientific literature

Annotations: Enriching a Digital Library

Improving Information Retrieval Effectiveness in Peer-to-Peer Networks through Query Piggybacking

Introduction to Sets and Logic (MATH 1190)

A Generic Framework for Realizing Semantic Model Differencing Operators

Distributed Objects with Sense of Direction

Document Title Ingest Guide for University Electronic Records

A Mechanism for Sequential Consistency in a Distributed Objects System

Sharing Data on the Aquileia Heritage: Proposals for a Research Project

Open Archives Initiatives Protocol for Metadata Harvesting Practices for the cultural heritage sector

A Framework for Enforcing Constrained RBAC Policies

Descendants, Ancestors, Children and Parent: A Set-Based Approach to Efficiently Address XPath Primitives

A Collaboration Model between Archival Systems to Enhance the Reliability of Preservation by an Enclose-and-Deposit Method

Mining High Order Decision Rules

Digital Library Interoperability at High Level of Abstraction

Johns Hopkins Math Tournament Proof Round: Point Set Topology

SOME CONCEPTUAL DATA MODELS FOR RECORDS AND ARCHIVES

A GRAPH FROM THE VIEWPOINT OF ALGEBRAIC TOPOLOGY

Comparative Analysis of Architectural Views Based on UML

Automatic Detection of Access Control Vulnerabilities in Web Applications by URL Crawling and Forced Browsing

TagFS Tag Semantics for Hierarchical File Systems

The Horizontal Splitter Algorithm of the Content-Driven Template- Based Layout System

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1

Exploring the Concept of Temporal Interoperability as a Framework for Digital Preservation*

Implementing Trusted Digital Repositories

Taming Rave: How to control data collection standards?

Automation of Semantic Web based Digital Library using Unified Modeling Language Minal Bhise 1 1

Applying the Annotation View on Messages for Discussion Search

An Archiving System for Managing Evolution in the Data Web

The InterPARES Glossary

SEMANTIC WEB POWERED PORTAL INFRASTRUCTURE

Configuration Management for Component-based Systems

Controlling user groups in traffic

Efficient, Scalable, and Provenance-Aware Management of Linked Data

1 Version management tools as a basis for integrating Product Derivation and Software Product Families

HYBRID PETRI NET MODEL BASED DECISION SUPPORT SYSTEM. Janetta Culita, Simona Caramihai, Calin Munteanu

Developing Software Applications Using Middleware Infrastructure: Role Based and Coordination Component Framework Approach

Resource and Service Trading in a Heterogeneous Large Distributed

Model Driven Development of Context Aware Software Systems

Driving Vision Systems by Communication

Extending E-R for Modelling XML Keys

Content Management for the Defense Intelligence Enterprise

A Top-Down Visual Approach to GUI development

An Agent Modeling Language Implementing Protocols through Capabilities

Graph Theory for Modelling a Survey Questionnaire Pierpaolo Massoli, ISTAT via Adolfo Ravà 150, Roma, Italy

Bridging Versioning and Adaptive Hypermedia in the Dynamic Web

Two interrelated objectives of the ARIADNE project, are the. Training for Innovation: Data and Multimedia Visualization

SDMX self-learning package No. 3 Student book. SDMX-ML Messages

The InterPARES Glossary

1 Executive Overview The Benefits and Objectives of BPDM

Terminology Management Platform (TMP)

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended.

Introducing MESSIA: A Methodology of Developing Software Architectures Supporting Implementation Independence

critically examined in the Federal Archives.

Integrating Systems and Software Engineering Concepts in AP-233

Studying conceptual models for publishing library data to the Semantic Web

Digital Archivists in Practice: A Preliminary Survey

Towards a Semantic Web Enabled Representation of DL Foundational Models: The Quality Domain Example

EFFICIENT INTEGRATION OF SEMANTIC TECHNOLOGIES FOR PROFESSIONAL IMAGE ANNOTATION AND SEARCH

META-SHARE : the open exchange platform Overview-Current State-Towards v3.0

From The European Library to The European Digital Library. Jill Cousins Inforum, Prague, May 2007

Taxonomies and controlled vocabularies best practices for metadata

Good morning. The MoMA Archives has been very busy this past year and before I begin my main presentation I want to shamelessly promote two

Creating a National Federation of Archives using OAI-PMH

Connected Components of Underlying Graphs of Halving Lines

Museums in the international information infrastructure. Do we know what will happen? And what are we going to do?

The Splitter Algorithm of the Content-Driven Template-Based Layout System

Semantic integration by means of a graphical OPC Unified Architecture (OPC-UA) information model designer for Manufacturing Execution Systems

Ontology Matching with CIDER: Evaluation Report for the OAEI 2008

RAMSES: a Reflective Middleware for Software Evolution

Access Control for Shared Resources

Generic and Domain Specific Ontology Collaboration Analysis

Final Report. Phase 2. Virtual Regional Dissertation & Thesis Archive. August 31, Texas Center Research Fellows Grant Program

Java-MOP: A Monitoring Oriented Programming Environment for Java

Archives in a Networked Information Society: The Problem of Sustainability in the Digital Information Environment

Inheritance Metrics: What do they Measure?

Designing a System Engineering Environment in a structured way

Assessment of product against OAIS compliance requirements

Improving Adaptive Hypermedia by Adding Semantics

T. Biedl and B. Genc. 1 Introduction

Semantics via Syntax. f (4) = if define f (x) =2 x + 55.

Change Management Process on Database Level within RUP Framework

Transcription:

Digital Archives: Extending the 5S model through NESTOR Nicola Ferro and Gianmaria Silvello Department of Information Engineering, University of Padua, Italy {ferro, silvello}@dei.unipd.it Abstract. Archives are an extremely valuable part of our cultural heritage. Although their importance, the models and technologies that have been developed over the past two decades in the Digital Library (DL) field have not been specifically tailored on archives and this is especially true when it comes to formal and foundational frameworks, as the Streams, Structures, Spaces, Scenarios, Societies (5S) model is. Therefore, we propose an innovative formal model, called NEsted SeTs for Object hierarchies (NESTOR), for archives, using it to extend the 5S model in order to take into account the specific features of the archives and to tailor the notion of digital library accordingly. 1 Motivation Over the past two decades, Digital Libraries (DLs) have been steadily evolving and have been shaping the way in which people and institutions access to and interact with our cultural heritage, study, and learn. Nowadays, their reach goes far beyond what has been the realm of traditional libraries and encompasses also other kinds of cultural heritage institutions, such as archives and museums. In particular, this work focuses on archives; an archive is not simply constituted by a series of objects that have been accumulated and filed with the passing of time as it usually happens with libraries that collect, for example, individual published books, journals, and serials. Instead, it represents the trace of the activities of a physical or juridical person in the course of their business which is preserved because of their continued value. DLs benefit from the existence of sophisticated formal models, such as the Streams, Structures, Spaces, Scenarios, Societies (5S) model [4], which allow us to formally describe them and to prove their properties and features. Notwithstanding the importance of the archives, so far, there has been no attempt to develop a dedicated formal model, built around their peculiar constituents, such as the notion of archival bond. We can neither exploit the 5S model as it is for archives because, as we will discuss later on, it needs some kind of extension and tailoring. We think that the archival domain deserves a formal theory as well and that this theory has to be reconciled with the more general theories for digital libraries in order to disclose to archives the full breadth of methodologies and technologies which have been developed over the last two decades in the DL field. To this

purpose we proposed a formal model for archives, built around the notion of archival bond and hierarchy: the NEsted SeTs for Object hierarchies (NESTOR) model [1]. Furthermore, we exploit NESTOR to formally extend the 5S model in order to be capable of defining a digital archive as a specific case of digital library able to take into consideration the peculiar features of the archives. The paper is organized as follows: in Section 2 we provide some background on archives and the 5S formal model. In Section 3 we present the basics of the NESTOR model and in Section 4 we introduce our extension to the 5S model via NESTOR. Finally, in Section 5 we draw some final remarks. 2 Related Work 2.1 Digital Archives In an archive the context and the relationships between the documents are preserved thanks to the hierarchical organization of the documents inside the archive. Indeed, an archive is divided by fonds and then by sub-fonds and then by series and so on; at every level we can find documents belonging to a particular division of the archive or documents describing the nature of the considered level of the archive. The union of all these documents, the relationships and the context information permits the full informational power of the archival documents to be maintained. The archival documents are analyzed, organized, and recorded by means of the archival descriptions that have to reflect the peculiarities of the archive. In the digital environment archival descriptions are encoded by the use of metadata; these need to be able to express and maintain the structure of the descriptions and their relationships [3]. The standard format of metadata for representing the hierarchical structure of the archive is the Encoded Archival Description (EAD) 1, which reflects the archival structure and holds relations between entities in an archive. On the other hand, an archive is described by means of a unique EAD file and this may be problematic when we need to access and exchange archival metadata with a variable granularity [2]. 2.2 The 5S Model The Streams, Structures, Spaces, Scenarios, Societies (5S) [4] is a formal model and draws upon the broad DL literature in order to have a comprehensive base of support. It has been developed largely bottom up, starting with key definitions and with elucidation of the DL concepts from a minimalist approach. It is built around five main concepts: streams are sequences of elements of an arbitrary type, e.g. bits, character, images, and so on; 1 http://www.loc.gov/ead/

structures specify the way in which parts of a whole are arranged or organized, e.g. hypertexts, taxonomies, and so on; spaces are sets of objects together with operations on those objects that obey certain constraints, e.g. vector spaces, probabilistic spaces, and so on; scenarios are sequences of related transition events, for instance, a story that describes possible ways to use a system to accomplish some functions that a user desires; societies are sets of entities and relationships between them, e.g. humans, hardware and software components, and so on. Starting from these five main concepts, it provides a definition for a minimal DL which is constituted by: (i) a repository of digital objects; (ii) a set of metadata catalogs containing metadata specifications for those digital objects; (iii) a set of services containing at least services for indexing, searching, and browsing; and, (iv) a society. While these broad concepts can be in common also with archives, when you look at the specific way in which they are formally defined, you realize that the definitions cannot be straightforwardly applied to the archives case without at least some extension. We will discuss this in further details, presenting an extension of 5S via NESTOR in Section 4. 3 The Basics of the NESTOR Formal Model We define both Nested Sets Model (NS-M) and Inverse Nested Sets Model (INS-M) in terms of the set theory as a collection of subsets where specific conditions must hold. Definition 1 Let A be a set and let C be a collection of subsets of A. Then C is a Nested Sets Collection (NS-C) if: A C, (3.1) H, K C H K H K K H. (3.2) Therefore, we define a NS-C as a collection of subsets where two conditions must hold. The first condition (3.1) states that set A which contains all the subsets of the collection must belong to the NS-C itself. The second condition states the intersection of every couple of sets in the NS-C is not the empty-set only if one set is a proper subset of the other one. Now we can introduce the Inverse Nested Sets Collection (INS-C) which defines the INS-M: Definition 2 Let A be a set and let C be a collection. Then, C is an Inverse Nested Sets Collection (INS-C) if:!b C K C, B K, (3.3) H, K, L C H K, L K (L K = H L) (H L) (L H). (3.4)

We define an INS-C as a collection of subsets where two conditions must hold. The first condition (3.3) states that C must contain the bottom set B, which is the common subset of all the sets in C. The second condition (3.4) states that if we consider three sets K, H, and L such that H is a subset of K and K is not equal to L, then the intersection between L and K is not the same as the intersection between H and L or H is not a subset of L and vice versa. 4 Extending the 5S Model via NESTOR The notion of descriptive metadata specification 2 (definition 14 [4, p. 293]) is suitable either to represent, for each archival division, a descriptive metadata e.g. a metadata describing a serie, a sub-fonds, or an archival unit or to represent the archive as a whole, as it happens in the case of EAD. When it comes to the definition of metadata catalog (definition 18 [4, p. 295]), there is no means to impose a structure over the descriptive metadata in the catalog. Therefore, if you use separate descriptive metadata specifications for each archival division, as in the former case, this would prevent the possibility of expressing the relationships between these archival divisions, i.e. you would loose the possibility of retaining the archival bond. Moreover, in a metadata catalog, there is no means to associate (sub-)parts of the descriptive metadata specifications to the digital objects (definition 16 [4, p. 294]) that they describe, but you can only associate a whole descriptive metadata to a whole digital object. Therefore, if you represent an archive as a whole with a single descriptive metadata specification, as in the latter case, it would not be possible to associate (sub-)parts of that descriptive metadata to the different digital objects corresponding to the various archival divisions. Our extension to the 5S model is thus organized as follows: using the notion of structure (definition 2 [4, p. 288]), we introduce the notion of NESTOR structure, as a structure that complies with the constraints of NS-M or INS-M; using the notion of metadata catalog, we introduce the notion of NESTOR metadata catalog, as a metadata catalog that exploits a NESTOR structure to retain the archival bonds; using the notion of digital library (definition 24 [4, p. 299]), we introduce the notion of digital archive, as a digital library where at least one of the metadata catalogs is a NESTOR metadata catalog. Definition 3 Let C be a Nested Set Collection (NS-C) on a set A. A NS-M structure(a) is a structure (NS-G, L, F), where L is a set of label values, F is a labeling function, and NS-G = (V, E) is a directed graph where v j V,! J C e j,k E,! J, K C K J. 2 In this section, we use italics for highlighting definitions taken from the 5S model.

Definition 4 Let C be an Inverse Nested Set Collection (INS-C) on a set A. A INS-M structure(a) is a structure (INS-G, L, F), where L is a set of label values, F is a labeling function, and INS-G = (V, E) is a directed graph where v j V,!J C e j,k E,!J, K C J K. Definition 3 applies definition 1, ensuring that the resulting structure complies with the NS-M. Note that the set of label values L and the labeling function F are not strictly needed for the NS-M, but they can be useful in the context of the 5S and this feature, in turn, may extend the NS-M with semantic possibilities. Similarly, definition 4 applies definition 2. Definition 5 Given a set A, a NESTOR structure(a) is either a NS-M structure(a) or a INS-M structure(a). The definition of metadata catalog in the 5S model can be expressed as follows. Let H be a set of handles to digital objects and M a set of descriptive metadata specifications, then a metadata catalog is a function DM : H 2 M. Definition 6 Let H be a set of handles to digital objects and M a set of descriptive metadata specifications, a metadata catalog DM is a NESTOR metadata catalog if: h i H M i 2 M DM(h i ) = M i M i = 1 (4.1) NESTOR structure(m) (4.2) Condition 4.1 imposes that, if exists, there is only one descriptive metadata specification for a given digital object because, in the archival practice, every single metadata describes a unique archival division, being it a level in the archive or a digital object [5]. Condition 4.2 ensures that the relationships among the different archival divisions are compliant with the descriptive metadata specifications in M. Definition 7 A digital archive (R, DM, Serv, Soc) is a digital library where R is a repository; at least one of the metadata catalogs in the set of metadata catalogs DM is a NESTOR metadata catalog; Serv is a set of services containing at least services for indexing, searching, and browsing; Soc is a society. Definition 7 extends the definition of digital library in the 5S model requiring that at least one of the metadata catalog is a NESTOR one, i.e. there exists at least on metadata catalog capable of retaining the archival bonds.

5 Final Remarks The definition of digital archive we gave in this paper has a couple of consequences. Firstly, more NESTOR metadata catalogs can be present in the same digital archive, thus giving the possibility of expressing different archival descriptions over the same set of digital objects. This extends the current practice in which a system for managing an archive is usually capable of managing only one description of the archive, thus giving only one point-of-view on the held material. Secondly, you can mix NESTOR and not-nestor metadata catalogs which allows for seamlessly integration of different visions of the managed digital objects within the same digital archive. This opens up the possibility of exploiting the whole breadth of methodologies and tools available in the DL field with the archives. Future work will concern the formal definition of creation, deletion, update, and search operations on digital archives via NESTOR. This, in turn, will open up the possibility to further extend the 5S model. Indeed, according to it, a minimal digital library has to offer, at least, indexing, searching, and browsing services [4, p. 299]. Acknowledgments CULTURA 3 (Grant agreement no. 269973) and the PROMISE network of excellence 4 (Contract n. 258191) projects, as part of the 7th Framework Program of the European Commission, have partially supported the reported work. References 1. A. Agosti, N. Ferro, and G. Silvello. The NESTOR Framework: Manage, Access and Exchange Hierarchical Data Structures. In Proceedings of the 18th Italian Symposium on Advanced Database Systems, pages 242 253. Società Editrice Esculapio, Bologna, Italy, 2010. 2. N. Ferro and G. Silvello. A Methodology for Sharing Archival Descriptive Metadata in a Distributed Environment. In B. Christensen-Dalsgaard et al., editor, Proc. 12th European Conference on Research and Advanced Technology for Digital Libraries (ECDL 2008), pages 268 279. Lecture Notes in Computer Science (LNCS) 5173, Springer, Heidelberg, Germany, 2008. 3. A. J. Gilliland-Swetland. Enduring Paradigm, New Opportunities: The Value of the Archival Perspective in the Digital Environment. Council on Library and Information Resources, Washington, DC, USA, 2000. 4. M. A. Gonçalves, E. A. Fox, L. T. Watson, and N. A. Kipp. Streams, Structures, Spaces, Scenarios, Societies (5S): A Formal Model for Digital Libraries. ACM Transactions on Information Systems (TOIS), 22(2):270 312, April 2004. 5. International Council on Archives. ISAD(G): General International Standard Archival Description, 2nd edition. Ottawa: International Council on Archives, March 1999. 3 http://www.cultura-strep.eu/ 4 http://www.promise-noe.eu/