A methodology for Sharing Archival Descriptive Metadata in a Distributed Environment Nicola Ferro and Gianmaria Silvello Information Management Research Group (IMS) Department of Information Engineering University of Padua, Italy
Outline The Nature of Archives Network of Digital Archives Digital Libraries Technologies and Digital Archives Encoded Archival Description Metadata Format Nested Sets Methodology Conclusions
Outline The Nature of Archives Network of Digital Archives Digital Libraries Technologies and Digital Archives Encoded Archival Description Metadata Format Nested Sets Methodology Conclusions
Archives Archives keep the context and the network of relationships. Archives have a hierarchical structure: archival bond. Archival descriptions need to be able to express and maintain hierarchical structure and relationships.
Archival Descriptions The Interna(onal Council on Archives has developed a general standard for archival descrip3on called Interna3onal Standard for Archival Descrip3on (General) ISAD(G) Archival descrip3ons produced according to the ISAD(G) standard take the form of a tree which represents the rela3onships among more general and more specific archive units going from the root to the leaves of the tree. Reference. Interna3onal Council on Archives. ISAD(G): General Interna3onal Standard Archival Descrip3on, 2nd edi3on. OGawa: Interna3onal Council on Archives, 1999.
Outline The Nature of Archives Network of Digital Archives Digital Libraries Technologies and Digital Archives Encoded Archival Description Metadata Format Nested Sets Methodology Conclusions
Archival Descriptive Metadata Archival descriptive metadata should meet the following three main requisites: 1. Context: archival descriptive metadata have to retain information about the context of a given record. 2. Hierarchy: archival descriptive metadata have to reflect the archive organization which is described in a multi-leveled fashion. 3. Variable Granularity: archival descriptive metadata have to facilitate access to the requested items.
Network of Digital Archives Archive B Heterogeneity issues.! Archive A Archive C Archive E Archive D Archives have a fixed tree structure.! Archives must preserve their autonomy and independence.! Difficulties in exchanging archival information embedded in a tree hierarchy.!
Trees mapped into Sets Archive descriptions assume a tree structure. It is difficult to share trees between archives and to access a precise element of the tree without accessing the whole hierarchy.
Nested Sets Model Sub fonds Fonds Sub fonds Serie Serie Sub fonds Serie Serie Serie Serie Sets permit to access elements with a variable granularity. Throughout nested sets it is possible to express hierarchy and retain context information. An organization of nested sets is flexible and well-suited for a distributed environment.
Outline The Nature of Archives Network of Digital Archives Digital Libraries Technologies and Digital Archives Encoded Archival Description Metadata Format Nested Sets Methodology Conclusions
Digital Libraries DLSs are the technology of choice for managing the information resources of different kind of organizations. o The need for interoperability among different systems is a compelling issue o DELOS Reference Model. Europeana the European digital library, museum and archive is a 2-year project that will give users direct access to some 2 million digital objects. This figure is taken from Europeana leaflet available at: http://www.europeana.eu
OAI-PMH Open Archive Initiative promotes interoperability through OAI-PMH. Dublin Core metadata format is the lowest common denominator in OAI-PMH. OAI-PMH is the de-facto standard in metadata exchange. It is based on the distinction between two main components: Data and Service Provider.
OAI Sets OAIsets enable logical data partitioning by defining group of records. OAIsets are defined by three main components: 1. setspec 2. setname 3. setdesc OAIset organization may be flat or hierarchical. Harvesting procedures: incremental and selective harvesting. Harvesting from a set which has subsets will cause the repository to return metadata in the specified set and recursively from all its subsets.
Digital Libraries and Digital Archives The use of OAI-PMH is not widespread in the archival context. Dublin Core metadata format seems to flatten out the archive structure. EAD: Encoded Archival Description. EAD is a standard defined by The Library of Congress in partnership with the Society of American Archivists. EAD reflects and emphasizes ISAD(G).
Outline The Nature of Archives Network of Digital Archives Digital Libraries Technologies and Digital Archives Encoded Archival Description Metadata Format Nested Sets Methodology Conclusions
EAD Structure and Puzzles <ead> <eadheader> [...] </eadheader> <archdesc level= fonds > [...] <did> [...] </did> <dsc> [...] <c01> [...] </c01> <c01> [...] <c02> [...] </c02> </c01> </dsc> </archdesc> </ead> Automatic processing: Several degree of freedom in tagging practice. Levels: The level of description needs to be inferred by navigating the upper components. Size: Sharing and searching archival description might be made difficult by the high size of EAD and its deep hierarchical structure. User needs: Users are often interested in item-level information which is typically buried very deeply in the hierarchy and difficult to reach. Archival metadata requirements: EAD complies with both the context and hierarchy requirements but it disregards the variable granularity one.
Outline The Nature of Archives Network of Digital Archives Digital Libraries Technologies and Digital Archives Encoded Archival Description Metadata Format Nested Sets Methodology Conclusions
Benefits of the Nested Sets Methodology The methodology addresses the shortcoming of EAD when it was used in a distributed environment and with variable granularity access to the resources. EAD items are mapped into different DC metadata which are shareable and natively supported by OAI-PMH. Context and hierarchy are expressed in a straightforward manner exploiting native functionalities of OAI-PMH levering the role of OAISets. This approach keeps archival metadata independent of the original EAD file, without loosing any context information. This approach can be applied also independently of the EAD standard; indeed we can also create archival description metadata from scratch by exploiting OAI sets and DC records.
Nested Set Methodology
Nested Sets Methodology Internal nodes are mapped into sets.
Nested Sets Methodology
Outline The Nature of Archives Network of Digital Archives Digital Libraries Technologies and Digital Archives Encoded Archival Description Metadata Format Nested Sets Methodology Conclusions
Conclusions We defined the requisites which must be satisfied in order to obtain shareable metadata and to retain all the fundamental characteristics of archival resources. We presented a methodology for creating shareable archival descriptive metadata which exploits the synergy between OAI-PMH and DC. This methodology opens archival description to be shared in a distributed environment. EAD metadata can be mapped into our methodology without losing information. The methodology can be applied backwards generating a new EAD file with a slightly different structure compared to the original one, but it brings the same informational content.
Conclusions Thank you! Questions? Gianmaria Silvello Department of Information Engineering University of Padova silvello@dei.unipd.it