Different approaches to digital preservation Hilde van Wijngaarden Digital Preservation Officer Koninklijke Bibliotheek/ National Library of the Netherlands www.kb.nl/e-depot
Digital preservation: p Safe storage p Preservation metadata p Permanent access
Safe storage: p Secure storage media p Seperating storage from access p Refreshment procedures p Back-up procedures p International standard: OAIS p Trusted depositories
Preservation metadata: p Content description p Specific preservation information: p Provenance p Rights p Technical metadata p File format information
Permanent access: p Rendering may become impossible due to obsolescence of soft- and hardware p Different strategies possible p Goal and audience have to be determined
Permanent access policy: p What kind of digital objects is the repository responsible for? p Fixed format texts, web resources, complex digital objects, datasets, programmes, p What do you want to render in the future? p Keep the original? p What is the original? p Offer extended functionalities? p How do you want to provide this access? p Options for the user? p Provide the software or give a recommendation?
Possible strategies: Processing the original: p Migration p Normalisation p Data-extraction Keeping the original: p Emulation p Encapsulation p Technology preservation (Hardware museum) p Re-engineering/Data recovery/digital archaeology
Migration p Hardware migration: refreshment p Transferring data to new carriers p Software migration: p Migrate to a new version of the same format p Migrate to another format p Migration at point of access Examples: p Dutch digital preservation testbed: Migration of wordprocessing documents p Scientific data archives like EROS, NASA,SDSC p Camileon: Migration-on-request
Even a simple conversion from WordPerfect to Word 97, shows how many differences can appear
Migration: Advantages p Conversion functionality supplied with software p Result has a format that is familiar to the user p New functionalities possible Disadvantages p Appearance changes p Errors occur p Meaning can be changed p If applied at point of expected obsolescence, everything has to be migrated, usually repeatedly p Migration at point of access may not be possible anymore at that time
Normalisation: Converting all objects into p One or more preferred formats p A chosen preservation format, for instance XML p A more generic format Normalisation is also used to describe data-extraction: Creating a logical description of the data, with tags Examples: p National Archives of Australia: Storing everything in XML p Universal Virtual Computer p Public Record Office Victoria: VERS
Normalisation: Advantages p A limited number of formats to maintain p Formats chosen have a higher chance of surviving longer p Using a logical description enhances the chances of future comprehension Disadvantages p (See migration) p Not flexible p Possible wrong choice of formats
Emulation: Recreating the behaviour of one computer on another Possibilities: p Hardware emulation p Software emulation p Emulation of an operating system p Emulation using an intermediate layer or virtual machine Examples: p Emulators for game computers p Universal Virtual Machine p Emulation Virtual Computer (Jeff Rothenberg)
Emulation: Advantages p Original file is kept accessible p Applicable to every sort of digital object, including programmes p One-time effort for large groups of digital objects Disadvantages p Never operationalised for digital preservation p Technological challenging p Result may not be what user wants
Encapsulation: Wrapping the content in a description Possibilities: p Including the original file in an XML document p Including links to software with the file in the description p Including the software itself Examples: p Archival Information Packages (AIP) that contain metadata and content files p VERS
Encapsulation: Advantages p Keeping options open through extensive descriptions Disadvantages p Updating metadata difficult p In fact: nothing has really been done yet, strategy still has to be chosen p Including (links to) software does not offer any guarantees
Technology preservation p Often referred to as a hardware museum p Saving everything: files, software and hardware and keep them alive p Maintanance almost impossible p Unworkable for larger quantities Re-engineering p Also called data recovery or digital archaeology p Saving the bits and restore their readability/usability p Labour intensive and technical challenging p The original is not available so no way to know how it should look like
Current choices: p Most repositories keep their options open p Migration usually preferred p Choices depend on sort of digital objects p Normalisation applied if content is considered the first priority p Encapsulation if context is important p Emulation (all thought not operational yet) for complex digital objects p Choices depend on state of R&D p Large scale migration not necessary yet because digital archiving is new p Hesitation about emulation because there is not a working example available
Strategies are not enough: we need tools that p Make a strategy possible (emulators, virtual machines) p Help choose a strategy p Help perform the strategy p Maintain the link between originals and conversions p Enable interoperability and co-operation between different repositories Tools have to be implemented p In the digital archiving system p In the digital archiving workflow
Any questions?