The Choice For A Long Term Digital Preservation System or why the IISH favored Archivematica

Similar documents
University of British Columbia Library. Persistent Digital Collections Implementation Plan. Final project report Summary version

Susan Thomas, Project Manager. An overview of the project. Wellcome Library, 10 October

Digital Preservation Efforts at UNLV Libraries

Assessment of product against OAIS compliance requirements

Introduction to. Digital Curation Workshop. March 14, 2013 SFU Wosk Centre for Dialogue Vancouver, BC

DIGITAL ARCHIVES & PRESERVATION SYSTEMS

Its All About The Metadata

Introduction to Digital Preservation. Danielle Mericle University of Oregon

Digital Preservation DMFUG 2017

WEB-BASED COLLECTION MANAGEMENT FOR ARCHIVES

Assessment of product against OAIS compliance requirements

NEW YORK PUBLIC LIBRARY

CoSA & Preservica Practical Digital Preservation 2015/16. Practical OAIS Digital Preservation Online Workshop Module 2

The OAIS Reference Model: current implementations

WEB-BASED COLLECTION MANAGEMENT FOR LIBRARIES

The e-depot in practice. Barbara Sierman Digital Preservation Officer Madrid,

Current Digital Preservation Trends and SDB4 Features. Pauline Sinclair, Tessella, PASIG, Madrid, 5 July 2010

Preservation at scale

Transfers and Preservation of E-archives at the National Archives of Sweden

Archivematica user instructions

General Model of E-ARK Services

Digital Preservation and The Digital Repository Infrastructure

An overview of the OAIS and Representation Information

RODA 3 LONG-TERM DIGITAL PRESERVATION CHARACTERISTICS AND TECHNICAL REQUIREMENTS

Protection of the National Cultural Heritage in Austria

Agenda. Bibliography

Managing stakeholders and (creating) their expectations

Digital Preservation Standards Using ISO for assessment

PRESERVING DIGITAL OBJECTS

Selecting an Electronic Records Repository Platform

Managing Born- Digital Documents.

BHL-EUROPE: Biodiversity Heritage Library for Europe. Jana Hoffmann, Henning Scholz

Getting Bits off Disks: Using open source tools to stabilize and prepare born-digital materials for long-term preservation

Digital Preservation Workshop

Digital Preservation Workshop

Building for the Future

Digital preservation activities at the German National Library nestor / kopal

Mass Digitisation Enabling Access, Use and Reuse

SmarterMail v. Exchange: Admin Comparison

Different Aspects of Digital Preservation

FLAT: A CLARIN-compatible repository solution based on Fedora Commons

White Paper. Backup and Recovery Challenges with SharePoint. By Martin Tuip. October Mimosa Systems, Inc.

Digital Preservation: From Theory to Practice

Libraries and Disaster Recovery

Handling a Digital Backlog and Analyzing Content in Archivematica

Taking the plunge: digital archives at HSBC

Different approaches to digital preservation

CoSA & Preservica Practical Digital Preservation 2015/16. Practical OAIS Digital Preservation Online Workshop Module 1

Geospatial Records and Your Archives

Evolving the digital library for digital scholarship enablement

Applied Interoperability in Digital Preservation: Solutions from the E-ARK Project

DAITSS Demo Virtual Machine Quick Start Guide

Storage Made Simple: Preserving Digital Objects with bepress Archive and Amazon S3

The most comprehensive review and comparison of cloud storage services

Document Title Ingest Guide for University Electronic Records

Affordable digital preservation for libraries and museums

PREMIS in Archivematica

Migrating NetBackUp Data to the Commvault Data Platform

MAPPING STANDARDS! FOR RICHER ASSESSMENTS. Bertram Lyons AVPreserve Digital Preservation 2014 Washington, DC

CASE STUDY GLOBAL CONSUMER GOODS MANUFACTURER ACHIEVES SIGNIFICANT SAVINGS AND FLEXIBILITY THE CUSTOMER THE CHALLENGE

Product Overview Archive2Azure TM. Compliance Storage Solution Based on Microsoft Azure. From Archive360

NEDLIB LB5648 Mapping Functionality of Off-line Archiving and Provision Systems to OAIS

Institutional repositories: description of VITAL as an example of a Fedora-based digital assets management system.

DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM

Prolaborate User Guides: Administrator Guide

Data Curation Profile Human Genomics

ISO Self-Assessment at the British Library. Caylin Smith Repository

IT Enterprise Services. Capita Private Cloud. Cloud potential unleashed

B2SAFE metadata management

Case study on PhoneGap / Apache Cordova

Digital Preservation with Special Reference to the Open Archival Information System (OAIS) Reference Model: An Overview

Born digital Hull: early steps and lessons learnt (so far) Simon Wilson, Digital Archivist (AIMS Project)

THE CUSTOMER SITUATION. The Customer Background

Moving from a Paper to Paperless validation effort and how to get the most efficient mix of Manual vs. Automated testing.

Co-operative Development of a Long-term Digital Information Archive

Conducting a Self-Assessment of a Long-Term Archive for Interdisciplinary Scientific Data as a Trustworthy Digital Repository

Archives in a Networked Information Society: The Problem of Sustainability in the Digital Information Environment

BUILDING A NEW DIGITAL LIBRARY FOR THE NATIONAL LIBRARY OF AUSTRALIA

Working with a Preservation Software Vendor - The Kentucky Experience Glen McAninch

Implementing Trusted Digital Repositories

Digital Preservation in the Cloud Benefits and Considerations for State Archives Tuesday 10 Feb 2015 Preservica & Amazon Web Services

Building a government cloud Concepts and Solutions

SciX Open, self organising repository for scientific information exchange. D15: Value Added Publications IST

RavenDB & document stores

How to set up SQL Source Control The short guide for evaluators

Promoting Open Standards for Digital Repository. case study examples and challenges

Selecting the Right Method

Evaluator Group Inc. Executive Editor: Randy Kerns

PHP Composer 9 Benefits of Using a Binary Repository Manager

Linda Strick Fraunhofer FOKUS. EOSC Summit - Rules of Participation Workshop, Brussels 11th June 2018

30 April 2012 Comprehensive Exam #3 Jewel H. Ward

Practical Experiences with Ingesting Materials for Long-Term Preservation

Certification Efforts at Nestor Working Group and cooperation with Certification Efforts at RLG/OCLC to become an international ISO standard

Title Goes Here July 29, 2004

STORAGE FOR ELECTRONIC RECORDS. A Discussion

PID System for eresearch

Preserving PDF at the coalface

warcinfo: contains information about the files within the WARC response: contains the full http response

Move Beyond Primitive Drawing Tools with SAP Sybase PowerDesigner Create and Manage Business Change in Your Enterprise Architecture

DCH-RP Trust-Building Report

Transcription:

The Choice For A Long Term Digital Preservation System or why the IISH favored Archivematica At the beginning of 2017 the IISH decided to use Archivematica as a central system for long term preservation of its digital collections. This decision was made after a thorough comparison between different systems and solutions. This blog gives an overview of this process. Demands for the new digital preservation system In 2016 the IISH started a project aimed to choose a system for the long term preservation of digital collections. The original project plan provided an overview of the gaps within the IISH digital infrastructure the new system would have to cover. From a bird s eye perspective, the system had to be OAIS (https://en.wikipedia.org/wiki/open_archival_information_system) compliant and would enable the digital repository to reach Trusted Digital Repository (TDR) status (link :http://www.trusteddigitalrepository.eu/trusted%20digital%20repository.html) From a more practical point of view the most important requirements for the system were: Able to follow the archiving workflow, delivering clear messages of the status of individual processes Able to perform appraisal and selection Able to use PREMIS (https://www.loc.gov/standards/premis/)for administrative and technical metadata Able to provide access to born digital and digitized material Able to manage access rights, copyrights and permissions Able to automatically OCR and index content Able to automatically and manually assemble DIP s Able to handle both a lot of very small and some exceptionally big files Some general starting points: We aim for a modular system To work on the basis of migration, not emulation We aim for an open and transparent system everything must be exportable (no vendor lock-in) preference for open source or closed source software, access to enough technical support is more important We want to store born digital material exactly as we receive it longer use too many custom made IISH scripts, but be able to profit from technical support or community input. These requirements were used to create a first selection of suitable products. A first selection A first selection of suitable preservation systems was made by the using the Powrr (Preserving digital Objects With Restricted Resources) tool evaluation grid which offers an exhaustive survey of digital preservation tools: http://digitalpowrr.niu.edu/tool-grid One of the strong points of this tool grid is that it takes OAIS functionality as a starting point. Based on the Powrr grid the IISH selected eight products for further inspection: Archivematica, Fedora, Preservica, Rosetta, Scope, Goobi, Roda and Vital.

The products were investigated on the basis of online documentation, demos, and user reviews. Of these eight choices, a few were taken off the list quite quickly. Goobi was more focused on digitized collections and not suitable for born digital. Roda seemed to be a good product, but lacked a wide user community. Vital and Resetta were too connected to (commercial) products, which we didn t want to use. Scope was out because we couldn t get a good impression of its users or performance. Fedora was longer on our list, being a widely used and a stable system. Fedora was installed but didn t have all the out-of-the-box functionalities that both Archivematica and Preservica had. And then there were two: Archivematica and Preservica So, for us, only Archivematica and Preservica remained as serious candidates. To make a well informed choice between these products both were quite intensely tested and compared. Both companies behind the Archivematica and Preservica Artefactual (https://www.artefactual.com/) and Preservica Digital Preservation (https://preservica.com/) respectively paid the IISH a visit to better match the institute s requirements with the functionality requirements and the functionality of the products. Both systems were installed on IISH servers to thoroughly test how it would work and interact with other systems. This resulted in a list of selection criteria of which you can find a selection in the - still comprehensive - table below. Concerning Preservica it is important to note that the IISH tested the enterprise version and not the cloud version. This is because of the IISH collection policy which states that collections have to fall under Dutch jurisdiction and therefore cloud storage outside the Netherlands is impossible. Also important to notice is that this comparison was made in the beginning of 2017 between version 1.4 of Archivematica and 5.7 of Preservica. Some of the conclusions might therefor be outdated regarding new versions of both products. Archivematica 1.4 Preservica Enterprise version 5.7 Financial Policy: open source vs closed source software Flexibility New demands - Artefactual developement to help implement Archivematica - Optional Artefactual yearly support contract: $25.000 In principle the software is totally adaptable, but of course developments have to be in line with the Archivematica releases of the maintainer (Artefactual). Open source, costs of development can sometimes be shared with other users and be part of the official release. - Implementation service package (one-time) - Yearly licence costs, including (developer s) support up to 100.000 a year for the enterprise version. - New versions of Preservica are at additional cost Even if it is closed source the Preservica SDK offers the possibility to adapt workflows and to connect with other software. New demands can be proposed to the Preservica User Group (http://preservica.com/preservica-usergroup/). These might or might not be implemented by Preservica. Some functionalities can sometimes be custom developed. But this is only applicable to that which does not belong to the core application. As the customer base is quite large, and growing, the influence of an individual institution is small. So if you want

IISG software policy (mostly open source, importance of having grip on IT processes, not too many custom made IISH scripts/tools) Exit strategy: Is the AIP (including all metadata) independent of the archival software? Can all information be exported? Technique Application code language Construction of the software Performance How to deal with the ingest of a lot of small files or big files. Infrastructure: Integration with IISH systems Storage system(s) and backup Connection with IISH acquisition database Possibility to connect to PID Handle system Metadata systems (Evergreen, "EAD" via X METAL) Archivematica will fit well into the broader IISH infrastructure and will contribute to transparent and well managed IT processes. It might cost a bit more in development costs to adapt the system specifically to IISH needs., Archivematica natively uses the standards METS and PREMIS. The AIP can always be exported. Python - Modular - The workflow is run by the use of microservices; small software tools with a specific task which can be switched on or off and configured individually. The microservices are connected by use of the Gearman application framework. This is dependent on the hardware used during ingest and the smart configuration of the workflow. This has not yet been thoroughly tested. The Archivematica storage service connects every pipeline to external storage. Standard connection is possible with Archivespace and Atom, for other systems development is necessary., development necessary. - t standard, development necessary. - Archivematica has a standard connection to Archivespace or Atom. to realize some new piece of functionality within the core application you will to have to forge coalitions with other institutions to really get the point across. Though the SDK gives the user a lot of the workflows and connection to the IISH infrastructure, there is some doubt as to whether the IISH can truly get a grip on the IT processes within Preservica and on the future development of the product. Preservica is more of a finished product, but it would still need development work to connect it to other IISH applications.. Preservica internally uses its own XIP metadata, but this can be exported to METS and PREMIS. This is part of the software so no custom work is necessary. Java - Preservica is a so called "Boiler plate" application: standard and logically build, using open source libraries. - Preservica also makes use of the microservice structure and Gearman. Idem. Storage adapters connect to a great variety of storage systems. N/A. Preservica begins with the ingest of the SIP., development necessary. - Can make use of Preservica sync workflow for an update from catalogue to AIP metadata update (catalogue sync). - Preservica has a standard connection to Archivespace.

Future systems: IIIF and rights management system t standard, development necessary. t standard, development necessary. Permissions How granular are access rights in the system? By default the only difference that is made is between an "active" and admin user. LDAP support t standard, development necessary. Different roles (which can be expanded) can be combined with different access rights. IISH preferred standards used PREMIS, but conversion to PREMIS is supported. METS, but conversion to METS is supported. Workflows Amount of freedom to configure a workflow Highly adaptable. Tools can be added to the default tools and workflows can be customized. Workflows are XML based and can be customized. It is not possible to add your own tools. Is the possible to monitor the workflow and receive messages? How well will the product fit with existing IISH workflows? Pre-ingest Is appraisal and selection possible? Transfer of offloaded files to a pre-sip staging area? Offers the software any solutions for the transfer of files from the archival donor or scanning service to the archive? Ingest configure error messages? configure if the ingest process stops after an error message? configure which tools are active for Expectation is that existing workflows can be translated to Archivematica without much trouble.. Version 1.7 has a special tab for appraisal and selection included., via the transfer tab. A test with the IISH digitisation workflow made clear that is was relatively simple to translate an existing workflow to a Preservica workflow.. Preservica functionality starts with the ingest of a SIP. This can be created by a separate SIP creator application (made by Preservica).

different workflows? add your own ingest tools? Can the system handle long file names and deep directory structures?) Can the system handle non-western languages (Unicode support) Are we able to decide to which file formats we want to normalise? Post-ingest supplement or to change an AIP? Can an AIP be deleted? Possibility for reporting on processed AIP's, DIP's, formats, preservation actions, etc Dissemination Does the product disseminates the content?, via normalization rules and commands (under the Preservation planning tab).. This comes from the principle idea that an AIP should never change. If, for instance, during a preservation action new preservation copies are made, the AIP with the new files will have to be re-ingested again., but only by someone with the correct access rights., but only to a certain degree. This was not thoroughly tested. Archivematica itself doesn t offer access to the material. The DIP is placed on a server from which other access applications will have to deal with the dissemination., via the normalization workflow and the migration pathways., but only for descriptive metadata. The objects and other metadata cannot be changed. Also, a re-ingest is necessary if files are added to the AIP., but only by someone with the correct access rights., Preservica by default can produce all kinds of reports. Preservica itself doesn t offer access to the material. But Preservica can offer a separate dissemination web application. Can you give access on the basis of the access rights (specific user, reading room only) connect to a rights systems?. This has to be arranged outside of Archivematica., but only after development CMIS support, but only after development Nice to have functionality Email archiving, Archivematica can ingest different email formats as PST and MBOX (and if necessary normalize).. But there is CMIS support., but only after development, Preservica can ingest different email formats as PST and MBOX (and if necessary normalize). There is also a separate email processing workflow available.

Web archiving t a separate workflow for web, via a separate web archiving module. archiving, but it can ingest WARC files. OCR service Quality of the documentation t always easy to find and not always Exhaustive and up to date documentation. up to date. User friendliness SIP creation Part of the application. Also appraisal Happens in another application (SIP creator). and selection is possible. Ingest process - Start-up and following of the workflow Idem is a relatively intuitive process. - The ingest workflow is not connected to the browser session. Preservation planning A relatively intuitive process. A relatively intuitive process. Stability of the software Regular updates Stability company and community. Artefactual and a few of the bigger users are the driving force behind the updates. Archivematica has a healthy community of users. The stability of Artefactual is unknown. Research data Dataverse support, it is planned for 2017.. The user group has influence on the roadmap. Preservica seems to be a healthy company with a big clients and diverse client base. Has recently split off from Tessella as a daughter company. Community Forum (lively?) User group Nearby users Users from which domain mostly?, seems like a lively discussion forum: https://groups.google.com/forum/?fro mgroups#!forum/archivematica There is a user group in the UK: https://wiki.archivematica.org/commu nity/regional_user_groups, not really. Users in the UK and Germany (Berlin) are the closest. Universities Is a closed off online forum where experiences and plugins can be shared. Unknown how lively the user forum is.., National Archives of the Netherlands, Dutch regional archives (RACs - tenants of the NA) Archives (government, companies) There can be only be one: Archivematica As can be understood from all the criteria listed in the table above the choice was certainly not an easy one. Both products are able to offer complex OAIS preservation workflows, for a wide variety of materials and could both fit within the IISH infrastructure. They both meet the requirements mentioned above. t included in the above table were the internal arguments concerning how much time our developers would have to invest in the implementation/development and structural technical management of the two systems. As difficult as it was to predict this at the time the feeling was that it would be more or less the same for both. In the end, and in essence what made us choose Archivematica were the following points:

More functionality in the pre-ingest phase (the transfer tab). As a private archive we have little influence on how archives are transferred. Therefore the extra functionality that Archivematica offers for a first check and appraisal are very welcome. Lesser costs: of course Archivematica, as with all open source software, does not come for free. Even without being able to account for the total cost of ownership for Archivematicain advance, the difference with the yearly licence costs of the Preservica Enterprise product are considerable. For an organization the size of the IISH this is an important point. Preservica may be the more ready and finished product, however the differences are not so big that this could be for the IISH enough of an argument to choose Preservica over Archivematica. Besides, both systems will require the same (one-time) development costs to connect them to the other IISH systems. The company behind Archivematica has the advantage that they are willing to do all kinds of development work for us, and also more IISH custom work, while the Preservica company only wanted to work on the Preservica core software. This was an important argument as it would relieve some of the pressure on our small IT department. Although the choice for open source software was not a decisive argument in this case, the choice for Archivematica means that any money the IISH invests in new Archivematica functionalities will also benefit the rest of the Archivematica community. As the IISH is a publicly funded organisation this serves as an extra argument for Archivematica. On the whole Archivematica gave the feeling that it fit better in our IT infrastructure, that we could have more influence on its development and it would be easier to adapt to our needs.