Lessons from the implementation

Similar documents
Thomas Ledoux Project manager, IT Department

Dynamic management of digital preservation

Session Two: OAIS Model & Digital Curation Lifecycle Model

Legal Deposit of Online Newspapers at the BnF - Clément Oury - IFLA PAC Paris 2012

Copyright 2008, Paul Conway.

Meeting researchers needs in mining web archives: the experience of the National Library of France

Persistent identifiers, long-term access and the DiVA preservation strategy

Agenda. Bibliography

August 14th - 18th 2005, Oslo, Norway. Web crawling : The Bibliothèque nationale de France experience

Data Curation Handbook Steps

Defining Economic Models for Digital Libraries :

Certification Efforts at Nestor Working Group and cooperation with Certification Efforts at RLG/OCLC to become an international ISO standard

MAPPING STANDARDS! FOR RICHER ASSESSMENTS. Bertram Lyons AVPreserve Digital Preservation 2014 Washington, DC

DRS Policy Guide. Management of DRS operations is the responsibility of staff in Library Technology Services (LTS).

Protecting Future Access Now Models for Preserving Locally Created Content

<goals> 10/15/11% From production to preservation to access to use: OAIS, TDR, and the FDLP

Developing a Research Data Policy

ISO Self-Assessment at the British Library. Caylin Smith Repository

Preserving French Scientific Data. Olivier Rouchon Sun PASIG Malta June 25th 2009

Digital Preservation Standards Using ISO for assessment

Conducting a Self-Assessment of a Long-Term Archive for Interdisciplinary Scientific Data as a Trustworthy Digital Repository

UNT Libraries TRAC Audit Checklist

Susan Thomas, Project Manager. An overview of the project. Wellcome Library, 10 October

Strategy for long term preservation of material collected for the Netarchive by the Royal Library and the State and University Library 2014

Preservation and Access of Digital Audiovisual Assets at the Guggenheim

Archiving the Web: What can Institutions learn from National and International Web Archiving Initiatives

Preservation Health Check: introduction to the pilot

Importance of cultural heritage:

Swedish National Data Service, SND Checklist Data Management Plan Checklist for Data Management Plan

Digital Preservation DMFUG 2017

Trusted Digital Repositories. A systems approach to determining trustworthiness using DRAMBORA

Big Data, exploiter de grands volumes de données

DataFlow and VIDaaS Workshop

31 March 2012 Literature Review #4 Jewel H. Ward

Certification. F. Genova (thanks to I. Dillo and Hervé L Hours)

The distant future of the recent past

Robin Dale RLG

Developing a social science data platform. Ron Dekker Director CESSDA

An overview of the OAIS and Representation Information

MetaArchive Cooperative TRAC Audit Checklist

State Government Digital Preservation Profiles

Building a Digital Repository on a Shoestring Budget

Interoperability & Archives in the European Commission

OAIS: What is it and Where is it Going?

Ensuring Proper Storage for Earth Science Data: The USGS Process to Certify Trusted Digital Repositories

Introduction to Digital Preservation. Danielle Mericle University of Oregon

Building for the Future

Promoting Open Standards for Digital Repository. case study examples and challenges

DIGITAL ARCHIVES & PRESERVATION SYSTEMS

Deliverable 6.4. Initial Data Management Plan. RINGO (GA no ) PUBLIC; R. Readiness of ICOS for Necessities of integrated Global Observations

Report on compliance validation

A Model for Managing Digital Pictures of the National Archives of Iran Based on the Open Archival Information System Reference Model

Archives in a Networked Information Society: The Problem of Sustainability in the Digital Information Environment

What do you do when your file formats become obsolete? Lydia T. Motyka Florida Center for Library Automation USETDA 2011

Harvesting digital newspapers at the Bibliothèque nationale de France

A Micro-Services-Based Approach for Curation and Preservation Solutions

David Minor UC San Diego Library Chronopolis Preservation Network

Safeguarding Digital Heritage through Sustained Use of Legacy Software

Preservation at scale

The ESA CASPAR Scientific Testbed and the combined approach with GENESI-DR

Digital Preservation Policy. Principles of digital preservation at the Data Archive for the Social Sciences

Digital Preservation with Special Reference to the Open Archival Information System (OAIS) Reference Model: An Overview

DRS Update. HL Digital Preservation Services & Library Technology Services Created 2/2017, Updated 4/2017

Selecting the Right Method

XFDU packaging contribution to an implementation of the OAIS reference model

Can a Consortium Build a Viable Preservation Repository?

Emory Libraries Digital Collections Steering Committee Policy Suite

Metadata and Encoding Standards for Digital Initiatives: An Introduction

Geospatial Records and Your Archives

DRI: Preservation Planning Case Study Getting Started in Digital Preservation Digital Preservation Coalition November 2013 Dublin, Ireland

Cyber Threat Intelligence Debbie Janeczek May 24, 2017

Woodson Research Center Digital Preservation Policy

Digital Preservation and The Digital Repository Infrastructure

Document Title Ingest Guide for University Electronic Records

Striving for efficiency

Enabling Collaboration for Digital Preservation

GEOSS Data Management Principles: Importance and Implementation

Presented By. Chris Lacinak

Managing Records in Electronic Formats. An Introduction

Digital Preservation at NARA

NEW YORK PUBLIC LIBRARY

A Collaboration Model between Archival Systems to Enhance the Reliability of Preservation by an Enclose-and-Deposit Method

Realizing the Army Net-Centric Data Strategy (ANCDS) in a Service Oriented Architecture (SOA)

Defining OAIS requirements by Deconstructing the OAIS Reference Model Date last revised: August 28, 2005

JISC WORK PACKAGE: (Project Plan Appendix B, Version 2 )

Research Data Management and Institutional Repositories

Accessioning. Kevin Glick Yale University AIMS

Edinburgh DataShare: Tackling research data in a DSpace institutional repository

Survey of Research Data Management Practices at the University of Pretoria

Table of Contents ARCHIVAL CONTENT STANDARD 7. Kris Kiesling. Cory L. Nimer. Kelcy Shepherd. Katherine M. Wisser. Aaron Rubinstein.

Non-text theses as an integrated part of the University Repository

Preserving Electronic Mailing Lists as Scholarly Resources: The H-Net Archives

Data management and discovery

Applying Archival Science to Digital Curation: Advocacy for the Archivist s Role in Implementing and Managing Trusted Digital Repositories

An Institutional Approach to Developing Research Data Management Infrastructure

DRI: Dr Aileen O Carroll Policy Manager Digital Repository of Ireland Royal Irish Academy

DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM

University of British Columbia Library. Persistent Digital Collections Implementation Plan. Final project report Summary version

Trust and Certification: the case for Trustworthy Digital Repositories. RDA Europe webinar, 14 February 2017 Ingrid Dillo, DANS, The Netherlands

NDSA Web Archiving Survey

Transcription:

Bibliothèque nationale de France (BnF) Distributed Archiving & Preservation System Système de Préservation et d Archivage Réparti (SPAR) Lessons from the implementation Sun-PASIG Malta 24/06/2009 Distributed Archiving & Preservation System (SPAR) 1

Agenda Missions and context From theory to practice the model the expectations handling bad data lessons Management through risk analysis Focus shift Dealing with human skills 24/06/2009 Distributed Archiving & Preservation System (SPAR) 2

Missions of the libary Missions : to build up the collections, to preserve and communicate them to the public, to produce a reference catalog, to cooperate with other institutions, to participate to research programs. Legal deposit : legal deposit since 1537 for printed materials 1648:engravings and maps 1793: musical scores 1925: photos 1938: phonograms 1975: videograms 1992: electronic documents 2006: Web legal deposit 24/06/2009 Distributed Archiving & Preservation System (SPAR) 3

Digital archiving at BnF Production applications Dissemination applications Preservation digitization. Record management. wayback WEB Archiving 24/06/2009 Distributed Archiving & Preservation System (SPAR) 4

Decomposition in channels To deal with the variability and heterogeneity of the data, definition of channels build on the relation between the digital objects and the archival system, independently of any given organization: Preservation digitization Audiovisual material Negotiated legal deposit (dark Web, regional press) Automatic legal deposit (surface Web) Administrative production Deposit / Third party archiving Acquisition / Donation 24/06/2009 Distributed Archiving & Preservation System (SPAR) 5

Agenda Missions and context From theory to practice the model the expectations handling bad data lessons Management through risk analysis Focus shift Dealing with human skills 24/06/2009 Distributed Archiving & Preservation System (SPAR) 6

Each channel is formally defined by a reference package: description of the Service Level Agreement (SLA) includes human readable definition machine actionable parameters links to accepted formats at various levels of commitment (stored, identified, known, managed) links to the used tools Model 24/06/2009 Distributed Archiving & Preservation System (SPAR) 7

Channel description (sample) <sla:servicelevelagreement> <sla:header> <sla:channelidentifier>fil_num_cons_a</sla:channelidentifier> <sla:type>info:bnf/spar/context/channel#ingest</sla:type> </sla:header> <sla:packageattribute> <sla:minsize unit="kilobyte">10</sla:minsize> <sla:maxsize unit="gigabyte">40</sla:maxsize> <sla:maxnumberoffiles>3000</sla:maxnumberoffiles> <sla:packagecontent> <sla:formatcategory type="info:bnf/spar/representation#storedformat" order="deny,allow"> <sla:formatlist action="deny"><format>*</format></sla:formatlist> </sla:formatcategory> <sla:formatcategory type="info:bnf/spar/representation#managedformat" order="deny,allow"> <sla:formatlist action="allow"> <format type="ark">ark:/12148/ftiff_6_0w</format> </sla:formatlist> <sla:formatlist action="deny"><format>*</format></sla:formatlist> </sla:formatcategory> </sla:packagecontent> </sla:servicelevelagreement> 24/06/2009 Distributed Archiving & Preservation System (SPAR) 8

Expectations For the digitized collection, all formats are managed BUT we deal with 10 years of data creation historical choices changes in requirements changes in scope learning mechanisms 24/06/2009 Distributed Archiving & Preservation System (SPAR) 9

Bad HTML files Testing the formal method, we discover bad HTML files => not valid against the W3C validator BUT these files have been produced for 2 years displayed for 8 years Possible strategies: correct the data before ingestion accept bad data and plan for correction 24/06/2009 Distributed Archiving & Preservation System (SPAR) 10

Previous experience : SGML files Previous experience : production made on SGML format (special TEI profile) uncontrolled conversion made in XML (the current format at the time) at ingestion time, no way to display correctly the files need for archeological search, to come up with: the original XML files the original specifications (SGML DTD) the expected XML files the needed tools (SGML parsers) the required skills 24/06/2009 Distributed Archiving & Preservation System (SPAR) 11

Strategy applied The idea is to accept the files in a control way Lax the requirements on the channel (some identified formats) Use the ingest mechanisms to characterize the files as much as possible Formally determine the set of bad data Plan for curation with a well defined and documented migration plan Benefits the history of transformation is preserved, the decision are traced and the tools used are known 24/06/2009 Distributed Archiving & Preservation System (SPAR) 12

Lifecycle of an Archival Package (AIP) V0 DO V1 DO V2 DO V3 Destructive operation on a dataobject: update or deletion MD V3' DO V4 MD Digital original 1/ Update of metadata only (PDI, IR, etc.) : addition, update, deletion 2/ Addition of data-object Each version/edition has its own internal identifier The persistent identifier is unique, in ark: format V4' MD V4'' ark:/12148/ 24/06/2009 Distributed Archiving & Preservation System (SPAR) 13

Some lessons Lesson 1: never think your data is perfect Lesson 2: the model works!!! Lesson 3: migration is part of the preservation process keep the original data and trace the operations made 24/06/2009 Distributed Archiving & Preservation System (SPAR) 14

Instructions to use the original tool 24/06/2009 Distributed Archiving & Preservation System (SPAR) 15

Agenda Missions and context From theory to practice the model the expectations handling bad data lessons Management through risk analysis Focus shift Dealing with human skills 24/06/2009 Distributed Archiving & Preservation System (SPAR) 16

Risk management is at the core of preservation strategy Risk analysis Necessity for reassessing the risks on a regular basis Inclusion of the preservation system in the whole organization: sustainability of the economics add value to the service 24/06/2009 Distributed Archiving & Preservation System (SPAR) 17

Handling the transition Building a smooth transition in two phases ingesting and auditing direct access BUT dissemination has became critical in our ecosystem: access through Gallica cooperation with Europeana experimentation with publishers Risk to high to modify access at the moment 24/06/2009 Distributed Archiving & Preservation System (SPAR) 18

Focus shift to preservation repository The SPAR project focus shifts to ingesting data storing it safely and managing it It acts as a digital repository for preservation 24/06/2009 Distributed Archiving & Preservation System (SPAR) 19

Need for enhanced skills in the library Dealing with human skills preservation expert: digital archeologist, model and formats digital curator: management of digital collections computer scientists: advise on tools administrator: dealing with the data deluge manager: understanding of digital issues, endorsement of channels 24/06/2009 Distributed Archiving & Preservation System (SPAR) 20

Implementation Pre Ingest Ingest Administration Rights management Preservation Access Data management Storage SAS SPAR 24/06/2009 Distributed Archiving & Preservation System (SPAR) 21

Planning 2004 2005 2006 2007 2008 2009 2010 2011 2012 Study Infrastructure Study RFP Core part Other channels 24/06/2009 Distributed Archiving & Preservation System (SPAR) 22

BnF Thank you for your attention Questions? More information : http://bibnum.bnf.fr/spar Thomas Ledoux thomas.ledoux_at_bnf.fr 24/06/2009 Distributed Archiving & Preservation System (SPAR) 23