Mitigating Preservation Threats: Standards and Practices in the National Digital Newspaper Program

Similar documents
Building for the Future

Mark Sweeney Library of Congress 101 Independence Ave., SE Washington, DC

The Reporter A Newspaper Digitization Project. Jenni Salamon Coordinator, Ohio Digital Newspaper PRogram

NDNP: The Kentucky Edition

Preserving & Digitizing the Kent Tribune Newspaper. Sandy Halem, Kent Historical Society Jenni Salamon, Ohio History Connection

Digits Fugit or. Preserving Digital Materials Long Term. Chris Erickson - Brigham Young University

Importance of cultural heritage:

Frederick Zarndt Semblanza

Chronicling America: Historic American Newspapers. Jenni Salamon, Unit Manager, Digitization Bronwyn Benson, Quality Control Technician

Text below in italics is directly from the LAMP Digitization Project Principles (

Using DSpace for Digitized Collections. Lisa Spiro, Marie Wise, Sidney Byrd & Geneva Henry Rice University. Open Repositories 2007 January 23, 2007

The Case of the 35 Gigabyte Digital Record: OCR and Digital Workflows

Compound or complex object: a set of files with a hierarchical relationship, associated with a single descriptive metadata record.

SobekCM METS Editor Application Guide for Version 1.0.1

Digital Preservation DMFUG 2017

International Implementation of Digital Library Software/Platforms 2009 ASIS&T Annual Meeting Vancouver, November 2009

Searching Chronicling America

CCS Content Conversion Specialists. METS / ALTO introduction

Digital Preservation at NARA

MAPPING STANDARDS! FOR RICHER ASSESSMENTS. Bertram Lyons AVPreserve Digital Preservation 2014 Washington, DC

Archiving and Preserving the Web. Kristine Hanna Internet Archive November 2006

Metadata and Encoding Standards for Digital Initiatives: An Introduction

Wendy Thomas Minnesota Population Center NADDI 2014

Metadata. AMA Digitization Workshop. Elizabeth-Anne Johnson University of Manitoba 29 February 2016

The OAIS Reference Model: current implementations

Delivering your special collections online: Digitization and CONTENTdm

Leveraging High Performance Computing Infrastructure for Trusted Digital Preservation

Page Delivery Service User Guide

Introduction to Digital Preservation. Danielle Mericle University of Oregon

Post Digitization: Challenges in Managing a Dynamic Dataset. Jasper Faase, 12 April 2012

Handles at LC as of July 1999

DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM

UNT Libraries OAIS Information Package Specification

PROCESS HISTORY METADATA PEGGY GRIESINGER NATIONAL DIGITAL STEWARDSHIP RESIDENT MUSEUM OF MODERN ART DECEMBER 4 T H, 2014

GUIDELINES FOR CREATION AND PRESERVATION OF DIGITAL FILES

Digital Preservation Network (DPN)

Working with Islandora

ebooks Preservation at Scholars Portal Kate Davis & Grant Hurley Scholars Portal, Ontario Council of University Libraries

Applying Archival Science to Digital Curation: Advocacy for the Archivist s Role in Implementing and Managing Trusted Digital Repositories

BHL-EUROPE: Biodiversity Heritage Library for Europe. Jana Hoffmann, Henning Scholz

Data Curation Handbook Steps

CONTENTdm & The Digital Collection Gateway New Looks for Discovery and Delivery

1. Record tab 2. Card tab 3. Holdings tab 4. Routing tab

Search Framework for a Large Digital Records Archive DLF SPRING 2007 April 23-25, 25, 2007 Dyung Le & Quyen Nguyen ERA Systems Engineering National Ar

Metadata: The Theory Behind the Practice

Humanities GIS in Japan: Current Status, Models and Tools

Practical Experiences with Ingesting Materials for Long-Term Preservation

Managing Records in Electronic Formats. An Introduction

Surveying the Digital Library Landscape

Cataloging Thin Air: Planning & Cataloging the Beyond the Shelf Grant Digital Items

What is Islandora? Islandora is an open source digital repository that preserves, manages, and showcases your institution s unique material.

The Functional Extension Parser (FEP) A Document Understanding Platform

Collection Policy. Policy Number: PP1 April 2015

Getting Started with the Digital Commonwealth. Robin L. Dale Director of Digital & Preservation Services LYRASIS

Using PDF Files in CONTENTdm

Metadata. Week 4 LBSC 671 Creating Information Infrastructures

RUtgers COmmunity REpository (RUcore)

DRS Update. HL Digital Preservation Services & Library Technology Services Created 2/2017, Updated 4/2017

Evaluation of Islandora & SobekCM

METS For The Cultural Heritage Community: A Literature Review

Using PDF Files in CONTENTdm

ISO Self-Assessment at the British Library. Caylin Smith Repository

How to Build a Digital Library

ICGI Recommendations for Federal Public Websites

The Archive Ingest and Handling Test: The Johns Hopkins Universi...

University of Florida Digital Collections (UFDC) Tracking Database (Legacy software)

Preserving PDF at the coalface

The Swedish National Archives digital preservation. Mats Berggren, IT-department,

Writing a Data Management Plan A guide for the perplexed

This presentation is on issues that span most every digitization project.

An Introduction to PREMIS. Jenn Riley Metadata Librarian IU Digital Library Program

Metadata Workshop 3 March 2006 Part 1

Preservation of Web Materials

WEB ARCHIVE COLLECTING POLICY

Robin Dale RLG

What do you do when your file formats become obsolete? Lydia T. Motyka Florida Center for Library Automation USETDA 2011

Mass Digitisation Enabling Access, Use and Reuse

Transform Presentation

Lessons Learned. Implementing Rosetta in the Harold B. Lee Library

Initial Operating Capability & The INSPIRE Community Geoportal

Protecting Future Access Now Models for Preserving Locally Created Content

Digital Preservation for Digital Libraries. means in which we create, disseminate, and exchange information (Xin, Jiang, & Min, 2010).

GETTING STARTED WITH DIGITAL COMMONWEALTH

Using MARC Records to Populate CONTENTdm

Creating descriptive metadata for patron browsing and selection on the Bryant & Stratton College Virtual Library

Comparing Open Source Digital Library Software

PROCESSING AND CATALOGUING DATA AND DOCUMENTATION - QUALITATIVE

UVic Libraries digital preservation framework Digital Preservation Working Group 29 March 2017

A Repository of Metadata Crosswalks. Jean Godby, Devon Smith, Eric Childress, Jeffrey A. Young OCLC Online Computer Library Center Office of Research

The Sunshine State Digital Network

Archiving Full Resolution Images

Digital Preservation with Special Reference to the Open Archival Information System (OAIS) Reference Model: An Overview

The Impending Demise of the Local OPAC

Digitizing Historic Newspapers

Institutional Repositories

Plan for implementing a uniform content ingestion system for SPO

Hello, I m Melanie Feltner-Reichert, director of Digital Library Initiatives at the University of Tennessee. My colleague. Linda Phillips, is going

Digital Preservation Standards Using ISO for assessment

Metadata for general purposes

Transfers and Preservation of E-archives at the National Archives of Sweden

Transcription:

Mitigating Preservation Threats: Standards and Practices in the National Digital Newspaper Program Deborah Thomas and David Brunton, Library of Congress NISO Digital Preservation Forum Washington, DC 1 March 14, 2008

Preservation Threats Organizational threats Errors (creation, conversion) Obsolescence (formats, hardware) Failure (hardware, humans) Rosenthal, David S. H., Thomas Robertson, Tom Lipkis, Vicky Reich, Seth Morabito. Requirements for Digital Preservation Systems: A Bottom-Up Approach. D-Lib Magazine. <doi:10.1045/november2005-rosenthal> Littman, Justin. Actualized Preservation Threats: Practical Lessons from Chronicling America. D-Lib Magazine.<doi:10.1045/july2007-littman> 2

OVERVIEW The Program Format Challenges Data Specifications Repository Development Mitigating Threats Enduring Access 3

The Many Facets of the NDNP Enhancing access to historic American newspapers Cooperative national digitization effort (LC/NEH/state-level institutions) Digital preservation in support of enduring access National resource free to use and reuse 4

Historic Newspapers: Old Problems Challenges to using historic newspapers Enormous corpus of content Characteristics: hierarchical structure, multiple information elements, layouts Very little indexing Limitations of physical formats Far-flung collections (distributed holders) Competition for use 5

Historic Newspapers: New Opportunities Large scale collection in digital form Chronologically, geographically diverse Reuse of USNP Products bibliographic info, location info, film Cross-search content By dates, keywords High-quality images Grayscale, resolution Available to multiple users, worldwide, 24/7 6

Program Infrastructure Mitigate organizational threats to preservation NEH/LC successful partnership for USNP LC experience with digitization Roles: NEH, LC, awardee institutions Phased development 7

NDNP: Technical Challenges How do we ensure good, consistent content produced over time? (Scope) What do we have to do to be able to support 54 producers and aggregate the content over the course of 20 years? (Scale) How do we provide enduring access? (Sustain) 8

Good Digital Objects for NDNP Mitigate preservation threats thru standards and validation Should be as simple as is practical, producible with current technology (multi-producer) Known before and after aggregated into LC infrastructure (apples and apples) Specifications should support desired research functions of the system Find it, Browse it, View it Provide enduring access 9

View (and Preserve): Images Archival Image: TIFF Conforms with TIFF 6.0, metadata 8-bit grayscale, 400 dpi uncompressed Production Image: JPEG 2000 JPEG 2000, Part 1 (.jp2), lossy 8:1 RDF/Dublin Core metadata in XML box Printable Image: PDF Compatible with Acrobat 5.0 (PDF 1.4) XMP/RDF/Dublin Core metadata OCR ALTO Modified ALTO (Analyzed Layout and Text Object) XML schema with word coordinates 10

Browse, Search, Find: METS Objects METS (Metadata Encoding & Transmission Standard) XML Schemabased specification for wrapping varying metadata types, filenames, and locations into information packets. Title METS Object CONSER records, from OCLC (biblio + holdings) Issue METS Object Date, title, LCCN, producer, source, etc. Individual page data folded into Issue METS PREMIS/MIX metadata added at validation Reel METS Object Technical film measurements Technical target images 11

Our Goal: Mitigate Threats Our goal is to mitigate threats to the content by creating, using, and enforcing standards from the birth of digital content through the eventual viewing of the content by visitors to the website. 1. Deb covered the ways in which we create and select standards to use. 2. I will talk about technical ways we enable and enforce standards. 12

Repository Development Center (RDC) Our group at the Library of Congress builds processes, environments, and tools to help manage collections: Historic Newspapers Electronic Journals Multilingual Content Library Directories Inventory Management Content Transfer 13

Newspapers come to LC in Batches Created by publishers Digitized by awardees Validated in the field Transferred to The Library Reviewed by digital conversion specialists Archived for posterity Processed into our access systems Made available online SUSTAIN USNP Union List PRODUCE GET MAKE AVAILABLE 14

Enforcing Validity Early and Often Digital Viewer and Validator The Validator Structure Formats Sanity Fixity The Viewer Content Form Metadata 15

Tracking Changes Upon receipt of batches, The Library assumes custodianship, keeping track of all intentional and unintentional changes: 1. Verify fixities 2. Quality review 3. Archive a true copy 4. Transfer to processing 5. Verify fixities 6. Process 7. Create Fixities 8. Transfer to access systems 9. Verify Fixities 16

Preparing for Data Loss Data loss will happen. Data archived to tape Tapes stored off-site Batches periodically retrieved Fixities checked Decay repaired 17

Managing Transfers As files are moved throughout our system, an appreciable threat is that one of our staff may miss an important step or cause an error, intentionally or unintentionally. Managed transfer reduces this risk: Automated receipt Automated movement from machine to machine Tracking inventory Auditing live systems Reporting progress through the system 18

Providing Enduring Access http://www.loc.gov/chroniclingamerica 19

What s s Available Now? 138,000 bib records and associated holdings 568,000 pages, 1897-1910, from 61 titles California, DC, Florida, Kentucky, New York, Utah, Virginia Next update June 1 COMING SOON more, more, more 1880-1910/1880-1922/1860-1922/1835-1922 California, DC, Kentucky, Minnesota, Nebraska, New York, Texas, Utah, Virginia!!! 20

Thank You NDNP Technical Information http://www.loc.gov/ndnp/ NDNP Web Portal Chronicling America: Historic American Newspapers http://www.loc.gov/chroniclingamerica Questions? 21