DRS Update. HL Digital Preservation Services & Library Technology Services Created 2/2017, Updated 4/2017

Similar documents
Building on to the Digital Preservation Foundation at Harvard Library. Andrea Goethals ABCD-Library Meeting June 27, 2016

Protecting Future Access Now Models for Preserving Locally Created Content

Preserving Digital Content at Scale

Agenda. Bibliography

DRS Policy Guide. Management of DRS operations is the responsibility of staff in Library Technology Services (LTS).

Digital Preservation at NARA

DRS 2 Glossary. access flag An object access flag records the least restrictive access flag recorded for one of the object s files: ο ο

GEOSS Data Management Principles: Importance and Implementation

PRESERVING DIGITAL OBJECTS

Digital The Harold B. Lee Library

An overview of the OAIS and Representation Information

Mercè Crosas, Ph.D. Chief Data Science and Technology Officer Institute for Quantitative Social Science (IQSS) Harvard

Applying Archival Science to Digital Curation: Advocacy for the Archivist s Role in Implementing and Managing Trusted Digital Repositories

Data Curation Handbook Steps

University of British Columbia Library. Persistent Digital Collections Implementation Plan. Final project report Summary version

BUILDING A NEW DIGITAL LIBRARY FOR THE NATIONAL LIBRARY OF AUSTRALIA

Working with Islandora

Digital Preservation DMFUG 2017

DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM

Data management Backgrounds and steps to implementation; A pragmatic approach.

Working with a Preservation Software Vendor - The Kentucky Experience Glen McAninch

DRI: Preservation Planning Case Study Getting Started in Digital Preservation Digital Preservation Coalition November 2013 Dublin, Ireland

Lessons Learned. Implementing Rosetta in the Harold B. Lee Library

The Data Curation Profiles Toolkit: Interview Worksheet

Writing a Data Management Plan A guide for the perplexed

Conducting a Self-Assessment of a Long-Term Archive for Interdisciplinary Scientific Data as a Trustworthy Digital Repository

ebooks Preservation at Scholars Portal Kate Davis & Grant Hurley Scholars Portal, Ontario Council of University Libraries

NDSA Web Archiving Survey

archiving with the IBM CommonStore solution

Copyright 2008, Paul Conway.

Woodson Research Center Digital Preservation Policy

ORACLE SERVICES FOR APPLICATION MIGRATIONS TO ORACLE HARDWARE INFRASTRUCTURES

ISO Self-Assessment at the British Library. Caylin Smith Repository

Managing Image Metadata

JISC WORK PACKAGE: (Project Plan Appendix B, Version 2 )

DCH-RP Trust-Building Report

Digital Preservation and The Digital Repository Infrastructure

Digital Preservation Standards Using ISO for assessment

Richard Marciano Alexandra Chassanoff David Pcolar Bing Zhu Chien-Yi Hu. March 24, 2010

Emory Libraries Digital Collections Steering Committee Policy Suite

Ensuring Proper Storage for Earth Science Data: The USGS Process to Certify Trusted Digital Repositories

GETTING STARTED WITH DIGITAL COMMONWEALTH

Data Curation Profile Plant Genetics / Corn Breeding

The OAIS Reference Model: current implementations

Digital Preservation: How to Plan

RADAR. Establishing a generic Research Data Repository: RESEARCH DATA REPOSITORY. Dr. Angelina Kraft

PASIG Directions & Issues

7.3. In t r o d u c t i o n to m e t a d a t a

Developing a Framework for File Format Migrations. ipres 2015 Chapel Hill, NC 3 November 2015 Joey Heinen and Andrea Goethals

CONTENTdm & The Digital Collection Gateway New Looks for Discovery and Delivery

MarkLogic. A Modern Data Platform To Support Your Critical Path COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Archives in a Networked Information Society: The Problem of Sustainability in the Digital Information Environment

DRI: Dr Aileen O Carroll Policy Manager Digital Repository of Ireland Royal Irish Academy

Mass Digitisation Enabling Access, Use and Reuse

Helping Journals to Upgrade Data Publications for Reusable Research

Long-term digital preservation of UNSWorks

User Stories : Digital Archiving of UNHCR EDRMS Content. Prepared for UNHCR Open Preservation Foundation, May 2017 Version 0.5

Importance of cultural heritage:

DEVELOPING, ENABLING, AND SUPPORTING DATA AND REPOSITORY CERTIFICATION

The e-depot in practice. Barbara Sierman Digital Preservation Officer Madrid,

Electronic Records Archives: Philadelphia Federal Executive Board

Strategy for long term preservation of material collected for the Netarchive by the Royal Library and the State and University Library 2014

Utilizing Digital Library Infrastructure to Build Modern Research Collections

MAPPING STANDARDS! FOR RICHER ASSESSMENTS. Bertram Lyons AVPreserve Digital Preservation 2014 Washington, DC

Update on Dataverse Dryad-Dataverse Community Meeting. Mercè Crosas, Elizabeth Quigley & Eleni Castro. Data Science > IQSS > Harvard University

Next-Generation Melvyl Pilot supported by WorldCat Local: The Future of Searching

Draft Digital Preservation Policy for IGNCA. Dr. Aditya Tripathi Banaras Hindu University Varanasi

Certification. F. Genova (thanks to I. Dillo and Hervé L Hours)

What do you do when your file formats become obsolete? Lydia T. Motyka Florida Center for Library Automation USETDA 2011

Assessing the FAIRness of Datasets in Trustworthy Digital Repositories: a 5 star scale

EarthCube and Cyberinfrastructure for the Earth Sciences: Lessons and Perspective from OpenTopography

Data Curation Profile Human Genomics

What steps to take. when AV is yet to become a priority for your organisation

Data Archiving and Networked Services. Valentijn Gilissen, MA

Building for the Future

Improving a Trustworthy Data Repository with ISO 16363

What is Islandora? Islandora is an open source digital repository that preserves, manages, and showcases your institution s unique material.

Deposit-Only Service Needs Report (last edited 9/8/2016, Tricia Patterson)

Preservation and Access of Digital Audiovisual Assets at the Guggenheim

UC Irvine LAUC-I and Library Staff Research

NSF Data Management Plan Template Duke University Libraries Data and GIS Services

UNT Libraries TRAC Audit Checklist

Introduction to Digital Preservation. Danielle Mericle University of Oregon

Research Data Edinburgh: MANTRA & Edinburgh DataShare. Stuart Macdonald EDINA & Data Library University of Edinburgh

Technology Special Interest Group Thursday, December 4, Tony Hanson Webmaster Technology Special Interest Group Leader

RADAR A Repository for Long Tail Data

ProQuest Dissertations and Theses Overview. Austin McLean and Marlene Coles CGS Summer Workshop, July 2017

Connect America Fund Webinar: Filing with the HUBB Portal. High Cost Program December 13, 2017

UVic Libraries digital preservation framework Digital Preservation Working Group 29 March 2017

Assigns a persistent identifier that will always point to the object and/or its metadata.

C4: Library, Archives, Museum, and more: connecting for improved heritage services in the City of Burnaby

Putting Open Access into Practice

Data Virtualization Implementation Methodology and Best Practices

The What, Why, Who and How of Where: Building a Portal for Geospatial Data. Alan Darnell Director, Scholars Portal

Understanding my data and getting value from it

Indiana University Research Technology and the Research Data Alliance

Robin Dale RLG

Certification Efforts at Nestor Working Group and cooperation with Certification Efforts at RLG/OCLC to become an international ISO standard

Introduction to. Digital Curation Workshop. March 14, 2013 SFU Wosk Centre for Dialogue Vancouver, BC

CA Test Data Manager Key Scenarios

Transcription:

Update HL Digital Preservation Services & Library Technology Services Created 2/2017, Updated 4/2017 1

AGENDA DRS DRS DRS Architecture DRS DRS DRS Work 2

COLLABORATIVELY MANAGED DRS Business Owner Digital Preservation Services Key Responsibilities: Preservation & usage policies, strategies Manage & communicate about service Represent users & content s preservation needs Define high-level enhancement roadmaps Preservation plans Preservation outreach, consulting, guidelines DRS Technology Owner Library Technology Services Key Responsibilities: Technology & security policies, strategies Manage hardware, software, development Bug fixes & enhancements System monitoring & scaling Refine roadmaps based on resources System testing & documentation User support & training on systems 3

WHAT IS THE DRS? Harvard-maintained service for digital collections & content for: preservation keep the content safe keep the information usable long-term on modern platforms delivery to users A service - not storage or a tool Includes preservation & IT staff actively monitoring the content and systems Includes documented policies, practices & preservation plans Uses technology & systems but these change over time 4

KEY POLICIES What can be deposited into the DRS Who can deposit to the DRS Obligations of collection managers Responsibilities of DRS staff Retention policies Discovery & access policies Delivery services Preservation services Review began in March 2017 DRS Policy Guide: https://wiki.harvard.edu/confluence/pages/viewpage.action?pageid=204385879 5

DRS STORAGE FEES $6.00 Fee per GB per Year $5.00 $4.00 $3.00 $2.00 $1.00 $- 2004 2006 2008 2010 2012 2014 2016 2018 2020 6

KEY STRATEGIES Format guidance (preferred & accepted for deposit) Deposit tools with automatic technical characterization validated against documented content models Constant bit integrity checking of multiple distributed copies Regular storage refreshes Format migrations Expert networks 7

HIGHLIGHTS In production since 2000 Integrated with Library infrastructure & services from over 50 Harvard units Supports curatorial management Rich set of metadata Accepts any format but provides incentives to deposit in recommended preservable formats Supports HRCI content Proven track record Experience with several metadata migrations, maintenance of persistent names, several storage refreshes, planning for format migrations this year Never lost a file! Spawned well-used open-source tools (JHOVE, FITS) 8

ARCHITECTURAL LAYERS Discovery (outside the DRS but links maintained between the DRS and discovery systems) Delivery Applications and APIs for users & external systems Utility services (authentication, name resolution, etc.) Ingest Deposit tools and services Automated ingest of batches Management Infrastructure for metadata and content management and reporting Storage Replicated, geographically-distributed content and metadata Automated tools for monitoring content integrity 9

WHAT S IN THE DRS? Quantity: 64.7 million files, 212 TB per copy as of 4/2017 Image 58% Many formats Images, audio, text, digitized books, web sites, documents, biomedical image stacks, email and video But primarily digitized images and text Text 41% Format distribution 4/2017 Container 1% Other 0% 10

WHO OWNS THE CONTENT? DRS by Owner 160 140 TB 120 100 80 60 40 20 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 Year Art Museum Business Design Divinity Education FAS Government HU Archives Law Medical Other Radcliffe Does not include content from the Harvard-Google project 11

GROWING USAGE 12

IMLS DRS Audit (1 res. until 6/16) Planning for Next-Gen DRS Storage Arcadia Easier DRS Deposits Project (1 term position through11/16) Arcadia Long-Term Preservation Project (2 term positions through 11/16) LARGE PARALLEL PROJECTS Migration to Next-Gen DRS Storage DP/DRS and Media Pres. Roadmap DRS2 Metadata Migration (1 term position through 6/17) Planning for Format Migrations Audio Format Migration Ongoing Enhancements, Upgrades and Project Support (2+ developers + systems and user support + management) 6/2016 12/2016 4/2017 We are here 6/2017 13

IMLS DRS Audit Prepared for eventual certification as a trustworthy repository; indicated areas to improve that have since been addressed Planning for Next-Gen DRS Storage Migration to Next-Gen DRS Storage OUTCOMES Refreshed storage with greater geographic distribution; less preservation risk Arcadia Easier DRS Deposits Project Arcadia Long-Term Preservation Project Requirements ready for a BatchBuilder replacement; DRS discussion group and DRS community platform established Video fully supported for preservation and basic delivery; requirements ready for the other most requested formats DP/DRS and Media Pres. Roadmap DRS2 Metadata Migration All DRS content described consistently to support preservation planning and achieves Level 3B PREMIS conformance (highest level) Planning for Format Migrations Ongoing Enhancements, Upgrades and Project Support Audio Format Migration All DRS audio accessible to users on modern platforms; have framework for format migrations 6/2016 12/2016 4/2017 We are here 6/2017 14

IMLS DRS Audit (1 res. until 6/16) Planning for Next-Gen DRS Storage UPDATE ON DRS2 METADATA MIGRATION Migration to Next-Gen DRS Storage Arcadia Easier DRS Deposits Project (1 term position through11/16) Arcadia Long-Term Preservation Project (2 term positions through 11/16) DRS2 Metadata Migration (1 term position through 6/17) Planning for Format Migrations DP/DRS and Media Pres. Roadmap Audio Format Migration Ongoing Enhancements, Upgrades and Project Support (2+ developers + systems and user support + management) 6/2016 12/2016 4/2017 We are here 6/2017 15

DRS2 METADATA MIGRATION PROJECT Last piece of the DRS2 Project (move to the nextgeneration DRS) Previously completed: Transition all infrastructure to standards-based object model and metadata schemas New modern management tools and API-based services layer Support for more formats Metadata migration Re-describe all the content at the object and file-level Result: more accurate and detailed metadata to support curatorial management and preservation planning 16

MIGRATION SEQUENCED BY CONTENT TYPE Status: 2/2017 1. Color Profiles - DONE 2. Text Methodologies - DONE 3. PDF Documents - DONE 4. Target Images - DONE 5. Still Images & Page-turned Documents - ALMOST DONE 6. Google books - SPECS READY 7. Audio - SPECS READY 8. Opaque Containers - SPECS READY 9. Web Harvests - SPECS READY 10. Biomedical Images - SPECS READY 17

MIGRATION SEQUENCED BY CONTENT TYPE Status: 4/2017 1. Color Profiles - DONE 2. Text Methodologies - DONE 3. PDF Documents - DONE 4. Target Images - DONE 5. Still Images & Page-turned Documents - ALMOST DONE 6. Google books - DONE 7. Audio - IN PROGRESS 8. Opaque Containers - SPECS READY 9. Web Harvests - SPECS READY 10. Biomedical Images - SPECS READY 18

OVERALL MIGRATION STATUS (2/2017) Files migrated 82% Left to migrate 18% 16% = Still image & Page-turned objects 2% = Google books 0.1% = Audio 0.1% = Web harvests 0.04% = Opaque containers 0.01% = Biomedical images 19

OVERALL MIGRATION STATUS (4/2017) Files migrated 99% Left to migrate 1% 0.6% = Still image & Page-turned objects 0.1% = Audio 0.1% = Web harvests 0.04% = Opaque containers 0.01% = Biomedical images 20

STATUS OF METADATA MIGRATION FOR STILL IMAGES & PAGE-TURNED DOCUMENTS (4/2017) Live chart at: https://wiki.harvard.edu/confluence/display/digitalpreservation/drs2+metadata+migration+project 21

OVERALL STATUS OF METADATA MIGRATION (2/2017) Live chart at: https://wiki.harvard.edu/confluence/display/digitalpreservation/drs2+metadata+migration+project 22

OVERALL STATUS OF METADATA MIGRATION (4/2017) Live chart at: https://wiki.harvard.edu/confluence/display/digitalpreservation/drs2+metadata+migration+project 23

DATA CLEANUP Fixed metadata errors for thousands of objects Report is in progress including: Overall findings Recommendations for preventing future metadata errors Individual sections for each of the 55 DRS content-owning units To share this experience a paper will be written to present at a conference 24

IMLS DRS Audit (1 res. until 6/16) Planning for Next-Gen DRS Storage UPDATE ON AUDIO FORMAT MIGRATION Migration to Next-Gen DRS Storage Arcadia Easier DRS Deposits Project (1 term position through11/16) Arcadia Long-Term Preservation Project (2 term positions through 11/16) DRS2 Metadata Migration (1 term position through 6/17) Planning for Format Migrations DP/DRS and Media Pres. Roadmap Audio Format Migration Ongoing Enhancements, Upgrades and Project Support (2+ developers + systems and user support + management) 6/2016 12/2016 4/2017 We are here 6/2017 25

AUDIO FORMAT MIGRATION STEPS Status: 4/2017 1. Migrate audio metadata (Spring 2017) 1. SMIL playlists - DONE 2. Audio objects - IN TESTING 2. Prepare for format migration (Spring-Summer 2017) 1. Format migration spec - IN PROCESS 2. Format migration tools - IN PROCESS 3. Update DRS ingest tools - NOT STARTED 4. Test migration - NOT STARTED 3. Run format migration (late Summer 2017) 1. SMIL playlists - NOT STARTED 2. RealAudio- NOT STARTED 26

IMLS DRS Audit (1 res. until 6/16) Planning for Next-Gen DRS Storage UPDATE ON LONG-TERM PRESERVATION PROJECT Migration to Next-Gen DRS Storage Arcadia Easier DRS Deposits Project (1 term position through11/16) Arcadia Long-Term Preservation Project (2 term positions through 11/16) DRS2 Metadata Migration (1 term position through 6/17) Planning for Format Migrations DP/DRS and Media Pres. Roadmap Audio Format Migration Ongoing Enhancements, Upgrades and Project Support (2+ developers + systems and user support + management) 6/2016 12/2016 4/2017 We are here 6/2017 27

LONG-TERM PRESERVATION PROJECT 3-year project enabled by Arcadia (ended Nov. 30, 2016) Goal to add support to the DRS for formats mostrequested by curators: video word processing CAD (2D and 3D) disk images RAW camera images (image sequences for scanned film) New fast-tracking process working with consultants to help with the analysis Work carries on through the digital preservation / DRS and media roadmaps 28

FORMAT SUPPORT BY PROJECT END (11/30/2016) Datasets 6% Other Still Images 6% Other OCR Text 1% Presentations 1% Databases 5% Software 5% Vector Graphics 6% Spreadsheets 8% Newspapers 1% DNG 6% GIS 1% Articles 1% Disk Images 8% Video 19% Word Processing 14% CAD 11% Chart shows format distribution of 60 requests made from 2004-2016 by curators to add support for new formats in the DRS Analysis Complete, Development Complete Analysis Complete, Development Partially Complete Analysis Complete, Development to Start When Resources are Available Work to be Scheduled in the 29

VIDEO DELIVERY http://nrs.harvard.edu/urn-3:fhcl:32605325 30

COMPLIANCE WITH THE NDSA LEVELS OF PRESERVATION* 1. Before the DRS2 project: * http://www.digitalpreservation.gov:8081/ndsa/activities/levels.html Storage & Geographic Location File Fixity and Data Integrity Information Security Metadata Level One (Lowest) Level Two Level Three Level Four (Highest) File Formats = DRS compliance 31

COMPLIANCE WITH THE NDSA LEVELS OF PRESERVATION 2. After the DRS2 project: Storage & Geographic Location File Fixity and Data Integrity Information Security Metadata Level One (Lowest) Level Two Level Three Level Four (Highest) File Formats = DRS compliance 32

COMPLIANCE WITH THE NDSA LEVELS OF PRESERVATION 3. After the latest DRS storage upgrade and the audio format migration: Storage & Geographic Location File Fixity and Data Integrity Information Security Metadata Level One (Lowest) Level Two Level Three Level Four (Highest) File Formats = DRS compliance 33

FUTURE WORK Depends on Library priorities Currently on horizon: Complete projects & enhancements in-flight (audio migration, CJK support, scaling, easier deposits, ) Full support through delivery for disk images, RAW camera images, email, CAD files, image sequences Delivery service improvements Expose more DRS documentation, roadmap DRS certification Additional DRS deposit streams from Harvard & external infrastructure (e.g. Dataverse, Archive-It, Kaltura) Evaluate options and start planning for next-gen DRS 34

RESOURCES DRS has become foundational HL infrastructure Average IT maintenance requires 4-5 FTE Major enhancements require additional staff Relative priority Description 1 Audio format migration & delivery improvements FY18 2 Easier deposits FY18- FY19 3 Support for disk image content model FY19 4 Make DRS content easier to embed in Canvas, etc. FY19 5 Video delivery enhancements FY20 6 Depositor efficiencies FY20 Timing given current level of staffing Slide adapted from Library IT Planning for FY18, Feb. 10, 2017 35 With existing staff we estimate this could be accomplished in FY18 With 1 additional developer we estimate this could be accomplished in FY18

DISCUSSION 36