Update HL Digital Preservation Services & Library Technology Services Created 2/2017, Updated 4/2017 1
AGENDA DRS DRS DRS Architecture DRS DRS DRS Work 2
COLLABORATIVELY MANAGED DRS Business Owner Digital Preservation Services Key Responsibilities: Preservation & usage policies, strategies Manage & communicate about service Represent users & content s preservation needs Define high-level enhancement roadmaps Preservation plans Preservation outreach, consulting, guidelines DRS Technology Owner Library Technology Services Key Responsibilities: Technology & security policies, strategies Manage hardware, software, development Bug fixes & enhancements System monitoring & scaling Refine roadmaps based on resources System testing & documentation User support & training on systems 3
WHAT IS THE DRS? Harvard-maintained service for digital collections & content for: preservation keep the content safe keep the information usable long-term on modern platforms delivery to users A service - not storage or a tool Includes preservation & IT staff actively monitoring the content and systems Includes documented policies, practices & preservation plans Uses technology & systems but these change over time 4
KEY POLICIES What can be deposited into the DRS Who can deposit to the DRS Obligations of collection managers Responsibilities of DRS staff Retention policies Discovery & access policies Delivery services Preservation services Review began in March 2017 DRS Policy Guide: https://wiki.harvard.edu/confluence/pages/viewpage.action?pageid=204385879 5
DRS STORAGE FEES $6.00 Fee per GB per Year $5.00 $4.00 $3.00 $2.00 $1.00 $- 2004 2006 2008 2010 2012 2014 2016 2018 2020 6
KEY STRATEGIES Format guidance (preferred & accepted for deposit) Deposit tools with automatic technical characterization validated against documented content models Constant bit integrity checking of multiple distributed copies Regular storage refreshes Format migrations Expert networks 7
HIGHLIGHTS In production since 2000 Integrated with Library infrastructure & services from over 50 Harvard units Supports curatorial management Rich set of metadata Accepts any format but provides incentives to deposit in recommended preservable formats Supports HRCI content Proven track record Experience with several metadata migrations, maintenance of persistent names, several storage refreshes, planning for format migrations this year Never lost a file! Spawned well-used open-source tools (JHOVE, FITS) 8
ARCHITECTURAL LAYERS Discovery (outside the DRS but links maintained between the DRS and discovery systems) Delivery Applications and APIs for users & external systems Utility services (authentication, name resolution, etc.) Ingest Deposit tools and services Automated ingest of batches Management Infrastructure for metadata and content management and reporting Storage Replicated, geographically-distributed content and metadata Automated tools for monitoring content integrity 9
WHAT S IN THE DRS? Quantity: 64.7 million files, 212 TB per copy as of 4/2017 Image 58% Many formats Images, audio, text, digitized books, web sites, documents, biomedical image stacks, email and video But primarily digitized images and text Text 41% Format distribution 4/2017 Container 1% Other 0% 10
WHO OWNS THE CONTENT? DRS by Owner 160 140 TB 120 100 80 60 40 20 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 Year Art Museum Business Design Divinity Education FAS Government HU Archives Law Medical Other Radcliffe Does not include content from the Harvard-Google project 11
GROWING USAGE 12
IMLS DRS Audit (1 res. until 6/16) Planning for Next-Gen DRS Storage Arcadia Easier DRS Deposits Project (1 term position through11/16) Arcadia Long-Term Preservation Project (2 term positions through 11/16) LARGE PARALLEL PROJECTS Migration to Next-Gen DRS Storage DP/DRS and Media Pres. Roadmap DRS2 Metadata Migration (1 term position through 6/17) Planning for Format Migrations Audio Format Migration Ongoing Enhancements, Upgrades and Project Support (2+ developers + systems and user support + management) 6/2016 12/2016 4/2017 We are here 6/2017 13
IMLS DRS Audit Prepared for eventual certification as a trustworthy repository; indicated areas to improve that have since been addressed Planning for Next-Gen DRS Storage Migration to Next-Gen DRS Storage OUTCOMES Refreshed storage with greater geographic distribution; less preservation risk Arcadia Easier DRS Deposits Project Arcadia Long-Term Preservation Project Requirements ready for a BatchBuilder replacement; DRS discussion group and DRS community platform established Video fully supported for preservation and basic delivery; requirements ready for the other most requested formats DP/DRS and Media Pres. Roadmap DRS2 Metadata Migration All DRS content described consistently to support preservation planning and achieves Level 3B PREMIS conformance (highest level) Planning for Format Migrations Ongoing Enhancements, Upgrades and Project Support Audio Format Migration All DRS audio accessible to users on modern platforms; have framework for format migrations 6/2016 12/2016 4/2017 We are here 6/2017 14
IMLS DRS Audit (1 res. until 6/16) Planning for Next-Gen DRS Storage UPDATE ON DRS2 METADATA MIGRATION Migration to Next-Gen DRS Storage Arcadia Easier DRS Deposits Project (1 term position through11/16) Arcadia Long-Term Preservation Project (2 term positions through 11/16) DRS2 Metadata Migration (1 term position through 6/17) Planning for Format Migrations DP/DRS and Media Pres. Roadmap Audio Format Migration Ongoing Enhancements, Upgrades and Project Support (2+ developers + systems and user support + management) 6/2016 12/2016 4/2017 We are here 6/2017 15
DRS2 METADATA MIGRATION PROJECT Last piece of the DRS2 Project (move to the nextgeneration DRS) Previously completed: Transition all infrastructure to standards-based object model and metadata schemas New modern management tools and API-based services layer Support for more formats Metadata migration Re-describe all the content at the object and file-level Result: more accurate and detailed metadata to support curatorial management and preservation planning 16
MIGRATION SEQUENCED BY CONTENT TYPE Status: 2/2017 1. Color Profiles - DONE 2. Text Methodologies - DONE 3. PDF Documents - DONE 4. Target Images - DONE 5. Still Images & Page-turned Documents - ALMOST DONE 6. Google books - SPECS READY 7. Audio - SPECS READY 8. Opaque Containers - SPECS READY 9. Web Harvests - SPECS READY 10. Biomedical Images - SPECS READY 17
MIGRATION SEQUENCED BY CONTENT TYPE Status: 4/2017 1. Color Profiles - DONE 2. Text Methodologies - DONE 3. PDF Documents - DONE 4. Target Images - DONE 5. Still Images & Page-turned Documents - ALMOST DONE 6. Google books - DONE 7. Audio - IN PROGRESS 8. Opaque Containers - SPECS READY 9. Web Harvests - SPECS READY 10. Biomedical Images - SPECS READY 18
OVERALL MIGRATION STATUS (2/2017) Files migrated 82% Left to migrate 18% 16% = Still image & Page-turned objects 2% = Google books 0.1% = Audio 0.1% = Web harvests 0.04% = Opaque containers 0.01% = Biomedical images 19
OVERALL MIGRATION STATUS (4/2017) Files migrated 99% Left to migrate 1% 0.6% = Still image & Page-turned objects 0.1% = Audio 0.1% = Web harvests 0.04% = Opaque containers 0.01% = Biomedical images 20
STATUS OF METADATA MIGRATION FOR STILL IMAGES & PAGE-TURNED DOCUMENTS (4/2017) Live chart at: https://wiki.harvard.edu/confluence/display/digitalpreservation/drs2+metadata+migration+project 21
OVERALL STATUS OF METADATA MIGRATION (2/2017) Live chart at: https://wiki.harvard.edu/confluence/display/digitalpreservation/drs2+metadata+migration+project 22
OVERALL STATUS OF METADATA MIGRATION (4/2017) Live chart at: https://wiki.harvard.edu/confluence/display/digitalpreservation/drs2+metadata+migration+project 23
DATA CLEANUP Fixed metadata errors for thousands of objects Report is in progress including: Overall findings Recommendations for preventing future metadata errors Individual sections for each of the 55 DRS content-owning units To share this experience a paper will be written to present at a conference 24
IMLS DRS Audit (1 res. until 6/16) Planning for Next-Gen DRS Storage UPDATE ON AUDIO FORMAT MIGRATION Migration to Next-Gen DRS Storage Arcadia Easier DRS Deposits Project (1 term position through11/16) Arcadia Long-Term Preservation Project (2 term positions through 11/16) DRS2 Metadata Migration (1 term position through 6/17) Planning for Format Migrations DP/DRS and Media Pres. Roadmap Audio Format Migration Ongoing Enhancements, Upgrades and Project Support (2+ developers + systems and user support + management) 6/2016 12/2016 4/2017 We are here 6/2017 25
AUDIO FORMAT MIGRATION STEPS Status: 4/2017 1. Migrate audio metadata (Spring 2017) 1. SMIL playlists - DONE 2. Audio objects - IN TESTING 2. Prepare for format migration (Spring-Summer 2017) 1. Format migration spec - IN PROCESS 2. Format migration tools - IN PROCESS 3. Update DRS ingest tools - NOT STARTED 4. Test migration - NOT STARTED 3. Run format migration (late Summer 2017) 1. SMIL playlists - NOT STARTED 2. RealAudio- NOT STARTED 26
IMLS DRS Audit (1 res. until 6/16) Planning for Next-Gen DRS Storage UPDATE ON LONG-TERM PRESERVATION PROJECT Migration to Next-Gen DRS Storage Arcadia Easier DRS Deposits Project (1 term position through11/16) Arcadia Long-Term Preservation Project (2 term positions through 11/16) DRS2 Metadata Migration (1 term position through 6/17) Planning for Format Migrations DP/DRS and Media Pres. Roadmap Audio Format Migration Ongoing Enhancements, Upgrades and Project Support (2+ developers + systems and user support + management) 6/2016 12/2016 4/2017 We are here 6/2017 27
LONG-TERM PRESERVATION PROJECT 3-year project enabled by Arcadia (ended Nov. 30, 2016) Goal to add support to the DRS for formats mostrequested by curators: video word processing CAD (2D and 3D) disk images RAW camera images (image sequences for scanned film) New fast-tracking process working with consultants to help with the analysis Work carries on through the digital preservation / DRS and media roadmaps 28
FORMAT SUPPORT BY PROJECT END (11/30/2016) Datasets 6% Other Still Images 6% Other OCR Text 1% Presentations 1% Databases 5% Software 5% Vector Graphics 6% Spreadsheets 8% Newspapers 1% DNG 6% GIS 1% Articles 1% Disk Images 8% Video 19% Word Processing 14% CAD 11% Chart shows format distribution of 60 requests made from 2004-2016 by curators to add support for new formats in the DRS Analysis Complete, Development Complete Analysis Complete, Development Partially Complete Analysis Complete, Development to Start When Resources are Available Work to be Scheduled in the 29
VIDEO DELIVERY http://nrs.harvard.edu/urn-3:fhcl:32605325 30
COMPLIANCE WITH THE NDSA LEVELS OF PRESERVATION* 1. Before the DRS2 project: * http://www.digitalpreservation.gov:8081/ndsa/activities/levels.html Storage & Geographic Location File Fixity and Data Integrity Information Security Metadata Level One (Lowest) Level Two Level Three Level Four (Highest) File Formats = DRS compliance 31
COMPLIANCE WITH THE NDSA LEVELS OF PRESERVATION 2. After the DRS2 project: Storage & Geographic Location File Fixity and Data Integrity Information Security Metadata Level One (Lowest) Level Two Level Three Level Four (Highest) File Formats = DRS compliance 32
COMPLIANCE WITH THE NDSA LEVELS OF PRESERVATION 3. After the latest DRS storage upgrade and the audio format migration: Storage & Geographic Location File Fixity and Data Integrity Information Security Metadata Level One (Lowest) Level Two Level Three Level Four (Highest) File Formats = DRS compliance 33
FUTURE WORK Depends on Library priorities Currently on horizon: Complete projects & enhancements in-flight (audio migration, CJK support, scaling, easier deposits, ) Full support through delivery for disk images, RAW camera images, email, CAD files, image sequences Delivery service improvements Expose more DRS documentation, roadmap DRS certification Additional DRS deposit streams from Harvard & external infrastructure (e.g. Dataverse, Archive-It, Kaltura) Evaluate options and start planning for next-gen DRS 34
RESOURCES DRS has become foundational HL infrastructure Average IT maintenance requires 4-5 FTE Major enhancements require additional staff Relative priority Description 1 Audio format migration & delivery improvements FY18 2 Easier deposits FY18- FY19 3 Support for disk image content model FY19 4 Make DRS content easier to embed in Canvas, etc. FY19 5 Video delivery enhancements FY20 6 Depositor efficiencies FY20 Timing given current level of staffing Slide adapted from Library IT Planning for FY18, Feb. 10, 2017 35 With existing staff we estimate this could be accomplished in FY18 With 1 additional developer we estimate this could be accomplished in FY18
DISCUSSION 36