From Mass Digitization to Description: Indiana University s Strategy to Overcome the Next Great Challenge

Similar documents
AV Description with AVPreserve & Indiana University

Indiana University s Media Digitization and Preservation Initiative

AMP: An Audiovisual Metadata Platform to Support Mass Description

DRI: Preservation Planning Case Study Getting Started in Digital Preservation Digital Preservation Coalition November 2013 Dublin, Ireland

Preserving Digital Content at Scale

Applying Archival Science to Digital Curation: Advocacy for the Archivist s Role in Implementing and Managing Trusted Digital Repositories

University of British Columbia Library. Persistent Digital Collections Implementation Plan. Final project report Summary version

Protecting Future Access Now Models for Preserving Locally Created Content

DLF ENVIRONMENTAL SCAN BY JEN MOHAN

How do Small Archives Steward their Moving Image and Sound Collections? A Qualitative Study

different types of archives

DRS Update. HL Digital Preservation Services & Library Technology Services Created 2/2017, Updated 4/2017

HydraDAM2: Extending Fedora 4 and Hydra for Media Preservation

Article begins on next page

Importance of cultural heritage:

DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM

Big Data, exploiter de grands volumes de données

Digital Preservation Policy for the PFA Library and Film Study Center. Paul Grammatico. San Jose State University

Digital The Harold B. Lee Library

Implementing a Data Publishing Service via DSpace. Jon W. Dunn, Randall Floyd, Garett Montanez, Kurt Seiffert

Practical IT Research that Drives Measurable Results OptimizeIT Strategic Planning Bundle

Variations: Implementing an Open Source Digital Music Library System

Recital Preservation: before they fade away. DIGITAL FRONTIERS September 2016 Houston, Texas

A Primer on the Use of TimeReference:!

The Reporter A Newspaper Digitization Project. Jenni Salamon Coordinator, Ohio Digital Newspaper PRogram

The first author s field notes are not included in this dataset because he played more of a

Preserving & Digitizing the Kent Tribune Newspaper. Sandy Halem, Kent Historical Society Jenni Salamon, Ohio History Connection

Computer Vision and Media Analytics

Introduction to Audiovisual Archiving. Workshop Outline and Assignments

Meeting the Challenge of Media Preservation: Strategies and Solutions

Lessons Learned. Implementing Rosetta in the Harold B. Lee Library

UNCOVERING THE HIDDEN TREASURES: ENABLING VIDEO DISCOVERY AND PRESERVATION FOR FUTURE GENERATIONS

PASIG Digital Preservation BOOTcamp Best* Practices in Preserving Common Content Types: AudioVisual files

The Data Census: Assessing Data Services at MSU

The Avalon Media System

Mass Digitisation Enabling Access, Use and Reuse

Collection Policy. Policy Number: PP1 April 2015

ProQuest Dissertations and Theses Overview. Austin McLean and Marlene Coles CGS Summer Workshop, July 2017

HydraDAM2: Extending Fedora 4 and Hydra for Media Preservation Project Narrative

Competency Definition

GETTING STARTED WITH DIGITAL COMMONWEALTH

Information Technology Services. Informational Report for the Board of Trustees October 11, 2017 Prepared effective August 31, 2017

Evaluating and Improving Cybersecurity Capabilities of the Electricity Critical Infrastructure

Building on to the Digital Preservation Foundation at Harvard Library. Andrea Goethals ABCD-Library Meeting June 27, 2016

Archivists Workbench: White Paper

Accession Record. Date(s) Donor. Project Mgr./Staff Contact. Fieldworker(s) Artist/Informant. Fieldwork Location. Project. Project Description

Outline. Recent research projects. Moving images. Research interests 1/06/09. Previous work. Other aspects. Indexing moving images

April 17, Ronald Layne Manager, Data Quality and Data Governance

Emergency Compliance DG Special Case DAMA INDIANA

Information Systems and Tech (IST)

Digitizing of Bulgarian National Radio Archives

A Preliminary Investigation into the Search Behaviour of Users in a Collection of Digitized Broadcast Audio

STRATEGIC PLAN

Emory Libraries Digital Collections Steering Committee Policy Suite

INFORMATION TECHNOLOGY CYBERSECURITY CLOUD COMPUTING

CISER Data Archive Collection Policy

UC Irvine LAUC-I and Library Staff Research

Developing a Research Data Policy

Audit and Compliance Committee - Agenda

Streaming Media Survey Report

Object-based audio production. Chris Baume EBU-PTS - 27th January 2016

Certification. F. Genova (thanks to I. Dillo and Hervé L Hours)

Five-Year Capital Plan Request FY through FY

What steps to take. when AV is yet to become a priority for your organisation

Redesign Strategy: UIC Library Website Redesign

A Finding Aid to the Murals of Aztlán Film Production Records, 1981, in the Archives of American Art

Horizon2020/EURO Coordination and Support Actions. SOcietal Needs analysis and Emerging Technologies in the public Sector

MOVING IMAGE ARCHIVING & PRESERVATION PROGRAM ACCESS TO MOVING IMAGE COLLECTIONS, H

Getting Started with the Digital Commonwealth. Robin L. Dale Director of Digital & Preservation Services LYRASIS

Search Framework for a Large Digital Records Archive DLF SPRING 2007 April 23-25, 25, 2007 Dyung Le & Quyen Nguyen ERA Systems Engineering National Ar

Records Management at MSU. Hillary Gatlin University Archives and Historical Collections January 27, 2017

How Turner Broadcasting can avoid the Seven Deadly Sins That. Can Cause a Data Warehouse Project to Fail. Robert Milton Underwood, Jr.

Creating descriptive metadata for patron browsing and selection on the Bryant & Stratton College Virtual Library

DRS Policy Guide. Management of DRS operations is the responsibility of staff in Library Technology Services (LTS).

School of Library & Information Science, Kent State University. NOR-ASIST, April 4, 2011

ISO Self-Assessment at the British Library. Caylin Smith Repository

PERSPECTIVES ON STREAMING VIDEO

Best Practices for Campus Security. January 26, 2017

ACCI Recommendations on Long Term Cyberinfrastructure Issues: Building Future Development

7.3. In t r o d u c t i o n to m e t a d a t a

Sustainable Governance for Long-Term Stewardship of Earth Science Data

Building Institutional Repositories: Emerging Challenges

BUILDING CYBERSECURITY CAPABILITY, MATURITY, RESILIENCE

Language Resources. Khalid Choukri ELRA/ELDA 55 Rue Brillat-Savarin, F Paris, France Tel Fax.

The library s role in promoting the sharing of scientific research data

Indigenous Languages of Latin America. Heidi Johnson / The University of Texas at Austin

2017 Resource Allocations Competition Results

A Strategy for Network Disclosure using MARC21 Format for Holdings

Integration of New and Big Administrative Data Sources into Turkish Statistical System

Brown University Libraries Technology Plan

DIGITAL Institute for Information and Communication Technologies

Digitally Preserving African Heritage

BUILDING A NEW DIGITAL LIBRARY FOR THE NATIONAL LIBRARY OF AUSTRALIA

NDSA Web Archiving Survey

Considerations and potential solutions for digital media stream: Current needs and available resources for streaming Audio and video content

Mansfield Center for Pacific Affairs records,

Indiana University Research Technology and the Research Data Alliance

Academic Programs. Esfan HaghverdiExecutive Associate Dean for Academic Affairs

Value & Role of Business Analyst in Agile. Presented by: Jagruti Shah Associate Business Consultant Mastek Ltd

Data publication and discovery with Globus

Transcription:

From Mass Digitization to Description: Indiana University s Strategy to Overcome the Next Great Challenge Chris Lacinak, AVPreserve Jon Dunn, Indiana University AMIA Conference - Pittsburgh November 10, 2016

MDPI: Media Digitization and Preservation Initiative Goal: To digitize, preserve and make universally available by IU s Bicentennial subject to copyright or other legal restrictions all of the time-based media objects on all campuses of IU judged important by experts. 280,000+ audio and video items ~7PB over 4 years 9TB per day peak 80 different units 20+ different physical formats INDIANA UNIVERSITY

Public-Private Partnership for Digitization Memnon Archiving Services, A Sony Company INDIANA UNIVERSITY

4 Digitization and Preservation: The Phases Digitization Discovery and Access Pre-Digitization Inventory Catalog Prioritize Batch & Queue à Memnon Massive parallel digitization IU Operation Digitization of selected unique and highly vulnerable formats Quality control à à Metadata Rights Issues Technical aspects Metadata Technical infrastructure Ongoing monitoring and migration Digital Preservation and Storage INDIANA UNIVERSITY

Progress to Date mdpi.iu.edu INDIANA UNIVERSITY

Many Types of Units Some Examples: Libraries Herman B Wells Library Cook Music Library Lilly Library Archives Archives of African American Music and Culture Archives of Traditional Music Black Film Center/Archive IUL Moving Image Archive University Archives Museums Sage Costume Collection Mathers Museum Production Units Radio and Television Services Media Design and Production Performing Groups African American Arts Institute Department of Bands Academic Units School of Journalism Department of Folklore and Ethnomusicology Jacobs School of Music Research Centers and Institutes Kinsey Institute Russian and East European Institute Athletics! INDIANA UNIVERSITY

Rights Goal: make universally available Many rights situations: Copyright held by IU Copyright held by others Copyright held by others with license or deed of gift to IU Public domain Copyrighted orphaned work Copyright status and ownership unknown Also, ethical and cultural considerations Decision trees being developed to guide rights and access determinations INDIANA UNIVERSITY

Existing metadata sources: IUCAT INDIANA UNIVERSITY

Existing metadata sources: POD INDIANA UNIVERSITY

Access: Avalon Media System INDIANA UNIVERSITY

Digital Preservation: HydraDAM2 Media Files and Metadata Masters, Mezzanines Digital Preservation Repository (HydraDAM2) SDA Out-of-Region Storage Transcodes Access Repository (Avalon) INDIANA UNIVERSITY

Bit Storage: Scholarly Data Archive INDIANA UNIVERSITY

Collection and Metadata Examples

INDIANA UNIVERSITY

INDIANA UNIVERSITY

INDIANA UNIVERSITY

INDIANA UNIVERSITY

INDIANA UNIVERSITY

INDIANA UNIVERSITY

INDIANA UNIVERSITY

INDIANA UNIVERSITY

INDIANA UNIVERSITY

INDIANA UNIVERSITY

INDIANA UNIVERSITY

Hours 280k

Media Content Description

Core Principles Technological Heroism Human Assisted Computing People First Phased and Iterative Service Oriented Architecture

Methodology +

Applied permission Applied perm note Collection *Date created *Default permissions *Department/Unit Duration Full text Geographic Identifiers Links & relationships *MDPI barcode Music/Speech *Names Notes Parts/components Physical description *Publication status Relationship *Rights holder *Rights status *Role (for names) Sound/silent Source Subjects Target audience *Title Type of Resource Type of title Year of renewal Date other Event Geographic other Keywords Language Publisher Beats per minute Color info Color/B&W Ethnicity Frequency info Gender Genre Music present

Metadata Generation Mechanisms AV images text metadata MGM

Phase One MGMs Phase Two MGMs Phase Three MGMs Commercial content ID Human generation Human transcription Legacy closed caption Legacy date & timecode Memnon output Music vs speech detection Queried/Imported data Segmentation Shipping manifest (POD) Silence detection Tech-embedded metadata Facial recognition Linked data Named entity recognition Natural language processing Paper & video OCR Phoneme analysis Speech to text Subtitle recovery Speech-to-text comparison & weighting Artificial intelligence Audio frequency analysis Beats per minute detection Chroma analysis Content matching Genre detection Speaker identification

Phase One Phase Two Phase Three 1) Staffing & Training 1) SMART team embed 2) Acquire/Develop & implement core infrastructure & MGMs 3) Rights analysis 4) Associated materials 5) Phase 2 Prep Required & prioritized Manual Low hanging fruit 1) Develop/Acquire & implement phase 2 MGMs 2) Additional system dev 3) Phase 3 prep Semi/Automated Time-based MD Linked data 1) Develop/Acquire & implement phase 3 MGMs 2) Additional system dev 3) Focus on AI Synthesized/AI Specialized/ Domain specific

Without the Project

Possible Searches Wells Radio and Television Services Collection

Phase One Title A Conversation with HB Wells Title (Type of Title Alternative) A Conversation with Herman B. Wells Collection Radio and Television Services Department/Unit Radio and Television Services Physical Description Betacam Duration 31:07 MDPI Barcode 40000000784944 Type of Resource moving image Date Created 1970/1979 Music/speech Speech Sound/silent Sound Subjects Indiana University (http://id.loc.gov/authorities/names/n8 0126369) Universities and colleges United States Administration (http://id.loc.gov/authorities/subjects/s h85141150) Universities and colleges United States History 20 th century (http://id.loc.gov/authorities/subjects/s h2010117482) Wells, Herman B. (http://id.loc.gov/authorities/names/n8 1012520)

Phase One Names Mundt, Bruce (Producer) Petranoff, Robert (Producer) Office of University Relations (Production company) Wells, Herman B. (Interviewee) Clark, Thomas D. (Interviewer) WTIU Bloomington (Production place) Geographic (Recording location) United States Parts/components 1 of 1 Notes Betacam is transfer from film original based on film leader at start of content. Date range is based on main library building being complete and having electricity. Rights holder Indiana University Rights status In Copyright Year of Renewal 2067 Publication status Unpublished Applied permissions Campus only Default permissions Campus only

Phase One Search Unpublished videos Produced by Radio and TV Audio type speech Herman B. Wells is interviewee IU is rights holder

Phase Two (Additional) Language English Keywords Board of Trustees Faculty University administration University president Music Arts Midwest universities President's spouses 20 th -century leaders Names Bryan, William Lowe Kinsey, Alfred Roosevelt, Eleanor Rockefeller Foundation Date other 1940/1970 Full Text OCR from text within video Speech to Text Geographic other Bloomington (http://www.geonames.org/4254679) Indiana (http://www.geonames.org/4921868)

Phase Two Search Every place within the video Face of Herman B. Wells appears Eleanor Roosevelt exists in text

Phase Three (Additional) Color/bw Color Color Information [time stamp and values representing predominant colors being represented] Relation [Content matching will identify where any portions of content in this video also exist in any portions of videos within the repository] Music present [time stamp at points at which music is present] Genre Interview Speaker Identification [time stamp and name each time a speaker changes, combined with the names field]

Phase Three Searches All repository content with any portions of the audio or video in this video Information for each piece of commercial music

Encompassed in Budget Existing Staff Hours Human Generation MGMs Query and Import Hours SMART Assessment Hours MGM Products & Services MGM Dev Avalon Dev Workflow System Dev RMS Dev Consulting Scanning & Imaging Costs Hardware Costs

Cost Per Item (All In) Phase 1 Phase 2 Phase 3 Total $16.26 $12.58 $4.07 $32.91

A Few Takeaways Organizational effort and preparation Segmentation vs timestamping Lower than anticipated costs

Next Steps 1. Access Workflows task force formed Develop workflows for items/collections with good metadata and clear rights 2. Continued internal messaging Digitization access 3. Longer-term planning Implementation plan and budget for full metadata project Development of platform for workflow engine / data warehouse Grant funding, collaborators? INDIANA UNIVERSITY

Questions? mdpi.iu.edu avpreserve.com jwd@iu.edu chris@avpreserve.com INDIANA UNIVERSITY