Post Digitization: Challenges in Managing a Dynamic Dataset. Jasper Faase, 12 April 2012

Similar documents
Digitising Special Collections Public-Private Partnerships at the KB and abroad

Paul Doorenbosch KB, National Library of the Netherlands Open Repositories 2010, Madrid,

Libraries and Disaster Recovery

Europeana, the prototype EDLfoundation Europeana Network Europeana, vs. 1.0 ThoughtLab Technical requirements

Caring for research data and what about software? Peter Doorn, director DANS

BHL-EUROPE: Biodiversity Heritage Library for Europe. Jana Hoffmann, Henning Scholz

Different approaches to digital preservation

Defining Economic Models for Digital Libraries :

Version 11

Protecting Future Access Now Models for Preserving Locally Created Content

The e-depot in practice. Barbara Sierman Digital Preservation Officer Madrid,

Dutch View on URN:NBN and Related PID Services

Striving for efficiency

Collection Policy. Policy Number: PP1 April 2015

GSCB-Workshop on Ground Segment Evolution Strategy

Research partnerships, user participation, extended outreach some of ETH Library s steps beyond digitization

Digital Preservation at NARA

Building a Dutch National Research Infrastructure IRODS UGM 2017

Collaboration in Digital Preservation. Barbara Sierman Digital Preservation Manager, KB NL CSC Espoo,

Can a Consortium Build a Viable Preservation Repository?

Developing a social science data platform. Ron Dekker Director CESSDA

Google indexed 3,3 billion of pages. Google s index contains 8,1 billion of websites

Big Data, exploiter de grands volumes de données

ARCW Digital Preservation Survey Report

Powering Transformation at Enel: Rebranding around digitization and customer experience

Erkki Tolonen

Bringing Europeana and CLARIN together: Dissemination and exploitation of cultural heritage data in a research infrastructure

Digital preservation activities at the German National Library nestor / kopal

From The European Library to The European Digital Library. Jill Cousins Inforum, Prague, May 2007

Building for the Future

The Functional Extension Parser (FEP) A Document Understanding Platform

International Implementation of Digital Library Software/Platforms 2009 ASIS&T Annual Meeting Vancouver, November 2009

Preserving Digital Cultural Heritage: the PREFORMA Project. Antonella Fresa Promoter Srl PREFORMA Technical Coordinator

The Sunshine State Digital Network

Resolution adopted by the General Assembly. [on the report of the Second Committee (A/64/417)]

CARARE: project overview

TEI, METS and ALTO, why we need all of them. Günter Mühlberger University of Innsbruck Digitisation and Digital Preservation

Practical Experiences with Ingesting Materials for Long-Term Preservation

Making research data repositories visible and discoverable. Robert Ulrich Karlsruhe Institute of Technology

verapdf after PREFORMA

Infrastructure PA Stephen Lecce

How to harvest born digital conspiracy theories: webarchiving Dutch digital culture in the post-truth era Dr Kees

WorldCat Digital Collection Gateway Visibility for Digital Collections

NEW YORK PUBLIC LIBRARY

Igitur Archive: Institutional Repository Utrecht University. May , Martin Slabbertje

Digitally Preserving African Heritage

Enabling Collaboration for Digital Preservation

A distributed network of digital heritage information

Utilizing Digital Library Infrastructure to Build Modern Research Collections

Persistent Identifiers for Audiovisual Archives and Cultural Heritage

Data Archiving and Networked Services. Valentijn Gilissen, MA

Public Private Partnerships for sustainable and smart cities. Milano, 4 July 2017

NARCIS: The Gateway to Dutch Scientific Information

STRATEGIC PLAN

Museums in the international information infrastructure. Do we know what will happen? And what are we going to do?

Cloud Computing Standards C-SIG Plenary Brussels, 15 February Luis C. Busquets Pérez DG CONNECT E2

Digital Single Market Technologies and Public Service Modernisation Package -DSM. Grazyna Wojcieszko DG CONNECT

IT Town Hall Meeting

Thrive in today's digital economy

STRATEGIC PLAN. USF Emergency Management

Thrive in today's digital economy

Current Digital Preservation Trends and SDB4 Features. Pauline Sinclair, Tessella, PASIG, Madrid, 5 July 2010

Electronic Records Archives: Philadelphia Federal Executive Board

Achieving a Secure and Resilient Cyber Ecosystem: A Way Ahead

Ex Libris Rosetta A Complete Digital Asset Management and Preservation System

PREservation FORMAts for culture information/e-archives

DCH-RP and PREFORMA Two case studies on the digital preservation of cultural heritage

Technical documentation. D2.4 KPI Specification

World Summit on the Information Society (WSIS) and the Digital Divide

Digitisation of historic newspapers and voluntary digital deposit of newspaper pre-print files in the the National Library of Estonia

Defense Security Service. Strategic Plan Addendum, April Our Agency, Our Mission, Our Responsibility

Mitigating Preservation Threats: Standards and Practices in the National Digital Newspaper Program

PROJECT FINAL REPORT. Tel: Fax:

Coordination, Collaboration and Integration: Canada s National Crime Prevention Strategy

The Minister s Selection

University of British Columbia Library. Persistent Digital Collections Implementation Plan. Final project report Summary version

RADAR Introduction and Basic Concepts. Matthias Razum

Different Aspects of Digital Preservation

EU DISASTER MANAGEMENT: NEW TOOLS FOR POLICY MAKING. Geneva, 22 May 2013

DRIVER Step One towards a Pan-European Digital Repository Infrastructure

Introduction to Metadata for digital resources (2D/3D)

NDL Search -future development- Yasunao Kobayashi Assistant Director Library Support Division, Kansai-kan of the National Diet Library

Leading New ICT The Road to City Digital Transformation

The role of PIONIER network in long term preservation services for cultural heritage institutions in Poland

About the Digital Bible Library

Evolving the digital library for digital scholarship enablement

CONTENTdm & The Digital Collection Gateway New Looks for Discovery and Delivery

EU Liaison Update. General Assembly. Matthew Scott & Edit Herczog. Reference : GA(18)021. Trondheim. 14 June 2018

UAE National Space Policy Agenda Item 11; LSC April By: Space Policy and Regulations Directory

2017 Resource Allocations Competition Results

BUILDING A NEW DIGITAL LIBRARY FOR THE NATIONAL LIBRARY OF AUSTRALIA

Digital Preservation: How to Plan

Agenda. Bibliography

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Damien Lecarpentier CSC-IT Center for Science, Finland EUDAT User Forum, Barcelona

RCS MediaGroup and Allenta

International Smart Grid Action Network TCP (ISGAN)

ISO Self-Assessment at the British Library. Caylin Smith Repository

PID System for eresearch

CNI Fall 2015 Membership Meeting, Washington, D.C. Archivportal-D. The National Platform for Archival Information in Germany

Research Infrastructures and Horizon 2020

Transcription:

Post Digitization: Challenges in Managing a Dynamic Dataset Jasper Faase, 12 April 2012

Post Digitization: Challenges in Managing a Dynamic Dataset Mission The Koninklijke Bibliotheek is the national library of the Netherlands: we bring people and information together. Our core values are: accessibility, sustainability, innovation and cooperation.

Post Digitization: Challenges in Managing a Dynamic Dataset Vision We Offer everyone everywhere access to everything published in the Netherlands Promote permanent access to digital information nationally and internationally Play a central role in the (scientific) information infrastructure of the Netherlands

Post Digitization: Challenges in Managing a Dynamic Dataset From vision to practice Mass-digitisation: both with public funding and in publicprivate partnerships (Metamorfoze, Google, Proquest) Between 2009 and 2013 10% of all Dutch publications (60 million pages) will be digitized. This will be sped up by designating digitization part of the library s primary process Target for newspaper digitization: 9 million pages by the end of 2012

Results so far Digitized: 7.4 million pages of Dutch national, colonial, regional and local newspapers (1618-1995) Contracts with 19 publishers and representative bodies of freelancers (photographers, journalists) to clear 105 titles for free online access 5 million pages published online via www.kranten.kb.nl; this figure will grow to 9 million pages by November 2012 Around 50,000 unique visitors and 80,000 visits a month

Post Digitization challenges 1. Improving results of Optical Character Recognition 2. Preserving large amounts of digital objects in order to guarantee access in the long term 3. Setting up a viable model for cooperation at national level to improve accessibility and completeness of digital collections

Challenge 1: Improving OCR Target groups (especially linguists) value high quality OCR Poor recognition rates of software means OCR of pre-1850 historical texts can often not be used for research Current situation at KB: OCR is a static dataset. No method in place to improve this data structurally Language technology is evolving quickly

Challenge 1: Improving OCR Develop a method for management of OCR as a dynamic dataset Use results of the European Project IMPACT (http://www.impact-project.eu/) within KB s digitization workflow (2012) Introduce crowdsourcing to improve OCR of individual texts to a high degree of accuracy (2013) Implement a toolset so language technology can be used to structurally improve OCR of existing datasets (2013-2014)

Challenge 2: Digital preservation Digital preservation in order to guarantee permanent access Use of digital copies is replacing use of originals Permanent access is not self evident and is not the main priority for many of our partners Risks:

Challenge 2: Digital preservation At the moment KB is only preserving born digital data in its E- Depot Extending scope to digitized collections will mean building a new scalable E-Depot for long-term & safe storage for over 500 TB of data This new depot has to provide the library with one infrastructure for processing, access and storage + + Safe storage Preservation metadata Permanent access

Requirements for the new system: Can be scaled up to deal with the enormous growth of digital collections Able to deal with a growing diversity of digital collections New functionality: tools for characterisation, formatconversion and other preservation functionality now available

Requirement for the library: Preservation and data management at the core of library processes

Challenge 3: Cooperation Fragmented access to digital publications; especially digitized newspapers Lack of standardization and limited use of metedatastandards (such as ALTO and protocols for sharing data OAI and SRU) Lack of coordination of digitization efforts

Challenge 3: Cooperation Centralizing mass-digitisation of books, magazines and newspapers - financed by Metamorfoze (National Programme for Preservation of Cultural Heritage) Starting a joint project between academic libraries and KB to develop a platform for digitized publications, including newspapers Developing a model for cooperating to provide permanent access to digitized heritage from libraries and newspaper archives And the biggest internal challenge: merging the National Library and the National Archive

Thanksforyourattention Jasper.Faase@kb.nl