Mass Digitisation Enabling Access, Use and Reuse National Digitisation Centre, Mikkeli, National Library of Finland Triangelipäivät 30.10.2008 Tiina Ison, Senior Analyst, Project Manager
Organisation of Speech THE CONTEXT - National Infrastructure Development 1. National Projects: DL and Mass Digitisation 2. Access, Use and Re-Use 3. Memory Organisations IN PRACTICE Digitisation Production Unit 1. Mass Digitisation Project 2008-2009 - OPM Hanke 2. National Digitisation Centre, NLF 3. Scaling up towards Mass Digitisation Processes 4. Reviewing structural analysis of content- Level of granularity 5. Providing access to resources 6. Metadata creation and capture 7. How about Use and Re-use? WRAP UP two national projects one wider process
THE CONTEXT - National Infrastructure Development Persona: Tiina as Senior Analyst
1. Education Ministry Funding and National Research Infrastrcutre Development Projects: National Digital Library Project 2008-2011 (OPM) Mass Digitisation Project 2008-2009 (OPM) Hub and Spoke Relationship (NRIDP) Closer integration needed (NRIDP) New Infrastructure Digitisation Existing Infrastructure Uniform Access Interface (viitetieto ja kokoteksti) Ministry of Education slide Production Environments and Systems Libraries, Archive, Museums, AV Born Digital Survey and Roadmap for Research Infrastructure in Finland, Social Sciences and Humanities (SSH) Panel. 7.10.2008 http://www.tsv.fi7tik/ssh report_071008_kokousmateriaali.pdf Long Term Preservation System (metadata ja dokumentit)
2. User and Communities want Access, Use and Re-use USER Context (build for mobility, user knowledge state, user knowledge construction, etc) Enabling Use and Re-Use interact COMMUNITY context Communities of Practice authoring, contribution, building knowledge commons e D E F I N E Enabling Access Presentation Layers User Interface Search Navigation Navigate Browse Contribute Resource Discovery Tools (Search and Discovery Services) Metadata Taxonomy Ontology Tagging Digital Resource Item Collection Collaboration Other Other Other D E F I N E governs FUNCTIONALITY
3. Memory Organisations- Museums, Archives and Libraries National Digital Library front end Production Units back end CRITICAL MASS Of CONTENT Production Units 1. Logistics for movement of source material 2. Logistics for workflow and storage of digital objects 3. Logistics for metadata creation and capture throughout the production chain Production dilemma message to National DL Workshop 16.10. - chain starts at creation of digital object not at ingest to a DL PAS? ingest ACCESS and PRESERVATION
IN PRACTICE Digitisation as Production Unit Persona: Tiina as Project Manager
1. Mass Digitisation Project OPM Digitointi Hanke Mass digitisation project funding from the Ministry of Education 2008-2009 Project aim is mass digitisation production logistics and management of whole of production chain and maximising access - as widely as possible (enabling access) Library wide workshops held at summer and autumn, 2008 Metadata Working Group and Care&Handling Working Group CCS Consultancy on production logistics and ingest Tools Item Tracking, Scan Client, docworks (site license)
2. National Digitisation Centre Now: Digitisation Centre of NLF at Mikkeli Project Based Digitisation (excluding historical newspapers) Digitisation processes and workflows internal to Mikkeli (scanning, conversion, web) Production output using interoperable XML based metadata standards in MODS, METS xml based format since 2004 Provision of Access is via web site maintained by Mikkeli http://digi.lib.helsinki.fi. Storage locally and at Helsinki Scaling up towards: National Digitisation Centre as a Production Unit at Mikkeli Mass Digitisation of Cultural Heritage (text, audio) Library wide digitisation process cross organisational production lines (i.e. ephemera with Archives ERS project) Production output using interoperable, XML based metadata standards in MODS, METS, PREMIS and MARC XML heading towards a METS Profile (KDK?)? Ingestion to National DL functionality and information architecture, single federated search, Europeana, EROM?PAS ingestion into a National Trusted Repository?
3. Establishing Mass Digitisation Processes - Helsinki care and handling of physical item Mikkeli Scan Client Module Item Tracking Module docworks (since 2004) 1. Selection 2. Verify catalogue, create holding record catalogued and noncatalogued items 3. Assign Bar Code ID s (1:1) 4. Prepare Trolleys in batches 5. Transport (230km) 6. Inspection enter into Item Tracking Bar code in, Scanner selection. 7. Scanning automatic, manual 8. Conversion Level of Structural analysis, output as JPEG2000, TXT, PDF. OCR, METS 9. Remote QA 10. Document Delivery and Ingestion to Presentation System Voyager holding record entry Update status in library catalogue using field (583) Ingest IN: descriptive metadata into docworks - automatically Capture technical metadata at scanning Create structural metadata with docworks Helsinki 9. Remote QA 11. Item Returns Update status in library catalogue Ingest Out: page number? How about authority control?
4. Level of Structural Analysis Determine the level of structural analysis Monographs chapter level? Newspapers article level? Ephemera none? Logical Structure chapters, articles, etc Physical Structure pages, columns, captions, footnotes etc. Turku Dissertation Papers Metadata working group Catalogue Pasi Koste Sirkka Havu Research community Submission to requirements Prioritisation of elements -- > researches really want to enrich txt due to quality of OCR - Remote QA for researchers? Cataloer Pasi Koste
5. Provision of Access (currently) Simple search Advance search Full text search (OCR) text highlighing Browsing by Newspaper name Article categorisation according to 1800 s (Eero Hyönen potential for ontology!) Fuzzy searching (spellling) For historical newspapers, functionality development for monographs, journals and ephemera
6. Creation and Capture of Metadata Persistent Identifiers of source material and digitial object (citation). Unique resource identifiers (URI) Administrative metadata Descriptive metadata (Ingestion from library catalogue Marc to MODS) Technical Metadata (MIX) (image quality, scanner, digital object provenenance) Structural metadata (METS) (granularity of search (URN) Preservation metadata (PREMIS) Rights management metadata (PREMIS) (copyright, access restrictions, licensing ie. Creative commons permission rights) docworks automatic output of administrative, techical and structural metadata wrap CAN INGEST TO DL s (National, Europeana etc)
7. How about Use and Re-Use Digitisation Centre produces output - of digital objects with Interoprable metadata - URI s that link resources semantically - Currently ensuring enabling citation with Persistent Identifiers (physical item, digital object) URN s for identifying down to structural level IPR? Copyright? Access Retrictions? Licensing (Creative Commons)? Learning objects? Community contribution?
Wrap Up One organisational wide process for digital content Unique identification of physical, source material Automatic ingestion of descriptive metadata from library catalogue Remote QA processes for non-catalogued items for post processing Ingestion back to library catalogue on discussion table Potential for researcher contribution for metadata, OCR txt enrichment Output Interoprable metadata MODS, METS heading for PREMIS Memory Organisations - Stuggling with concept of production unit - Pressure to build critical mass of content online - Pressure for provision of access, use, re-use - As prduction units differing levels of maturity - Metadata chain One national process - for long term preservation National Digitial Library (KDK) great time for co-operation