Grey Literature and Digital Preservation: Standards in Practice GL17 Pre-Conference Workshop 30 November 2015, Amsterdam.

Similar documents
INISat25: Pioneer of the nuclear information highway

The e-depot in practice. Barbara Sierman Digital Preservation Officer Madrid,

International Implementation of Digital Library Software/Platforms 2009 ASIS&T Annual Meeting Vancouver, November 2009

Staff Sourcing for Digitization

P2WW ENZ0. PaperStream Capture 2.5. User's Guide

UWDCC Case Study. Supplementing DPOE Workshop 6 June Steven Dast Digital Asset Librarian UW Digital Collections Center

Invenio: A Modern Digital Library for Grey Literature

GETTING STARTED WITH DIGITAL COMMONWEALTH

GUIDELINES FOR CREATION AND PRESERVATION OF DIGITAL FILES

BHL-EUROPE: Biodiversity Heritage Library for Europe. Jana Hoffmann, Henning Scholz

ABBYY FineReader 14 YOUR DOCUMENTS IN ACTION

fi-5530c2 Image Scanner

fi-6110 Image Scanner

Igitur Archive: Institutional Repository Utrecht University. May , Martin Slabbertje

Session Questions and Responses

Safety of Nuclear Installations

Nuno Freire National Library of Portugal Lisbon, Portugal

Million Book Universal Library Project :Manual for Metadata Capture, Digitization, and OCR

CONTENTdm & The Digital Collection Gateway New Looks for Discovery and Delivery

The WINS Academy Security Certification Programme: The Route to Demonstrable Competence. Dr Roger Howsley, Executive Director

Getting Bits off Disks: Using open source tools to stabilize and prepare born-digital materials for long-term preservation

Main focus of the of the presentation

Scanning. Technology

CONTENTdm 4.3. Russ Hunt Product Specialist Barcelona October 30th 2007

Chapter 9 Section 3. Digital Imaging (Scanned) And Electronic (Born-Digital) Records Process And Formats

CREATING DIGITAL REPOSITORIES PRESENTED BY CHAMA MPUNDU MFULA CHIEF LIBRARIAN NATIONAL ASSEMBLY OF ZAMBIA

fi-7180/ fi-7160/ fi-7260/ fi-7280 Getting Started Image Scanner Checking the Components P3PC EN

Union catalogue models

RLG Model Request for Information (RFI) for Digital Imaging Services

Defining Economic Models for Digital Libraries :

Compound or complex object: a set of files with a hierarchical relationship, associated with a single descriptive metadata record.

PDF/A - The Basics. From the Understanding PDF White Papers PDF Tools AG

Encontros de outono. Nicholas Crofts. Chair ICOM CIDOC

Greenstone Publications

Data Curation Handbook Steps

SECTION E: DOCUMENT DIGITIZATION

Discovery to Delivery with WorldCat Local

From The European Library to The European Digital Library. Jill Cousins Inforum, Prague, May 2007

RECORDS MANAGEMENT Judith Read and Mary Lea Ginn

different types of archives

This document is a preview generated by EVS

HARMONIZATION OF NUCLEAR SAFEGUARDS INFRASTRUCTURE DEVELOPMENT

ABBYY FineReader 14 Full Feature List

Capacity building in the IAEA Action Plan on Nuclear Safety

Digital The Harold B. Lee Library

WEB-BASED COLLECTION MANAGEMENT FOR ARCHIVES

Rack2-Filer ( 1) (Exclusive to S1300 with Rack2-Filer)

Setup DVD-ROM. Carrier Sheet ( 1) Rack2-Filer ( 1) (Exclusive to S1500 with Rack2-Filer)

A paragraph that summarizes the important points of a given text.

DCH-RP and PREFORMA Two case studies on the digital preservation of cultural heritage

WEB-BASED COLLECTION MANAGEMENT FOR LIBRARIES

Digital preservation activities at the German National Library nestor / kopal

72,035 libraries in 170 countries

EPO INPADOC 44 years. Dr. Günther Vacek, EPO Patent Information Fair 2016, Tokyo. November 2016

ISO INTERNATIONAL STANDARD. Information and documentation Library performance indicators

Digital s ideal partner

Web-based workflow software to support book digitization and dissemination. The Mounting Books project

DLF ENVIRONMENTAL SCAN BY JEN MOHAN

When it comes to information archive strategy, digital has met its ideal partner.

Capture Pro Software. Getting Started A-61640

Digital Preservation DMFUG 2017

ABBYY FineReader 14. User s Guide ABBYY Production LLC. All rights reserved.

Assigns a persistent identifier that will always point to the object and/or its metadata.

INTERGOVERNMENTAL OCEANOGRAPHIC COMMISSION (of UNESCO)

State Government Digital Preservation Profiles

Information and documentation Library performance indicators

Institutional Repositories

Mitigating Preservation Threats: Standards and Practices in the National Digital Newspaper Program

ST600 Overhead Book Scanner. Kiosk Software Installation Guide

easy ntelligent convenient GlobalScan NX Server 5/ Server 32/Server 750 Capture & Distribution Solution Energize Critical Workflows

A Dublin Core Application Profile in the Agricultural Domain

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Damien Lecarpentier CSC-IT Center for Science, Finland EUDAT User Forum, Barcelona

Protection of the National Cultural Heritage in Austria

General Disposal Authority 7

Frederick Zarndt Semblanza

Key principles: The more complete and accurate the information, the better the matching

Workshop 4.4: Lessons Learned and Best Practices from GI-SDI Projects II

Session Two: OAIS Model & Digital Curation Lifecycle Model

Practical Experiences with Ingesting Materials for Long-Term Preservation

Fast. Easy to use. Built for business. Two powerful new desktop document scanners join our award winning line-up ADS-2200 I ADS-2700W

The Swedish National Archives digital preservation. Mats Berggren, IT-department,

Global Nuclear Safety and Security Regime

Evaluation of Islandora & SobekCM

Two interrelated objectives of the ARIADNE project, are the. Training for Innovation: Data and Multimedia Visualization

International Atomic Energy Agency Meeting the Challenge of the Safety- Security Interface

Persistent identifiers, long-term access and the DiVA preservation strategy

Using PDF Files in CONTENTdm

ISO TC46/SC4/WG7 N ISO Information and documentation - Directories of libraries and related organizations

DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM

Invoice Automation for Microsoft NAV. Platform Requirements. 09 th August Invoice Automation for Microsoft NAV Platform Requirements 1

Getting Started with the Digital Commonwealth. Robin L. Dale Director of Digital & Preservation Services LYRASIS

PaperSave Cloud Deployment WhitePaper

ScandAll PRO V2.1 User's Guide

CERN Document Server Software

AMScan Viewer User Guide Version 2.1

Introduction to Digital Preservation. Danielle Mericle University of Oregon

TO SCAN OR NOT TO SCAN? Donna Read, CRM, CDIA+, ERMs, ECMp President DL Read LLC Dlreadllc.com

Alphabet Soup: A Metadata Overview Melanie Schlosser Metadata Librarian

Managing our Print Legacy: Factors influencing the success, sustainability and scalability of shared print management Constance Malpas, OCLC Research

NEDLIB LB5648 Mapping Functionality of Off-line Archiving and Provision Systems to OAIS

Transcription:

Grey Literature and Digital Preservation: Standards in Practice GL17 Pre-Conference Workshop 30 November 2015, Amsterdam Germain St-Pierre g.st-pierre@iaea.org Nuclear Information Section IAEA, Vienna 1

Scope of this Presentation Introduction: What is INIS? Overview of the INIS System Architecture Non-Conventional Literature (NCL) in the INIS Collection Preservation and Digitization of NCL (1970 to now) NCL Collection Management System Microfiche Digitization Project Other Digitization Activities Practical Session Conclusions 2

International Nuclear Information System (INIS) One of the world's largest collections of published information on the peaceful uses of nuclear science and technology Operated since 1970 by the IAEA in collaboration with 130 countries and 24 international organizations 45 years of successful international cooperation 3

INIS in 1970 INIS Atomindex on magnetic tape INIS Atomindex in hardcopy NCL on microfiche Handling of Nuclear Information STI/PUB/254 4

and NOW Free online access to a unique collection of nonconventional (grey) literature Google-based INIS Collection Search (ICS): inis.iaea.org/search 5

INIS Path through History 1970 First Atomindex in printed and electronic form. NCL distributed on microfiche 1971 INIS Thesaurus for document indexing 1991 INIS Database available on CD-ROM 1997 NCL full-texts on CD-ROM. Start of electronic document delivery service 1998 INIS Database on Internet (for subscribers only) 2004 Start of Computer-assisted indexing (CAI) 2009 Open Access of INIS Database on Internet 2011 Google-based INIS Collection Search (ICS) 2012 INIS Products made Unicode compliant 2013 Publicly accessible PDFs searchable through Google.com and Google Scholar 2014 Link to INSPIRE HEP, OCLC s WorldCat service, IAEA Library catalogue and NUCLEUS databases through ICS interface 2015 Over 190,000 page views and 230,000 PDF downloads in October 2015 INIS Milestones: https://www.iaea.org/inis/about-us/history.html 6

Non-Conventional Literature in INIS 3,86 million bibliographic records in the INIS Collection 27% (over 1 million records) are NCL > 750,000 NCL are Available from INIS > 515,000 full-texts in PDF > 371,000 full-texts are PUBLIC 7

1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 Non-Conventional Literature in INIS 1970-2015 35000,0 30000,0 25000,0 20000,0 15000,0 10000,0 5000,0,0 NCL available from INIS NCL available from other sources 8

INIS System Architecture 9

NCL Processing and Distribution (1970 now) Production System NCL Processing NCL Distribution 1970-1996 Microfiche-based (B&H, TDC Documate, Anacomp) Conversion from paper to microfiche (Note: NCL from U.S.A. and Japan received on microfiche) Diazo duplicates 1997-2003 INISIS Image-based (JouveScan) Conversion from paper to TIFF NCL CD-ROM with INISIR viewer 2003-2009 2010-2013 2014 to present INISIS2K (InputAccel) IDPS (Livelink) NCL Collection System (PixEdit + ABBYY FineReader) Conversion from paper to PDF + OCR Ingestion of PDF files NCL CD-ROM Online access (subscribers only) E-mail / FTP NCL DVD Free online access E-mail / FTP Free online access E-mail / FTP 10

INIS NCL Collection on Microfiche (1970-1996) 312,000 NCL reports 500,000 bibliographic records Conversion from paper to microfiche and diazo duplication after database production NCL Check ensured accuracy of the eye-readable microfiche header information Over 1 million master microfiches ( 17 million pages) Diazo duplicates sent to customers 11

INIS Imaging System (INISIS) (1997-2003) Image-based production system developed by Jouve Imaging started only after database production B/W scanning, image enhancement, link to bibliographic metadata using bar codes, validation and CD image creation Format supported: single-page TIFF Group 4 2 b/w Fujitsu production scanners, 5 workstations, 2 servers 12

INIS Imaging System 2000 (INISIS2K) (2003-2009) 32-bit modular Capture System (EMC InputAccel) B/W, Greyscale and Color Scanning, OCR, output to PDF Integration with Livelink-based INIS Data Processing System (IDPS) Bibliographic Control of NCL from PDF output 2 B/W Fujitsu production scanners, 1 Kodak i260 color scanner, dedicated workstations for image enhancement and OCR 13

Current Technical Infrastructure (2010 to present) Flexible tools to support different projects: PixEdit 8 (scanning, image enhancement) ABBYY FineReader Corporate Edition 12 (multilingual OCR) 2 scanners (Fujitsu fi-5750c with VRS, Kodak i1440) 2 Sunrise microfiche scanners 4 workstations with quad-core processors 14

NCL Collection Management System Strong web-based data management system PDF Metadata is collected during ingestion into the NCL Collection Helpful to detect problems, e.g. PDF files needing OCR, poor PDF creation tools 15

Microfiche Digitization Project Project started in 2002 as support to document delivery service In close collaboration with the INIS National Centres to avoid duplication of work Scanning part was outsourced to 4 different companies Image enhancement, OCR and output to PDF done in-house 16

Support / Take Advantage of Digitization Initiatives Win-Win strategy. Some examples: Digitization of old collections from Mexico, Sweden, the Netherlands and Serbia CEA, France: Ongoing digitization of old theses from hard copy Availability of full-texts in PDF adds new value to the INIS bibliographic records PDF scanned from paper may replace the version scanned from microfiche in our NCL Collection If gaps are identified, new INIS records are created US DoE: Digitized reports (from paper) available from SciTech are harvested to avoid duplication of work and speed up the digitization of our microfiche collection 17

Digitization of IAEA documents Agreement with IAEA Publications to digitize and make available on-line out-of-print books published before 2000: Proceedings Series, Safety Series, IAEA Bulletin Digitized old documents from the IAEA Board of Governors Digitized old documents of the International Nuclear Data Committee (INDC) for the IAEA Nuclear Data Section 18

Practical Session Examples Standards PDF PDF/A? 19

For more information INIS Newsletters: https://www.iaea.org/inis/productsservices/newsletter/ INIS Training Seminar Presentations: https://www.iaea.org/inis/events/training/ts- 2015/Presentations/index.html INIS Publications: https://www.iaea.org/inis/productsservices/publications/index.html 20

Conclusions Digitization projects require: serious planning substantial funds qualified staff (internal or external) awareness of standards well defined purpose Maintenance of high quality digitization process is a must Document preparation, selection of proper scanning techniques, type of equipment, and adherence to current standards, determine digitization success or failure Digitization should not be a goal in itself - its ultimate use and usefulness must always be taken into account Meaningful and searchable metadata should accompany any digitized collection, making online search and retrieval efficient Long term preservation needs to be considered in order to ensure future sustainability and availability of the digitized collection 21

Thank you! 22