Persistent identifiers, long-term access and the DiVA preservation strategy

Similar documents
DRI: Preservation Planning Case Study Getting Started in Digital Preservation Digital Preservation Coalition November 2013 Dublin, Ireland

Metadata and Encoding Standards for Digital Initiatives: An Introduction

Dutch View on URN:NBN and Related PID Services

Data Curation Handbook Steps

Institutional Repository using DSpace. Yatrik Patel Scientist D (CS)

Slide 1 & 2 Technical issues Slide 3 Technical expertise (continued...)

The Ohio State University's Knowledge Bank: An Institutional Repository in Practice

B2SAFE metadata management

Building a Digital Repository on a Shoestring Budget

Strategy for long term preservation of material collected for the Netarchive by the Royal Library and the State and University Library 2014

Long-term digital preservation of UNSWorks

Developing a Research Data Policy

ISO Self-Assessment at the British Library. Caylin Smith Repository

DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM

The OAIS Reference Model: current implementations

Swedish National Data Service, SND Checklist Data Management Plan Checklist for Data Management Plan

The e-depot in practice. Barbara Sierman Digital Preservation Officer Madrid,

Registry Interchange Format: Collections and Services (RIF-CS) explained

Metadata Workshop 3 March 2006 Part 1

DigitalHub Getting started: Submitting items

WORKSHOP 28 TH /29 TH APRIL Christine Staiger

Science Europe Consultation on Research Data Management

Data Exchange and Conversion Utilities and Tools (DExT)

BPMN Processes for machine-actionable DMPs

Assessment of product against OAIS compliance requirements

Assessment of product against OAIS compliance requirements

Lessons from the implementation

Digital Preservation DMFUG 2017

Data Management Plan Generic Template Zach S. Henderson Library

Copyright 2008, Paul Conway.

Building for the Future

Conducting a Self-Assessment of a Long-Term Archive for Interdisciplinary Scientific Data as a Trustworthy Digital Repository

Checklist and guidance for a Data Management Plan, v1.0

Agenda. Bibliography

1. General requirements

Data publication and discovery with Globus

Managing Data in the long term. 11 Feb 2016

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

Institutional repositories: description of VITAL as an example of a Fedora-based digital assets management system.

Handles at LC as of July 1999

SobekCM. Compiled for presentation to the Digital Library Working Group School of Oriental and African Studies

Hello, I m Melanie Feltner-Reichert, director of Digital Library Initiatives at the University of Tennessee. My colleague. Linda Phillips, is going

Million Book Universal Library Project :Manual for Metadata Capture, Digitization, and OCR

A structured workflow for implementing digital archiving standards in an organisation

RDF and Digital Libraries

ACDH AUSTRIAN CENTRE FOR DIGITAL HUMANITIES

Writing a Data Management Plan A guide for the perplexed

Preservation. Policy number: PP th March Table of Contents

Florida Digital Archive (FDA) SIP Specification

The Semantic Institution: An Agenda for Publishing Authoritative Scholarly Facts. Leslie Carr

DAITSS Demo Virtual Machine Quick Start Guide

ProQuest Dissertations and Theses Overview. Austin McLean and Marlene Coles CGS Summer Workshop, July 2017

Dryad Curation Manual, Summer 2009

Certification. F. Genova (thanks to I. Dillo and Hervé L Hours)

SciX Open, self organising repository for scientific information exchange. D15: Value Added Publications IST

What do you do when your file formats become obsolete? Lydia T. Motyka Florida Center for Library Automation USETDA 2011

Persistent identifiers in the national bibliography context

Managing Web Resources for Persistent Access

Summary of Bird and Simons Best Practices

Archives in a Networked Information Society: The Problem of Sustainability in the Digital Information Environment

Interdisciplinary Processes at the Digital Repository of Ireland

Persistent Identifiers for Audiovisual Archives and Cultural Heritage

Transfers and Preservation of E-archives at the National Archives of Sweden

Nuno Freire National Library of Portugal Lisbon, Portugal

Certification Efforts at Nestor Working Group and cooperation with Certification Efforts at RLG/OCLC to become an international ISO standard

A Repository of Metadata Crosswalks. Jean Godby, Devon Smith, Eric Childress, Jeffrey A. Young OCLC Online Computer Library Center Office of Research

THE BIG FLAME A MODEL FOR A UNIVERSAL FULL-TEXT ELECTRONIC LIBRARY OF RESEARCH

The Necessity of a New Culture of Electronic Publishing C A S L I N

FDA Affiliate s Guide to the FDA User Interface

Showing it all a new interface for finding all Norwegian research output

ISO 2146 INTERNATIONAL STANDARD. Information and documentation Registry services for libraries and related organizations

An overview of the OAIS and Representation Information

ORCA-Registry v2.4.1 Documentation

DRI: Dr Aileen O Carroll Policy Manager Digital Repository of Ireland Royal Irish Academy

FLAT: A CLARIN-compatible repository solution based on Fedora Commons

Invenio: A Modern Digital Library for Grey Literature

Comparing Open Source Digital Library Software

Research Data Management: lessons learned - and still to learn

Towards a joint service catalogue for e-infrastructure services

Digital Preservation Efforts at UNLV Libraries

Pam Armstrong Library and Archives Canada Ottawa, Canada

DISCOVER THE POWER OF VITAL The Solution for Your Digital Collection Management

Archival Information Package (AIP) E-ARK AIP version 1.0

Appendix REPOX User Manual

Compound or complex object: a set of files with a hierarchical relationship, associated with a single descriptive metadata record.

Persistent identifiers: jnbn, a JEE application for the management of a national NBN infrastructure

2nd Technical Validation Questionnaire - interim results -

Electronic Submission to UMI using FTP

Igitur Archive: Institutional Repository Utrecht University. May , Martin Slabbertje

Susan Thomas, Project Manager. An overview of the project. Wellcome Library, 10 October

Data Management Checklist

Its All About The Metadata

Mass Digitisation Enabling Access, Use and Reuse

Session Two: OAIS Model & Digital Curation Lifecycle Model

re3data.org - Making research data repositories visible and discoverable

Digital repositories as research infrastructure: a UK perspective

The European project CASA (promoting Co-operative Action on Serials and Articles)

Implementation of the Data Seal of Approval

GEOSS Data Management Principles: Importance and Implementation

Florida Coastal Everglades LTER Program

Transcription:

Persistent identifiers, long-term access and the DiVA preservation strategy Eva Müller Electronic Publishing Centre Uppsala University Library, http://publications.uu.se/epcentre/ 1

Outline DiVA project and its objectives DiVA publishing system Persistent identifiers and their roles within the DiVA publishing system Conclusions and next steps

DiVA Project Started 2000 at Uppsala University, 2004 ten universities three countries

DiVA - Academic Archive Online (Digitala Vetenskapliga Arkivet ) Objectives of the DiVA Project: Technical solutions & workflows supporting fulltext publishing, storage and dissemination of university research (theses, dissertations, working and research papers ) Explore ways to ensure future access, use and understanding of digital objects in the archive

DiVA Publishing System makes it possible to reuse and enhance data from source documents originally created by authors, both for metadata and a digital master for electronic & printed versions assign a persistent identifier, store & checksum all files in a local archive send a copy to the national library archives and to other interested parties

Long term access and the DiVA preservation strategy Issues How can we ensure access to documents we produce locally? How can we minimize risks for data loss? What factors increase potential for success? Can these factors be integrated into an automated and low-cost workflow?

How can we ensure access in the future? A stable point of reference (persistent identifier) Use human-readable, non-proprietary storage format for metadata and if possible even for the content (published documents) Storage in several locations

How can we minimize risks for data loss? Multiple copies in different locations Mechanism to keep track of copies? Can we integrate all these factors into an automated and low-cost workflow?

Long-term access Stakeholders Producers Authors Discovery of their intellectual output Dissemination of their intellectual output University Publishers Increase impact

Long-term access Stakeholders Consumers Authors (citation durability) Readers (discovery, bibliography) Universities (track research output) Curators National Libraries (legal deposit) Archives? Other parties

Some requirements for PIDs and their resolution Easy and reliable maintenance and administration Potential to connect a preservation copy to the PIDs (guarantee long-term access) Possibility to integrate into automated and low-cost workflows

Which PID and why? Cooperation with a trusted, public and nonprofit organization Management of a resolution service, other metadata services and an archival copy within the same framework Possibility to use the same PID for different manifestations of the same content Non proprietary solution

Based on that: Decision to cooperate with the National Library of Decision to use XML as a primary storage format Decision to use URN:NBN as a primary persistent identifier Decision to fit all needs into an automated workflow

Assignment of the URN:NBN The name assigning authority The Royal Library, the National Library of assigns sub domains Sub domain manages locally Structure URN:NBN:se:?:diva URN:NBN:se:uu:diva+locally managed serial number URN:NBN is used as identifier for each item an item is a single publication without consideration of format, where various formats of the item (the identical content) are manifestations

Implementation of URN:NBN Resolution Service Version 2.00 released in May A new version in cooperation within Nordic countries coming in fall 2004 Implemented as a java-servlet and contains a harvester which can harvest URN:URLbindings from many different repositories

User request e.g. http://urn.kb.se/resolve?urn= Royal Library response user redirected to an URL URN:NBN:se to URL mappings URN:NBN resolution service Resolution Service Configuration File request request Repositories response URN:NBN Register Format response URN:NBN Register Format DiVA Other 16

URN:NBN and its various roles within the DiVA system URN:NBN as a unique identifier within the archive URN:NBN as a naming convention for files, directories and archival packages URN:NBN as a part of disseminated metadata

URN:NBN as a naming convention for files, directories and information packages

Information Package metadata content stylesheets schemas checksums checksum name: URN:NBN:se:[specific part]

Metadata Dissemination Services Export Formats Dublin Core MARC 21 TEI Header Endnote Reference Manager URN:NBN Register Format Content Dissemination Services Export Formats PDF DocBook TEI HTML Word Processor Author Word Processing Format (Template) DiVA Document Format Web Services URN:NBN as a part of disseminated metadata Local Repository 20

Central URN:NBN Resolution Service Long-term Storage Library Catalogue XML Long-term storage packages MARC 21 Local urn:nbn:se:.. -> http://www urn:nbn:se:.. -> http://www... urn:nbn:se:.. -> http://www... List of URN:NBN to URL mappings Long-term Storage Repository Long-term storage packages urn:nbn:se:. Metadata Metadata & Content 21

Other IDs used within the DiVA Within the documents to identify (as pointers to) schemas name authorities authorized names (person name, institutional name), geographical places and other registries and entries in those registries. DiVA Document Format supports the concept generically through Identifier elements! Currently no broadly agreed upon recommendations in the many fields

DiVA Document Format Identifier component = identifier agnostic The identifier name is specified in a property element. Currently valid identifiers are internal, isbn, issn, local, uri, iso639-1, iso3166-1

Comprehensive identifiers for the document. Identifiers specified here belong to all manifestations. The property internal is used to link this document to other external descriptions. The value with the property uri contains for example the URN:NBN identifier of the document.

implemented Identifiers for the serial publication. The property issn is used for the ISSN identifier. The property internal is used to link this serial publication to a more detailed external description.

Container element for organisation identifiers. The property internal is used to link the name of organisation to a more detailed external description. Identifiers can for example link the organisation to an authority data register (identifier name not implemented yet). partly implemented

Container element for person identifiers. Identifiers can be used to link the person to an authority data register (identifier name not implemented). not implemented

Archiving workflow to the National Library Infrastructure Local producer Central archive Solutions and methods for addressing and identifying the resources Methods for transmission of data (information packages) (Temporary) File format registry

Infrastructure Consumers metadata local services, Union Catalogue, OAI-based services.. URN:NBN Local archive metadata Resolution Service (university, other) Information Packages Y Available at local a.? N Producers Format registry Metadata & PI Archive (documents and metadata)

Infrastructure/producer Local producer Follows recommendations on: Metadata Storage formats Persistent Identifiers Organization of the local archive Implements solutions and routines for storage of the data and transmission of the data to the central archive

Infrastructure/archive Central Archive sets up requirements for the producer regarding quality of the data delivered to the archive provides quality control of the delivered package at ingest event

Infrastructure Methods for addressing and identifying resources provides conditions for long-term access Primary URN:NBN URN:NBN resolution service Secondary identifiers (e.g., Handle, DOI, ARK)

Infrastructure Transmission of data (information packages) Provides guarantees for access in the long term Verifiable agreement Quality control on both the producer side and on the central archive side

Infrastructure (Temporary) File format registry Provides additional information about formats submitted to the archive Methods Persistent identifiers for format information Populate format metadata on ingest Using format registry information increases probability of longevity of the archived documents by providing more technical metadata in uniform form Relation to other format registry projects

not yet implemented Pointer to format registry/format dictionary Identifiers for the manifestation. Here can identifiers pointing to a file format register/dictionary can be specified (not yet implemented).

Next steps On the national (Swedish) level: 2003-2005 project Coordination of electronic academic publishing at Swedish Universities. Subproject Long-term access and preservation with goal to develop and implement an generalized archiving workflow between a local repository and a national archive focusing on the variety of publishing platforms and systems On the Nordic level: Additional development of the resolution service is being undertaken as a cooperative effort amongst the Nordic countries within a by NORDINFO granted project Access to documents now and in the future..further development of the URN:NBN resolution service as international cooperative effort

DiVA project experience Conclusions: Low-cost system that supports an semi automated workflow from the point of submission works well Automated creation of metadata Workflow to the National Library Archive Using harvesting model for updates to the mapping registry makes the management of URN:NBN simple, reliable and economic Long-term access to institutional research can be assured with cooperation from national libraries

but is the international cooperation within URN:NBN community enough? No! There is a need for a global resolution mechanism which can accommodate different types of identifiers!

More information Electronic Publishing Centre, Uppsala University http://publications.uu.se/epcentre/ DiVA Academic Archive Online http://www.diva-portal.org/about.xsql SVEP (Coordination of electronic publishing at Swedish universities) http://www.svep-projekt.se/english/ NORDINFO granted project Access to documents now and in the future http://epc.ub.uu.se/niwiki/pmwiki.php/main/homepage