Promoting Open Standards for Digital Repository. case study examples and challenges

Similar documents
An overview of the OAIS and Representation Information

The OAIS Reference Model: current implementations

Conducting a Self-Assessment of a Long-Term Archive for Interdisciplinary Scientific Data as a Trustworthy Digital Repository

Session Two: OAIS Model & Digital Curation Lifecycle Model

Trusted Digital Repositories. A systems approach to determining trustworthiness using DRAMBORA

The International Journal of Digital Curation Issue 1, Volume

Digital repositories as research infrastructure: a UK perspective

Certification Efforts at Nestor Working Group and cooperation with Certification Efforts at RLG/OCLC to become an international ISO standard

Protection of the National Cultural Heritage in Austria

DSpace Fedora. Eprints Greenstone. Handle System

Digital Preservation with Special Reference to the Open Archival Information System (OAIS) Reference Model: An Overview

DIGITAL ARCHIVES & PRESERVATION SYSTEMS

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

The International Journal of Digital Curation Issue 1, Volume

Agenda. Bibliography

University of British Columbia Library. Persistent Digital Collections Implementation Plan. Final project report Summary version

Building a Digital Repository on a Shoestring Budget

Protecting Future Access Now Models for Preserving Locally Created Content

Introduction to. Digital Curation Workshop. March 14, 2013 SFU Wosk Centre for Dialogue Vancouver, BC

Copyright 2008, Paul Conway.

Different Aspects of Digital Preservation

Draft Digital Preservation Policy for IGNCA. Dr. Aditya Tripathi Banaras Hindu University Varanasi

glite Grid Services Overview

Metadata and Encoding Standards for Digital Initiatives: An Introduction

Data Access and Data Management

Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013

Introduction to Digital Preservation. Danielle Mericle University of Oregon

Susan Thomas, Project Manager. An overview of the project. Wellcome Library, 10 October

A Model for Managing Digital Pictures of the National Archives of Iran Based on the Open Archival Information System Reference Model

Implementing Trusted Digital Repositories

The Storage Networking Industry Association (SNIA) Data Preservation and Metadata Projects. Bob Rogers, Application Matrix

Richard Marciano Alexandra Chassanoff David Pcolar Bing Zhu Chien-Yi Hu. March 24, 2010

DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM

DRI: Dr Aileen O Carroll Policy Manager Digital Repository of Ireland Royal Irish Academy

The glite middleware. Presented by John White EGEE-II JRA1 Dep. Manager On behalf of JRA1 Enabling Grids for E-sciencE

Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008

Digital Curation and Preservation: Defining the Research Agenda for the Next Decade

Juliusz Pukacki OGF25 - Grid technologies in e-health Catania, 2-6 March 2009

Scalable, Reliable Marshalling and Organization of Distributed Large Scale Data Onto Enterprise Storage Environments *

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Digital Preservation DMFUG 2017

An Introduction to Digital Preservation

Building a Digital Library Software

Scientific data management

einfrastructures Concertation Event

Trust and Certification: the case for Trustworthy Digital Repositories. RDA Europe webinar, 14 February 2017 Ingrid Dillo, DANS, The Netherlands

Long-term digital preservation of UNSWorks

From production to preservation to access to use: OAIS, TDR, and the FDLP OAIS TRAC / TDR

Data Curation Handbook Steps

OAIS: What is it and Where is it Going?

Preservation Standards (& Specifications) (&& Best Practices)

Institutional Repository using DSpace. Yatrik Patel Scientist D (CS)

NorStore. a national infrastructure for scientific data. Andreas O Jaunsen UNINETT Sigma as

The ESA CASPAR Scientific Testbed and the combined approach with GENESI-DR

Review of OSI Report Developing the UK s e-infrastructure for Science and Innovation

extensible Access Method (XAM) - a new fixed content API Mark A Carlson, SNIA Technical Council, Sun Microsystems, Inc.

Improving a Trustworthy Data Repository with ISO 16363

Developing a Research Data Policy

Digital Preservation Standards Using ISO for assessment

e-infrastructure: objectives and strategy in FP7

RODA 3 LONG-TERM DIGITAL PRESERVATION CHARACTERISTICS AND TECHNICAL REQUIREMENTS

Distributed Data Management on the Grid. Mario Lassnig

GEOSS Data Management Principles: Importance and Implementation

Distributed Data Management with Storage Resource Broker in the UK

Using EUDAT services to replicate, store, share, and find cultural heritage data

PROCESS HISTORY METADATA PEGGY GRIESINGER NATIONAL DIGITAL STEWARDSHIP RESIDENT MUSEUM OF MODERN ART DECEMBER 4 T H, 2014

Grid Computing Middleware. Definitions & functions Middleware components Globus glite

CHIPP Phoenix Cluster Inauguration

Applied Interoperability in Digital Preservation: Solutions from the E-ARK Project

Digital Curators: Who, What, & How

The e-depot in practice. Barbara Sierman Digital Preservation Officer Madrid,

Grid Computing. Lectured by: Dr. Pham Tran Vu Faculty of Computer and Engineering HCMC University of Technology

DCH-RP Trust-Building Report

Building for the Future

General Model of E-ARK Services

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Implementation of the Data Seal of Approval

Conch Appendix: Discovery Questionnaire. Questionnaire Summary

A Simple Mass Storage System for the SRB Data Grid

Certification. F. Genova (thanks to I. Dillo and Hervé L Hours)

Grid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms

<goals> 10/15/11% From production to preservation to access to use: OAIS, TDR, and the FDLP

EGEE and Interoperation

Assessment of product against OAIS compliance requirements

Understanding StoRM: from introduction to internals

PRESERVING DIGITAL OBJECTS

Giovanni Lamanna LAPP - Laboratoire d'annecy-le-vieux de Physique des Particules, Université de Savoie, CNRS/IN2P3, Annecy-le-Vieux, France

Selecting an Electronic Records Repository Platform

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

Some thoughts on the evolution of Grid and Cloud computing

LTR TWG & the Cloud PRESENTATION TITLE GOES HERE

Research Data Repository Interoperability Primer

The OpenAIREplus Project

The digital preservation technological context

The INFN Tier1. 1. INFN-CNAF, Italy

DALA Project: Digital Archive System for Long Term Access

Cheshire 3 Framework White Paper: Implementing Support for Digital Repositories in a Data Grid Environment

Long-term preservation for INSPIRE: a metadata framework and geo-portal implementation

DRI: Preservation Planning Case Study Getting Started in Digital Preservation Digital Preservation Coalition November 2013 Dublin, Ireland

Persistent identifiers, long-term access and the DiVA preservation strategy

Transcription:

Promoting Open Standards for Digital Repository Infrastructures: case study examples and challenges Flavia Donno CERN P. Fuhrmann, DESY, E. Ronchieri, INFN-CNAF OGF-Europe Community Outreach Seminar Digital Repositories - Interoperablility Using Grid Technologies OGF23-5 June 2008, Barcelona, Spain

Promoting Open Standards for Digital Repository Infrastructures: archival case study examples and challenges Flavia Donno CERN P. Fuhrmann, DESY, E. Ronchieri, INFN-CNAF OGF-Europe Community Outreach Seminar Digital Repositories - Interoperablility Using Grid Technologies OGF23-5 June 2008, Barcelona, Spain

Digital Repositories The term digital repository is used with different meanings depending on the scope, the environment, and the community using it. Digital Repository Infrastructure: a managed (distributed) storage system with content deposited on a personal, departmental, institutional, national, regional, or consortia basis, providing services to designated communities (Joint Information Systems Committee (JISC) Digital Repositories Program Digital Repository Review, February 2005) 3

Digital Repository Infrastructure: Importance of standards Modelling the international network of repositories independent of specific operation domains, and developing frameworks for interaction is very important. It enables crossover of technologies, sharing experience across domains, collaborative development work. External service providers can exploit the combined network of institutional repositories on a global scale: Cloud storage services (Nirvanix, Amazon S3) Indexing of metadata and content (Google Book Search) Data Management and Preservation services (RepliWeb/RMFT, file transformation and normalization, etc. ) Best practices, open architectures and (de-facto) standards enable integration, interoperability and collaborative work. 4

Digital Repository Infrastructure: Responsibilities Ingestion, access and presentation Data Acquisition Data Sharing Transformation services Security Authentication and Authorization Authenticity Privacy Vulnerability Data Management Metadata extraction, collection, manipulation Content and metadata management Workflow management Storage Management Space Management Copies Lifecycle Media Management Data Mirroring and Replication Preservation and Curation Extend the usable life of machinereadable computer files Protect data from media failure, physical loss, and obsolescence. 5

Open Archival Information System Model (ISO) It is a popular reference Model. It allows for the definition of layered architectures identifying the main services Strong standardization efforts for Ingest, Access, Preservation, Administration Sustainability Data and Storage Management standardization and best practices in distributed environments and Grid can be improved 6

Relevant Projects CASPAR (EU-FP6) It is dedicated to preservation issues in repositories. Its goal is to search, implement, and disseminate innovative solutions for digital preservation based on the OAIS reference model DigitalPreservationEurope - Addresses the need to improve coordination, cooperation and consistency in current activities to secure effective preservation of digital materials, focusing on authenticity and auditing issues. SHERPA DP - It is dedicated to preservation issues in repositories, developing strategy and business models for moving institutional repositories from project funding to sustainability. JISC Digital Repositories Program - It support research on institutional repositories and digital preservation. Digital Curation Center - It is in the UK, and is developing auditing and certification processes for trusted repositories. RLG, NARA - Digital Repository Certification Task Force that has established general certification requirements applying to any type of digital repository. 7

Available standards for description and preservation OAIS - Open Archive Information Standard AIP - archival information package SIP - submission information package DIP - dissemination information package ISO standard - Producer-Archive Submission METS - Metadata Encoding Transmission Standard PREMIS Data Dictionary for Preservation Dublin Core Metadata Element Set - Its goal is the interoperability and broad applicability. MODS - MODS metadata can be mapped into Dublin Core metadata. 8

Available Frameworks with preservation/curation policies 9

Standards for Storage: SRM The Storage Resource Manager (SRM/OGF-GSM) is an interface definition and a middleware componentwhose function is to provide dynamic space allocation and file managementon shared storage components on the Grid. It offers storage system independent syntax and semantic It supports storage quality negotiation It allows for lifecycle management of storage resources and files It defines unique storage file identifiers in a Grid Authorization, authentication mechanisms It provides support for privacy and data authenticity It allows for the negotiation of file access/transfer protocols It supports optimized replication between remote storage systems 10

Available implementations dcache CASTOR Disk BeStMan Be r k e l e y DPM mysql DB Client User/application xrootd SRB (irods) BNL SLAC LBNL SDSC SINICA LBNL EGEE 11

SRM and the clouds SRM interfaces to business storage services are under development StoRM is an example It allows for easy and transparent integration of existing SRM based environment with business providers and for a dynamic expansion of existing infrastructures http://storm.forge.cnaf.infn.it/ 12

SRM-based DR Content Management Layer Storage Management Layer Base Layer Collection Manageme nt Service JDBC Content Manageme nt Service Notification Service Storage Management Service Properties Raw File Content & Relations Replication Management Service Cataloguing API LFC GFAL Archive Import Service Storage API SRM Any off-theshelf DBMS, e.g. MySQL Any SRM enabled SE, e.g. DPM Preservation policies enforced through replication Data curation will be addressed 13

Standards for Storage: NFSv4 The NFS 4 protocol is a standard since 2000. Implemented by major file (IBM/GPFS, SUN/Lustre, ) and operating systems including Linux and Windows Specialized storage systems also offer an NFS v4 implementation: dcache now, DPM in the future Requirements to an improved (inter)net protocol Improved access and good performance on the Internet Strong security, with security negotiation/delegation built into the protocol Support for metadata content Stateful sessions (lockd part of the protocol) Enhanced cross-platform interoperability Extensibility of the protocol 14

Attractive NFSv4 features Arbitrary subtrees of different hosts can be exported to clients which may build their own view. So parts of the file system structure may be moved to other hosts or be replicated for performance reasons. A Not Exported B host X Hiding subtrees Server may redirect to different host There might be two servers providing the same content. C Alternating host Y D host Z1 D' host Z2 15

Standards for Storage: SNIA/XAM Metadata and content management are normally separated XAM allows for the storage and management of content and related metadata information in storage devices and services XAM specifies how metadata is represented, but not the actual metadata field names and values. Further work is needed to standardize metadata names and allowed values for application domains like Email, Health, and Document Management. 16

SNIA/XAM at work Commercially Available Applications Records&Documents Disk Extender RIM4DB Photo Editor XAM Interface HP RISS EMC Centera Sun 17

Challenges Coordination of disciplinary specific initiatives to encourage the adoption of common practice for data archival, preservation, and curation Demonstration of federated specialized storage and computing infrastructures to support digital repositories. Requirements and needed interfaces Integration and interoperability of existing repository infrastructures, data exchange and accessibility Common operational procedures for networked digital repositories. Dissemination activities and reference fora for producers and administrators 18