Digital Curation and Preservation: Defining the Research Agenda for the Next Decade

Similar documents
Mitigating Risk of Data Loss in Preservation Environments

A Simple Mass Storage System for the SRB Data Grid

Knowledge-based Grids

Data Grids, Digital Libraries, and Persistent Archives

The International Journal of Digital Curation Issue 1, Volume

Implementing Trusted Digital Repositories

Data Grid Services: The Storage Resource Broker. Andrew A. Chien CSE 225, Spring 2004 May 26, Administrivia

Managing Large Scale Data for Earthquake Simulations

Transcontinental Persistent Archive Prototype

Scalable, Reliable Marshalling and Organization of Distributed Large Scale Data Onto Enterprise Storage Environments *

Cheshire 3 Framework White Paper: Implementing Support for Digital Repositories in a Data Grid Environment

Leveraging High Performance Computing Infrastructure for Trusted Digital Preservation

DSpace Fedora. Eprints Greenstone. Handle System

The International Journal of Digital Curation Issue 1, Volume

IRODS: the Integrated Rule- Oriented Data-Management System

Managing Large Distributed Data Sets using the Storage Resource Broker

Building the Archives of the Future: Self-Describing Records

DATA MANAGEMENT SYSTEMS FOR SCIENTIFIC APPLICATIONS

Data Sharing with Storage Resource Broker Enabling Collaboration in Complex Distributed Environments. White Paper

Policy Based Distributed Data Management Systems

Preserving Electronic Mailing Lists as Scholarly Resources: The H-Net Archives

SRB Logical Structure

An overview of the OAIS and Representation Information

Electronic Records Archives: Philadelphia Federal Executive Board

Metadata and Archival. System (MADRAS) Anne Gilliland Department of Information Studies University of California, Los Angeles

SAN, HPSS, Sam-QFS, and GPFS technology in use at SDSC

Policy-Driven Repository Interoperability: Enabling Integration Patterns for irods and Fedora

Building a Reference Implementation for Long-Term Preservation

This document contains information on fixed and known limitations for Test Data Management.

Constraint-based Knowledge Systems for Grids, Digital Libraries, and Persistent Archives: Final Report May 2007

Medici for Digital Cultural Heritage Libraries. George Tsouloupas, PhD The LinkSCEEM Project

Ing. José A. Mejía Villar M.Sc. Computing Center of the Alfred Wegener Institute for Polar and Marine Research

Promoting Open Standards for Digital Repository. case study examples and challenges

Building a Digital Repository on a Shoestring Budget

Collection-Based Persistent Digital Archives - Part 1

Digital Preservation DMFUG 2017

Enabling Interaction and Quality in a Distributed Data DRIS

An Introduction to Digital Preservation

Fedora. CS 431 April 17, 2006 Carl Lagoze Cornell University. Acknowledgements: Sandy Payette (Cornell)

I data set della ricerca ed il progetto EUDAT

Scaling a Global File System to the Greatest Possible Extent, Performance, Capacity, and Number of Users

T-Systems Solutions for Research. Data Management and Security. T-Systems Solutions for Research GmbH

Metadata Ingestion and Processinng

Distributed Data Management with Storage Resource Broker in the UK

Repository Software Survey, March 2009

Sessions 3/4: Member Node Breakouts. John Cobb Matt Jones Laura Moyers 7 July 2013 DataONE Users Group

The NCAR Community Data Portal

Infrastructure. San Diego Supercomputer Center Funding: NSF ITR / NARA

Requirements for data catalogues within facilities

Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013

Scientific Workflow Tools. Daniel Crawl and Ilkay Altintas San Diego Supercomputer Center UC San Diego

A High-Level Distributed Execution Framework for Scientific Workflows

Introduction to Federico 2.0 and Fedora Commons

Session Two: OAIS Model & Digital Curation Lifecycle Model

BlackPearl Customer Created Clients Using Free & Open Source Tools

White Paper: National Data Infrastructure for Earth System Science

Metadata and Encoding Standards for Digital Initiatives: An Introduction

Persistent Identifiers for Audiovisual Archives and Cultural Heritage

Internet2 Meeting September 2005

Kepler and Grid Systems -- Early Efforts --

Fedora Relationships and Information Network Overlays. CS 431 April 19, 2006 Carl Lagoze Cornell University

Creating synergy through private cloud

The Advantages of PostgreSQL

Semantic Extensions to Defuddle: Inserting GRDDL into XML

The OAIS Reference Model: current implementations

Open source software for building open access repositories. Imma Subirats Coll knowledge and information management officer FAO of the United Nations

Sky Computing on FutureGrid and Grid 5000 with Nimbus. Pierre Riteau Université de Rennes 1, IRISA INRIA Rennes Bretagne Atlantique Rennes, France

Surveying the Digital Library Landscape

MetaData Management Control of Distributed Digital Objects using irods. Venkata Raviteja Vutukuri

EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture. Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members

Introduction to The Storage Resource Broker

EarthCube and Cyberinfrastructure for the Earth Sciences: Lessons and Perspective from OpenTopography

Transferring vital e-records to a trusted digital repository in Catalan public universities (the iarxiu platform)

Openness, Growth, Evolution, and Closure in Archival Information Systems

Grid Programming: Concepts and Challenges. Michael Rokitka CSE510B 10/2007

From business need to implementation Design the right information solution

DIGITAL ARCHIVES & PRESERVATION SYSTEMS

EUDAT Data Services & Tools for Researchers and Communities. Dr. Per Öster Director, Research Infrastructures CSC IT Center for Science Ltd

IBM Tivoli Identity Manager V5.1 Fundamentals

FLAT: A CLARIN-compatible repository solution based on Fedora Commons

Digital Preservation Workshop

Digital Preservation Workshop

The Canadian Information Network for Research in the Social Sciences and Humanities.

Planning the Future with Planets The Planets Interoperability Framework. Presented by Ross King Austrian Research Centers GmbH ARC

Index Introduction Setting up an account Searching and accessing Download Advanced features

Assessment of product against OAIS compliance requirements

Large Scale Sky Computing Applications with Nimbus

PROCESS HISTORY METADATA PEGGY GRIESINGER NATIONAL DIGITAL STEWARDSHIP RESIDENT MUSEUM OF MODERN ART DECEMBER 4 T H, 2014

Web Services in Cincom VisualWorks. WHITE PAPER Cincom In-depth Analysis and Review

Harnessing the Data Deluge

Uniform Resource Locator Wide Area Network World Climate Research Programme Coupled Model Intercomparison

WePreserve Conference October Nice, France Emulation: Bridging the Past to the Future without Altering the Object

DALA Project: Digital Archive System for Long Term Access

- C3Grid Stephan Kindermann, DKRZ. Martina Stockhause, MPI-M C3-Team

ACET s e-research Activities

Presented by Dr Joanne Evans, Centre for Organisational and Social informatics Faculty of IT, Monash University Designing for interoperability

SQL Developer Oracle Migration Workbench Taking Database Migration to the next level

Knowledge Discovery Services and Tools on Grids

Kepler Scientific Workflow and Climate Modeling

2/12/11. Addendum (different syntax, similar ideas): XML, JSON, Motivation: Why Scientific Workflows? Scientific Workflows

Transcription:

Storage Resource Broker Digital Curation and Preservation: Defining the Research Agenda for the Next Decade Reagan W. Moore moore@sdsc.edu http://www.sdsc.edu/srb

Background NARA research prototype persistent archive (SDSC, U Md, NARA) NSF National Science Digital Library persistent archive NHPRC Persistent Archive Testbed NSF/Library of Congress Digital Archive California Digital Library - Digital Preservation Repository UCSD Libraries image archive InterPARES VanMap - GIS preservation

Storage Resource Broker 3.3.1 Application C Library, Java Unix Shell Linux I/O C++ NT Browser, Kepler Actors DLL / Python, Perl, Windows HTTP, DSpace, OpenDAP, GridFTP OAI, WSDL, (WSRF) Federation Management Consistency & Metadata Management / Authorization, Authentication, Audit Logical Name Space Latency Management Data Transport Metadata Transport Database Abstraction Storage Repository Abstraction Databases - DB2, Oracle, Sybase, Postgres, mysql, Informix Archives - Tape, Sam-QFS, DMF, ORB HPSS, ADSM, UniTree, ADS File Systems Unix, NT, Mac OSX Databases - DB2, Oracle, Sybase, Postgres, mysql, Informix

Preservation Extract record from its creation environment Authenticity Assertions by record creator Static provenance information Import record into the preservation environment Integrity Assertions by archivist Dynamic information context, mapping from preservation environment to external changing world Infrastructure independence Assertion that the preservation environment has no dependencies upon a particular proprietary product, format, or protocol

State of the Art Grid - workflow virtualization Support execution of jobs (processes) across multiple compute servers Data grid - data virtualization Manage properties of a shared collection that is distributed across multiple storage servers Trust virtualization - manage authentication, authorization Semantic grid - information virtualization Reason across inferred attributes from multiple collections.

Preservation Environment Reference Model Preservation management virtualization Preservation trust virtualization Preservation workflow virtualization Preservation data virtualization Preservation knowledge virtualization

Management Virtualization Characterization of management policies independently of the implementation Validation policies Lifetime policies Access policies Federation policies Presentation policies Consistency policies

Trust Virtualization Management of ownership of records independently of storage systems Collection owned data At each remote storage system, an account ID is created under which the preservation environment stores files Management of roles for permitted operations Management of authentication of users Management of authorization

Workflow Virtualization Management of execution of preservation processes across distributed resources Management of execution state Management of relationships between jobs Management of interactions with remote schedulers

Data Virtualization Access Method Access Operations Data Grid Storage Operations Storage Protocol Map from the operations used by the access method to a standard set of operations used to interact with the storage system Storage System

Data Virtualization Data Access Methods (C library, Unix, Web Browser) Data Collection Storage Repository Storage location User name File name File context (creation date, ) Access constraints Data Grid Logical resource name space Logical user name space Logical file name space Logical context (metadata) Control/consistency constraints Data is organized as a shared collection

Federation Between Data Grids Data Access Methods (Web Browser, DSpace, OAI-PMH) Data Collection A Data Grid Logical resource name space Logical user name space Logical file name space Logical context (metadata) Data Collection B Data Grid Logical resource name space Logical user name space Logical file name space Logical context (metadata) Control/consistency constraints Control/consistency constraints Access controls and consistency constraints on cross registration of digital entities

Knowledge Virtualization Management of relationships between record components or between records independently of the storage systems Characterization of hierarchical metadata Logical relationships - OWL Spatial/structural relationships Temporal/procedural relationships Functional relationships

Research Persistent objects Characterize the structure of digital entities Migrate the characterization forward in time Parse the digital entity based on the characterization DFDL / OpenOffice / Multivalent Browser Separate behaviors from parsing Preservation workflow systems Management virtualization Automate application of preservation policies Automate migration between preservation environments

Research Preservation environment validation Consistency checking of semantics and syntax of authenticity and integrity metadata Validation of assertions made about distributed environment

Reference Models OAIS - record handling Preservation environment infrastructure Preservation management policies Preservation utilization properties Curation Implied knowledge

Evaluation Metric Migrate records into an alternate preservation environment while maintaining authenticity and integrity Generic infrastructure Community standards for semantics Community standards for formats Community standards for services

Synergy in Data Management Common requirements across: Data grids Digital libraries Persistent archives Collection building Analysis pipelines Real-time sensor streams

For More Information Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu http://www.sdsc.edu/srb/