The Storage Networking Industry Association (SNIA) Data Preservation and Metadata Projects. Bob Rogers, Application Matrix

Similar documents
extensible Access Method (XAM) - a new fixed content API Mark A Carlson, SNIA Technical Council, Sun Microsystems, Inc.

Storage Industry Resource Domain Model

The Need for a Terminology Bridge. May 2009

LTR TWG & the Cloud PRESENTATION TITLE GOES HERE

SNIA 100 Year Archive Survey 2017 Thomas Rivera, CISSP

Preservation DataStores: An architecture for Preservation Aware Storage

How to Participate in SNIA Standards & Software Development. Arnold Jones SNIA Technical Council Managing Director

XAM over OSD. Sami Iren Seagate Technology

Interoperable Cloud Storage with the CDMI Standard. Mark Carlson, SNIA TC and Oracle Chair, SNIA Cloud Storage TWG

EMC Centera CentraStar/SDK Compatibility with Centera ISV Applications

Session Two: OAIS Model & Digital Curation Lifecycle Model

An overview of the OAIS and Representation Information

Format Technologies for the Long Term Retention of Big Data

Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data

Preservation DataStores: Architecture for Preservation Aware Storage

CoSA & Preservica Practical Digital Preservation 2015/16. Practical OAIS Digital Preservation Online Workshop Module 2

Cloud Archive and Long Term Preservation Challenges and Best Practices

Interoperable Cloud Storage with the CDMI Standard. Mark Carlson, SNIA TC and Oracle Co-Chair, SNIA Cloud Storage TWG

The Secret Sauce of ILM

Rocket Software Rocket Arkivio

Format Technologies for the Long Term Retention of Big Data. Roger Cummings, Antesignanus Co-Author: Simona Rabinovici-Cohen, IBM Research Haifa

Trends in Data Protection CDP and VTL

Format Technologies for the Long Term Retention of Big Data

Research Data Repository Interoperability Primer

SNIA/DMTF Work Register. Version 1.4

Agenda. Bibliography

Storage & Data Management Practices for the Long Term. Raymond A. Clarke Enterprise Storage Consultant Data Management Group Sun Microsystems, Inc.

More than a Lifetime of

Optim. Optim Solutions for Data Governance. R. Kudžma Information management technical sales

SNIA/DMTF Work Register. Version 1.3

The OAIS Reference Model: current implementations

ISO TC46/SC11 Archives/records management

Record Lifecycle Modeling Tasks

OSD-2 & XAM. Erik Riedel Seagate Technology May 2007

Promoting Open Standards for Digital Repository. case study examples and challenges

IBM TS4300 Tape Library

IBM Spectrum Protect Version Introduction to Data Protection Solutions IBM

CCSDS STANDARDS A Reference Model for an Open Archival Information System (OAIS)

Data Deduplication Methods for Achieving Data Efficiency

IBM Tivoli Storage Manager Version Introduction to Data Protection Solutions IBM

As storage networking technology

Assessment of product against OAIS compliance requirements

DATA Act Information Model Schema (DAIMS) Architecture. U.S. Department of the Treasury

Introduction to Digital Archiving and IBM archive storage options

University of British Columbia Library. Persistent Digital Collections Implementation Plan. Final project report Summary version

Facing an SSS Decision? SNIA Efforts to Evaluate SSS Performance. Ray Lucchesi Silverton Consulting, Inc.

Digital Preservation with Special Reference to the Open Archival Information System (OAIS) Reference Model: An Overview

Data Classification. The Foundation for Intelligent Information Management. Infostructure Associates Leveraging Information for Organizational Success

Content Addressed Storage (CAS)

Trends in Data Protection and Restoration Technologies. Jason Iehl, NetApp

OAIS: What is it and Where is it Going?

A structured workflow for implementing digital archiving standards in an organisation

The digital preservation technological context

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Self-contained Information Retention Format For Future Semantic Interoperability

Transfers and Preservation of E-archives at the National Archives of Sweden

Archive exchange Format AXF

einfrastructures Concertation Event

Metadata Management in the FAO Statistics Division (ESS) Overview of the FAOSTAT / CountrySTAT approach by Julia Stone

Aspects of Information Lifecycle Management

Reducing Consumer Uncertainty

The Ohio State University's Knowledge Bank: An Institutional Repository in Practice

GETTING STARTED WITH DIGITAL COMMONWEALTH

Health Information Exchange Content Model Architecture Building Block HISO

Accelerate GDPR compliance with the Microsoft Cloud

An Introduction to Digital Preservation

Interoperability & Archives in the European Commission

BlackPearl Customer Created Clients Using Free & Open Source Tools

Best Practices in Designing Cloud Storage based Archival solution Sreenidhi Iyangar & Jim Rice EMC Corporation

Selecting the Right Method

Presentation to Canadian Metadata Forum September 20, 2003

Cloud Storage Securing CDMI. Eric A. Hibbard, CISSP, CISA, ISSAP, ISSMP, ISSEP, SCSE Hitachi Data Systems

DRS Update. HL Digital Preservation Services & Library Technology Services Created 2/2017, Updated 4/2017

Driving Interoperability with CMIS

XML information Packaging Standards for Archives

Copyright 2011, TeraMedica, Inc.

Digital s ideal partner

Managing Born- Digital Documents.

The Making of PDF/A. 1st Intl. PDF/A Conference, Amsterdam Stephen P. Levenson. United States Federal Judiciary Washington DC USA

Long-term preservation for INSPIRE: a metadata framework and geo-portal implementation

lnteroperability of Standards to Support Application Integration

Addendum 1. APPENDIX B - Technical Information / Requirements Records Management Solution Capabilities. Questions. Number

Metadata and Encoding Standards for Digital Initiatives: An Introduction

National Data Sharing and Accessibility Policy-2012 (NDSAP-2012)

ISO INTERNATIONAL STANDARD. Information and documentation Managing metadata for records Part 2: Conceptual and implementation issues

An Introduction to GPFS

Different Aspects of Digital Preservation

Digital Preservation DMFUG 2017

NEW MICROSOFT FEATURES THAT WILL AFFECT EDISCOVERY IN THE FUTURE

ISO ARCHIVE STANDARDS: STATUS REPORT

[your name] [your job title]

Conch Appendix: Discovery Questionnaire. Questionnaire Summary

Getting Bits off Disks: Using open source tools to stabilize and prepare born-digital materials for long-term preservation

PDF/arkivering PDF/A. Per Haslev Adobe Systems Danmark Adobe Systems Incorporated. All Rights Reserved.

SGI Overview. HPC User Forum Dearborn, Michigan September 17 th, 2012

ADVANCED DEDUPLICATION CONCEPTS. Thomas Rivera, BlueArc Gene Nagle, Exar

Metadata Workshop 3 March 2006 Part 1

Document Title Ingest Guide for University Electronic Records

CDMI Support to Object Storage in Cloud K.M. Padmavathy Wipro Technologies

Scalable Platform Management Forum. Forum Status 10/30/2014

Transcription:

The Storage Networking Industry Association (SNIA) Data Preservation and Metadata Projects Bob Rogers, Application Matrix

Overview The Self Contained Information Retention Format Rationale & Objectives Requirements & Use Cases SNIA s extensible Access Method (XAM) The metadata issue The SNIA Resource Domain Model Applying policies to information management 2

What is the Self-Contained Information Retention Format (SIRF)? A logical container format for the storage subsystem appropriate for the long-term storage of digital information A logical data format of a mountable unit e.g. a filesystem, a block device, a stream device, an object store, a tape, etc. Includes a cluster of interpretable preservation objects that can be understood in the future Self-describing can be interpreted by different systems Self-contained all data needed for the preservation objects interpretation is contained within the preservation objects cluster If a mountable unit is damaged or lost, the effect is contained the information in the other mountable units is still valid! Need to define how and when external references are supported A work effort by SNIA s Long Term Retention Technical Work Group 3

Longest Retention Requirement Longest Retention Requirement Long-term needs are real: 68 % over 100 Years 83% over 50 Years Requirements vary by organization type, information type, and compliance rules/risk Leading Organizations: Education, gov., IT services, manufacturing, finance, insurance, pharma, & health-care Leading Info-Types: Source-files, government, history, customer, & database records http://www.snia.org/forums/dmf/registration 4% >10 Yrs beyond Project Life of Company 2% 1% Life of Product > 5 Yr >10 Yr >20 Yr >25 Yr 4% 4% 2% 3% 13% >50 Yr 15% >100 Yr 53% Permanent 0% 10% 20% 30% 40% 50% 60% Percent of Responses Source: 100 Year Archive Requirements Survey, SNIA 2007 N=104 4 4

SIRF Objectives Facilitate transparent logical and physical migration and movement in order to support long term preservation Media, subsystem or bitstream movement remove the mountable unit from system A and put it at system B. Transparent system A is not involved. All the information needed for system B to understand the mountable unit is self-described and selfcontained within the mountable unit. Long term 15 years and above (according to 100 years archive requirements survey). Preservation sustain the understandability and usability of the data and not just the bits. Considering multiple implementations of SIRF to utilize: the Open Archival Information System (OAIS) ISO standard SNIA s extensible Access Method (XAM) 5

Problem SIRF is Addressing Without SIRF Application A With SIRF Application B Application A Application B SIRF Interface Interface Preservation Retention Storage Subsystem Data Type Stop Data Type Preservation Retention Storage Subsystem Data Type Data Type Cannot move cluster of preservation objects between systems by itself Only the original application who wrote the preservation objects can read and interpret them Utilize export and import processes Preservation Objects cannot be sustained over the long-term Interface Preservation Retention Storage Subsystem SIRF SIRF Interface PASS SIRF Preservation Retention Storage Subsystem SIRF SIRF Can move cluster of preservation objects between systems by itself Any SIRF compliant application can read and interpret the preservation objects No need for export and import processes Preservation Objects can survive longer 6

Positioning SIRF in the Digital Information Layers Application Information Layer Get the interpretation of the data Example: render a Word 3.0 document Diverse applications and data types Preservation Information Layer Get the basic preservation objects Example: interprets OAIS AIPs Can it be standardized? SIRF Bit Layer Get the bits from the media Example: Linear Tape-Open (LTO) for tapes May depend on media type 7

What is a Preservation Object? The raw data to be preserved plus additional metadata needed to enable the preservation of the raw data for decades to come The preservation object may be subject to physical and logical migrations The preservation object may be dynamic and change over time The updated preservation object is a new version of the original preservation object. An example of a preservation object is OAIS Archival Information Package (AIP) An AIP includes recursive representation information that enables future interpretation of the raw data 8

SIRF Use Cases and Functional Requirements 9

SIRF Initial Requirements General Media agnostic Tape, disk, future media Direct random access and serial access Support mixture of storage technologies Required by: all use cases Vendor and Platform agnostic Required by: all use cases Support different standard storage technologies and interfaces e.g. NFS, CIFS, XAM Required by: subset of use cases Extensible Support additional information which may be added in the future Required by: subset of use cases 10

SIRF Initial Requirements Format Self-describing The amount of required information is small and can be acquired in stages Interpretable by both humans and machines Ability to do offline inspection Support self-contained data Include means to represent internal links and cross references Support different SIRF formats preserved in an external registry Interoperability Ability to migrate data between different systems without loss of information data should be interpretable after migrations Can be interpreted in the future Support methodology for verification of completeness and correctness 11

SIRF Initial Requirements Preservation Object Data Model Support different data models for preservation objects Support different object data models at one time Support complex data structures like collections of objects Support migrating objects from one data model to an alternative data model Can handle any proper data format for the raw data No restrictions on file formats Enable retention of multiple versions of the original preservation object with their relations References from new to existing preservation objects of the same version series There must be a persistent identifier for each preservation object Include additional external identifiers 12

SIRF Initial Requirements - Performance Performance Need to have good performance even for data that includes text and binaries Support large objects e.g. web archiving objects, database archiving objects Do not require complete scanning for access Required by: all use cases Enable parallel data migration Enable parallel reads and writes Required by: all use cases 13

SNIA s extensible Access Method (XAM) 14

A New Fixed Content API XAM Provides: Interoperability: Applications can work with any XAM conformant storage system; information can be migrated and shared Compliance: Integrated record retention and disposition metadata, ILM Practices: Framework for classification, policy, and implementation Migration: Ability to automate migration process to maintain long-term readability Discovery: Application-independent structured discovery avoids application obsolescence. XAM is a SNIA Architecture The XAM Architecture spec defines the normative semantics of the API for use by applications and implementation by storage systems XAM is an Application Programming Interface (API) The XAM Java and C API specs define bindings of the XAM Architecture to the Java and C Languages XAM is SNIA Software The XAM SDK provides a common library and reference implementation to promote widespread adoption of the standard 15

XAM Interface First interface to standardize system metadata for retention of data Implements basic capability to Read and Write Data (through Xstreams) Locates any XSet with a query or by supplying the XUID Allows Metadata to be added to the data and keeps both in an XSet object Uses and produces system metadata for each Xset For example Access and Commit times (Storage System Metadata) But it also uniquely specifies Data System Metadata for Retention Data Services XAM User metadata is un-interpretable by the system, but stored with the other data and is available for use in queries Given this we can see that XAM is a data storage interface that is used by both Storage and Data Services (functions) 16

The Resource Domain Model This model shows the logical layering of the different domains and the role of policies for each domain. The services in each domain play a different role, but leverage common, standard interfaces 17

Resource Domains Resource Domains are a way of classifying services into specific areas that each deal with a different aspect of the problem Data An information domain application creates data and associates MetaData with it MetaData Data Storage Interface Data User MetaData MetaData aware Data Services interpret Data System MetaData as the requirements for it s lifecycle and implement policies for retention, placement, lifecycle, etc. System MetaData Data Storage Interface Data User MetaData Certain Data Storage Interfaces can accommodate both Data and MetaData ( XAM, Filesystems with extended attributes) System MetaData Other Data Storage interfaces (based on blocks or objects) provide virtualized Containers for the Data bits and the management of those containers Storage services are employed to meet those requirements at this point in the data s lifecycle, however the storage services are unaware of the data s requirements 18

Summary Long Term Retention Technical Work Group was launched this year Preliminary work focuses on: Terminology SIRF Requirements SIRF Use Cases XAM SDK released this year Storage Resource Domain Model in review Contributors are welcome 19

Q&A / Feedback The DMF would like to thank the following individuals for their contributions to the development of this presentation: Michael Peterson Gary Zasman Simona Cohen Mark Carlson 20

For More Information SNIA Long Term Retention Technical Work Group http://www.snia.org/apps/org/workgroup/lttwg/ XAM Home http://www.snia.org/xam XAM Technical Work Group http://www.snia.org/apps/org/workgroup/fcastwg/ XAM Software Development Kit (SDK) TWG http://www.snia.org/apps/org/workgroup/xamsdktwg/ SNIA Community Forum http://community.snia-dmf.org/index.php 21