Storage Industry Resource Domain Model

Similar documents
extensible Access Method (XAM) - a new fixed content API Mark A Carlson, SNIA Technical Council, Sun Microsystems, Inc.

The Storage Networking Industry Association (SNIA) Data Preservation and Metadata Projects. Bob Rogers, Application Matrix

Optimizing and Managing File Storage in Windows Environments

XAM over OSD. Sami Iren Seagate Technology

Software Defined Storage. Mark Carlson, Alan Yoder, Leah Schoeb, Don Deel, Carlos Pratt, Chris Lionetti, Doug Voigt

Interoperable Cloud Storage with the CDMI Standard. Mark Carlson, SNIA TC and Oracle Co-Chair, SNIA Cloud Storage TWG

Provisioning with SUSE Enterprise Storage. Nyers Gábor Trainer &

Executive Summary SOLE SOURCE JUSTIFICATION. Microsoft Integration

An Introduction to GPFS

How to Participate in SNIA Standards & Software Development. Arnold Jones SNIA Technical Council Managing Director

IBM Spectrum Protect Version Introduction to Data Protection Solutions IBM

Rio-2 Hybrid Backup Server

White paper ETERNUS CS800 Data Deduplication Background

Introduction 1.1 SERVER-CENTRIC IT ARCHITECTURE AND ITS LIMITATIONS

CDMI Support to Object Storage in Cloud K.M. Padmavathy Wipro Technologies

Copyright 2010 EMC Corporation. Do not Copy - All Rights Reserved.

RazorSafe 7-Series Remote Backup and NAS Support

Using the F5 ARX Solution for Automated Storage Tiering

IBM Tivoli Storage Manager Version Introduction to Data Protection Solutions IBM

Deduplication and Incremental Accelleration in Bacula with NetApp Technologies. Peter Buschman EMEA PS Consultant September 25th, 2012

OpenStack SwiftOnFile: User Identity for Cross Protocol Access Demystified Dean Hildebrand, Sasikanth Eda Sandeep Patil, Bill Owen IBM

The File Systems Evolution. Christian Bandulet, Sun Microsystems

Object storage platform How it can help? Martin Lenk, Specialist Senior Systems Engineer Unstructured Data Solution, Dell EMC

Information Lifecycle Management for Business Data. An Oracle White Paper September 2005

Trends in Data Protection and Restoration Technologies. Mike Fishman, EMC 2 Corporation

Interoperable Cloud Storage with the CDMI Standard. Mark Carlson, SNIA TC and Oracle Chair, SNIA Cloud Storage TWG

Trends in Data Protection. T. W. Lanzatella Distinguished Engineer Data Protection Technology Office

OSD-2 & XAM. Erik Riedel Seagate Technology May 2007

WHITE PAPER. DATA DEDUPLICATION BACKGROUND: A Technical White Paper

Object Storage: Redefining Bandwidth for Linux Clusters

TECHNICAL OVERVIEW OF NEW AND IMPROVED FEATURES OF EMC ISILON ONEFS 7.1.1

Multi-Cloud Storage: Addressing the Need for Portability and Interoperability

Infinite Volumes Management Guide

Data Governance Overview

Kubernetes Integration with Virtuozzo Storage

Optim. Optim Solutions for Data Governance. R. Kudžma Information management technical sales

Rocket Software Rocket Arkivio

Modeling Pattern Characteristics

COS 318: Operating Systems

IT ADMINISTRATOR TRAINING COURSE

Next Generation Storage for The Software-Defned World

Cloud Computing Concepts, Models, and Terminology

EMC ViPR SRM. Data Enrichment and Chargeback Guide. Version

Object-based Storage (OSD) Architecture and Systems

TSM Paper Replicating TSM

HPC File Systems and Storage. Irena Johnson University of Notre Dame Center for Research Computing

A Close-up Look at Potential Future Enhancements in Tivoli Storage Manager

Designing Database Solutions for Microsoft SQL Server (465)

Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES

Trends in Data Protection CDP and VTL

AUTOMATING IBM SPECTRUM SCALE CLUSTER BUILDS IN AWS PROOF OF CONCEPT

DELL EMC DATA DOMAIN EXTENDED RETENTION SOFTWARE

Veritas Scalable File Server (SFS) Solution Brief for Energy (Oil and Gas) Industry

More than a Lifetime of

SCALITY ZENKO. Freedom & control across Hybrid IT and Multi-Cloud

THE EMC ISILON STORY. Big Data In The Enterprise. Deya Bassiouni Isilon Regional Sales Manager Emerging Africa, Egypt & Lebanon.

Best Practices For Backup And Restore In Sql Server 2005

Considerations to Accurately Measure Solid State Storage Systems

Deploying Software Defined Storage for the Enterprise with Ceph. PRESENTATION TITLE GOES HERE Paul von Stamwitz Fujitsu

Trends in Data Protection and Restoration Technologies. Jason Iehl, NetApp

Technical Note. Dell/EMC Solutions for Microsoft SQL Server 2005 Always On Technologies. Abstract

As storage networking technology

Simplify WAN Service Discovery for Mac Users & Eliminate AppleTalk

Storage and File Hierarchy

SNIA Cloud Storage TWG

REFERENCE ARCHITECTURE. Rubrik and Nutanix

HPC Considerations for Scalable Multidiscipline CAE Applications on Conventional Linux Platforms. Author: Correspondence: ABSTRACT:

Achieving Network Storage Optimization, Security, and Compliance Using File Reporter

Storage on the Lunatic Fringe. Thomas M. Ruwart University of Minnesota Digital Technology Center Intelligent Storage Consortium

Database Centric Information Security. Speaker Name / Title

SMI-S Manage all the things!!! Chris Lionetti NetApp

LiteSpeed Fast Compression Quick Start Guide

What is database? Types and Examples

Samba and Ceph. Release the Kraken! David Disseldorp

StorageCraft OneXafe and Veeam 9.5

Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data

Microsoft SMB Looking Forward. Tom Talpey Microsoft

Copyright 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12

Managing and Accessing 500 Million Files (and More!)

Best Practices in Designing Cloud Storage based Archival solution Sreenidhi Iyangar & Jim Rice EMC Corporation

Changing Requirements for Distributed File Systems in Cloud Storage

Dell PowerVault DL Backup-to-Disk Appliance Powered by CommVault

Peer-to-Peer Provisioning

Storage & Data Management Practices for the Long Term. Raymond A. Clarke Enterprise Storage Consultant Data Management Group Sun Microsystems, Inc.

Chapter 1. Storage Concepts. CommVault Concepts & Design Strategies:

The Evolution of File Systems

NetApp AltaVault Cloud-Integrated Storage Appliances

Oracle Streams. An Oracle White Paper October 2002

EMC Documentum xdb. High-performance native XML database optimized for storing and querying large volumes of XML content

StorNext 3.0 Product Update: Server and Storage Virtualization with StorNext and VMware

Advanced iscsi Management April, 2008

DESIGNING DATABASE SOLUTIONS FOR MICROSOFT SQL SERVER CERTIFICATION QUESTIONS AND STUDY GUIDE

BackupAssist V4 vs V5

Digital Preservation at NARA

Cloud Archive and Long Term Preservation Challenges and Best Practices

The Need for a Terminology Bridge. May 2009

Cloud Data Management Interface (CDMI )

Veritas Access Enterprise Vault Solutions Guide

Preserving the World s Most Important Data. Yours. SYSTEMS AT-A-GLANCE: KEY FEATURES AND BENEFITS

Disk-to-Disk backup. customer experience. Harald Burose. Architect Hewlett-Packard

Transcription:

Storage Industry Resource Domain Model A Technical Proposal from the SNIA Technical Council

Topics Abstract Data Storage Interfaces Storage Resource Domain Data Resource Domain Information Resource Domain Management of Resources Definitions Resource Domain Model Mapping to existing products Future standards

Abstract Today s IT environment is composed of various products that are intended to store, protect, secure and make available the information used by businesses and business processes. These products encompass elements used in both the data path and control path between the user and the eventual location of that information. Standards exist and are emerging for interoperability between these elements, however, what is missing is a comprehensive description of where interoperability is needed and where standards can best be applied. This paper sets out a model of these elements that describes a logical view of their functions and capabilities using a descriptive taxonomy. The purpose of this model is to form a basis upon which industry efforts can be organized, needed standards identified and vendor products can be described by vendor independent terminology.

Data Storage Interfaces

What is a Data Storage interface? Interface can be an Application Programming Interface (API) or a network (channel) protocol (or both) The interface is to a device and/or software that implements one or more services To store and retrieve the data, among other functions We propose a model where any number of services from different domains can sit behind such an interface The purpose of the model is to describe and categorize these services The Application creating and using the data doesn t care (for the most part) about services per se But a model helps when we need to manage these services

XAM API: an example Data Storage Interface XAM is the first interface to standardize system metadata for retention of data XAM implements the basic capability to Read and Write Data (through Xstreams) XAM has the ability to locate any XSet with a query or by supplying the XUID XAM allows Metadata to be added to the data and keeps both in an XSet object XAM uses and produces system metadata for each XSet For example Access and Commit times (Storage System Metadata) XAM User metadata is uninterpretable by the system, but stored with the other data and is available for use in queries Given this we can see that XAM is a data storage interface that is used by both Storage and Data Services (functions) But it also uniquely specifies Data System Metadata for Retention Data Services

Data Storage Interface Standards Other standard data storage APIs have the ability to deal with metadata as well (POSIX filesystems) POSIX specifies standard system metadata as part of the data storage interface: File times, Permission (including ACLs), owner, group, etc. This metadata is maintained and used (interpreted) by the storage services that implement the API Thus we call it storage system metadata The functions that are controlled by this metadata govern the storing and retrieval of the data through the interface These functions are described in the abstract as storage services

Data vs. Control Path Data storage interfaces are used for a mixture of Data Path and Control Path functions Data Path functions are those which implement reading and writing data, locating (addressing) the data, and gaining access to the data Control Path functions are those which implement control over the underlying storage or data services Can be in-band or out-of-band Control Path examples: File System Metadata (permissions, owner, etc.) (in-band) IOCTLs (in a filesystem interface) (in-band) SCSI Mode Page commands (in-band) SMI-S (out-of-band)

Some example data storage interfaces Block Interfaces SCSI, ATA, IDE Local File Interfaces POSIX, NTFS Network File Interfaces NFS, CIFS, SMB2, Appletalk, Novell, AFS Object Based OSD, XAM Database JDBC, ODBC

Storage Resource Domain

Storage Resource Domain Rather than talk about storage devices, we abstract the storage functions of those devices and talk about storage services instead This allows us to categorize the different services in these devices and understand (and standardize) the different points of interoperability needed This categorization of services of a particular type is called a resource domain Any given device or subsystem can implement services from any of the resource domains In particular we are concerned with managing these resources interoperably

Layering of Storage Services The SNIA shared storage model shows a layering of storage services with associated data services

Metadata and Data Storage Interfaces Storage services may provide functions for metadata as part of the data storage interface. This is an important capability for managing Data Resources (as opposed to managing Storage Resources). The metadata may be managed by the storage service, managed by data services, or un-interpreted by either. System metadata that is managed by storage services are those properties of a data element that pertain to the primary functions of storing and retrieving the data. We call this storage system metadata, as it is used and managed by storage services. Other system and user metadata may be preserved on the basis of individual data elements, but is not interpreted by the storage services

Data Resource Domain

Data Resource Domain The data resource domain is the category of services (data services) that treat data absent of any context, but whose primary purpose is not to store and retrieve the data itself This is a useful categorization of services that requires a different view of how to manage these services The interfaces to the data services are also quite different from those of the storage services In fact, data services can be deployed totally transparent to the actual user of data and the consumer of the data storage interfaces Data Services manage data in some manner, adding value over and above simple storage and retrieval Historically data services are viewed as point products Examples include: backup/restore, archiving and security

Elements of a Data Service Interface Data services may, in fact, be a consumer of one or more data storage interfaces in order to add this value Backup software, for example, adds value to the data being stored in a disk based storage service by copying that data to another disk or tape based storage service and retrieving it when needed It does this function by consuming the appropriate data storage interfaces A key concept that data services understand is a quantization of data They can apply differentiated value to individual data elements and groups of data elements The data element can be a block, volume, file, file system or object Data services may be able to group data elements and treat all members of the group in the same way.

Namespace of Data Elements A namespace is a context for identifiers, and thus a Data Namespace is a context for the data element identifiers Data services understand the namespace for the data elements they work with While the Data Namespace is logically part of the data storage interface location concept, a Data Service may make use of that namespace in its functions of enhancing the value of the data Namespace virtualization allows Data Services to differentiate the value they provide to data elements The namespace is really just a convenient handle for the application (or user) to find the data Although the growth of data will lead to search as a primary means of locating your data

Metadata for Data Services Metadata available through the data storage interface may also be managed by data services This data service metadata can be used by data services to provide differentiated value to individual data elements The model or schema for data service metadata may be defined by each data service and may be standardized

Information Resource Domain

The Information Resource Domain The information resource domain, then, is the category of services that treat data in a context. These services are able to treat the data, not just as an opaque set of bits, but also as information within a context. An information service may understand what application has generated the data, it may understand the format of the data, or it may understand the relationship of the data to other parts of the environment. An information service may examine the information in the environment, extract keywords from data elements, index the content and/or create metadata. Because information services are able to understand this context, only information services can be used to help classify data according to its requirements. This data classification can then be communicated to data services so that those requirements can be met. We differentiate this classification from the grouping of data elements that data services are capable of, but note that once classified, data can be treated as a group by the data services.

Metadata and Information Services The role of metadata in information services is as a communication mechanism with the underlying storage services and data services. Information services are primarily concerned with the data service system metadata as a means to convey the data s requirements to the underlying data services. An information service may also interpret user metadata for purposes of data classification. An information service can create its own user metadata that is un-interpreted by the underlying services for its own use.

Management of Resources

Data Policies Data policies are used to manage data services in support of achieving the data s requirements. In the absence of data classification, data policies are used by administrators on groups of data that are the target of the policies operation. Conditions may be time based and may involve the value of data service metadata. Events in the environment may also trigger data policies. The actions may be any functions of data services that are exposed through a management interface. Some example actions include data placement, data movement and data transformation (such as compression and encryption).

Information Policies Information policies are used to ensure that data is treated according to its importance to the organization. Information policies implement business processes regarding the information that applications generate and use. Information policy conditions depend on business related properties that are available in the environment. These properties can include the position of employees in the organization (as stored in a corporate directory), properties derived from the information itself, business intelligence applications and business conditions available from financial applications. The actions of information policies can set data service metadata corresponding to the requirements of the data at this point in time as well as corresponding to the class of data.

Data Requirement Lifecycles The data s requirements may change over the life of the data and may change based on events internal to the environment or events external to the business (such as a subpoena). Data policies can be used to manage the data according to pre-defined lifecycle steps for each different class of data. This is known as Data Lifecycle Management (DLM).

Definitions

Data Related Terms Data The digital representation of anything in any form. Data Service A set of functions that treat data without any contextual interpretation. This treatment may, for example, involve copying, movement, security and/or protection, but not the actual storage of the data. Data Resource Domain The category of resources that exclusively encompass data services.

Information Related Terms Information Information is data that is interpreted within a context such as an application or a process. Information Service A set of functions that treat data within an interpretation context. Information Resource Domain The category of resources that exclusively encompass information services.

Storage Related Terms Storage a function that records data and supports retrieval. Storage Service A storage service is a set of functions that provide data storage. Storage Resource Domain The category of resources that exclusively encompass storage services.

Resource Domain Model

The Resource Domain Model This model shows the logical layering of the different domains and the role of policies for each domain. The services in each domain play a different role, but leverage common, standard interfaces

Mapping to existing products

Product Mapping Backup software Array with snapshot and remote replication Database software

Backup Software Backup software may contain services from both the Information Resource Domain and Data Resource Domain Information services classify the data to be backed up, may use policies for classification Data services copy the data under the direction of backup policies Standard interfaces are used for storage system metadata, location and read/write of data Data classification is typically not marked as Data system metadata But instead is conveyed through internal APIs today Information and Data Policies are in a proprietary format Typically modified through a custom user interface Points of interoperability that might be standardized

Array with Snapshot and Replication Storage arrays may contain services from both the Data Resource Domain and the Storage Resource Domain Snapshot data service keeps consistent, virtual copy of data Remote replication copies data to a remote site Standard interfaces are used for location and read/write of data There is no per volume metadata that can be used to convey requirements Configuration and control can be done through a standard SMI-S interface however Storage Policies can be externalized Not much traction within SMI-S for this so far

Database software Database software may contain services from any or all of the resource domains Information services to classify the data in the database instances Data services to protect, secure and ensure data availability Storage services to contain the data and query for its location Columns in the database tables can be used as metadata to express requirements of each row of data Similar metadata could be created for each table Policies are typically driven by various administrative interfaces But no de jure standard exists today

Future Standardization

Future Standardization It s clear from the Model that metadata can play a key role in interoperability between the Information Resource Domain and the Data Resource Domain Standardizing Data System Metadata for different types of data requirements and their implementing data services will allow for interoperability between these domains The standards can apply equally to Data Storage Interfaces as well as data formats themselves XAM has both: XSet properties in the API, and an export format that encapsulates these properties Long Term Retention TWG developing a data format with associated data system metadata