Cloud Archive and Long Term Preservation Challenges and Best Practices

Similar documents
Multi-Cloud Storage: Addressing the Need for Portability and Interoperability

Interoperable Cloud Storage with the CDMI Standard. Mark Carlson, SNIA TC and Oracle Co-Chair, SNIA Cloud Storage TWG

Interoperable Cloud Storage with the CDMI Standard. Mark Carlson, SNIA TC and Oracle Chair, SNIA Cloud Storage TWG

Deploying Public, Private, and Hybrid. Storage Cloud Environments

ADVANCED DEDUPLICATION CONCEPTS. Thomas Rivera, BlueArc Gene Nagle, Exar

A Promise Kept: Understanding the Monetary and Technical Benefits of STaaS Implementation. Mark Kaufman, Iron Mountain

ADVANCED DATA REDUCTION CONCEPTS

Application Recovery. Andreas Schwegmann / HP

Trends in Data Protection and Restoration Technologies. Mike Fishman, EMC 2 Corporation

Tom Sas HP. Author: SNIA - Data Protection & Capacity Optimization (DPCO) Committee

LTFS Bulk Transfer Standard PRESENTATION TITLE GOES HERE

Mobile and Secure Healthcare: Encrypted Objects and Access Control Delegation

Data Deduplication Methods for Achieving Data Efficiency

PRESENTATION TITLE GOES HERE. Understanding Architectural Trade-offs in Object Storage Technologies

Restoration Technologies

Tutorial. A New Standard for IP Based Drive Management. Mark Carlson SNIA Technical Council Co-Chair

An Introduction to Key Management for Secure Storage. Walt Hubis, LSI Corporation

Format Technologies for the Long Term Retention of Big Data. Roger Cummings, Antesignanus Co-Author: Simona Rabinovici-Cohen, IBM Research Haifa

in Transition to the Cloud David A. Chapa, CTE EVault, a Seagate Company

Virtualization Practices:

The Role of WAN Optimization in Cloud Infrastructures. Josh Tseng, Riverbed

Cloud Storage Securing CDMI. Eric A. Hibbard, CISSP, CISA, ISSAP, ISSMP, ISSEP, SCSE Hitachi Data Systems

Facing an SSS Decision? SNIA Efforts to Evaluate SSS Performance. Ray Lucchesi Silverton Consulting, Inc.

Trends in Data Protection and Restoration Technologies. Jason Iehl, NetApp

Notes & Lessons Learned from a Field Engineer. Robert M. Smith, Microsoft

SRM: Can You Get What You Want? John Webster, Evaluator Group.

Storage in combined service/product data infrastructures. Craig Dunwoody CTO, GraphStream Incorporated

Format Technologies for the Long Term Retention of Big Data

Planning For Persistent Memory In The Data Center. Sarah Jelinek/Intel Corporation

The Evolution of File Systems

Ramin Elahi, Adjunct Faculty UC Santa Cruz Silicon Valley

ASPECTS OF DEDUPLICATION. Dominic Kay, Oracle Mark Maybee, Oracle

Blockchain Beyond Bitcoin. Mark O Connell

Effective Storage Tiering for Databases

LEVERAGING FLASH MEMORY in ENTERPRISE STORAGE

SRM: Can You Get What You Want? John Webster Principal IT Advisor, Illuminata

Storage Clouds. Marty Stogsdill Oracle

Everything You Wanted To Know About Storage (But Were Too Proud To Ask) The Basics

EVERYTHING YOU WANTED TO KNOW ABOUT STORAGE, BUT WERE TOO PROUD TO ASK Part Cyan Storage Management. September 28, :00 am PT

Create a Smarter and More Economic Cloud Storage Architecture. November 7, 2018

Tiered File System without Tiers. Laura Shepard, Isilon

SNIA 100 Year Archive Survey 2017 Thomas Rivera, CISSP

Virtualization Practices: Providing a Complete Virtual Solution in a Box

Mark Rogov, Dell EMC Chris Conniff, Dell EMC. Feb 14, 2018

WHAT HAPPENS WHEN THE FLASH INDUSTRY GOES TO TLC? Luanne M. Dauber, Pure Storage

The Benefits of Solid State in Enterprise Storage Systems. David Dale, NetApp

Storage Performance Management Overview. Brett Allison, IntelliMagic, Inc.

Scaling Data Center Application Infrastructure. Gary Orenstein, Gear6

A Vendor Agnostic Overview. Walt Hubis Hubis Technical Associates

Adrian Proctor Vice President, Marketing Viking Technology

Use Cases for iscsi and FCoE: Where Each Makes Sense

Expert Insights: Expanding the Data Center with FCoE

The File Systems Evolution. Christian Bandulet, Sun Microsystems

Ron Emerick, Oracle Corporation

Apples to Apples, Pears to Pears in SSS performance Benchmarking. Esther Spanjer, SMART Modular

SAS: Today s Fast and Flexible Storage Fabric. Rick Kutcipal President, SCSI Trade Association Product Planning and Architecture, Broadcom Limited

OpenStack Manila An Overview of Manila Liberty & Mitaka

SNIA Cloud Storage TWG

Storage Virtualization II Effective Use of Virtualization - focusing on block virtualization -

Trends in Worldwide Media and Entertainment Storage

extensible Access Method (XAM) - a new fixed content API Mark A Carlson, SNIA Technical Council, Sun Microsystems, Inc.

High Availability Using Fault Tolerance in the SAN. Wendy Betts, IBM Mark Fleming, IBM

The Storage Networking Industry Association (SNIA) Data Preservation and Metadata Projects. Bob Rogers, Application Matrix

Format Technologies for the Long Term Retention of Big Data

Felix Xavier CloudByte Inc.

Jeff Dodson / Avago Technologies

Overview and Current Topics in Solid State Storage

Felix Xavier CloudByte Inc.

Data Protection for Virtualized Environments

pnfs, parallel storage for grid and enterprise computing Joshua Konkle, NetApp, Inc.

How to create a synthetic workload test. Eden Kim, CEO Calypso Systems, Inc.

SNIA Tutorial 1 A CASE FOR FLASH STORAGE HOW TO CHOOSE FLASH STORAGE FOR YOUR APPLICATIONS

SAS: Today s Fast and Flexible Storage Fabric

Trends in Data Protection CDP and VTL

Architectural Principles for Networked Solid State Storage Access

LTR TWG & the Cloud PRESENTATION TITLE GOES HERE

Storage Virtualization II. - focusing on block virtualization -

Overview and Current Topics in Solid State Storage

Software Defined Storage. Mark Carlson, Alan Yoder, Leah Schoeb, Don Deel, Carlos Pratt, Chris Lionetti, Doug Voigt

Analysis and Optimization. Carl Waldspurger Irfan Ahmad CloudPhysics, Inc.

The Secret Sauce of ILM

Testing. Michael Ault, IBM Oracle FlashSystem Consulting Manager

The Need for a Terminology Bridge. May 2009

Evolution of Fibre Channel. Mark Jones FCIA / Emulex Corporation

SCSI Security Nuts and Bolts. Ralph Weber, ENDL Texas

Strategies for Single Instance Storage. Michael Fahey Hitachi Data Systems

Best Practices in Designing Cloud Storage based Archival solution Sreenidhi Iyangar & Jim Rice EMC Corporation

Single Instance Storage Strategies

Design and Implementations of FCoE for the DataCenter. Mike Frase, Cisco Systems

Executive Summary SOLE SOURCE JUSTIFICATION. Microsoft Integration

An Introduction to Key Management for Secure Storage. Walt Hubis, LSI Corporation

Exam4Tests. Latest exam questions & answers help you to pass IT exam test easily

Massively Scalable File Storage. Philippe Nicolas, KerStor

SNIA Tutorial 3 EVERYTHING YOU WANTED TO KNOW ABOUT STORAGE: Part Teal Queues, Caches and Buffers

1. Name of Course: Oracle Database 12c: Backup and Recovery Workshop

CDMI Support to Object Storage in Cloud K.M. Padmavathy Wipro Technologies

Transforming Data Protection with HPE: A Unified Backup and Recovery June 16, Copyright 2016 Vivit Worldwide

iscsi : A loss-less Ethernet fabric with DCB Jason Blosil, NetApp Gary Gumanow, Dell

Thomas Rivera / Hitachi Data Systems. Author: SNIA - Data Protection & Capacity Optimization (DPCO) Committee

Extending RDMA for Persistent Memory over Fabrics. Live Webcast October 25, 2018

Transcription:

Cloud Archive and Long Term Preservation Challenges and Best Practices Chad Thibodeau, Cleversafe Inc. Sebastian Zangaro, HP Author: Chad Thibodeau, Cleversafe Inc.

SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. Member companies and individual members may use this material in presentations and literature under the following conditions: Any slide or slides used must be reproduced in their entirety without modification The SNIA must be acknowledged as the source of any material used in the body of any document containing material from these presentations. This presentation is a project of the SNIA Education Committee. Neither the author nor the presenter is an attorney and nothing in this presentation is intended to be, or should be construed as legal advice or an opinion of counsel. If you need legal advice or a legal opinion please contact your attorney. The information presented herein represents the author's personal opinion and current understanding of the relevant issues involved. The author, the presenter, and the SNIA do not assume any responsibility or liability for damages arising out of any reliance on or use of this information. NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK. 2

Abstract Cloud Archive Challenges and Best Practices This session will appeal to Storage Vendors, Datacenter Managers, Developers, and those seeking a basic understanding of SNIA s Cloud Archive and Preservation Special Interest Group and it s work towards promoting the use of the cloud for archival purposes. This session will examine current challenges within the Public Cloud Storage Industry, delve into some specific services profiles, and address some best practices for utilizing cloud storage for archive and preservation needs. 3

Agenda Cloud Archive and Preservation Definitions Introduce SNIA Cloud Archive and Preservation SIG Challenges within Public Cloud Storage Solutions from data to information 4

CA&P Taxonomy 5

Definitions Cloud Digital Archive Service: A cloud-base service providing a specialized online storage repository for the purposes of compliance, litigation support, and/or retention for extended periods of time, not including longterm. Can be utilized as a component of a complete digital preservation service. Does not necessarily provide adequate services to accomplish digital preservation. 6

Definitions (cont.) Cloud Digital Preservation Service A cloud service providing digital preservation of information and data. A digital preservation service includes a comprehensive management and curation function that controls: Supporting Infrastructure Information Data Storage Services 7

Definitions (cont.) Digital Preservation Object A special type of a digital information object consisting of indexes, fixity, audit logs, data files, reference information, and metadata wrapped into a single/compound digital container. A preservation object provides the functionality required to use, secure, interpret and verify authenticity of the metadata, information and data in the container and is the foundational element for digital preservation of information. CDMI is the foundation for a Cloud-intelligent Preservation Object 8

Digital Preservation Framework 9

Introduction to the SNIA Cloud Archive and Preservation (CA&P) Special Interest Group 10

CA&P Mission & Charter Advance the use of public, private and hybrid clouds for archival services and long term retention CDMI Market Education Best Practices Services Profiles Standards Promotion Industry Liaison Interoperability Demonstrations/Certifications and Plugfests Implementation Reference Model

Digital Preservation Market Source: Mpeterson www.ltdprm.com 12

CA&P Participation Participating companies: BlueArc, Cleversafe, Computer Associates, EMC, HP, Hitachi Data Systems, IMERGE Consulting, Iron Mountain, NetApp, Novell, Oracle, SNIA, Spectra Logic New Potential members: Hardware vendors Content management software developers Professional services Cloud services providers Archival infrastructure providers End-User organizations

Current challenges with the Public Cloud Storage Industry 14

Public Cloud Challenges Lack of uniform semantics and standard interfaces Interoperability between public cloud providers Managing data format changes over time Authenticity verification Compliance and Governance Risk Management & Litigation 15

A new class of challenges? Cloud A Cloud B Data over WAN via vendor specific API s 16

Public Cloud Challenges Authenticity Verification Talk about digital auditing Migration Compliance & Governance Risk and Litigation 17

Public Cloud Challenges Data Migration Every time a data object is moved it runs the risk of corruption due to error rates migration technologies. Even data at rest can be corrupted via bit-rot, malicious attack, or user error Bit-integrity preservation Data management and curation can monitor the integrity of data stored in a Digital Archive. Often checksums are used to verify that there is no variance between two copies of the same piece of data over time. A Cloud Archive must ensure that data migration does not affect the integrity of the data object stored. 18

Public Cloud Challenges Governance and Compliance 19

Solutions from Data to Information 20

21

What is already standardized? SNIA s Cloud Data Management Standard (CDMI) Standardized Data Path (Access) to the Cloud Standardized metadata to express the Archive requirement for the Data put in the cloud Requirements for Archiving: Multiple primary copies distributed geographically Secondary copies (backup) if the data changes Hash of the data to check for changes and corruption Immutability in some cases 22

How does this work in CDMI? Cloud Client needs to discover what archiving capabilities are provided by the cloud CDMI does this though Capabilities a type of resource that acts like a service catalog for the functions that the cloud offers customers If the cloud offers the capability, the customer marks the data objects and containers with metadata (Data System Metadata) that specifies the requirements Lastly the Cloud provider has a way of expressing what is actually being provided through metadata also What does this metadata look like? 23

Addressing the requirements Section 16.4 of CDMI addresses Data System Metadata A standardized means to express the archive requirements for a single data object or an entire container Multiple Primary Copies. cdmi_data_redundancy - Contains the desired number of complete copies of the data item to be maintained. This number determines the minimum number of primary copies of the data that the cloud shall maintain. Additional primary copies may be made to satisfy demand for the value. 24

Addressing the requirements distributed geographically cdmi_data_dispersion - Contains the desired distance (km) between the infrastructures supporting the multiple copies of data. This metadata is used to separate the (cdmi_infrastructure_redundancy number of) infrastructures by a minimum geographic distance to prevent data loss due to site disasters. cdmi_infrastructure_redundancy - Contains the number of desired independent storage infrastructures supporting the data. This metadata is used to convey that, of the primary copies specified in cdmi_data_redundancy, these copies shall be stored on this many separate infrastructures. Any two infrastructures may not share common elements, such as a network or power source. 25

Addressing the requirements Secondary copies cdmi_rpo - Contains the largest acceptable duration in time between an update and when the update may be recovered, specified in seconds. This metadata is used to indicate the desired backup frequency from the primary copy(s) of the data to the secondary copy(s). cdmi_rto - Contains the largest acceptable duration in time to restore data, specified in seconds. This metadata is used to indicate the desired maximum acceptable duration to restore the primary copy(s) of the data from a secondary backup copy(s). 26

Addressing the requirements Hash cdmi_value_hash - If present, this metadata lists the hash algorithm/ lengths supported. If absent, objects shall not present a hash value as system metadata. Values are in the form of "Algorithm Length, for example, "SHA256". When a CDMI implementation supports hashing, the system-wide capability of cdmi_security_data_integrity shall be set to true. Otherwise, it shall not be present, indicating that there is no hashing support. 27

Addressing the requirements Immutability cdmi_retention_period - Contains an ISO-8601 time interval during which the object is under retention. Only the end-date may be altered when updated. If an object is under retention, the object may not be deleted and its value may not be altered. After the retention date has passed, the object may be deleted. 28

Other Requirements Archived data may have other requirements besides those related to archiving i.e. access latency, throughput, encryption, geographic placement, etc. All standardized by CDMI! But, need implementation by Cloud Archiving Providers Ask for these from your favorite storage cloud 29

Solution Benefits of Industry standards: Allows storage vendors and developers to easily integrate with any cloud infrastructure. Allows Data Object Migration between heterogeneous systems: End User site to Public Cloud Public Cloud A to Public Cloud B From Public Cloud back to the End User Standards already exist such as CDMI (The Cloud Data Management Interface) 30

Example Services Profiles Digital Cloud Archive Digital Preservation Cloud Backup Cloud Retention for extended periods of time Retention for extended periods of time (indefinitely) Variable retention periods (may not be long-term) Permanent Deletion Permanent Deletion Incremental Deletion Long-term information Long-term information Data recovery Availability Interpretation Device specific Interpretation Authentication Policies Authentication Data protection Security Security Security Data Integrity Information Integrity Data Integrity Define Profiles using CDMI JSON attributes Certify compliance at Plugfests 31

Q&A / Feedback Please send any questions or comments on this presentation to SNIA: tracktutorials@snia.org Many thanks to the following individuals for their contributions to this tutorial. SNIA Cloud Archive and Preservation SIG Chris Marsh Michael Peterson Don Post Thomas Rivera Mark Carlson Ray Clark Bob Rogers 32