ADVANCED DATA REDUCTION CONCEPTS

Similar documents
ADVANCED DEDUPLICATION CONCEPTS. Thomas Rivera, BlueArc Gene Nagle, Exar

Tom Sas HP. Author: SNIA - Data Protection & Capacity Optimization (DPCO) Committee

Restoration Technologies

Data Deduplication Methods for Achieving Data Efficiency

Thomas Rivera / Hitachi Data Systems. Author: SNIA - Data Protection & Capacity Optimization (DPCO) Committee

Trends in Data Protection and Restoration Technologies. Mike Fishman, EMC 2 Corporation

in Transition to the Cloud David A. Chapa, CTE EVault, a Seagate Company

Trends in Data Protection and Restoration Technologies. Jason Iehl, NetApp

Protecting Data in the Big Data World

Application Recovery. Andreas Schwegmann / HP

Storage Virtualization II Effective Use of Virtualization - focusing on block virtualization -

Storage in combined service/product data infrastructures. Craig Dunwoody CTO, GraphStream Incorporated

Storage Virtualization II. - focusing on block virtualization -

SRM: Can You Get What You Want? John Webster, Evaluator Group.

A Promise Kept: Understanding the Monetary and Technical Benefits of STaaS Implementation. Mark Kaufman, Iron Mountain

SRM: Can You Get What You Want? John Webster Principal IT Advisor, Illuminata

50 TB. Traditional Storage + Data Protection Architecture. StorSimple Cloud-integrated Storage. Traditional CapEx: $375K Support: $75K per Year

Tiered File System without Tiers. Laura Shepard, Isilon

Planning For Persistent Memory In The Data Center. Sarah Jelinek/Intel Corporation

LEVERAGING FLASH MEMORY in ENTERPRISE STORAGE

ASPECTS OF DEDUPLICATION. Dominic Kay, Oracle Mark Maybee, Oracle

The Role of WAN Optimization in Cloud Infrastructures. Josh Tseng, Riverbed

Interoperable Cloud Storage with the CDMI Standard. Mark Carlson, SNIA TC and Oracle Co-Chair, SNIA Cloud Storage TWG

Cloud Archive and Long Term Preservation Challenges and Best Practices

Trends in Data Protection CDP and VTL

Overview and Current Topics in Solid State Storage

Storage Performance Management Overview. Brett Allison, IntelliMagic, Inc.

Multi-Cloud Storage: Addressing the Need for Portability and Interoperability

Ron Emerick, Oracle Corporation

The Benefits of Solid State in Enterprise Storage Systems. David Dale, NetApp

Overview and Current Topics in Solid State Storage

Interoperable Cloud Storage with the CDMI Standard. Mark Carlson, SNIA TC and Oracle Chair, SNIA Cloud Storage TWG

PRESENTATION TITLE GOES HERE. Understanding Architectural Trade-offs in Object Storage Technologies

Notes & Lessons Learned from a Field Engineer. Robert M. Smith, Microsoft

Scaling Data Center Application Infrastructure. Gary Orenstein, Gear6

Use Cases for iscsi and FCoE: Where Each Makes Sense

Everything You Wanted To Know About Storage (But Were Too Proud To Ask) The Basics

Virtualization Practices: Providing a Complete Virtual Solution in a Box

Virtualization Practices:

An Introduction to Key Management for Secure Storage. Walt Hubis, LSI Corporation

SNIA Tutorial 1 A CASE FOR FLASH STORAGE HOW TO CHOOSE FLASH STORAGE FOR YOUR APPLICATIONS

Effective Storage Tiering for Databases

Deploying Public, Private, and Hybrid. Storage Cloud Environments

HP Dynamic Deduplication achieving a 50:1 ratio

Expert Insights: Expanding the Data Center with FCoE

The File Systems Evolution. Christian Bandulet, Sun Microsystems

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

Facing an SSS Decision? SNIA Efforts to Evaluate SSS Performance. Ray Lucchesi Silverton Consulting, Inc.

Apples to Apples, Pears to Pears in SSS performance Benchmarking. Esther Spanjer, SMART Modular

iscsi : A loss-less Ethernet fabric with DCB Jason Blosil, NetApp Gary Gumanow, Dell

Get More Out of Storage with Data Domain Deduplication Storage Systems

Green Data Center and Storage Technologies. Alan G. Yoder, Ph.D., NetApp

Copyright 2010 EMC Corporation. Do not Copy - All Rights Reserved.

Create a Smarter and More Economic Cloud Storage Architecture. November 7, 2018

Scale-out Data Deduplication Architecture

Analysis and Optimization. Carl Waldspurger Irfan Ahmad CloudPhysics, Inc.

The Evolution of File Systems

Business Benefits of Policy Based Data De-Duplication Data Footprint Reduction with Quality of Service (QoS) for Data Protection

A Vendor Agnostic Overview. Walt Hubis Hubis Technical Associates

Scale-out Object Store for PB/hr Backups and Long Term Archive April 24, 2014

A Crash Course In Wide Area Data Replication. Jacob Farmer, CTO, Cambridge Computer

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

Server and Storage Consolidation with iscsi Arrays. David Dale, NetApp Suzanne Morgan, Microsoft

High Availability Using Fault Tolerance in the SAN. Wendy Betts, IBM Mark Fleming, IBM

EMC DATA DOMAIN PRODUCT OvERvIEW

1 Quantum Corporation 1

Adrian Proctor Vice President, Marketing Viking Technology

Introducing Tegile. Company Overview. Product Overview. Solutions & Use Cases. Partnering with Tegile

Verron Martina vspecialist. Copyright 2012 EMC Corporation. All rights reserved.

LTFS Bulk Transfer Standard PRESENTATION TITLE GOES HERE

Single Instance Storage Strategies

Executive Summary SOLE SOURCE JUSTIFICATION. Microsoft Integration

Accelerate the Journey to 100% Virtualization with EMC Backup and Recovery. Copyright 2010 EMC Corporation. All rights reserved.

See what s new: Data Domain Global Deduplication Array, DD Boost and more. Copyright 2010 EMC Corporation. All rights reserved.

Thinking Different: Simple, Efficient, Affordable, Unified Storage

WAN Application Infrastructure Fueling Storage Networks

Microsoft DPM Meets BridgeSTOR Advanced Data Reduction and Security

WHY DO I NEED FALCONSTOR OPTIMIZED BACKUP & DEDUPLICATION?

Understanding Virtual System Data Protection

Rio-2 Hybrid Backup Server

A Centralized Data Protection Application for Cross Vendor Storage Systems Prateek Sinha & Nishi Gupta Tata Consultancy Services

DEDUPLICATION BASICS

Protect enterprise data, achieve long-term data retention

Evolution of Fibre Channel. Mark Jones FCIA / Emulex Corporation

Mostafa Magdy Senior Technology Consultant Saudi Arabia. Copyright 2011 EMC Corporation. All rights reserved.

PRESENTATION TITLE GOES HERE

Technologies for Green Storage. Alan G. Yoder, Ph.D. NetApp

WHAT HAPPENS WHEN THE FLASH INDUSTRY GOES TO TLC? Luanne M. Dauber, Pure Storage

Strategies for Single Instance Storage. Michael Fahey Hitachi Data Systems

Identifying and Eliminating Backup System Bottlenecks: Taking Your Existing Backup System to the Next Level

April 2010 Rosen Shingle Creek Resort Orlando, Florida

Scale-Out Architectures for Secondary Storage

The storage challenges of virtualized environments

How to solve your backup problems with HP StoreOnce

SAS: Today s Fast and Flexible Storage Fabric. Rick Kutcipal President, SCSI Trade Association Product Planning and Architecture, Broadcom Limited

EMC Data Domain for Archiving Are You Kidding?

Provisioning with SUSE Enterprise Storage. Nyers Gábor Trainer &

De-dupe: It s not a question of if, rather where and when! What to Look for and What to Avoid

THE SUMMARY. CLUSTER SERIES - pg. 3. ULTRA SERIES - pg. 5. EXTREME SERIES - pg. 9

IBM Spectrum Protect Version Introduction to Data Protection Solutions IBM

Transcription:

ADVANCED DATA REDUCTION CONCEPTS Thomas Rivera, Hitachi Data Systems Gene Nagle, BridgeSTOR Author: Thomas Rivera, Hitachi Data Systems Author: Gene Nagle, BridgeSTOR

SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individual members may use this material in presentations and literature under the following conditions: Any slide or slides used must be reproduced in their entirety without modification The SNIA must be acknowledged as the source of any material used in the body of any document containing material from these presentations. This presentation is a project of the SNIA Education Committee. Neither the author nor the presenter is an attorney and nothing in this presentation is intended to be, or should be construed as legal advice or an opinion of counsel. If you need legal advice or a legal opinion please contact your attorney. The information presented herein represents the author's personal opinion and current understanding of the relevant issues involved. The author, the presenter, and the SNIA do not assume any responsibility or liability for damages arising out of any reliance on or use of this information. NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK. 2

About the SNIA DPCO This tutorial has been developed, reviewed and approved by members of the Data Protection and Capacity Optimization (DPCO) Committee which any SNIA member can join for free The mission of the DPCO is to foster the growth and success of the market for data protection and capacity optimization technologies 2012 goals include educating the vendor and user communities, market outreach, and advocacy and support of any technical work associated with data protection and capacity optimization Check out these SNIA Tutorials: Understanding Data Deduplication Deduplication s Role in Disaster Recovery Managing Big Data Hands-On Lab HOL 3

Abstract Since arriving on the scene ~20 years ago, the adoption of data reduction has become widespread throughout the storage and data protection community. This tutorial assumes a basic understanding of data reduction techniques and covers topics that attendees will find helpful in understanding today s expanded use of this technology. Topics will include: Trends in data reduction design and usage Practical data reduction of primary storage Using data reduction techniques to reduce storage network traffic Pervasive data reduction across storage tiers 4

Terminology Capacity Optimization Methods [Storage System] Methods which reduce the consumption of space required to store a data set, such as compression, data deduplication, thin provisioning, and delta snapshots Data Deduplication [Storage System] The replacement of multiple copies of data at variable levels of granularity with references to a shared copy in order to save storage space and/or bandwidth. Compression [General] The process of encoding data to reduce its size. Lossy compression (i.e., compression using a technique in which a portion of the original information is lost) is acceptable for some forms of data (e.g., digital images) in some applications, but for most IT applications, lossless compression (i.e., compression using a technique that preserves the entire content of the original data, and from which the original data can be reconstructed exactly) is required. 5

Data Reduction Trends Original value and justification has not changed: Satisfy ROI/TCO requirements Manage data growth Increase efficiency of storage and backup Reduce overall cost of storage Reduce network bandwidth requirements Reduce operational costs including: Infrastructure costs: space, power and cooling - Movement toward a greener data center Reduce administrative costs New OSes and file systems are now integrating deduplication (e.g., Windows 8 Server ReFS, ZFS) 6

Data Reduction Techniques Compression Deduplication File (Single Instance Storage) Block (Hash-Based or Delta Block) Application aware In-line vs. post process Thin Provisioning Note: Some techniques may be combined 7

Deduplication and Compression Dedupe and compression are similar Both are dependant on data patterns Both consume system resources Both can optimize required storage capacity Dedupe and compression are different Some data can only be optimized via dedupe Some data can only be optimized via compression Some data can be optimized via dedupe and compression Some data cannot be optimized at all Dedupe and compression are complementary But some knowledge about the data pattern is required 8

Data Reduction Scope The scope of data reduction is broadening: Primary Storage Reduced physical capacity for storage of active data Replication Reduced capacity for disaster recovery and business continuity Data Protection Reduced capacity for backup with longer retention periods Archivals Reduced capacity for data retention and preservation Movement/Migration of data Reduced bandwidth requirements for data-in-transit 9

Data Reduction in Primary Storage Effective for specific workloads Acceptable performance - a factor for compression or deduplication Post-processing Inline ingestion Network-based Deduplication works best with applications with high data redundancy Virtual servers and desktops Collaborative file sharing Email (Software SIS replacement) LAN Deduplicated Data Application Server SAN Primary Storage 10

Deduplication Savings Expectation Deduplication Savings Depend on Use Case and Time Primary/replicated storage has less duplicate data Periodic archives have moderate duplicate data Repeated backups have significant duplicate data Logical Capacity Dedupe Primary Dedupe Archive Savings Time Dedupe Backup 11

Primary Storage Array Cache/SSD Array Cache Intelligent cache can be dedupe-aware Hot data is cached with dedupe attributes Reduces rotating media latencies Example: Virtual Desktop boot storms Solid State Drives Deduplication/Compression helps offset the higher cost/gb of SSD s High performance applications with highly redundant data Random nature of (block-level) deduplicated data I/O a good match to SSD 12

Primary Storage Considerations Balance the tradeoff between cost savings and performance impact VM Clone Copies APP APP DATA DATA OS OS APP DATA OS APP DATA OS APP DATA OS Some workloads lend themselves better to data reduction Walk before you run Use estimation tools Perform POCs Implement one workload at a time APP DATA OS VMs After Deduplication Duplicate Data Is Eliminated 13

Data Reduction and Replication Remote Sites Primary Data Center Secondary Data Center Optimized Data Optimized Data Can be one-way, two-way, multi-node, or cascade Optimized location(s) can be determined by Users Data reduction makes replication more affordable Data reduction enables replication on constrained networks 14

Replication Considerations Focus on your Service Level Agreements (SLAs) first Needs to meet window for Replication Needs to meet SLA for System Recovery or Data Restore Is it Necessary to Optimize All Data? Mission-critical applications May have regulatory issues for some data Some data types not conducive to data reduction Replicate incremental changes only, without other optimization 15

Data Reduction and Network Traffic SAN/LAN/WAN Data in Transit Remote Office Servers Application Servers Production Storage DR/Backup Storage Archival Storage Increased SAN/LAN/WAN Efficiency Compression/deduplication for data in flight Transfer data references instead of data objects Shorten data transfer times by sending less data 16

Deduplication and Backups The Original Promise Faster data recovery from disk Reduction in D2D cost per terabyte stored Reduction in D2D backup storage footprint Less network bandwidth required for D2D backups Makes longer retention possible What s New? Wide use as part of backup software Scalability of deduplication appliances Deduplication across appliances Deduplication to tape 17

Backup Considerations Hardware or software deduplication? Source or target deduplication? Variable or fixed-length deduplication? File or sub-file deduplication? Compression WITH deduplication? Answers depend on the problem you are trying to solve 18

Data Reduction and Archival Application Servers Primary Storage Data reduction can reduce the cost of online archive repositories These repositories are often required for regulatory compliance No standard exists today for approved use of data reduction techniques with regulatory data Vendors should provide assurances that the ability to retrieve data in its original form is not impaired Optimized Archive 19

Consideration Matrix for Data Reduction Location Application-specific protocol CIFS, NFS, FC, FCoE, iscsi, VTL WAN Network Gateway Backup Software Agent Virtual Gateway Primary Storage DR/Backup Storage Archival Storage Reduce Network Traffic Reduce Physical Capacity Reduce Backup Time Reduce Recovery Time Reduce Replication Time Reduce Media Latency 20

Summary The primary value of data reduction is in reducing costs and helping manage data growth The scope of data reduction is broadening All storage tiers Various levels of granularity Bandwidth reduction Data subject to regulatory rules New use cases and new technologies bring new challenges And new opportunities! 21

Q&A / Feedback Please send any questions or comments on this presentation to SNIA: tracktutorials@snia.org Many thanks to the following individuals for their contributions to this tutorial. - SNIA Education Committee Data Protection and Capacity Optimization (DPCO) Committee: Mike Dutch Larry Freeman Tom McNeal Gene Nagle Ron Pagani Thomas Rivera Tom Sas SW Worth It s easy to get involved with the DPCO! - Join us - Find a passion - Gain knowledge & influence - Make a difference www.snia.org/dpco 22