Understanding Primary Storage Optimization Options Jered Floyd Permabit Technology Corp.

Similar documents
TCO REPORT. NAS File Tiering. Economic advantages of enterprise file management

HPE SimpliVity. The new powerhouse in hyperconvergence. Boštjan Dolinar HPE. Maribor Lancom

WHITE PAPER. Permabit Albireo Data Optimization Software. Virtual Data Optimizer (VDO) for Linux. August Permabit Technology Corporation

Introducing Tegile. Company Overview. Product Overview. Solutions & Use Cases. Partnering with Tegile

StorageCraft OneXafe and Veeam 9.5

Deduplication Storage System

XTREMIO: TRANSFORMING APPLICATIONS, ENABLING THE AGILE DATA CENTER

Modernize Without. Compromise. Modernize Without Compromise- All Flash. All-Flash Portfolio. Haider Aziz. System Engineering Manger- Primary Storage

Renovating your storage infrastructure for Cloud era

Symantec NetBackup 7 for VMware

STATE OF STORAGE IN VIRTUALIZED ENVIRONMENTS INSIGHTS FROM THE MIDMARKET

Flashed-Optimized VPSA. Always Aligned with your Changing World

Hyper-converged Secondary Storage for Backup with Deduplication Q & A. The impact of data deduplication on the backup process

Design Tradeoffs for Data Deduplication Performance in Backup Workloads

SoftNAS Cloud Data Management Products for AWS Add Breakthrough NAS Performance, Protection, Flexibility

If you knew then...what you know now. The Why, What and Who of scale-out storage

THE EMC ISILON STORY. Big Data In The Enterprise. Deya Bassiouni Isilon Regional Sales Manager Emerging Africa, Egypt & Lebanon.

ADVANCED DATA REDUCTION CONCEPTS

Maximizing Data Efficiency: Benefits of Global Deduplication

HYCU and ExaGrid Hyper-converged Backup for Nutanix

Optimizing the Data Center with an End to End Solutions Approach

Deduplication File System & Course Review

StorageCraft OneBlox and Veeam 9.5 Expert Deployment Guide

ADVANCED DEDUPLICATION CONCEPTS. Thomas Rivera, BlueArc Gene Nagle, Exar

Market Report. Scale-out 2.0: Simple, Scalable, Services- Oriented Storage. Scale-out Storage Meets the Enterprise. June 2010.

Converged Platforms and Solutions. Business Update and Portfolio Overview

São Paulo. August,

Technical White Paper: IntelliFlash Architecture

Flash Decisions: Which Solution is Right for You?

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

ChunkStash: Speeding Up Storage Deduplication using Flash Memory

Veeam with Cohesity Data Platform

50 TB. Traditional Storage + Data Protection Architecture. StorSimple Cloud-integrated Storage. Traditional CapEx: $375K Support: $75K per Year

April 2010 Rosen Shingle Creek Resort Orlando, Florida

How To Get The Most Out Of Flash Deployments

Paper. Delivering Strong Security in a Hyperconverged Data Center Environment

A Comparative Survey on Big Data Deduplication Techniques for Efficient Storage System

The Business Case to deploy NetBackup Appliances and Deduplication

Copyright 2010 EMC Corporation. Do not Copy - All Rights Reserved.

Atlantis Computing Adds the Ability to Address Classic Server Workloads

HP Storage Summit 2015 Transform Now.

FOCUS ON THE FACTS: SOFTWARE-DEFINED STORAGE

Maximizing Availability With Hyper-Converged Infrastructure

Saving capacity by using Thin provisioning, Deduplication, and Compression In Qsan Unified Storage. U400Q Series U600Q Series

De-dupe: It s not a question of if, rather where and when! What to Look for and What to Avoid

Why Datrium DVX is Best for VDI

UNLEASH YOUR APPLICATIONS

The What, Why and How of the Pure Storage Enterprise Flash Array. Ethan L. Miller (and a cast of dozens at Pure Storage)

Discover the all-flash storage company for the on-demand world

NEC Express5800 R320f Fault Tolerant Servers & NEC ExpressCluster Software

VMware Horizon View. VMware Horizon View with Tintri VMstore. TECHNICAL SOLUTION OVERVIEW, Revision 1.1, January 2013

Scale-out Data Deduplication Architecture

Scale-out Object Store for PB/hr Backups and Long Term Archive April 24, 2014

The End of Storage. Craig Nunes. HP Storage Marketing Worldwide Hewlett-Packard

Hitachi Virtual Storage Platform Family

Enterprise-class desktop virtualization with NComputing. Clear the hurdles that block you from getting ahead. Whitepaper

Certeon s acelera Virtual Appliance for Acceleration

VMware Virtual SAN Technology

BUYING SERVER HARDWARE FOR A SCALABLE VIRTUAL INFRASTRUCTURE

Isilon: Raising The Bar On Performance & Archive Use Cases. John Har Solutions Product Manager Unstructured Data Storage Team

Hyper-Converged Infrastructure: Providing New Opportunities for Improved Availability

ECONOMICAL, STORAGE PURPOSE-BUILT FOR THE EMERGING DATA CENTERS. By George Crump

DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE

IBM System Storage DCS3700

DAHA AKILLI BĐR DÜNYA ĐÇĐN BĐLGĐ ALTYAPILARIMIZI DEĞĐŞTĐRECEĞĐZ

THE SUMMARY. CLUSTER SERIES - pg. 3. ULTRA SERIES - pg. 5. EXTREME SERIES - pg. 9

DDN. DDN Updates. DataDirect Neworks Japan, Inc Nobu Hashizume. DDN Storage 2018 DDN Storage 1

Open Hybrid Cloud & Red Hat Products Announcements

Hyperconvergence and Medical Imaging

Why Converged Infrastructure?

The 5 Keys to Virtual Backup Excellence

The Effectiveness of Deduplication on Virtual Machine Disk Images

Four Steps to Unleashing The Full Potential of Your Database

EMC Virtual Infrastructure for Microsoft Exchange 2010 Enabled by EMC Symmetrix VMAX, VMware vsphere 4, and Replication Manager

VEXATA FOR ORACLE. Digital Business Demands Performance and Scale. Solution Brief

Deduplication and Incremental Accelleration in Bacula with NetApp Technologies. Peter Buschman EMEA PS Consultant September 25th, 2012

"Software-defined storage Crossing the right bridge"

MODERNISE WITH ALL-FLASH. Intel Inside. Powerful Data Centre Outside.

Dell Fluid Data solutions. Powerful self-optimized enterprise storage. Dell Compellent Storage Center: Designed for business results

Single Instance Storage Strategies

Deep Dive on SimpliVity s OmniStack A Technical Whitepaper

HP StoreOnce: reinventing data deduplication

Turning Object. Storage into Virtual Machine Storage. White Papers

Scale-Out Architectures for Secondary Storage

SolidFire and Ceph Architectural Comparison

Understanding Virtual System Data Protection

Modernize with all-flash

Modernize with All Flash

Tegile Enters the All-Flash Array Market with Super Density Offering

Don t Get Duped By Dedupe: Introducing Adaptive Deduplication

Next Generation Storage for The Software-Defned World

The storage challenges of virtualized environments

Data Reduction Meets Reality What to Expect From Data Reduction

The Expanding Role of Backup and Recovery in the Hybrid Enterprise

CASE STUDY: ASHLAND FOOD CO-OP

SoftNAS Cloud Performance Evaluation on AWS

DDN. DDN Updates. Data DirectNeworks Japan, Inc Shuichi Ihara. DDN Storage 2017 DDN Storage

New HPE 3PAR StoreServ 8000 and series Optimized for Flash

See what s new: Data Domain Global Deduplication Array, DD Boost and more. Copyright 2010 EMC Corporation. All rights reserved.

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review

Transcription:

Understanding Primary Storage Optimization Options Jered Floyd Permabit Technology Corp.

Primary Storage Optimization Technologies that let you store more data on the same storage Thin provisioning Copy-on-write snapshots Compression Deduplication 2

Data Growth is Accelerating This explosive growth means that by 2020, our Digital Universe will be 44 TIMES AS BIG as it was in 2009 ~ IDC 1 ZB = 1 Trillion Gigabytes Source: IDC Digital Universe Study, May 2010 3

Requirements for Primary Storage Optimization To be broadly adopted, any technology must: Support block, file and unified architectures Have no impact on performance Have no impact on reliability Scale to support storage capacity deployed Implement within existing architecture Older methods (e.g. thin provisioning) meet these requirements today Newer methods are beginning to emerge 4

Compression and Deduplication For that reason, we ll focus on compression and deduplication Both have been in backup forever Compression 40 years Deduplication 10 years Compression for primary storage, while available for many years, has never really been enabled Deduplication is relatively new to primary storage 5

2011 is The Year of Primary Storage Optimization Musings on the future of data dedupe One thing is clear: In 2011, the focus will shift from deduplication for nearline / secondary storage to deduplication for primary storage. ~Dave Simpson Data storage trends 2011: Predictions of hot data storage technologies The other one that I think is really important, and we're just beginning to see this come out now, is data deduplication and compression for primary storage. ~Tony Asaro Hot technologies for 2011 Primary storage data reduction is back from our 2010 Hot Technologies list. In 2011, we'll see a lot more of primary data reduction in shipping products. Storage Industry Consolidation And Deduplication In short customers want a single deduplication method that works across platforms. It s the only logical way to leverage deduplication so its full advantages can be realized. ~George Crump 6

Data Compression Techniques to reduce the size of stored data Lossy vs. lossless Generic vs. content aware Identifying repeated bytes Identifying duplicate bytes Identifying similar bytes or objects Discarding irrelevant data Operate on a single file / object at a time 7

Data Compression Benefits Mature, generic algorithms Big savings on low entropy data (e.g. text) Big savings on rich media Broad hardware support for specific technologies Low memory requirements 8

Data Compression Challenges High processor requirements No cross-object savings Complex licensing on media formats Always in the data read path Modifies byte stream on storage Savings constant regardless of data scale 9

Data Deduplication Conceptually a sort of compression Techniques to eliminate data being stored Single-instance vs. sub-file Fixed block vs. variable block Generic vs. content aware Always lossless Operates across a file system, LUN, or entire storage pool 10

Data Deduplication Benefits Big savings on redundant data (e.g. VM, database) Lower CPU requirements than compression No impact on data read Underlying data isn t modified Savings scale with more data stored 11

Data Duplication Challenges Higher memory requirements Limited applicability for media files No standardization for software implementations Limited scale in most solutions 12

Deduplication: Backup vs. Primary Primary workflows require massive scale and high performance Backup data model allows for simplifications not applicable to primary storage Bloom filters are unfeasibly computationally expensive for random-access deletion Differencing methods require larger blocks than primary dedupe allows Buffering for large look-back window Locality knowledge to individual sources Large block similarities Backup dedupe doesn t adapt to primary storage use case Backup Primary Data Flow Stream-Oriented Random Access Latency Critical No Yes Typical Chunk Size 1 MB and up 4 KB to 64 KB Index Lookups Thousands/sec Millions/sec # Objects 100s Millions 100s Billions Unique Data 1 to 60 TB 1 TB to PBs 13

Gartner Priority Matrix for Storage Technologies Big Data and Extreme Information Processing and Management Data Deduplication Enterprise-Grade Solid- State Drives Thin Provisioning Deduplication identified as a transformational storage technology Source: Gartner Hype Cycle for Storage Technologies, 2011 26 July 2011 ID:G00214638 14

IDC 2010 Trends in Storage and Virtualized Environments 55% IDC 2010 Trends in Storage and Virtualized Environments Survey Noemi Greyzdorf, Benjamin Woo Nov 2010 Doc #225059 15

Deduplication Impact 1 ZB = 1 Trillion Gigabytes 16

Compression vs. Deduplication Compression Deduplication Impact on Read High None Impact on Write High Moderate Savings on VM Low High Savings on Media High (some formats) Low CPU Requirements High Moderate Memory Requirements Low High Scalability Unlimited Varies by Implementation Impact on Reliability Moderate Low 17

Compression and Deduplication Compression identifies "micro" duplicates Dedupe identifies "macro" duplicates Dedupe then compress (easiest), or Compress then dedupe (requires compressed format segmentation) File Data Segment and Dedupe Compress File Data Segment and Compress Dedupe 18

Compression and Deduplication Compression (e.g. LZ) identifies "micro" duplicates Deduplication identifies "macro" duplicates 35.1 Compression Only Dedupe Only 10.1 Data Reduction Ratio 21.3 12.1 18.1 6.4 Dedupe + Compression 6.8 6.7 2.3 1.6 1.1 1.1 2.5 2.8 2.1 1.4 1.5 1.5 VMware Database Log files Office 2007 User Directories Exchange 19

Primary Data Efficiency Impact For 1PB Enterprise Primary environment Conclusion: Compression is good Dedupe is better: >3x data reduction over wider data set range Dedupe + Compression is best of breed 20

Deploying Primary Optimization Where does it run? Integrated into Storage Intermediary Appliance Host Software When Inline Post-process Parallel 21

Intermediary Appliance Storage optimization runs on a separate hardware device All data passes through the appliance on read and write Benefits: Brings storage optimization to legacy platforms Challenges: Additional hardware expense Introduces bottleneck to all I/O operations Appliance can mask functionality Failure can affect availability Data lock-in to optimization appliance technology 22

Host Software Storage optimization takes place on application host All data passes through the software on read and write Benefits: Brings storage optimization to legacy platforms No additional hardware cost or complexity Challenges Difficult to implement with shared storage Consumes host CPU and memory resources Data lock-in to specific optimization technology 23

Deploying Deduplication In Write Path, Out of Read Path In Write Path, In Read Path Primary Optimization Primary Optimization Out of Write Path, Out of Read Path Out of Write Path, Out of Read Path Primary Optimization Primary Optimization 24

Conclusion Data continues to grow exponentially Primary Storage Optimization technologies save you money (This includes thin provisioning if you re not doing that yet) Compression + Deduplication is best Different integration models have different tradeoffs Cost Performance Data Savings Operational impact (availability, reliability, etc.) In the end, optimization will move into the storage 25