Scale-out Data Deduplication Architecture

Similar documents
Scale-out Object Store for PB/hr Backups and Long Term Archive April 24, 2014

Milestone Solution Partner IT Infrastructure Components Certification Report

White Paper Simplified Backup and Reliable Recovery

Maximizing Data Efficiency: Benefits of Global Deduplication

Tom Sas HP. Author: SNIA - Data Protection & Capacity Optimization (DPCO) Committee

Data Deduplication Methods for Achieving Data Efficiency

Scale-Out Architectures for Secondary Storage

HYDRAstor: a Scalable Secondary Storage

NEC Express5800 R320f Fault Tolerant Servers & NEC ExpressCluster Software

NEC Sets High Bar in Scale-Out Secondary Storage Market

Mainframe Backup Modernization Disk Library for mainframe

ADVANCED DATA REDUCTION CONCEPTS

HOW DATA DEDUPLICATION WORKS A WHITE PAPER

The storage challenges of virtualized environments

The Data-Protection Playbook for All-flash Storage KEY CONSIDERATIONS FOR FLASH-OPTIMIZED DATA PROTECTION

Converged Platforms and Solutions. Business Update and Portfolio Overview

ADVANCED DEDUPLICATION CONCEPTS. Thomas Rivera, BlueArc Gene Nagle, Exar

HPE SimpliVity. The new powerhouse in hyperconvergence. Boštjan Dolinar HPE. Maribor Lancom

Verron Martina vspecialist. Copyright 2012 EMC Corporation. All rights reserved.

How to solve your backup problems with HP StoreOnce

Dell Storage Point of View: Optimize your data everywhere

Nutanix Tech Note. Virtualizing Microsoft Applications on Web-Scale Infrastructure

Cohesity Flash Protect for Pure FlashBlade: Simple, Scalable Data Protection

The World s Fastest Backup Systems

DAHA AKILLI BĐR DÜNYA ĐÇĐN BĐLGĐ ALTYAPILARIMIZI DEĞĐŞTĐRECEĞĐZ

Cohesity SpanFS and SnapTree. White Paper

Copyright 2010 EMC Corporation. Do not Copy - All Rights Reserved.

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Virtualization Selling with IBM Tape

Zero Data Loss Recovery Appliance DOAG Konferenz 2014, Nürnberg

Archiving, Backup, and Recovery for Complete the Promise of Virtualisation Unified information management for enterprise Windows environments

IBM ProtecTIER and Netbackup OpenStorage (OST)

Oracle Zero Data Loss Recovery Appliance (ZDLRA)

Cat Herding. Why It s Time for a Millennial Approach to Storage. Cloud Expo East Western Digital Corporation All rights reserved 01/25/2016

Discover the all-flash storage company for the on-demand world

Veeam with Cohesity Data Platform

Choosing the Right Deduplication Solution for Your Organization

PRESENTATION TITLE GOES HERE

NEC HYDRAstor Date: September, 2009 Author: Terri McClure, Senior Analyst, and Lauren Whitehouse, Senior Analyst

2012 Enterprise Strategy Group. Enterprise Strategy Group Getting to the bigger truth. TM

HPE SimpliVity 380. Simplyfying Hybrid IT with HPE Wolfgang Privas Storage Category Manager

See what s new: Data Domain Global Deduplication Array, DD Boost and more. Copyright 2010 EMC Corporation. All rights reserved.

DATACENTER AS A SERVICE. We unburden you at the level you desire

50 TB. Traditional Storage + Data Protection Architecture. StorSimple Cloud-integrated Storage. Traditional CapEx: $375K Support: $75K per Year

Trends in Data Protection CDP and VTL

Buy vs Build: Converged Platforms are the New End Game. Johannes Sieben Dell EMC CPSD varchitect Hyper-Converged

Renovating your storage infrastructure for Cloud era

Business Resiliency in the Cloud: Reality or Hype?

April 2010 Rosen Shingle Creek Resort Orlando, Florida

De-dupe: It s not a question of if, rather where and when! What to Look for and What to Avoid

Enterprise Cloud Data Protection With Storage Director and IBM Cloud Object Storage October 24, 2017

EMC DATA DOMAIN PRODUCT OvERvIEW

The Technology Behind Datrium Cloud DVX

Business Benefits of Policy Based Data De-Duplication Data Footprint Reduction with Quality of Service (QoS) for Data Protection

It s More Than Just Backup. Sean Craig

Reducing Costs in the Data Center Comparing Costs and Benefits of Leading Data Protection Technologies

VMworld 2018 Content: Not for publication or distribution

Cloud Meets Big Data For VMware Environments

Hedvig as backup target for Veeam

IM B32: What s New in NetBackup: The Vision and Roadmap

Microsoft Applications on Nutanix

Protect enterprise data, achieve long-term data retention

Veritas NetBackup Appliance Family OVERVIEW BROCHURE

HYCU and ExaGrid Hyper-converged Backup for Nutanix

Simple Data Protection for the Cloud Era

IBM řešení pro větší efektivitu ve správě dat - Store more with less

HYDRAstor: a Scalable Secondary Storage

Overview Brochure. Veritas NetBackup TM Appliance Family

TITLE. the IT Landscape

Elevate the Conversation: Put IT Resilience into Practice for Cloud Service Providers

Scality RING on Cisco UCS: Store File, Object, and OpenStack Data at Scale

Why Datrium DVX is Best for VDI

If you knew then...what you know now. The Why, What and Who of scale-out storage

The Microsoft Large Mailbox Vision

São Paulo. August,

Solution Brief: Commvault HyperScale Software

Business Continuity and Disaster Recovery. Ed Crowley Ch 12

DELL EMC ISILON ONEFS OPERATING SYSTEM

Dell Fluid Data solutions. Powerful self-optimized enterprise storage. Dell Compellent Storage Center: Designed for business results

The Business Case to deploy NetBackup Appliances and Deduplication

Balakrishnan Nair. Senior Technology Consultant Back Up & Recovery Systems South Gulf. Copyright 2011 EMC Corporation. All rights reserved.

Ten things hyperconvergence can do for you

The Impact of Hyper- converged Infrastructure on the IT Landscape

Deduplication and Its Application to Corporate Data

Nutanix White Paper. Hyper-Converged Infrastructure for Enterprise Applications. Version 1.0 March Enterprise Applications on Nutanix

Top Trends in DBMS & DW

Availability in the Modern Datacenter

Modernize Your Backup and DR Using Actifio in AWS

IBM Storage Software Strategy

Modernize Your IT with Dell EMC Storage and Data Protection Solutions

WHITE PAPER: GREEN IT. Using Software to Reduce Power Consumption. By Bruce Naegel & Jose Iglesias. June 1, 2008

IBM Spectrum Protect Version Introduction to Data Protection Solutions IBM

Case study: Building bi-directional DR. Joep Piscaer, VMware vexpert, VCDX #101

DEMYSTIFYING DATA DEDUPLICATION A WHITE PAPER

NetVault Backup Client and Server Sizing Guide 2.1

Storage Solutions for VMware: InfiniBox. White Paper

The Impact of Hyper- converged Infrastructure on the IT Landscape

Backup and archiving need not to create headaches new pain relievers are around

HPE Synergy HPE SimpliVity 380

The Data Protection Rule and Hybrid Cloud Backup

Transcription:

Scale-out Data Deduplication Architecture Gideon Senderov Product Management & Technical Marketing NEC Corporation of America

Outline Data Growth and Retention Deduplication Methods Legacy Architecture Limitations Adaptive Platform for Long-term Data Enhanced Scale-out Attributes Scalability Performance Resiliency Page 2

Long-term Data According to SNIA, 80% of files are dormant, unchanged. According to Contoural, average file shares grow 40% annually. Non-mission critical data filling costly Tier 1 storage (EB) 90 80 70 60 50 40 30 20 10 0 Consumption of Enterprise Disk Capacity by Type 88% 2008 2009 2010 2011 2012 2013 2014 Content Depots & Cloud Unstructured data Replicated data Structured data CAGR 75.6% 54.8% 24.2% 23.6% Page 3 Source: IDC 2010

Data Outlasts Its Container MIGRATE DATA MIGRATE DATA DATA MINING MIGRATE DATA WWW DATA MINING FILE SHARING FILE SHARING Months Years Decades Centuries Increasing Enterprise Size Page 4

Data Migration Costs In 2011 there will be 1775 Exabytes of information Archive data outlasts a storage container, data migrate costs $1K $1.8K per TB 15% of Storage Admin s time spent on data migration Source: IDC, Gartner, Contoural Consulting Page 5

Deduplication Granularity File (SIS) Data deduplication performed at a file or object granularity Sub-file (Block) Data deduplication performed at a sub-file granularity Fixed size chunk Variable size chunk Sub-file variable size chunk deduplication benefits Greater efficiency by deduplicating data at finer granularity Automatic adjustments ( slide ) across inserted data Page 6

Inline vs. Post-process Inline Data deduplication performed before writing the deduplicated data Post-Process Data deduplication performed after the data to be deduplicated has been initially stored Inline deduplication benefits Eliminate need for disk buffer for un-deduplicated data Eliminate need for subsequent idle time for processing Immediate replication and shorter RTO for DR Page 7

Generic vs. Application-aware Generic Deduplication with same algorithm against all data streams Application-aware Custom deduplication for specific data streams to account for metadata inserted by the corresponding applications Application-aware deduplication benefits Greater efficiency by eliminating negative impact of application metadata on deduplication ratio Cross-application deduplication for greater scope and higher deduplication efficiency Page 8

Local vs. Global Local Deduplication across a single node or sub-node, requiring separate deduplication repositories for multiple nodes Global Deduplication spanning multiple nodes with a single shared deduplication repository across all nodes Global deduplication benefits Greater scalability of a single deduplication repository Greater efficiency with a broader scope spanning multiple nodes Page 9

Legacy Scalability Limitations Inadequate scalability of capacity & performance Cannot scale performance to keep up with growth Multiple products with different architectures More siloed capacity to manage Limited deduplication scope Limited scalability proliferates duplicate data across appliances Lower deduplication ratio for large environments Page 10

Local vs. Global Scope Single deduplication repository across entire solution Data deduplication across ALL data from ALL nodes Cross-node dedupe for greater efficiency Cross-application dedupe with application-awareness Local Volume Local Node Global Dedupe Repository Dedupe Repository Dedupe Repository Dedupe Repository Dedupe Repository Dedupe Repository Page 11

Legacy Resiliency Limitations Day 1: Full Day 2: Incremental Day 3: Incremental Day 7: Full 1 7 1 4 4 6 2 1 1 6 6 3 5 1 1 6 4 7 1 6 1 4 7 8 Q: What data can be restored if block #1 lost? A: NONE! 1 2 3 4 5 6 7 8 Traditional RAID is not sufficient for deduped data Page 12

Adaptive Scale-out Architecture One system becomes Greener, Faster, Denser Non-disruptive Remove Non-disruptive Add, Upgrade V1 Grid + V1 + V2 + Vx = 1 System Concurrently support multiple HW generations Non-disruptive resource scaling Zero provisioning, dynamic self-management Automatically balance system workload In-place technology refresh No migration Page 13

Performance Independent Linear Scalability Uniquely Scale Performance and Capacity to Meet Current and Future Needs Capacity Customized performance and capacity per application Dynamically adjust to changing data growth Page 14

Enhanced Scale-out Attributes Criteria Scope Performance Protection Attributes GLOBAL DEDUPE SCALABLE DEDUPE SAFE DEDUPE Page 15

Scope Multiple applications and data sources across a single scalable deduplication repository Broader scope leads to greater efficiencies Higher dedupe ratios Improved capacity utilization Easier management Longer retention periods Lower costs Single scalable system vs. multiple disparate silos Cross-application deduplication for greater efficiency Global deduplication for entire environment Page 16

Performance Linear performance scalability to keep up with data growth Scalability Linear scalability vs. high degradation Independent performance scalability Beyond the physical boundaries of the system Inline dedupe vs. post-process Keeping up with the workload Effects of deduplication rates Higher performance with higher dedupe Page 17

Scale-out Dedupe Architecture Multi-node architecture Inline data routing Processing and memory scales with capacity Distributed hash table Linear performance scalability Global deduplication across ALL nodes Page 18

Page 19 Linear Performance Scalability

Resiliency Enhanced data resiliency, especially with fewer copies due to deduplication The greater the dedupe ratio, the greater the exposure Maximum protection against multiple failures Component failures Appliance or node failures Upgradeability and technology refresh In-place technology refresh vs. forklift upgrade Online versus downtime Configurable resiliency for different applications Investment protection Page 20

Erasure-coded Resiliency User dialable disk/node protection Intermix of multiple resiliency levels Dynamically allocated protection Greater protection with less overhead No idle spare drives Faster healing while maximizing I/O performance Only data is reconstructed Reconstruction across multiple spindles/processors Page 21

Long-term Evolution Functionality Application-specific requirements (Security, Resiliency) Regulatory compliance requirements Performance Linear scalability throughput HW/SW enhancements Multiple I/O profiles Attributes Flexibility Efficiency Granularity Page 22

Storage Spending Long-term Data Protection Clones/Snapshots WAN-Optimized Replication Erasure-Coded Resiliency Inline Global Deduplication Automatic Load Balancing Dynamic Provisioning Massive Scalability Features drive savings Months Years Decades Centuries Time Page 23