Business Benefits of Policy Based Data De-Duplication Data Footprint Reduction with Quality of Service (QoS) for Data Protection

Similar documents
De-dupe: It s not a question of if, rather where and when! What to Look for and What to Avoid

Veritas Scalable File Server (SFS) Solution Brief for Energy (Oil and Gas) Industry

LTO and Magnetic Tapes Lively Roadmap With a roadmap like this, how can tape be dead?

EMC Celerra Replicator V2 with Silver Peak WAN Optimization

Data Protection for Virtualized Environments

Scale-Out Architectures for Secondary Storage

Choosing the Right Deduplication Solution for Your Organization

Nutanix Tech Note. Virtualizing Microsoft Applications on Web-Scale Infrastructure

The Data-Protection Playbook for All-flash Storage KEY CONSIDERATIONS FOR FLASH-OPTIMIZED DATA PROTECTION

Using Self-Protecting Storage to Lower Backup TCO

ADVANCED DEDUPLICATION CONCEPTS. Thomas Rivera, BlueArc Gene Nagle, Exar

Dell DR4000 Replication Overview

Data Protection Options for Virtualized Servers Demystifying Virtual Server Data Protection Best Practices and Technologies

White paper ETERNUS CS800 Data Deduplication Background

Understanding Virtual System Data Protection

ADVANCED DATA REDUCTION CONCEPTS

Deduplication has been around for several

EMC Business Continuity for Microsoft Applications

Flash Storage-based Data Protection with HPE

EMC DATA DOMAIN OPERATING SYSTEM

EMC DATA DOMAIN PRODUCT OvERvIEW

Efficient Data Center Virtualization Requires All-flash Storage

ECONOMICAL, STORAGE PURPOSE-BUILT FOR THE EMERGING DATA CENTERS. By George Crump

SQL Server Consolidation with Server Virtualization on NetApp Storage

Symantec NetBackup 7 for VMware

Disaster Recovery Is A Business Strategy

Veritas NetBackup Appliance Family OVERVIEW BROCHURE

SQL Server 2008 Consolidation

Microsoft Office SharePoint Server 2007

HYCU and ExaGrid Hyper-converged Backup for Nutanix

EMC CLARiiON CX3-40. Reference Architecture. Enterprise Solutions for Microsoft Exchange Enabled by MirrorView/S

BEST PRACTICES FOR OPTIMIZING IT THE OPTIMIZED DATA SERVICES MODEL

How to Protect Your Small or Midsized Business with Proven, Simple, and Affordable VMware Virtualization

Atlantis Computing Adds the Ability to Address Classic Server Workloads

HPE and Micro Focus Data Protection for SAP HANA

Dell Storage Point of View: Optimize your data everywhere

How Architecture Design Can Lower Hyperconverged Infrastructure (HCI) Total Cost of Ownership (TCO)

Protecting Mission-Critical Application Environments The Top 5 Challenges and Solutions for Backup and Recovery

EMC Backup and Recovery for Microsoft Exchange 2007 SP1. Enabled by EMC CLARiiON CX4-120, Replication Manager, and VMware ESX Server 3.

EMC XTREMCACHE ACCELERATES ORACLE

Copyright 2010 EMC Corporation. Do not Copy - All Rights Reserved.

Data Protection Options for Virtualized Servers Demystifying Virtual Server Data Protection Best Practices and Technologies

VMWARE PROTECTION WITH DELL EMC NETWORKER 9

WAN Application Infrastructure Fueling Storage Networks

Virtual Disaster Recovery

A Thorough Introduction to 64-Bit Aggregates

Building Backup-to-Disk and Disaster Recovery Solutions with the ReadyDATA 5200

Availability and the Always-on Enterprise: Why Backup is Dead

WHITE PAPER. Header Title. Side Bar Copy. Header Title 5 Reasons to Consider Disaster Recovery as a Service for IBM i WHITEPAPER

Disaster Recovery Options

IBM Storage Software Strategy

Arcserve Solutions for Amazon Web Services (AWS)

Trends in Data Protection and Restoration Technologies. Mike Fishman, EMC 2 Corporation

Preserving the World s Most Important Data. Yours. SYSTEMS AT-A-GLANCE: KEY FEATURES AND BENEFITS

IBM SmartCloud Resilience offers cloud-based services to support a more rapid, reliable and cost-effective enterprise-wide resiliency.

Key metrics for effective storage performance and capacity reporting

Using ESVA to Optimize Returns on Investment

Moving From Reactive to Proactive Storage Management with an On-demand Cloud Solution

Take control of storage performance

Hybrid WAN Operations: Extend Network Monitoring Across SD-WAN and Legacy WAN Infrastructure

Maximizing your Storage Capacity: Data reduction techniques and performance metrics

Executive Summary. The Need for Shared Storage. The Shared Storage Dilemma for the SMB. The SMB Answer - DroboElite. Enhancing your VMware Environment

DATACENTER AS A SERVICE. We unburden you at the level you desire

DELL EMC DATA DOMAIN BOOST AND DYNAMIC INTERFACE GROUPS

Hyperconverged Infrastructure: Cost-effectively Simplifying IT to Improve Business Agility at Scale

Why Datrium DVX is Best for VDI

UPS system failure. Cyber crime (DDoS ) Accidential/human error. Water, heat or CRAC failure. W eather related. Generator failure

Tom Sas HP. Author: SNIA - Data Protection & Capacity Optimization (DPCO) Committee

Downtime Prevention Buyer s Guide. 6 QUESTIONS to help you choose the right availability protection for your applications

DYNAMIC LINE MANAGEMENT FOR VECTORING SCENARIOS

Virtualization. Disaster Recovery. A Foundation for Disaster Recovery in the Cloud

EMC Business Continuity for Microsoft SharePoint Server (MOSS 2007)

Backup and Recovery: New Strategies Drive Disk-Based Solutions

Solution Brief: Commvault HyperScale Software

EMC Backup and Recovery for Microsoft SQL Server

A Thorough Introduction to 64-Bit Aggregates

Business Continuity and Disaster Recovery. Ed Crowley Ch 12

Is Tape Really Cheaper Than Disk?

Exam Name: Midrange Storage Technical Support V2

Protect enterprise data, achieve long-term data retention

Scale-out Data Deduplication Architecture

White Paper Simplified Backup and Reliable Recovery

HPE Synergy HPE SimpliVity 380

DISASTER RECOVERY- AS-A-SERVICE FOR VMWARE CLOUD PROVIDER PARTNERS WHITE PAPER - OCTOBER 2017

Software-defined Storage by Veritas

Shaping the Cloud for the Healthcare Industry

Enterprise-class desktop virtualization with NComputing. Clear the hurdles that block you from getting ahead. Whitepaper

Veeam and HP: Meet your backup data protection goals

Market Report. Scale-out 2.0: Simple, Scalable, Services- Oriented Storage. Scale-out Storage Meets the Enterprise. June 2010.

For DBAs and LOB Managers: Using Flash Storage to Drive Performance and Efficiency in Oracle Databases

INTRODUCTION TO XTREMIO METADATA-AWARE REPLICATION

Optimizing Web and Application Infrastructure on a Limited IT Budget

TOP REASONS TO CHOOSE DELL EMC OVER VEEAM

DEDUPLICATION BASICS

EMC Integrated Infrastructure for VMware. Business Continuity

Three Key Considerations for Your Public Cloud Infrastructure Strategy

Discover the all-flash storage company for the on-demand world

arcserve r16.5 Hybrid data protection

THE FOUR STEPS TO A TAPELESS BACKUP ENVIRONMENT: YOUR HOW-TO GUIDE FOR DATA MANAGEMENT SUCCESS INTRODUCTION

PURITY FLASHRECOVER REPLICATION. Native, Data Reduction-Optimized Disaster Recovery Solution

Transcription:

Data Footprint Reduction with Quality of Service (QoS) for Data Protection By Greg Schulz Founder and Senior Analyst, the StorageIO Group Author The Green and Virtual Data Center (Auerbach) October 28th, 2008 This Industry Trends and Perspectives Paper is Compliments of: www.emc.com Copyright 2008 StorageIO All Rights Reserved. 1 of 7

Introduction Organizations of all sizes are faced with the demands of storing more data, including multiple copies of the same or similar data, for longer periods of time. The result is an expanding data footprint that results in increased IT resource management costs to support and sustain given levels of application and information Quality of Service (QoS) delivery. An approach to providing relief from the pressures, costs and complexities associated with managing data protection for an expanding data footprint including reducing the amount of data transmitted over wide area network (WAN) links is data de-duplication (de-dupe). Figure 1- Data Footprint Reduction with De-Dupe Data de-dupe (Figure 1) reduces the impact of expanding data footprints 1 by optimizing storage and network bandwidth capacity used to store protected data, enabling improved utilization and more information to be processed in a given time frame. Background and Issues No two IT organizations are alike. All have different recovery point objectives (RPO) and recovery time objectives (RTO) to meet different application and business QoS, service level agreement (SLA) and cost objectives. For some organizations the emphasis will be on performance optimization to reduce or eliminate any negative impact on application availability as a result of data protection operations. For other organizations, or even different applications in a given organization, there will be a priority of optimizing storage space capacity instead of optimizing for performance. Backup and data protection environments change and evolve over time to reflect different business requirements and service objectives. In order to keep up with the growing demands of more data to protect in less time using less physical storage capacity in a cost and energy efficient manner, technology evolution and innovations have been leveraged. In order to meet the diverse needs of different environments of various sizes and focus, new products and techniques need to be scalable in both performance and capacity, as well as resilient while being adaptable, to coexist and preserve backwards compatibility for investment protection while leveraging new technologies too move forward. Data Footprint Reduction Using Data De-Deduplication The impact of an expanding data footprint extends further when a document or file gets backed up to some other medium, perhaps even multiple copies, some of which are sent to alternative sites as part of a business continuance (BC) or disaster recovery (DR) strategy. As new versions 1 See Business Benefits of Data Footprint Reduction white paper at www.storageio.com/xreports.html Copyright 2008 StorageIO All Rights Reserved. 2 of 7

of documents and files are updated, circulated for review and revised, similar versions of the document are stored with similar and duplicated data. In addition to operating, capital and management costs, other aspects of the footprint to consider are the power, cooling, floor-space and environmental (PCFE) or green impact. Data de-duplication eliminates the impact and overhead associated with protecting and managing expanding data footprints by preserving an image or snapshot of the original document or file while only saving the changes or differences unique to each version or copy. The result is that significantly less information or, specifically, duplicate copies of data are stored thus addressing cost and complexities associated with storing expanding data footprints. Fundamentally, data footprint reduction and space optimization using de-dupe can be accomplished either at the source of where the data is being protected from (e.g. host or application server) or at the target destination where data is being backed up. The balance of this paper looks at target based data de-dupe. A popular location for performing de-dupe, given the relative ease of deployment and coexistence with existing installed server software and data protection and data management utilities, is target based data de-duplication. Target based data de-duplication has a relatively low barrier to entry and ease of deployment due to the ability to co-exist with currently installed backup and data protection software, procedures and policies on a local as well as remote or LAN basis. As is the case with any data footprint reduction technology or strategy, there is a balancing act between optimizing performance and optimizing capacity. With data de-duplication, the balancing act exists between the quest to reduce data footprint by boosting storage capacity utilization and to meet QoS and SLA requirements and avoid performance gaps or bottlenecks 2. These balancing acts and requirements lead to industry and vendor debates on the pros and cons of different approaches while often missing the key point of shifting the focus away from the architecture and looking at the QoS requirements for different application needs across different sized deployments. Most industry discussions or debates have centered around in-line, immediate, or in-band de-dupe architectures and more recently around these versus scheduled, deferred or post-processing forms of de-dupe. Vendor driven de-dupe debates typically center on the architecture and capabilities of different solutions. These debates tend to force a decision to perform the data reduction in-line with a focus on algorithms and architecture and de-dupe data reduction ratios to optimize space or on post-processing performance with secondary focus on de-dupe ratios. As a result, most de-dupe discussions driven by vendors involve detailed architectural comparisons and data reduction ratios as opposed to a focus on effective performance for different customer applications, needs and requirements. Policy based data de-dupe gives IT organizations the flexibility to select the applicable policy, for example immediate data footprint 2 See Data Center I/O and Performance Impacts at www.storageio.com/xreports.html Copyright 2008 StorageIO All Rights Reserved. 3 of 7

reduction to optimize storage capacity or scheduled data footprint reduction to meet various QoS and SLA requirements. In the quest to reduce data footprint impacts and optimize storage capacity, investment in existing backup infrastructure, including hardware and software as well as skill set and experience, can be compromised by introducing new technologies that are not compatible or able to co-exist with current policies or procedures or QoS and SLA requirements, This can result in a step backwards instead of going forward. Consequently, when looking at space optimization solutions such as de-dupe, IT organizations need to assess and determine on an application by application basis what tradeoffs in QoS can be made to balance performance and capacity optimization. Put another way, can you afford to degrade backup performance for key applications in the effort to boost de-dupe ratios and storage capacity optimization at the risk of missed SLA objectives? Or would taking the time to align the most applicable de-dupe or space optimization approach to the specific QoS needs of different applications in your environment be the better approach? De-Dupe Architecture Debates As data de-dupe continues to evolve as a technology and feature function, it will become more common in storage solutions as opposed to current separate standalone first generation solutions. With this and the advent of adaptive and flexible policy based de-dupe with the ability to support both immediate de-dupe along with deferred de-dupe, the focus will shift from debates over one architecture versus another to that of where and what approach to use for which application. Granted solutions that can only perform de-dupe using an immediate architecture or, solutions whose architectures can only support deferred de-dupe will continue to debate the merits of their approach. However in general, as has been the case with other technology features including WAN optimization, RAID levels, synchronous and asynchronous data replication among others, with technology maturity and evolution, conversations shift from how architected to when and where to deploy. For example, shifting conversations pertaining to dedupe are occurring around which policy based mode or configuration to use when and where to meet different QoS and SLA objectives. Different Data De-dupe Modes Real-time or In-line de-dupe: Aka: On-the-fly, in-band, immediate De-dupe occurs as data is ingested Optimizes storage capacity Use when performance not top priority Deferred or post-processing de-dupe: Aka: Scheduled or time delayed De-dupe occurs at a later point in time Optimizes performance Use when performance is top priority Many discussions about data de-duplication focus on reduction ratios as opposed to deduplication rates. While effective ratios need to be considered, so does the rate at which data can be de-duplicated, i.e. the de-duplication rate. The importance of this is the ability to gauge if data is being processed within available timeframes, such as backup windows. There is also continued growing awareness of how much effective data is reduced over a broader or larger capacity basis as well as increased focus on effective read and write or backup and restore performance. Copyright 2008 StorageIO All Rights Reserved. 4 of 7

The benefit of immediate de-duplication is that data is reduced on the fly, resulting in a smaller initial footprint on disk. The caveat with immediate is that the de-duplication process (e.g. ingestion rate) must keep up with the backup or data protection process in order to be able to ingest the data without causing a bottleneck and performance degradation to occur. The inverse of immediate data de-duplication is deferred where no performance degradation occurs and there is no negative impact to server based applications or data protection processes. Deferred data de-duplication solutions should be able to ingest or receive data as fast as it is received; assuming no solution based bottlenecks such as a slow or non-intelligent disk storage system exists. For environments where performance and time sensitive processing are important, also look at de-duplication or data compression rates as an indicator of how much data can be processed in a given timeframe such as a constrained backup window. The trade off is some extra disk space to maintain performance for time sensitive applications or processing. In-line vs. Immediate Both in-line and immediate are real-time modes with the data de-duplication occurring while the data is being ingested. With EMC DL3D immediate mode, data de-dupe starts as soon as the first 256MB of data is ingested; the dedupe process is continuous until the backup is completed. Look at the overall performance required for meeting backup windows when comparing in-line or immediate based de-dupe solutions. The Changing De-dupe Landscape In general, data de-dupe is a function and not a product although it may be presented as such. Data de-dupe, like RAID and other common storage technologies, is a feature that can be brought together with other storage features to enable IT organizations to select the modes and configuration options needed to meet specific QoS and availability requirements. A flexible data de-duplication solution in the form of policy based de-dupe, is now appearing as a selectable option that can be tailored to meet service requirements, similar to how different RAID levels or types of snapshots and replication or even types of disk drives can be used in intelligent storage systems to meet various needs and requirements. Similar to aligning the appropriate WAN optimization, replication mode or RAID level to meet performance, availability, capacity and energy needs or requirements, policy-based de-dupe (Table 1) enables IT organizations to align and adapt the technology to their specific application QoS and SLA requirements. De-dupe occurs as data is ingested Optimizes storage capacity Use when performance not top priority De-dupe occurs at a later point in time Optimizes performance Use when performance is top priority Immediate or In-line Primary Focus Deferred or Scheduled Primary Focus Table 1 Immediate vs. Deferred vs. Policy-based De-dupe Policy-Based Adaptive Selectable Selectable With policy-based de-dupe, IT organizations can shift their time and focus to learning how and where to deploy the most applicable mode or approach to meet different application QoS needs in Copyright 2008 StorageIO All Rights Reserved. 5 of 7

their environments. For example, with a policy-based de-dupe solution that enables both immediate and deferred (e.g. scheduled) modes of operations (Figure 2), IT organizations assign the applicable approach to specific application requirement. Figure 2 - Selectable Policy and QoS Based Data De-Dupe Benefits of Policy-Based De-duplication Flexibility to select different de-dupe modes to meet various application QoS needs Immediate mode for space savings combined with replication for BC/DR Deferred or scheduled mode to meet performance needs with deferred space savings Disable for data and applications where de-dupe does not provide a benefit Support more performance when needed to meet backup windows or move more data Enable effective BC and DR capabilities by moving more data with less resources Enable enterprise scalability with stability (performance, availability, capacity) By selecting performance as a priority, de-dupe of data is deferred in order to meet time sensitive processing requirements such as copying or ingesting all data within a given backup or data protection window. Another policy option is to optimize storage capacity space as a primary objective when performance is not a priority by enabling immediate de-dupe of data. The result is that less space is initially needed to store de-duped data; however, this comes at the expense of performance, for example extending a backup window during data ingestion. EMC Policy Based De-duplication Rather than being forced into one model or another or into selecting a target-based solution based on if it is in-line or post-processing, EMC with the DL3D is a solution with the flexibility of policy based de-dupe with the ability to tailor the solution to particular needs and adjust to changing workloads, service level objects and data protection requirements. Benefits of the EMC DL3D Policy-Based De-dupe include: Approximately provides the same amount of performance as other in-line architectures Scheduled or deferred mode maximizes performance for time sensitive backup operations Provides best of both worlds immediate and deferred Adaptability and flexibility to co-exist with existing technologies for investment protection Scalable in terms of performance, availability and capacity to meet diverse needs Policy based de-dupe is included in the purchase price of the EMC DL3D eliminating the need to decide between immediate or deferred mode based architectures Copyright 2008 StorageIO All Rights Reserved. 6 of 7

Call to Action and Summary It is important to keep performance and subsequent application impacts in perspective when looking at data footprint reduction solutions, particularly when looking at heavy thinking approaches, including de-dupe, that add latency to time sensitive processing such as backup and restore. Look beyond data de-dupe ratios. Instead, consider the overall effective amount of data footprint capacity that is reduced - for example, how applicable and achievable are high ratios for your environment, applications and data as well as data protection processes. Most environments will need a mix and balance of performance and capacity optimized capabilities to meet different applications needs that are addressed by policy-based de-dupe. Bottom line, get solution providers to start talking about how their solution will adapt to your requirements instead of how you will adapt your environment, processing, polices and procedures to meet their solution capabilities, and if solution providers want to continue the de-dupe debate, then make sure that you include policy-based de-dupe into the discussions. Learn more about data footprint reduction techniques and technologies along with other storage and data protection management related topics at www.storageio.com. About the author Greg Schulz is founder of the StorageIO Group, an IT industry analyst and consultancy firm as well as author of the books Resilient Storage Network (Elsevier) and The Green and Virtual Data Center (Auerbach) at www.thegreenandvirtualdatacenter.com and www.storageio.com. All trademarks are the property of their respective companies and owners. The StorageIO Group makes no expressed or implied warranties in this document relating to the use or operation of the products and techniques described herein. The StorageIO Group in no event shall be liable for any indirect, inconsequential, special, incidental or other damages arising out of or associated with any aspect of this document, its use, reliance upon the information, recommendations, or inadvertent errors contained herein. Information, opinions and recommendations made by the StorageIO Group are based upon public information believed to be accurate, reliable, and subject to change. Copyright 2008 StorageIO All Rights Reserved. 7 of 7