Inside Track Research Note. In association with. Everyone s talking about Dedupe Where does dedupe and data compression make sense?

Similar documents
The power management skills gap

Inside Track Research Note. In association with. Will hyper-converged infrastructure kill traditional storage? Whoa - not so fast!

Why In-Place Copy Data Management is The Better Choice

Business Benefits of Policy Based Data De-Duplication Data Footprint Reduction with Quality of Service (QoS) for Data Protection

Inside Track Research Note. In association with. Protecting data in your hyper-converged environment. Keep it safe, keep it simple

Inside Track Research Note. Sponsored by. Understanding Software-Defined Storage. How it works and why its time has come

Inside Track Research Note. In association with. Data Protection and Management in a Box Appliances that take the pain out of backup and archiving

Rethinking VDI: The Role of Client-Hosted Virtual Desktops. White Paper Virtual Computer, Inc. All Rights Reserved.

White paper ETERNUS CS800 Data Deduplication Background

Inside Track Research Note. In association with. All-in-one Data Protection Taking the pain out of information management

COMMVAULT. Enabling high-speed WAN backups with PORTrockIT

FUJITSU Backup as a Service Rapid Recovery Appliance

Understanding Virtual System Data Protection

De-dupe: It s not a question of if, rather where and when! What to Look for and What to Avoid

ADVANCED DEDUPLICATION CONCEPTS. Thomas Rivera, BlueArc Gene Nagle, Exar

WHITE PAPER. How Deduplication Benefits Companies of All Sizes An Acronis White Paper

Why Datrium DVX is Best for VDI

DEDUPLICATION BASICS

ADVANCED DATA REDUCTION CONCEPTS

Flash Decisions: Which Solution is Right for You?

SYSTEM UPGRADE, INC Making Good Computers Better. System Upgrade Teaches RAID

Inside Track Research Note. In association with. Thin Client Devices Revisited Technology best forgotten, or time for a renaissance?

Specifying Storage Servers for IP security applications

NetApp SolidFire and Pure Storage Architectural Comparison A SOLIDFIRE COMPETITIVE COMPARISON

Go Cloud. VMware vcloud Datacenter Services by BIOS

SECURITY THAT FOLLOWS YOUR FILES ANYWHERE

Evaluator Group Inc. Executive Editor: Randy Kerns

It s Time to Move Your Critical Data to SSDs Introduction

The Data-Protection Playbook for All-flash Storage KEY CONSIDERATIONS FOR FLASH-OPTIMIZED DATA PROTECTION

Technology Insight Series

Hybrid WAN Operations: Extend Network Monitoring Across SD-WAN and Legacy WAN Infrastructure

The case for cloud-based data backup

Guide. A small business guide to data storage and backup

Discover the all-flash storage company for the on-demand world

Atlantis Computing Adds the Ability to Address Classic Server Workloads

Trends in Data Protection and Restoration Technologies. Mike Fishman, EMC 2 Corporation

Optimizing and Managing File Storage in Windows Environments

FROM A RIGID ECOSYSTEM TO A LOGICAL AND FLEXIBLE ENTITY: THE SOFTWARE- DEFINED DATA CENTRE

Information Lifecycle Management for Business Data. An Oracle White Paper September 2005

Ensuring business continuity with comprehensive and cost-effective disaster recovery service.

SolidFire and Pure Storage Architectural Comparison

White Paper Features and Benefits of Fujitsu All-Flash Arrays for Virtualization and Consolidation ETERNUS AF S2 series

STORAGE CONSOLIDATION AND THE SUN ZFS STORAGE APPLIANCE

In association with. The Impact of Automation on IT Operations. Are you ready for the software-defined datacentre?

Drive Space Efficiency Using the Deduplication/Compression Function of the FUJITSU Storage ETERNUS AF series and ETERNUS DX S4/S3 series

IBM Real-time Compression and ProtecTIER Deduplication

Not all SD-WANs are Created Equal: Performance Matters

Storage Performance Validation for Panzura

Hyper-converged Secondary Storage for Backup with Deduplication Q & A. The impact of data deduplication on the backup process

Global Headquarters: 5 Speen Street Framingham, MA USA P F

Ensuring a Successful DCIM Project. What You Need to Know. Paul Goodison CEO, Cormant Inc.

Rio-2 Hybrid Backup Server

BIG DATA INDUSTRY PAPER

Symantec NetBackup 7 for VMware

Unstructured data: A management overview

Accelerate Your Enterprise Private Cloud Initiative

The Microsoft Large Mailbox Vision

HP StorageWorks D2D Backup Systems and StoreOnce

Hedvig as backup target for Veeam

Technology Insight Series

LTO and Magnetic Tapes Lively Roadmap With a roadmap like this, how can tape be dead?

Optimisation drives digital transformation

Keeping the lid on storage

Efficient Data Center Virtualization Requires All-flash Storage

EXCHANGE 2016 EDITIONS, LICENSING AND CAPABILITIES. Md Shaifullah Mozide Palash Mobile

Chapter 1. Storage Concepts. CommVault Concepts & Design Strategies:

Deduplication and Its Application to Corporate Data

A SERVICE ORGANIZATION S GUIDE SOC 1, 2, & 3 REPORTS

ECONOMICAL, STORAGE PURPOSE-BUILT FOR THE EMERGING DATA CENTERS. By George Crump

ADVANCED SECURITY MECHANISMS TO PROTECT ASSETS AND NETWORKS: SOFTWARE-DEFINED SECURITY

It s Time for WAN Optimization to Evolve to Meet the Needs of File Collaboration

Tom Sas HP. Author: SNIA - Data Protection & Capacity Optimization (DPCO) Committee

Availability and the Always-on Enterprise: Why Backup is Dead

New Zealand Government IBM Infrastructure as a Service

White paper: Agentless Backup is Not a Myth. Agentless Backup is Not a Myth

whitepaper: Whitelisting Without The Complexity

Ultra-Low Latency Down to Microseconds SSDs Make It. Possible

HP Dynamic Deduplication achieving a 50:1 ratio

SECURE CLOUD BACKUP AND RECOVERY

Data center interconnect for the enterprise hybrid cloud

Red Hat Enterprise Virtualization (RHEV) Backups by SEP

Evolved Backup and Recovery for the Enterprise

Protect enterprise data, achieve long-term data retention

Microsoft Virtualization Delivers More Capabilities, Better Value than VMware

vsan 6.6 Performance Improvements First Published On: Last Updated On:

A Ready Business rises above infrastructure limitations. Vodacom Power to you

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight

COMPREHENSIVE RETENTION COMPLIANCE: HOW KEEPITSAFE ONLINE BACKUP CAN HELP YOUR BUSINESS

Moving From Reactive to Proactive Storage Management with an On-demand Cloud Solution

PORTrockIT. IBM Spectrum Protect : faster WAN replication and backups with PORTrockIT

Security Enhancements

THE STATE OF CLOUD & DATA PROTECTION 2018

Total Cost of Ownership: Benefits of the OpenText Cloud

PANASAS TIERED PARITY ARCHITECTURE

DESIGNING AN ACTIVE ARCHIVE

StarWind Virtual SAN Free

Notes & Lessons Learned from a Field Engineer. Robert M. Smith, Microsoft

Better backups with Data Protection Manager 2007

TOP REASONS TO CHOOSE DELL EMC OVER VEEAM

Don t Get Duped By Dedupe: Introducing Adaptive Deduplication

Transcription:

Research Note In association with Everyone s talking about Dedupe Where does dedupe and data compression make sense? April 2016

In a nutshell About this The insights presented in this document are derived from independent research conducted by Freeform Dynamics. Inputs into this include in-depth discussions with IT vendors and service providers on the latest technology developments, along with intelligence gathered from mainstream enterprises during broader market studies. Is the use of data compression and deduplication technologies a good fit for all enterprise storage use cases? From a practical perspective, data deduplication and data compression are IT solutions designed to minimise the total amount of physical data written to disk. Storage is a very hot topic in every data centre and computer room. Data volumes keep rising and users demand ever better performance and availability, all without pushing storage costs any higher. Over the past few years a new form of storage has enjoyed great success shaking up what was previously a very stable ecosystem. Flash storage and SSDs offer great performance compared to traditional spinning disks, but until recently they also came with a very high cost. In fact the costs were high enough that the use of flash storage could often only be justified for workloads that demanded the very best performance and lowest latency. In order to expand the potential range of use cases, vendors began to offer flash storage solutions that employed special functionality to limit the total data that had to be stored. Data deduplication, usually truncated to dedupe, and data compression are now positioned as simple ways to limit storage expenditure. In fact such functionality is now so widely marketed that data compression and dedupe have become standard fixtures on RFP s for many storage solutions, not just the new flash based systems but also those using traditional spinning disks. This begs the question, is the use of data compression and deduplication technologies something that is a good fit for all enterprise storage use cases? More importantly, should you be applying these techniques to everything by default? This paper considers the advantages and challenges associated with data compression and deduplication solutions and discusses what to think about when looking at their use in the acquisition of new storage. What happens during dedupe and data compression? From a practical perspective, data deduplication and data compression are IT solutions designed to minimise the total amount of physical data written to disk, regardless of the natural size of what s being stored. Some people even regard deduplication to be a specialised case of more general data compression technologies. There are many different architectural approaches involved in making the magic happen, the details of which are too complex to discuss here in detail, but let s consider some of the basic principles. On writing data to storage, fundamentally, data compression uses complex mathematics to reduce the actual number of bits required to hold a file or block of data. Data deduplication, on the other hand, essentially looks for data patterns that match others already held on the system, so it can store pointers to those rather than storing the same data over again. Both processes take place automatically and invisibly to the user or requesting application. When reading back data these processes are reversed, and the data requested is recreated from either it s compressed or deduplicated form. The rehydration process is again automatic and invisible. Hopefully our description here has made sense so far, but to clarify further, take a look at our visual schematic of compression and deduplication in action, which will help to visualise one of the key principles we have been discussing (Figure 1). Copyright 2016, Freeform Dynamics Ltd Page 2

Figure 1 Data deduplication and compression basics it s all about reducing storage volumes Practical considerations Dedupe and compression rates vary considerably depending on the type of data, but in many cases the amount of physical storage required to hold data can be substantially reduced. Data compression and deduplication inevitably place overheads on both read and write operations. There are a number of practical considerations to bear in mind when looking at the use of dedupe and compression in your environment, some positive, others you need to be wary of. Storage volume Dedupe and compression rates vary considerably depending on the type of data, but in many cases the amount of physical storage required to hold data can be substantially reduced, thereby saving acquisition costs, as well as optimising certain consequential operational costs. The latter can include the power consumed by the storage system, but other softer benefits can be even more valuable. One commonly referenced is the reduction in time it takes to protect data during backup operations. Another is a reduction in the volume of traffic that needs to be transported over a WAN to replicate data. Performance of writes and reads The very processes associated with data compression and deduplication inevitably place overheads on both read and write operations. The magnitude of the overhead induced will vary depending upon many factors, including how the data is stored and the architecture of the storage platforms. In particular the latency induced in both read and write operations is normally too great to be considered practical for use with systems built using hard disks. Depending on the type of data being written, in systems where dedupe and compression are always on, performance impacts may be visible without providing significant storage reduction benefits. Material changes to stored data Data compression clearly changes the data stored as you are not writing it in its original, natural form. In many business use cases this may not be an issue, but some Copyright 2016, Freeform Dynamics Ltd Page 3

While both compression and deduplication technology is generally very reliable, it is not totally fool proof. It is clear that using dedupe and compression capabilities may make sense for some use cases, but not others. data sets may have external regulatory or audit requirements that make it inappropriate to use the kind of techniques we have been discussing without creating a compliance risk. Other concerns While both compression and deduplication technology is generally very reliable, it is not totally fool proof. Applying advanced mathematics on the fly, which is essentially what s going on, can occasionally lead to anomalies and even loss of data. It is always worth checking with suppliers and potentially regulators and auditors to seek assurances that there are no significant risks given the types of data you store. Use cases for and against dedupe and compression As dedupe and compression offer both potential benefits and disadvantages, it is clear that using systems with these capabilities may make sense for some use cases, but not others. The exact nature of the storage reduction benefits achieved depend entirely on the data sets concerned. In backup environments where data is frequently stored multiple times, although it s difficult to generalise, compression ratios can be very dramatic. Gains in relation to live data stored in production environments, however, are generally much less, with volume reduction ratios typically in the range of 3:1 to 5:1. There are also many important scenarios where data deduplication may have no significant impact on data volumes stored but can add to storage latency and hence to service performance degradation. The computational overhead and increase in latency on both writes and reads of data usually makes compression and deduplication difficult on storage systems built using hard disks, unless such systems are designed explicitly to cater for these functions. The inherently lower latency found in flash based storage systems can make data dedupe and/or compression a lot more viable, provided the other concerns mentioned above have been addressed. The naturally conservative nature of IT professionals looking after corporate data has meant that up until now, data dedupe and compression have only been utilised extensively in a few scenarios. These factors, along with the naturally conservative nature of IT professionals looking after corporate data has meant that up until now, data dedupe and compression have only been utilised extensively in a few scenarios. Based on some of these factors, let s take a closer look at where the opportunities for effective use of dedupe and compression technologies lie, and where they might be less applicable. Where dedupe and compression make most sense Several generic use cases have been shown to make effective use of data deduplication and compression with virtual environments? Examples include virtual servers and desktop virtualisation scenarios where it is likely common data sets will be utilised in virtual system after virtual system, e.g. operating system files etc. The same applies to backup systems where in many architectures the same file may be stored as data is protected repeatedly, time after time. Let s look at a few typical use cases in more depth: Copyright 2016, Freeform Dynamics Ltd Page 4

It is very easy to clone virtual servers, thus the opportunity for deduplication to reduce data volumes stored is very clear. Dedupe in particular has achieved good adoption in relation to long term data protection systems. Large scale web services Very large web services, especially those used in consumer markets, can achieve great benefits in terms of reduced data volumes, and hence cost avoidance, by employing compression and data deduplication. Server and desktop virtualisation It is very easy to clone virtual servers, thus in many organisations it is possible that several copies, in some scenarios potentially hundreds or even thousands, of certain virtual machines may need to be stored at any one time. The opportunity for deduplication to reduce data volumes stored is very clear. Equally, in situations where desktop virtualisation is deployed there may be scenarios where many desktops are essentially identical, once again providing opportunities for the effective use of data deduplication technologies. Backup and recovery An area where dedupe in particular can be very effective and in which it has achieved good adoption is in long term data protection systems. The reason for this is simple depending on the information being protected and how often it changes many identical copies of the same data can be stored. As a consequence the application of data deduplication can result in very high data reduction rates. Indeed, it is possible for many data sets to be reduced in size by very high factors, with results of compression ratios of 30 to 1 or more not uncommon depending on how frequently backups are performed. The reduction in data volumes can also have benefits in the time required to protect data, reducing the overall backup window. It is also a task where latency is not usually a sensitive matter hence making the compression and dedupe overhead far easier to entertain. However, it should be noted that while backup windows can be significantly reduced, the time to restore data from deduped sets can be extended due to the extra processing required to retrieve information, and this consideration must be borne in mind when looking at recovery time objectives. Archiving Similar arguments also apply to the use of compression and dedupe in long term archiving. The amount of information that needs to be archived continues to grow rapidly, and the cost benefits achieved by reducing data volumes via dedupe and compression can be substantial. Once again, the nature of archiving makes any latency associated with the deduplication and compression processes straightforward to absorb. However, dedupe ratios tend to be lower than for backup solutions as data is written only once. The inappropriate application of dedupe and / or compression technologies can lead to performance degradation with no benefits to offset this disadvantage. Where data deduplication and compression may prove ineffective There are certain areas where the application of data compression and deduplication technologies may deliver little in the way of storage volume reductions, but given the processing overhead inherent in such solutions they still add latency. Compressed data Some data sets, for example video and other media content, are natively stored in very compact formats making them unlikely to benefit significantly from using deduplication and data compression solutions unless multiple copies of the same content are stored time after time. Thus the inappropriate application of such Copyright 2016, Freeform Dynamics Ltd Page 5

capabilities can, once again, lead to performance degradation with no benefits to offset this disadvantage. It is usually impossible to use data compression and deduplication on data sets that are encrypted. In situations where absolute performance is key the use of deduplication and compression technologies is unlikely to be considered. To date, the overhead of data reduction technologies has limited their use in the storage of active data. Undermined by encryption Equally, it is usually impossible to use data compression and deduplication on data sets that are encrypted. After all, the very purpose of encrypting data is to make it unreadable and essentially this involves removing any patterns and repetitions in the data stream that could provide a clue to the contents. This, in turn, makes it impossible to compress and deduplicate. Databases The way that many database management systems store data can mean that data deduplication and compression solutions have little effect on data volumes. The precise benefit gain in data storage varies between different database systems, but it is always the case that the latency overhead inherent in such systems will reduce effective service performance on both data writes and reads. As a consequence, up to the time of writing data deduplication solutions have not been applied widely to such systems. High performance systems Today any system which requires the best possible performance is almost certain to utilise all flash array storage. In situations where absolute performance is key, the use of deduplication and compression technologies is unlikely to be considered as these incur additional latency, however small. If the storage is planned to support very high performance systems along with other enterprise services, an all flash array should be capable of selectively utilising dedupe / compression only for workloads where it makes sense. Use case evolution To date, the overhead of data reduction technologies has limited their use in the storage of active data, i.e. the data that is required by applications and users in everyday business processes. This is certainly the case in the context of hard disk based solutions, where the latency overhead induced can be difficult, certainly expensive, to overcome. Until recently the additional cost associated with flash based storage systems has restricted their use to meet the requirements of applications and services where performance is an absolute. As a consequence, while some business cases have used projected data compression and deduplication factors to reduce the apparent cost per terabyte of flash storage solutions when compared with traditional spinning disks, the real world implementation of these technologies in environments where consistent, lowest latency performance is key has been sporadic. As the price of flash falls and as the technologies mature flash systems are being used to support more mainstream enterprise applications and services. In these scenarios it is likely that the use of dedupe and compression will increase, especially if these capabilities can be selectively applied to specific data sets rather than as a binary on / off selection as is the case in some current offerings. However, on a pure cost per terabyte basis, the incentive to use deduplication and compression will equally be reduced for performance sensitive data sets. Copyright 2016, Freeform Dynamics Ltd Page 6

The bottom line Data dedupe and compression are frequently promoted as the holy grail for storage cost savings. But experience has shown that it is essential the organisation consider the specific usage scenarios. When considering whether to look for data compression and deduplication functionality in new storage solutions there are a few questions that need to be addressed. Despite all the marketing activity at the moment, do not assume that dedupe is broadly applicable, desirable or legal in your environment. Data dedupe and compression are frequently promoted as the holy grail for storage cost savings. But experience has shown that it is essential the organisation consider the specific usage scenarios they need to address to avoid deploying such technologies ineffectively, thus gaining little in the way of data reduction but still experiencing the performance penalty such systems entail. Dedupe and compression may be great for use in your backup environment, but only if you have also considered the potential impact on your recovery times. Equally, data dedupe may offer little, if any benefit to certain other systems, notably database systems and anything involving the storage of already encrypted data. Thus, when considering whether to look for data compression and deduplication functionality in new storage solutions there are a few questions that need to be addressed. In essence these are no different from the standard questions that should be applied to any acquisition of a storage system. The first item, as always, is to understand exactly what are the service capabilities required by the workloads to be supported on the system. Naturally enough these should include an accurate idea of the service levels to be delivered, especially in terms of application / service performance. Clearly the budget available for the projected acquisition will help shape what solutions can be considered. But it is then essential to consider whether there are any specific compliance or regulatory issues governing how the data can be stored, essentially identifying if data compression and / or deduplication is permitted by the demands of data governance. If data compression and dedupe is permitted for some data sets but not others or if different data sets need different performance qualities it is essential to determine which potential solutions provide enough management granularity. Where dedupe and / or compression do make sense for a particular workload, it is important to understand there are several, significantly different, architectural approaches used in solutions. Each of these architectures have different advantages and constraints, so if dedupe and / or compression are required, additional research will be necessary to identify which approach best fits the needs of the services to be supported. Despite all the marketing activity at the moment, do not assume that dedupe is broadly applicable, desirable or legal in your environment. By all means include it in your RFP, but make sure you can manage where and how it is applied within the system. This may cost more than systems having compression and dedupe on all the time (with no ability to control or manage), but in a mixed workload, shared storage environment, lack of control could be a problem. As always, the key to making the right decision is to understand your requirements and constraints, and make sure the technology you are considering will meet your needs without creating any undesirable side-effects. Copyright 2016, Freeform Dynamics Ltd Page 7

About Freeform Dynamics Freeform Dynamics is an IT industry analyst firm. Through our research and insights, we aim to help busy IT and business professionals get up to speed on the latest technology developments, and make better informed investment decisions. For more information, and access to our library of free research, please visit www.freeformdynamics.com. About Fujitsu Fujitsu is the leading Japanese information and communication technology (ICT) company offering a full range of technology products, solutions and services. Approximately 162,000 Fujitsu people support customers in more than 100 countries. We use our experience and the power of ICT to shape the future of society with our customers. Business-centric storage Under the ETERNUS storage solution brand, Fujitsu provides disk systems and data protection appliances. ETERNUS excels in "Business-centric Storage" by synchronizing storage resources with business priorities whilst reducing costs. For more information, please see: www.fujitsu.com/eternus-dx www.fujitsu.com/global/products/computing/storage/disk/eternus-dx/dx200f/ www.fujitsu.com/eternus_cs Terms of use This document is Copyright 2016 Freeform Dynamics Ltd. It may be freely duplicated and distributed in its entirety on an individual one to one basis, either electronically or in hard copy form. It may not, however, be disassembled or modified in any way as part of the duplication process. Hosting of the entire report for download and/or mass distribution by any means is prohibited unless express permission is obtained from Freeform Dynamics Ltd or Fujitsu. The contents contained herein are provided for your general information and use only, and neither Freeform Dynamics Ltd nor any third party provide any warranty or guarantee as to its suitability for any particular purpose. Copyright 2016, Freeform Dynamics Ltd Page 8