Maximizing your Storage Capacity: Data reduction techniques and performance metrics

Similar documents
IBM Real-time Compression and ProtecTIER Deduplication

Efficient, fast and reliable backup and recovery solutions featuring IBM ProtecTIER deduplication

Storwize/IBM Technical Validation Report Performance Verification

IBM Storwize V5000 disk system

Business Benefits of Policy Based Data De-Duplication Data Footprint Reduction with Quality of Service (QoS) for Data Protection

DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE

Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades

Backups and archives: What s the scoop?

Archiving, Backup, and Recovery for Complete the Promise of Virtualisation Unified information management for enterprise Windows environments

IBM Storwize V7000: For your VMware virtual infrastructure

Preserving the World s Most Important Data. Yours. SYSTEMS AT-A-GLANCE: KEY FEATURES AND BENEFITS

1 Quantum Corporation 1

Storwize V7000 real-time compressed volumes with Symantec Veritas Storage Foundation

Backup Appliances: Key Players and Criteria for Selection

THE FOUR STEPS TO A TAPELESS BACKUP ENVIRONMENT: YOUR HOW-TO GUIDE FOR DATA MANAGEMENT SUCCESS INTRODUCTION

A Practical Guide to Cost-Effective Disaster Recovery Planning

Understanding Virtual System Data Protection

Balakrishnan Nair. Senior Technology Consultant Back Up & Recovery Systems South Gulf. Copyright 2011 EMC Corporation. All rights reserved.

Hyper-converged Secondary Storage for Backup with Deduplication Q & A. The impact of data deduplication on the backup process

Protect enterprise data, achieve long-term data retention

Veeam and HP: Meet your backup data protection goals

How to Protect Your Small or Midsized Business with Proven, Simple, and Affordable VMware Virtualization

Exam : Title : Storage Sales V2. Version : Demo

WHITE PAPER. How Deduplication Benefits Companies of All Sizes An Acronis White Paper

IBM ProtecTIER and Netbackup OpenStorage (OST)

High performance and functionality

IBM System Storage EXP3500 Express

Scale-out Data Deduplication Architecture

Reducing Costs in the Data Center Comparing Costs and Benefits of Leading Data Protection Technologies

Deduplication has been around for several

Understanding Cloud Migration. Ruth Wilson, Data Center Services Executive

Stellar performance for a virtualized world

Veeam Availability Solution for Cisco UCS: Designed for Virtualized Environments. Solution Overview Cisco Public

Moving From Reactive to Proactive Storage Management with an On-demand Cloud Solution

Remove complexity in protecting your virtual infrastructure with. IBM Spectrum Protect Plus. Data availability made easy. Overview

Optimizing Quality of Service with SAP HANA on Power Rapid Cold Start

The Data-Protection Playbook for All-flash Storage KEY CONSIDERATIONS FOR FLASH-OPTIMIZED DATA PROTECTION

A Better Approach to Leveraging an OpenStack Private Cloud. David Linthicum

Chapter 1. Storage Concepts. CommVault Concepts & Design Strategies:

Technology Insight Series

Flash Decisions: Which Solution is Right for You?

IT Certification Exams Provider! Weofferfreeupdateserviceforoneyear! h ps://

Configuring Short RPO with Actifio StreamSnap and Dedup-Async Replication

The Data Protection Rule and Hybrid Cloud Backup

ECONOMICAL, STORAGE PURPOSE-BUILT FOR THE EMERGING DATA CENTERS. By George Crump

IBM System Storage DCS3700

Cutting Backup and Recovery Time in Half with Solutions from Symantec

WHITE PAPER: BEST PRACTICES. Sizing and Scalability Recommendations for Symantec Endpoint Protection. Symantec Enterprise Security Solutions Group

Key metrics for effective storage performance and capacity reporting

CIO Guide: Disaster recovery solutions that work. Making it happen with Azure in the public cloud

Real-time Protection for Microsoft Hyper-V

TOP REASONS TO CHOOSE DELL EMC OVER VEEAM

Maximizing Availability With Hyper-Converged Infrastructure

For DBAs and LOB Managers: Using Flash Storage to Drive Performance and Efficiency in Oracle Databases

Deduplication and Incremental Accelleration in Bacula with NetApp Technologies. Peter Buschman EMEA PS Consultant September 25th, 2012

The Microsoft Large Mailbox Vision

DATA CENTRE SOLUTIONS

Backup and Recovery Trends: How Businesses Are Benefiting from Data Protector

Global Headquarters: 5 Speen Street Framingham, MA USA P F

IBM Resiliency Services:

RED HAT ENTERPRISE LINUX. STANDARDIZE & SAVE.

New Zealand Government IBM Infrastructure as a Service

Consolidating servers, storage, and incorporating virtualization allowed this publisher to expand with confidence in a challenging industry climate.

Everyone into the Pool

Atlantis Computing Adds the Ability to Address Classic Server Workloads

DEDUPLICATION BASICS

Copyright 2010 EMC Corporation. Do not Copy - All Rights Reserved.

Building Backup-to-Disk and Disaster Recovery Solutions with the ReadyDATA 5200

De-dupe: It s not a question of if, rather where and when! What to Look for and What to Avoid

Case study: Building bi-directional DR. Joep Piscaer, VMware vexpert, VCDX #101

Microsoft DPM Meets BridgeSTOR Advanced Data Reduction and Security

SolidFire and Pure Storage Architectural Comparison

Data Protection for Virtualized Environments

HP Dynamic Deduplication achieving a 50:1 ratio

Chapter 10 Protecting Virtual Environments

Data Protection for Cisco HyperFlex with Veeam Availability Suite. Solution Overview Cisco Public

Build a Better Disaster Recovery Plan to Improve RTO & RPO Lubomyr Salamakha

Copyright 2012 EMC Corporation. All rights reserved.

FAST SQL SERVER BACKUP AND RESTORE

Storage validation at GoDaddy Best practices from the world s #1 web hosting provider

RAIFFEISENBANK BULGARIA

5 Things Small Businesses Need to Know About Disaster Recovery

New IBM System x3500 M3 and x3650 M3 server models added to IBM Express Portfolio

IBM N Series. Store the maximum amount of data for the lowest possible cost. Matthias Rettl Systems Engineer NetApp Austria GmbH IBM Corporation

Understanding Primary Storage Optimization Options Jered Floyd Permabit Technology Corp.

How to Protect SAP HANA Applications with the Data Protection Suite

IBM SmartCloud Resilience offers cloud-based services to support a more rapid, reliable and cost-effective enterprise-wide resiliency.

Symantec NetBackup 5200

THE STATE OF CLOUD & DATA PROTECTION 2018

Module 4 STORAGE NETWORK BACKUP & RECOVERY

Veritas Storage Foundation for Oracle RAC from Symantec

Virtualization. Disaster Recovery. A Foundation for Disaster Recovery in the Cloud

Frequently Asked Questions. s620 SATA SSD Enterprise-Class Solid-State Device

Rio-2 Hybrid Backup Server

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

IBM Active Cloud Engine centralized data protection

Evaluating Virtualization Technologies and Rolling Out Office in a Box at XL Capital

The storage challenges of virtualized environments

Technology Insight Series

IBM PowerVM. Virtualization without limits. Highlights. IBM Systems and Technology Data Sheet

Transcription:

E-Guide Data reduction techniques and performance metrics Data reduction technologies include anything that reduces the footprint of your data on disk. In primary storage, there are three types of data reduction techniques that are used: compression, file-level deduplication and sub-file-level deduplication. This eguide will explore the challenges of data reduction, three data reduction techniques and how to choose the best technique for your data storage environment. Sponsored By:

E-Guide Data reduction techniques and performance metrics Table of Contents Data reduction techniques for primary data storage systems Performance metrics: Evaluating your data storage efficiency Resources from IBM Sponsored By: Page 2 of 10

Data reduction techniques for primary data storage systems W. Curtis Preston The No. 1 rule to keep in mind when introducing a change in your primary data storage system is primum non nocere, or "First, do no harm." Data reduction techniques can help save money in disk systems, and power and cooling costs, but if by introducing these technologies you negatively impact the user experience, the benefits of data reduction may seem far less attractive. The next challenge for data reduction in primary data storage is the expectation that spacesaving ratios will be comparable to those achieved with data deduplication for backups. They won't. Most backup software creates enormous amounts of duplicate data, with multiple copies stored in multiple places. Although there are exceptions, that's not typically the case in primary storage. Many people feel that any reduction beyond 50% (a 2:1 reduction ratio) should be considered gravy. This is why most vendors of primary data reduction systems don't talk much about ratios; rather, they're more likely to cite reduction percentages. (For example, a 75% reduction in storage sounds a whole lot better than a 3:1 reduction ratio.) If you're considering implementing data reduction technologies in primary data storage, the bottom line is this: Compared to deploying deduplication in a backup environment, the job is harder and the rewards are fewer. That's not to suggest you shouldn't consider primary storage data reduction technologies, but rather, you need to properly set expectations before making a commitment. Primary storage data reduction technologies The following are three primary storage data reduction technologies: Compression. Compression technologies have been around for decades, but compression is typically used for data that's not accessed very much. That's because the act of Sponsored By: Page 3 of 10

compressing and uncompressing data can be a very CPU-intensive process that tends to slow down access to the data. However, backup is one area of the data center where compression is widely used. Every modern tape drive is able to dynamically compress data during backups and uncompress data during restores. Not only does compression not slow down backups, it actually speeds them up. How is that possible? The secret is that the drives use a chip that can compress and uncompress at line speeds. By compressing the data by approximately 50%, it essentially halves the amount of data the tape drive has to write. Because the tape head is the bottleneck, compression actually increases the effective speed of the drive. Compression systems for primary data storage use the same concept. Products such as Ocarina Networks' ECOsystem appliances and Storwize Inc.'s STN-2100 and STN-6000 appliances compress data as it's being stored and then uncompress it as it's being read. If they can do this at line speed, it shouldn't slow down write or read performance. They should also be able to reduce the amount of disk necessary to store files by between 30% and 75%, depending on the algorithms they use and the type of data they're compressing. The advantage of compression is that it's a very mature and well understood technology. The disadvantage is that it only finds patterns within a file and doesn't find patterns between files, therefore limiting its ability to reduce the size of data. File-level deduplication. A system employing file-level deduplication examines the file system to see if two files are exactly identical. If it finds two identical files, one of them is replaced with a link to the other file. The advantage of this technique is that there should be no change in access times, as the file doesn't need to be decompressed or reassembled prior to being presented to the requester; it's simply two different links to the same data. The disadvantage of this approach is that it will obviously not achieve the same reduction rates as compression or sub-file-level deduplication. Sub-file-level deduplication. Sub-file-level deduplication is very similar to the technology used in hash-based data deduplication systems for backup. It breaks all files down into segments or chunks, and then runs those chunks through a cryptographic hashing algorithm to create a numeric value that's then compared to the numeric value of every other chunk that has ever been seen by the deduplication system. If the hashes from two different Sponsored By: Page 4 of 10

chunks are the same, one of the chunks is discarded and replaced with a pointer to the other identical chunk. Depending on the type of data, a sub-file-level deduplication system can reduce the size of data quite a bit. The most dramatic results using this technique are achieved with virtual system images, and especially virtual desktop images. It's not uncommon to achieve reductions of 75% to 90% in such environments. In other environments, the amount of reduction will be based on the degree to which users create duplicates of their own data. Some users, for example, save multiple versions of their files on their home directories. They get to a "good point" and save the file, and then save it a second time with a new name. This way, they know that no matter what they do, they can always revert to the previous version. But this practice can result in many versions of an individual file -- and users rarely go back and remove older file versions. In addition, many users download the same file as their coworkers and store it on their home directory. These activities are why sub-file-level deduplication works even within a typical user home directory. The advantage of sub-file-level deduplication is that it will find duplicate patterns all over the place, no matter how the data has been saved. The disadvantage of this approach is that it works at the macro level as opposed to compression that works at the micro level. It might identify a redundant segment of 8 KB of data, for example, but a good compression algorithm might reduce the size of that segment to 4 KB. That's why some data reduction systems use compression in conjunction with some type of data deduplication. Overall, each primary data storage reduction technique has its pros and cons, and none are better than the other. How you decide which technique is right for you comes down to your individual data storage environment and how these reduction techniques will fit in. About this author: W. Curtis Preston (a.k.a. "Mr. Backup"), Executive Editor and Independent Backup Expert, has been singularly focused on data backup and recovery for more than 15 years. From starting as a backup admin at a $35 billion dollar credit card company to being one of the most sought-after consultants, writers and speakers in this space, it's hard to find someone more focused on recovering lost data. He is the webmaster of BackupCentral.com, the author of hundreds of articles, and the books "Backup and Recovery" and "Using SANs and NAS." Sponsored By: Page 5 of 10

Building the engines of a Smarter Planet: How midsize businesses get more from their data, while paying less to store it. On a smarter planet, information doesn t just grow it evolves. That s why midsize businesses need a storage system designed to grow with both their business and their increasingly complex information. Enter the IBM Storwize V7000, a compact midrange disk system designed and priced for midsize companies. The IBM Storwize V7000 includes advanced features like storage virtualization, thin provisioning, and automated tiering at no additional cost, helping midsize companies store their data in a way that s simple, flexible and affordable. Here s how: 1 Improve 2 Maximize 1 application throughput by up to 200%. Automated tiering moves frequently used information to faster drives, which can provide quicker search results and lower costs for storing data. the potential of your infrastructure. With essential technologies like virtualization and thin provisioning, you can maximize storage potential without having to choose between performance and efficiency. 3 Simplify your storage management. A graphical user interface can simplify configuration, provisioning, tiering and upgrades, making users more productive, resources better utilized and growth easier to manage. IBM Storwize V7000 A compact midrange disk system designed and priced for the growing needs of midsize companies. Starting at per month for 36 months. $ 1,250 Midsize businesses are the engines of a Smarter Planet. To learn more about products like the IBM Storwize V7000, connect with an IBM Business Partner today. Call 1-877-IBM-ACCESS or visit ibm.com/engines/storage 1. Based on IBM internal study. Actual results may be different based on storage, server and database configuration. Prices subject to change and valid in the U.S. only. Actual costs will vary depending on individual customer configurations and environment. IBM Global Financing offerings are provided through IBM Credit LLC in the United States and other IBM subsidiaries and divisions worldwide to qualified commercial and government customers. Rates are based on a customer's credit rating, financing terms, offering type, equipment type and options, and may vary by country. Other restrictions may apply. Rates and offerings are subject to change, extension or withdrawal without notice. IBM, the IBM logo, ibm.com, Smarter Planet and the planet icon and Storwize are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at www.ibm.com/legal/copytrade.shtml. International Business Machines Corporation 2010.

Performance metrics: Evaluating your data storage efficiency By Greg Schulz Performance metrics can help data storage pros judge the effectiveness of their enterprise data storage resources. For example, data storage efficiency can be measured in terms of capacity utilization or productivity (such as performance). Likewise, quality of service (QoS) can indicate compliance with data protection among other application service requirements. Examples of metrics and measurements for storage efficiency and optimization include the following: Macro (e.g., facilities such as power usage effectiveness) and micro (device or component level) Time (performance or activity) vs. availability vs. space (capacity) Performance metrics, including IOPS, bandwidth, and response time or latency Additional performance metrics, including reads, writes, random, sequential or IO size Storage capacity metrics, including percent utilization as well as reduction ratios Other capacity metrics, including raw, formatted, free, allocated or allocated not used Metrics can be obtained from in-house, third-party, or operating system and applicationspecific tools. Other metrics can be estimated or simulated; for example, benchmarks running specific workloads such as those from the Transaction Processing Performance Council (TPC), Storage Performance Council (SPC), Standard Performance Evaluation Corporation (SPEC) or Microsoft Exchange Solution Reviewed Program (ESRP). Compound metrics, those made up of multiple metrics, include cost per GB and cost per IOP, along with capacity per watt or activity per watt, such as IOPS or bandwidth per watt of energy used. Sponsored By: Page 7 of 10

A list of common storage performance metrics Here is a list of common storage performance metrics: IOPS: I/O operations per second where the I/O can be of various size Latency: The response time where lower is better for time-sensitive applications MTBF: Mean time between failures indicates reliability or availability MTTR: Mean time to repair or replace a failed component or storage device Quality of Service (QoS): Refers to performance, availability or general service experience Recovery point objective (RPO): To what point in time is data saved or lost Recovery time objective (RTO): How quickly data or applications can be made available SPC: Storage Performance Council workload (IOP, bandwidth and others) TPC: Transaction Processing Council workload comparisons Other metrics include uptime, planned or unplanned downtime, errors or defects, and missed windows for data protection or other infrastructure resource management tasks. Remember to keep idle and active modes of operation in perspective when comparing tiered storage. Applications that rely on performance or data access need to be compared on an activity basis, while applications and data that are focused more on data retention should be compared on a cost per-capacity basis. For example, active, online and primary data that needs to provide performance should be looked at in terms of activity per-watt per-footprint cost, while inactive or idle data should be looked at on a capacity per-watt per-footprint cost basis. Given that productivity is also a tenet of storage efficiency, metrics that shed light on how effectively resources are being used are important. For example, QoS, performance, transactions, IOPS, files serviced or other activity-based metrics should be looked at to determine how effective and productive storage resources are. Sponsored By: Page 8 of 10

Tips for using data storage resource metrics Here are three other storage efficiency tips to remember: Look beyond cost per-capacity comparisons Remember that GB per watt can mean capacity or performance bandwidth While hit rates may indicate good utilization, they may not necessarily mean effective performance It can be easy to end up with an apples-to-oranges comparison when looking at different storage products optimized for idle or low activity that may have a good capacity per watt, but poor performance and low IOPS or bandwidth per watt. Likewise, a high-performance storage system may have good IOPS or bandwidth per watt, but may not be as attractive when compared on a capacity basis. Remember that more information will have to be processed, stored and protected in multiple locations and at a lower cost in the future. Therefore, performance efficiency can enable more effective storage capacity at a given QoS level, for both active and idle storage. Sponsored By: Page 9 of 10

Resources from IBM IBM System Storage: Hardware, Software and Services Solutions IBM Real-time Compression - Storage efficiency solutions for primary, active data IBM ProtecTIER Deduplication Solutions About IBM At IBM, we strive to lead in the creation, development and manufacture of the industry's most advanced information technologies, including computer systems, software, networking systems, storage devices and microelectronics. We translate these advanced technologies into value for our customers through our professional solutions and services businesses worldwide. Sponsored By: Page 10 of 10