Copyright 2010 EMC Corporation. Do not Copy - All Rights Reserved.

Similar documents
EMC DATA DOMAIN PRODUCT OvERvIEW

EMC DATA DOMAIN OPERATING SYSTEM

EMC Data Domain for Archiving Are You Kidding?

Balakrishnan Nair. Senior Technology Consultant Back Up & Recovery Systems South Gulf. Copyright 2011 EMC Corporation. All rights reserved.

DELL EMC DATA DOMAIN EXTENDED RETENTION SOFTWARE

See what s new: Data Domain Global Deduplication Array, DD Boost and more. Copyright 2010 EMC Corporation. All rights reserved.

DELL EMC DATA DOMAIN OPERATING SYSTEM

IBM Tivoli Storage Manager Version Introduction to Data Protection Solutions IBM

IBM Spectrum Protect Version Introduction to Data Protection Solutions IBM

Get More Out of Storage with Data Domain Deduplication Storage Systems

Protect enterprise data, achieve long-term data retention

White paper ETERNUS CS800 Data Deduplication Background

DELL EMC DATA DOMAIN OPERATING SYSTEM

WHY DO I NEED FALCONSTOR OPTIMIZED BACKUP & DEDUPLICATION?

Dell DR4100. Disk based Data Protection and Disaster Recovery. April 3, Birger Ferber, Enterprise Technologist Storage EMEA

HOW DATA DEDUPLICATION WORKS A WHITE PAPER

Trends in Data Protection and Restoration Technologies. Mike Fishman, EMC 2 Corporation

STOREONCE OVERVIEW. Neil Fleming Mid-Market Storage Development Manager. Copyright 2010 Hewlett-Packard Development Company, L.P.

Technology Insight Series

IBM řešení pro větší efektivitu ve správě dat - Store more with less

Accelerate the Journey to 100% Virtualization with EMC Backup and Recovery. Copyright 2010 EMC Corporation. All rights reserved.

PASS4TEST. IT Certification Guaranteed, The Easy Way! We offer free update service for one year

Netapp Exam NS0-510 NCIE-Backup & Recovery Implementation Engineer Exam Version: 7.0 [ Total Questions: 216 ]

EMC BACKUP AND RECOVERY PRODUCT OVERVIEW

Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES

HP Dynamic Deduplication achieving a 50:1 ratio

1 Quantum Corporation 1

Redefine Data Protection: Next Generation Backup & Business Continuity Solutions

WHITE PAPER. DATA DEDUPLICATION BACKGROUND: A Technical White Paper

EMC Solutions for Backup to Disk EMC Celerra LAN Backup to Disk with IBM Tivoli Storage Manager Best Practices Planning

HP D2D & STOREONCE OVERVIEW

Executive Summary SOLE SOURCE JUSTIFICATION. Microsoft Integration

The World s Fastest Backup Systems

Backup Solutions with (DSS) July 2009

Quest DR Series Disk Backup Appliances

Exam Name: Midrange Storage Technical Support V2

EMC DL3D Best Practices Planning

Dell DR4000 Replication Overview

Data Deduplication Methods for Achieving Data Efficiency

WHY SECURE MULTI-TENANCY WITH DATA DOMAIN SYSTEMS?

ADVANCED DEDUPLICATION CONCEPTS. Thomas Rivera, BlueArc Gene Nagle, Exar

How to solve your backup problems with HP StoreOnce

EMC Integrated Infrastructure for VMware. Business Continuity

Technology Insight Series

De-dupe: It s not a question of if, rather where and when! What to Look for and What to Avoid

Frequently Asked Questions

Enterprise Backup Architecture. Richard McClain Senior Oracle DBA

Complete Data Protection & Disaster Recovery Solution

Deduplication Storage System

Backup and archiving need not to create headaches new pain relievers are around

DATA DOMAIN INVULNERABILITY ARCHITECTURE: ENHANCING DATA INTEGRITY AND RECOVERABILITY

Tyler Orick SJCOE/CEDR Joan Wrabetz Eversync Solutions

50 TB. Traditional Storage + Data Protection Architecture. StorSimple Cloud-integrated Storage. Traditional CapEx: $375K Support: $75K per Year

StorageCraft OneXafe and Veeam 9.5

Deduplication File System & Course Review

VMware Backup and Replication Enterprise Edition

A Crash Course In Wide Area Data Replication. Jacob Farmer, CTO, Cambridge Computer

Hitachi Adaptable Modular Storage and Hitachi Workgroup Modular Storage

Data Domain OpenStorage Primer

EMC NetWorker Backup Solution for SAP HANA. Data Protection for Big Data

Hedvig as backup target for Veeam

The Business Case to deploy NetBackup Appliances and Deduplication

Tintri & Veeam VM Backup & Replication Best Practices. John Phillips Strategic Alliances and Technical Marketing Ryan Post Systems Engineer

Hitachi Adaptable Modular Storage and Workgroup Modular Storage

Redefine Data Protection: Next Generation Backup And Business Continuity

HP StoreOnce: reinventing data deduplication

BraindumpsIT. BraindumpsIT - IT Certification Company provides Braindumps pdf!

Veeam and HP: Meet your backup data protection goals

Building Backup-to-Disk and Disaster Recovery Solutions with the ReadyDATA 5200

Mainframe Virtual Tape: Improve Operational Efficiencies and Mitigate Risk in the Data Center

Dell PowerVault DL2100 Powered by CommVault

Dell EMC Data Protection Everywhere

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

Oracle Zero Data Loss Recovery Appliance (ZDLRA)

FOUR WAYS TO LOWER THE COST OF REPLICATION

Preserving the World s Most Important Data. Yours. SYSTEMS AT-A-GLANCE: KEY FEATURES AND BENEFITS

Boost your data protection with NetApp + Veeam. Schahin Golshani Technical Partner Enablement Manager, MENA


StorageCraft OneBlox and Veeam 9.5 Expert Deployment Guide

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

Architecture and Deployment

QuickSpecs. HP StoreOnce D2D Backup Systems. Overview. Retired

Cybernetics Virtual Tape Libraries Media Migration Manager Streamlines Flow of D2D2T Backup. April 2009

White Paper Simplified Backup and Reliable Recovery

Quest DR Series Disk Backup Appliances

ADVANCED DATA REDUCTION CONCEPTS

Disaster Happens; Don t Be Held

The storage challenges of virtualized environments

NetVault Backup Client and Server Sizing Guide 2.1

IBM Storage Software Strategy

Chapter 4 Data Movement Process

Trends in Data Protection and Restoration Technologies. Jason Iehl, NetApp

Nasuni for Oil & Gas. The Challenge: Managing the Global Flow of File Data to Improve Time to Market and Profitability

EMC DiskXtender for Windows and EMC RecoverPoint Interoperability

Tom Sas HP. Author: SNIA - Data Protection & Capacity Optimization (DPCO) Committee

Backup Solution. User Guide. Issue 01 Date

EMC NetWorker and EMCData Domain Boost Deduplication Devices

RazorSafe 7-Series Remote Backup and NAS Support

How Symantec Backup solution helps you to recover from disasters?

DATA PROTECTION IN A ROBO ENVIRONMENT

Transcription:

1

Using patented high-speed inline deduplication technology, Data Domain systems identify redundant data as they are being stored, creating a storage foot print that is 10X 30X smaller on average than the original dataset and that reduces WAN bandwidth needed for replication by up to 99%. Originally an ideal solution for backup and disaster recovery application, customers are now deploying Data Domain deduplication storage more broadly as a storage tier including near-line file storage, backup, disaster recovery (DR), and long term retention of enterprise data for reference, litigation support and regulatory compliance. The Data Domain product family ranges from the low-end DD140 system to the highend Global Deduplication Array. 2

A Data Domain appliance is a storage system with shelves of disks and a controller. It s optimized, first to backup and second to archive applications, and supports most of the industry-leading backup and archiving applications. The list on the slide, which is composed primarily of leading backup applications, not only EMC s offerings with NetWorker but also Symantec, CommVault, and so on. On the way into the storage system, data can pass through either Ethernet or Fibre Channel. With Ethernet it can use various protocols and NFS or CIFS; it can also use optimized protocols, such as Data Domain Boost, a custom integration with leading backup applications. After the data is stored and it s deduplicated during the storage process, it can replicate for disaster recovery, replicating only the compressed deduplicated unique data segments that have been filtered out through the right process on the target tier. Within the hardware, there are best-of-class approaches for using commodity hardware for maximum effect. Data Domain supports RAID 6 implementation. 3

The end result of identifying duplicate segments and compressing the data before storing is a significant reduction in the data stored on disk. The overall reduction is viewed as compression, and it is sometimes discussed in two parts: global and local. Global compression refers to the deduplication process that compares received data to data already stored on disks. Data that is new is then locally compressed before being written to disk. To see how the effect of global compression increases over time consider a backup stream from a first full backup that contains five segments, A, B, C, another copy of B, and D. This gets stored on disk as A,B,C,D and a reference to B instead of a second copy. Global compression at this point is the ratio of the size of the original 5 segments received (A+B+C+B again+d) to the size of the 4 segments (A+B+C+D) stored on disk. If the next backup is incremental that includes copies of A and B as well as a new segment E, only E needs to be stored. A and B are already stored so simply create references to the previously stored segments. Global compression of this backup is quite good since it is the ratio of the 3 received segments (A+B+E) to the single stored segment E. The second full backup is when the savings from global compression start to become very large. A,B,C,D and E are recognized as duplicates from the previous two backups, and only the new segment F gets stored. Global compression of this second full backup is very high, with 6 segments coming in but only the one new segment getting stored. Global compression taken over all three backups is the ratio of all 14 segments coming from the backup software to be stored to the 6 segments that get stored to represent all the data received over time. Local compression further reduces the space needed for the 6 stored segments by as much as another ratio of 2:1. 4

In the post-process architecture, data is stored to a disk before deduplication. Then after it s stored, it s read back internally, deduplicated, and written again to a different area. Although this approach may sound appealing because it seems as if it would allow for faster backups and the use of less resources. By doing post process deduplication, a lot more disks are needed to store the multiple pools of data, and for speed. In Inline approach, the data is all filtered before it s stored to disk which improves overall performance. 5

Data Domain operating system (DD OS) is purpose built for data protection, its design elements comprise an architectural design whose goal is data invulnerability. Since every component of a storage system can introduce errors, an end-to-end test is the simplest path to ensure data integrity. End-to-end verification means reading data after it is written and comparing it to what it is supposed to be, proving that it is reachable through the file system to disk. When DD OS receives a write request from backup software, it computes checksum for the data. After analyzing the data for redundancy, it stores the new data segments and all of the checksums. After the backup is compete and all the data has been synchronized to disk, DD OS verifies that it can read the entire file from the disk platter and through the Data Domain file system, and that the checksums of the data read back match the checksums of the written. This ensures that the data on the disks is readable and correct and that the file system metadata structures used to find the data are also readable and correct. The data is correct and recoverable from every level of the system. If there are problems anywhere along the way, for example if a bit has flipped on a disk drive, it will be caught. For the most part it can be corrected through self-healing feature. If for any reason it can t be corrected, it will be reported immediately, and a backup can be repeated while the data is still valid on the primary store. Conventional, performance-optimized storage systems cannot afford such rigorous verifications. The tremendous data reduction achieved by Data Domain Global Compression reduces the amount of data that needs to be verified and makes such verifications possible. 6

Once the data is stored in a Data Domain system, there are a variety of replication options to move the compressed deduplicated changes to a secondary site or a tertiary site for restore in multiple locations for disaster recovery. Collection replication performs whole-system mirroring in a one-to-one topology, continuously transferring changes in the underlying collection, including all of the logical directories and files of the Data Domain filesystem. In addition, the most popular is a directory or tape pool-oriented approach that lets you select a part of the file system, or a virtual tape library or tape pool, and only replicate that. So a single system could be used as both a backup target and a replica for another Data Domain system. This graphic shows a number of smaller sites all replicated into one hub site. In those cases the communication between those systems asks the hub whether or not it has a given segment of data yet. If it doesn t, then it sends the data. If the destination system does have the data already, the source site doesn t have to send the data again. In this scenario with multiple systems replicating to one system, in a many-to-one configuration, there is cross-site deduplication, further reducing the WAN bandwidth required and the price. 7

EMC Data Domain Boost software distributes parts of the deduplication process to the DD Boost Library that runs on backup servers. Traditional backup is a three-tier system. There s a backup client, a backup server, and a storage array. The whole stream of backup data from the client has to go through the backup server, across two LAN hops, to a storage device. Traditionally with Data Domain, since all of the deduplication occurs on the array, the network and each system along the way has to ship the whole dataset over both hops of the backup LAN. DD Boost distributes some of the deduplication processing to the backup server, so the last hop sends only deduplicated, compressed data. This makes the backup network more efficient, it makes Data Domain systems 50% faster, and it makes the whole system more manageable. It works across the entire Data Domain product line. 8

Today s IT environments are facing challenges with the combination of data growth and shrinking backup windows. Recovery time objectives (RTOs) and Recovery point objectives (RPOs) are also becoming more stringent, increasing the importance of a highly reliable, high-performance backup environment. As a complement to tape for long-term, offsite storage, backup-to-disk such as the EMC Disk Library products have emerged as powerful solutions. Customers seeking the advanced virtual tape library (VTL) functionality of the Disk Library as well as the ROI benefits of deduplication can leverage a Disk Library deployment with Data Domain. This enables customers to move data to Data Domain deduplication storage systems for longer-term retention of data and network-efficient replication. Figure on slide shows a Disk Library with the Data Domain deployment scenario. In this deployment, data in the Disk Library virtual tape cartridges is migrated or copied to the Data Domain system where it is deduplicated to remove data redundancies, resulting in longer data retention capability than a stand-alone Disk Library. The Data Domain system does not need to be dedicated to the Disk Library. While operations are occurring from the Disk Library to the Data Domain system, concurrent NAS or VTL jobs can be occurring in parallel on the Data Domain system. 9

The most common scenarios for using the Disk Library with the Data Domain system are shown in the slide. 1. Copying data from the Disk Library to the Data Domain system: In this scenario, either one or two engines are writing data to the Data Domain system. Data is migrated from the Disk Library (using tape caching) or is copied (using the embedded media managers) to the Data Domain system. With the Automated Tape Caching feature, the backup application sees the local copy of data and data access is through the Disk Library. With the embedded storage node or embedded media server, the backup application is aware of both copies of data and data access is through the backup application. 2. Copying data from the Disk Library to Data Domain and to a physical tape library: In this scenario, data is copied to the Data Domain system and a physical tape library via the embedded storage node/media server. In this configuration, the data can reside on each of the three units for different retention periods. Each engine would have to see the Data Domain system and the physical tape library since the data is seen by each engine individually. Multiple engines can be used in a dual- engine configuration, with each writing to its own Data Domain system and physical tape unit. 3. Copying data to the Data Domain and replicating to another Data Domain: In this scenario, data is written to the Data Domain system and then replicated to another Data Domain system. Data can either be migrated from the Disk Library (using tape caching) or is copied (using the embedded media managers) to the Data Domain system. The data is then automatically replicated to another Data Domain system. A dedicated Disk Library on the target side is not required, although in some tape caching environments, a Disk Library on the target side may be required. 10