WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression

Size: px
Start display at page:

Download "WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression"

Transcription

1 WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression Philip Shilane, Mark Huang, Grant Wallace, & Windsor Hsu Backup Recovery Systems Division EMC Corporation

2 Introduction 80% of our customers replicate most of their data off-site for disaster recovery WAN bandwidth often limits throughput Data reduction techniques increase effective throughput Deduplication and local compression are effective Delta compression with stream-informed caching adds 2X additional compression Remote Office Backup System Central Office Backup System

3 Example Of Deduplication And Delta Compression Chunk fp Sketches based on Broder [97 & 00]

4 Example Of Deduplication And Delta Compression Chunk fp sk Maximal Value 1 Maximal Value 2 Maximal Value 3 Maximal Value 4 Sketches based on Broder [97 & 00] super_feature = Rabin_fp(feature 1 feature 4 ) sketch is one or more super_features

5 Example Of Deduplication And Delta Compression Chunk fp sk Chunk fp (duplicate of earlier chunk) Sketches based on Broder [97 & 00] super_feature = Rabin_fp(feature 1 feature 4 ) sketch is one or more super_features

6 Example Of Deduplication And Delta Compression Chunk fp sk Chunk fp sk (similar to earlier chunk) Regions of difference Maximal Value 1 Maximal Value 2 Maximal Value 3 Maximal Value 4 Sketches based on Broder [97 & 00] super_feature = Rabin_fp(feature 1 feature 4 ) sketch is one or more super_features

7 Example Of Deduplication And Delta Compression Chunk fp sk Chunk fp sk (similar to earlier chunk) Regions of difference Transmit fp and differences Sketches based on Broder [97 & 00]

8 Sketch Index Options Full index (simple idea) Requires IO Difficult to update Finds all similarity matches 256 TB capacity 8 KB chunks 16 byte record 0.5 TB index per super-feature

9 Sketch Index Options Full index (simple idea) Requires IO Difficult to update Finds all similarity matches Partial index (slightly better?) Load and evict with LRU policy Not persistent Must be as large as full backup 256 TB capacity 8 KB chunks 16 byte record 0.5 TB index per super-feature

10 Sketch Index Options Partial index would have to hold a full backup 256 to TB be capacity effective 8 KB chunks 16 byte record Full index (simple idea) Requires IO Difficult to update Finds all similarity matches Partial index (slightly better?) Load and evict with LRU policy Not persistent Must be as large as full backup 0.5 TB index per super-feature Stream-informed cache (our contribution) Experimentally demonstrate that delta locality closely matches deduplication locality for backup datasets Updates handled by fingerprint system Little extra memory Finds most similarity matches

11 Sketch Index Options Full index (simple idea) Requires IO Difficult to update Finds all similarity matches Partial index (slightly better?) Load and evict with LRU policy Not persistent Must be as large as full backup 256 TB capacity 8 KB chunks 16 byte record 0.5 TB index per super-feature Partial index has to be large enough to index entire primary storage system

12 Sketch Index Options Full index (simple idea) Requires IO Difficult to update Finds all similarity matches Partial index (slightly better?) Load and evict with LRU policy Not persistent Must be as large as full backup 256 TB capacity 8 KB chunks 16 byte record 0.5 TB index per super-feature Partial index has to be large enough to index entire primary storage system Stream-informed cache (our contribution) Experimentally demonstrate that delta locality closely matches deduplication locality for backup datasets Updates handled by fingerprint system Little extra memory Finds most similarity matches

13 Stream-Informed Locality Similarity search can leverage deduplication locality Sketch cache loaded based on fingerprint cache Fingerprint and Sketch Cache

14 Stream-Informed Locality Similarity search can leverage deduplication locality Sketch cache loaded based on fingerprint cache Full 1 A B C Fingerprint and Sketch Cache fp A fp B fp C sk A sk B sk C

15 Stream-Informed Locality Similarity search can leverage deduplication locality Sketch cache loaded based on fingerprint cache Full 1 A B C Fingerprint and Sketch Cache Container 1 fp A fp B fp C sk A sk B sk C

16 Stream-Informed Locality Similarity search can leverage deduplication locality Sketch cache loaded based on fingerprint cache Full 1 A B C D E F Fingerprint and Sketch Cache Container 1 fp A sk A fp D sk D fp B sk B fp E sk E fp C sk C fp F sk F

17 Stream-Informed Locality Similarity search can leverage deduplication locality Sketch cache loaded based on fingerprint cache Full 1 A B C D E F Fingerprint and Sketch Cache Container 1 Container 2 fp A sk A fp D sk D fp B sk B fp E sk E fp C sk C fp F sk F

18 Stream-Informed Locality Similarity search can leverage deduplication locality Sketch cache loaded based on fingerprint cache Full 1 A B C D E F Store containers to disk Fingerprint and Sketch Cache Container 1 Container 2 fp A sk A fp D sk D fp B sk B fp E sk E fp C sk C fp F sk F

19 Stream-Informed Locality Similarity search can leverage deduplication locality Sketch cache loaded based on fingerprint cache Full 1 Full 2 A A B C D E F Fingerprint and Sketch Cache

20 Stream-Informed Locality Similarity search can leverage deduplication locality Sketch cache loaded based on fingerprint cache Full 1 Full 2 A A B C D E F Load container from disk Fingerprint and Sketch Cache

21 Stream-Informed Locality Similarity search can leverage deduplication locality Sketch cache loaded based on fingerprint cache Full 1 A B C D E F Full 2 A Load container from disk Fingerprint and Sketch Cache Container 1 fp A fp B fp C sk A sk B sk C

22 Stream-Informed Locality Similarity search can leverage deduplication locality Sketch cache loaded based on fingerprint cache Full 1 A B C D E F Full 2 A B C Fingerprint and Sketch Cache Container 1 fp A fp B fp C sk A sk B sk C

23 Stream-Informed Locality Similarity search can leverage deduplication locality Sketch cache loaded based on fingerprint cache Full 1 A B C D E F Full 2 A B C D Load container from disk Fingerprint and Sketch Cache Container 1 fp A fp B fp C sk A sk B sk C

24 Stream-Informed Locality Similarity search can leverage deduplication locality Sketch cache loaded based on fingerprint cache Full 1 A B C D E F Full 2 A B C D E F Fingerprint and Sketch Cache Container 1 Container 2 E is similar to previous chunk E and is a sketch match fp A sk A fp D sk D fp B sk B fp E sk E fp C sk C fp F sk F

25 Replication With Deduplication Remote Office Central Office

26 Replication With Deduplication Send chunk fingerprints Remote Office fp() fp() fp() Central Office

27 Replication With Deduplication Send chunk fingerprints Remote Office fp() fp() fp() Reply with list of missing fingerprints fp() fp() Central Office

28 Replication With Deduplication Send chunk fingerprints Remote Office fp() fp() fp() Reply with list of missing fingerprints fp() fp() Central Office Send missing chunks

29 Replication With Deduplication Send chunk fingerprints Remote Office fp() fp() fp() Reply with list of missing fingerprints fp() fp() Central Office Send missing chunks

30 Replication With Deduplication And Delta Remote Office Central Office m

31 Replication With Deduplication And Delta Remote Office Send chunk fingerprints fp() fp() fp() fp() fp(m) Central Office m

32 Replication With Deduplication And Delta Remote Office Reply with list of missing fingerprints fp() fp(m) Central Office m

33 Replication With Deduplication And Delta Remote Office Central Office Send sketches m sk() sk(m)

34 Replication With Deduplication And Delta Ø m Remote Office Central Office Send sketches m sk() sk(m)

35 Replication With Deduplication And Delta Ø m Remote Office Central Office m Reply with list of fingerprints for similar chunks Ø m

36 Replication With Deduplication And Delta Remote Office Central Office m Send missing chunks and delta encodings () +m

37 Replication With Deduplication And Delta Remote Office Central Office m Send missing chunks and delta encodings m () +m

38 Replication With Deduplication And Delta Remote Office Central Office m Send missing chunks and delta encodings m () +m

39 Replication With Deduplication And Delta Remote Office Central Office m Send missing chunks and delta encodings () +m () +m

40 Replication With Deduplication And Delta Remote Office Central Office m Send missing chunks and delta encodings () +m () () +m +m

41 Replication With Deduplication And Delta Remote Office Central Office m Send missing chunks and delta encodings () +m () () +m +m

42 Replication With Deduplication And Delta Remote Office Central Office m Send missing chunks and delta encodings () +m () +m m

43 Replication With Deduplication And Delta Remote Office Central Office m m Send missing chunks and delta encodings () +m () +m m

44 Properties Of Stream-Informed Delta Compression Pros: Eliminates on-disk sketch index Improves performance fewer disk reads than using a sketch index Small memory footprint Improves compression Cons: Dependent on stream locality and caching to find similar chunks Requires read IO and CPU to process delta chunks

45 Datasets Dataset Type Backup Policy TB Months Source Code Version control repository Weekly full Daily incremental Workstations 16 desktops Weekly full Daily incremental MS Exchange server Daily full System Logs Server s /var directory Weekly full Daily incremental Home Directories Engineers home directories Weekly full Daily incremental

46 Index VS Stream-Informed Cache Cache sized at 12 MB per stream Deduplication 1 super-feature 2 super-features 3 super-features 4 super-features

47 Index VS Stream-Informed Cache Cache sized at 12 MB per stream For one super-feature, compression is within 14% of using an index Deduplication 1 super-feature 2 super-features 3 super-features 4 super-features

48 Index VS Stream-Informed Cache Cache sized at 12 MB per stream For one super-feature, compression is within 14% of using an index Two super-features in a cache is better than an index with one Deduplication 1 super-feature 2 super-features 3 super-features 4 super-features

49 Index VS Stream-Informed Cache Cache sized at 12 MB per stream For one super-feature, compression is within 14% of using an index Two super-features in a cache is better than an index with one Deduplication 1 super-feature 2 super-features 3 super-features 4 super-features

50 Index VS Stream-Informed Cache Cache sized at 12 MB per stream For one super-feature, compression is within 14% of using an index Two super-features in a cache is better than an index with one Deduplication 1 super-feature 2 super-features 3 super-features 4 super-features

51 Index VS Stream-Informed Cache Cache sized at 12 MB per stream For one super-feature, compression is within 14% of using an index Two super-features in a cache is better than an index with one Deduplication 1 super-feature 2 super-features 3 super-features 4 super-features

52 Index VS Stream-Informed Cache Cache sized at 12 MB per stream For one super-feature, compression is within 14% of using an index Two super-features in a cache is better than an index with one 1. Deduplication is slightly different because of different implementations 2. Delta compression in a cache was higher than a full index because of better delta encoding with a candidate in the cache Deduplication 1 super-feature 2 super-features 3 super-features 4 super-features

53 Delta Compression Typical delta improvement is 2X beyond deduplication and GZ compression Choose GZ or Delta with GZ Dataset Deduplication GZ Delta w/ GZ Delta Improv. Source Code 24.9X 7.2X 14.9X 2.1X Workstations 5.7X 2.8X 8.8X 3.1X 6.9X 3.1X 5.8X 1.9X System Logs 57.9X 4.6X 10.2X 2.2X Home Directories 31.7X 3.1X 5.5X 1.8X Compression factors are presented after first week of seeding.

54 Network Throughput Effective throughput is 1-2 orders of magnitude faster

55 Overheads And Limitations Sketches take up about 20 bytes per non-duplicate chunk Uses read IO and CPU on source and destination Sketching is a 20% slowdown on writes, but only for non-duplicates Scales linearly at destination with number of streams

56 Overheads And Limitations Sketches take up about 20 bytes per non-duplicate chunk Resource utilization during replication Uses read IO and CPU on source and destination Sketching is a 20% slowdown on writes, but only for non-duplicates Scales linearly at destination with number of streams Shared sketch cache affects compression System sized to handle 20 streams Number of systems replicating to a single destination With 25 streams compression loss of 0-12% With 50 streams compression loss of 0-27%

57 Overheads And Limitations Sketches take up about 20 bytes per non-duplicate chunk Uses read IO and CPU on source and destination Sketching is a 20% slowdown on writes, but only for non-duplicates Scales linearly at destination with number of streams Shared sketch cache affects compression System sized to handle 20 streams With 25 streams compression loss of 0-12% With 50 streams compression loss of 0-27%

58 Customer Results Median customer has 2X delta compression beyond deduplication

59 Related Work Optimized network transfer Spring00, Muthitacharoen01, Eshghi07, and Park07 File synchronization Tridgell00, Suel04 Delta compression Burns97, Mogul97, Hunt98, Chan99, MacDonald00, Suel02, Trendafilov02, and Chen04 Similarity detection Brin94, Manber94, Broder[97 & 00], Douglis03, Kulkarni04, You04, Jain05, and Aronovich09 Deduplicated storage Policroniades04, and Bobbarjung06 Stream-informed deduplication Zhu08, Bhagwat09, Lillibridge09, Min10, Guo11, and Xia11

60 Conclusion Delta locality closely matches deduplication locality for backup datasets Good scalability Stream-informed delta compression is effective with a small cache CPU and IO utilization is low Product allows customers to replicate and protect twice as much data across a WAN

61 Questions?

62

Delta Compressed and Deduplicated Storage Using Stream-Informed Locality

Delta Compressed and Deduplicated Storage Using Stream-Informed Locality Delta Compressed and Deduplicated Storage Using Stream-Informed Locality Philip Shilane, Grant Wallace, Mark Huang, and Windsor Hsu Backup Recovery Systems Division EMC Corporation Abstract For backup

More information

Deduplication Storage System

Deduplication Storage System Deduplication Storage System Kai Li Charles Fitzmorris Professor, Princeton University & Chief Scientist and Co-Founder, Data Domain, Inc. 03/11/09 The World Is Becoming Data-Centric CERN Tier 0 Business

More information

MIGRATORY COMPRESSION Coarse-grained Data Reordering to Improve Compressibility

MIGRATORY COMPRESSION Coarse-grained Data Reordering to Improve Compressibility MIGRATORY COMPRESSION Coarse-grained Data Reordering to Improve Compressibility Xing Lin *, Guanlin Lu, Fred Douglis, Philip Shilane, Grant Wallace * University of Utah EMC Corporation Data Protection

More information

The Logic of Physical Garbage Collection in Deduplicating Storage

The Logic of Physical Garbage Collection in Deduplicating Storage The Logic of Physical Garbage Collection in Deduplicating Storage Fred Douglis Abhinav Duggal Philip Shilane Tony Wong Dell EMC Shiqin Yan University of Chicago Fabiano Botelho Rubrik 1 Deduplication in

More information

Reducing Replication Bandwidth for Distributed Document Databases

Reducing Replication Bandwidth for Distributed Document Databases Reducing Replication Bandwidth for Distributed Document Databases Lianghong Xu 1, Andy Pavlo 1, Sudipta Sengupta 2 Jin Li 2, Greg Ganger 1 Carnegie Mellon University 1, Microsoft Research 2 Document-oriented

More information

Deduplication File System & Course Review

Deduplication File System & Course Review Deduplication File System & Course Review Kai Li 12/13/13 Topics u Deduplication File System u Review 12/13/13 2 Storage Tiers of A Tradi/onal Data Center $$$$ Mirrored storage $$$ Dedicated Fibre Clients

More information

Sparse Indexing: Large-Scale, Inline Deduplication Using Sampling and Locality

Sparse Indexing: Large-Scale, Inline Deduplication Using Sampling and Locality Sparse Indexing: Large-Scale, Inline Deduplication Using Sampling and Locality Mark Lillibridge, Kave Eshghi, Deepavali Bhagwat, Vinay Deolalikar, Greg Trezise, and Peter Camble Work done at Hewlett-Packard

More information

A DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU

A DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU A DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU PRESENTED BY ROMAN SHOR Overview Technics of data reduction in storage systems:

More information

ChunkStash: Speeding Up Storage Deduplication using Flash Memory

ChunkStash: Speeding Up Storage Deduplication using Flash Memory ChunkStash: Speeding Up Storage Deduplication using Flash Memory Biplob Debnath +, Sudipta Sengupta *, Jin Li * * Microsoft Research, Redmond (USA) + Univ. of Minnesota, Twin Cities (USA) Deduplication

More information

Accelerating Restore and Garbage Collection in Deduplication-based Backup Systems via Exploiting Historical Information

Accelerating Restore and Garbage Collection in Deduplication-based Backup Systems via Exploiting Historical Information Accelerating Restore and Garbage Collection in Deduplication-based Backup Systems via Exploiting Historical Information Min Fu, Dan Feng, Yu Hua, Xubin He, Zuoning Chen *, Wen Xia, Fangting Huang, Qing

More information

Characteristics of Backup Workloads in Production Systems

Characteristics of Backup Workloads in Production Systems Characteristics of Backup Workloads in Production Systems Grant Wallace Fred Douglis Hangwei Qian Philip Shilane Stephen Smaldone Mark Chamness Windsor Hsu Backup Recovery Systems Division EMC Corporation

More information

DEC: An Efficient Deduplication-Enhanced Compression Approach

DEC: An Efficient Deduplication-Enhanced Compression Approach 2016 IEEE 22nd International Conference on Parallel and Distributed Systems DEC: An Efficient Deduplication-Enhanced Compression Approach Zijin Han, Wen Xia, Yuchong Hu *, Dan Feng, Yucheng Zhang, Yukun

More information

DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE

DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE WHITEPAPER DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE A Detailed Review ABSTRACT While tape has been the dominant storage medium for data protection for decades because of its low cost, it is steadily

More information

A Survey and Classification of Storage Deduplication Systems

A Survey and Classification of Storage Deduplication Systems 11 A Survey and Classification of Storage Deduplication Systems JOÃO PAULO and JOSÉ PEREIRA, High-Assurance Software Lab (HASLab), INESC TEC & University of Minho The automatic elimination of duplicate

More information

Design Tradeoffs for Data Deduplication Performance in Backup Workloads

Design Tradeoffs for Data Deduplication Performance in Backup Workloads Design Tradeoffs for Data Deduplication Performance in Backup Workloads Min Fu,DanFeng,YuHua,XubinHe, Zuoning Chen *, Wen Xia,YuchengZhang,YujuanTan Huazhong University of Science and Technology Virginia

More information

EMC DATA DOMAIN OPERATING SYSTEM

EMC DATA DOMAIN OPERATING SYSTEM EMC DATA DOMAIN OPERATING SYSTEM Powering EMC Protection Storage ESSENTIALS High-Speed, Scalable Deduplication Up to 31 TB/hr performance Reduces requirements for backup storage by 10 to 30x and archive

More information

Business Benefits of Policy Based Data De-Duplication Data Footprint Reduction with Quality of Service (QoS) for Data Protection

Business Benefits of Policy Based Data De-Duplication Data Footprint Reduction with Quality of Service (QoS) for Data Protection Data Footprint Reduction with Quality of Service (QoS) for Data Protection By Greg Schulz Founder and Senior Analyst, the StorageIO Group Author The Green and Virtual Data Center (Auerbach) October 28th,

More information

Cost-Efficient Remote Backup Services for Enterprise Clouds

Cost-Efficient Remote Backup Services for Enterprise Clouds 1650 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 12, NO. 5, OCTOBER 2016 Cost-Efficient Remote Backup Services for Enterprise Clouds Yu Hua, Senior Member, IEEE, Xue Liu, Member, IEEE, and Dan Feng,

More information

White paper ETERNUS CS800 Data Deduplication Background

White paper ETERNUS CS800 Data Deduplication Background White paper ETERNUS CS800 - Data Deduplication Background This paper describes the process of Data Deduplication inside of ETERNUS CS800 in detail. The target group consists of presales, administrators,

More information

EMC DATA DOMAIN PRODUCT OvERvIEW

EMC DATA DOMAIN PRODUCT OvERvIEW EMC DATA DOMAIN PRODUCT OvERvIEW Deduplication storage for next-generation backup and archive Essentials Scalable Deduplication Fast, inline deduplication Provides up to 65 PBs of logical storage for long-term

More information

An Application Awareness Local Source and Global Source De-Duplication with Security in resource constraint based Cloud backup services

An Application Awareness Local Source and Global Source De-Duplication with Security in resource constraint based Cloud backup services An Application Awareness Local Source and Global Source De-Duplication with Security in resource constraint based Cloud backup services S.Meghana Assistant Professor, Dept. of IT, Vignana Bharathi Institute

More information

DELL EMC DATA DOMAIN EXTENDED RETENTION SOFTWARE

DELL EMC DATA DOMAIN EXTENDED RETENTION SOFTWARE WHITEPAPER DELL EMC DATA DOMAIN EXTENDED RETENTION SOFTWARE A Detailed Review ABSTRACT This white paper introduces Dell EMC Data Domain Extended Retention software that increases the storage scalability

More information

Technology Insight Series

Technology Insight Series EMC Avamar for NAS - Accelerating NDMP Backup Performance John Webster June, 2011 Technology Insight Series Evaluator Group Copyright 2011 Evaluator Group, Inc. All rights reserved. Page 1 of 7 Introduction/Executive

More information

Veeam and HP: Meet your backup data protection goals

Veeam and HP: Meet your backup data protection goals Sponsored by Veeam and HP: Meet your backup data protection goals Eric Machabert Сonsultant and virtualization expert Introduction With virtualization systems becoming mainstream in recent years, backups

More information

Copyright 2010 EMC Corporation. Do not Copy - All Rights Reserved.

Copyright 2010 EMC Corporation. Do not Copy - All Rights Reserved. 1 Using patented high-speed inline deduplication technology, Data Domain systems identify redundant data as they are being stored, creating a storage foot print that is 10X 30X smaller on average than

More information

Tintri & Veeam VM Backup & Replication Best Practices. John Phillips Strategic Alliances and Technical Marketing Ryan Post Systems Engineer

Tintri & Veeam VM Backup & Replication Best Practices. John Phillips Strategic Alliances and Technical Marketing Ryan Post Systems Engineer Tintri & Veeam VM Backup & Replication Best Practices John Phillips Strategic Alliances and Technical Marketing Ryan Post Systems Engineer 1 VM-aware Storage from Tintri Stores VMs and vdisks (only!) No

More information

White Paper Simplified Backup and Reliable Recovery

White Paper Simplified Backup and Reliable Recovery Simplified Backup and Reliable Recovery NEC Corporation of America necam.com Overview Amanda Enterprise from Zmanda - A Carbonite company, is a backup and recovery solution that offers fast installation,

More information

See what s new: Data Domain Global Deduplication Array, DD Boost and more. Copyright 2010 EMC Corporation. All rights reserved.

See what s new: Data Domain Global Deduplication Array, DD Boost and more. Copyright 2010 EMC Corporation. All rights reserved. See what s new: Data Domain Global Deduplication Array, DD Boost and more 2010 1 EMC Backup Recovery Systems (BRS) Division EMC Competitor Competitor Competitor Competitor Competitor Competitor Competitor

More information

DASH COPY GUIDE. Published On: 11/19/2013 V10 Service Pack 4A Page 1 of 31

DASH COPY GUIDE. Published On: 11/19/2013 V10 Service Pack 4A Page 1 of 31 DASH COPY GUIDE Published On: 11/19/2013 V10 Service Pack 4A Page 1 of 31 DASH Copy Guide TABLE OF CONTENTS OVERVIEW GETTING STARTED ADVANCED BEST PRACTICES FAQ TROUBLESHOOTING DASH COPY PERFORMANCE TUNING

More information

EMC Data Domain for Archiving Are You Kidding?

EMC Data Domain for Archiving Are You Kidding? EMC Data Domain for Archiving Are You Kidding? Bill Roth / Bob Spurzem EMC EMC 1 Agenda EMC Introduction Data Domain Enterprise Vault Integration Data Domain NetBackup Integration Q & A EMC 2 EMC Introduction

More information

Protect enterprise data, achieve long-term data retention

Protect enterprise data, achieve long-term data retention Technical white paper Protect enterprise data, achieve long-term data retention HP StoreOnce Catalyst and Symantec NetBackup OpenStorage Table of contents Introduction 2 Technology overview 3 HP StoreOnce

More information

Online Version Only. Book made by this file is ILLEGAL. Design and Implementation of Binary File Similarity Evaluation System. 1.

Online Version Only. Book made by this file is ILLEGAL. Design and Implementation of Binary File Similarity Evaluation System. 1. , pp.1-10 http://dx.doi.org/10.14257/ijmue.2014.9.1.01 Design and Implementation of Binary File Similarity Evaluation System Sun-Jung Kim 2, Young Jun Yoo, Jungmin So 1, Jeong Gun Lee 1, Jin Kim 1 and

More information

Reducing The De-linearization of Data Placement to Improve Deduplication Performance

Reducing The De-linearization of Data Placement to Improve Deduplication Performance Reducing The De-linearization of Data Placement to Improve Deduplication Performance Yujuan Tan 1, Zhichao Yan 2, Dan Feng 2, E. H.-M. Sha 1,3 1 School of Computer Science & Technology, Chongqing University

More information

RecoverPoint Operations

RecoverPoint Operations This chapter contains the following sections: RecoverPoint Appliance Clusters, page 1 Consistency Groups, page 2 Replication Sets, page 17 Group Sets, page 20 Assigning a Policy to a RecoverPoint Task,

More information

Speeding Up Cloud/Server Applications Using Flash Memory

Speeding Up Cloud/Server Applications Using Flash Memory Speeding Up Cloud/Server Applications Using Flash Memory Sudipta Sengupta and Jin Li Microsoft Research, Redmond, WA, USA Contains work that is joint with Biplob Debnath (Univ. of Minnesota) Flash Memory

More information

Rhinoback Online Backup. In-File Delta

Rhinoback Online Backup. In-File Delta December 2006 Table of Content 1 Introduction... 3 1.1 Differential Delta Mode... 3 1.2 Incremental Delta Mode... 3 2 Delta Generation... 4 3 Block Size Setting... 4 4 During Backup... 5 5 During Restore...

More information

Setting Up the DR Series System on Veeam

Setting Up the DR Series System on Veeam Setting Up the DR Series System on Veeam Quest Engineering June 2017 A Quest Technical White Paper Revisions Date January 2014 May 2014 July 2014 April 2015 June 2015 November 2015 April 2016 Description

More information

StorageCraft OneXafe and Veeam 9.5

StorageCraft OneXafe and Veeam 9.5 TECHNICAL DEPLOYMENT GUIDE NOV 2018 StorageCraft OneXafe and Veeam 9.5 Expert Deployment Guide Overview StorageCraft, with its scale-out storage solution OneXafe, compliments Veeam to create a differentiated

More information

The Google File System

The Google File System October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single

More information

Technology Insight Series

Technology Insight Series IBM ProtecTIER Deduplication for z/os John Webster March 04, 2010 Technology Insight Series Evaluator Group Copyright 2010 Evaluator Group, Inc. All rights reserved. Announcement Summary The many data

More information

Oracle Advanced Compression. An Oracle White Paper June 2007

Oracle Advanced Compression. An Oracle White Paper June 2007 Oracle Advanced Compression An Oracle White Paper June 2007 Note: The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated

More information

File Systems Management and Examples

File Systems Management and Examples File Systems Management and Examples Today! Efficiency, performance, recovery! Examples Next! Distributed systems Disk space management! Once decided to store a file as sequence of blocks What s the size

More information

CSE 124: Networked Services Lecture-16

CSE 124: Networked Services Lecture-16 Fall 2010 CSE 124: Networked Services Lecture-16 Instructor: B. S. Manoj, Ph.D http://cseweb.ucsd.edu/classes/fa10/cse124 11/23/2010 CSE 124 Networked Services Fall 2010 1 Updates PlanetLab experiments

More information

PracticeDump. Free Practice Dumps - Unlimited Free Access of practice exam

PracticeDump.   Free Practice Dumps - Unlimited Free Access of practice exam PracticeDump http://www.practicedump.com Free Practice Dumps - Unlimited Free Access of practice exam Exam : E22-280 Title : Avamar Backup and Data Deduplication Exam Vendors : EMC Version : DEMO Get Latest

More information

dedupv1: Improving Deduplication Throughput using Solid State Drives (SSD)

dedupv1: Improving Deduplication Throughput using Solid State Drives (SSD) University Paderborn Paderborn Center for Parallel Computing Technical Report dedupv1: Improving Deduplication Throughput using Solid State Drives (SSD) Dirk Meister Paderborn Center for Parallel Computing

More information

Chapter 4 Data Movement Process

Chapter 4 Data Movement Process Chapter 4 Data Movement Process 46 - Data Movement Process Understanding how CommVault software moves data within the production and protected environment is essential to understanding how to configure

More information

A study of practical deduplication

A study of practical deduplication A study of practical deduplication Dutch T. Meyer University of British Columbia Microsoft Research Intern William Bolosky Microsoft Research Why Dutch is Not Here A study of practical deduplication Dutch

More information

Setting Up the Dell DR Series System on Veeam

Setting Up the Dell DR Series System on Veeam Setting Up the Dell DR Series System on Veeam Dell Engineering April 2016 A Dell Technical White Paper Revisions Date January 2014 May 2014 July 2014 April 2015 June 2015 November 2015 April 2016 Description

More information

The World s Fastest Backup Systems

The World s Fastest Backup Systems 3 The World s Fastest Backup Systems Erwin Freisleben BRS Presales Austria 4 EMC Data Domain: Leadership and Innovation A history of industry firsts 2003 2004 2005 2006 2007 2008 2009 2010 2011 First deduplication

More information

Data Deduplication Methods for Achieving Data Efficiency

Data Deduplication Methods for Achieving Data Efficiency Data Deduplication Methods for Achieving Data Efficiency Matthew Brisse, Quantum Gideon Senderov, NEC... SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies

More information

Kevin Russell ExaGrid Systems. Platinum sponsors:

Kevin Russell ExaGrid Systems. Platinum sponsors: Kevin Russell ExaGrid Systems Platinum sponsors: Established in 2002 Largest independent vendor in the space 100% focused on disk-based backup Winner of the most industry awards Patented technology Large

More information

Deduplication and Its Application to Corporate Data

Deduplication and Its Application to Corporate Data White Paper Deduplication and Its Application to Corporate Data Lorem ipsum ganus metronique elit quesal norit parique et salomin taren ilat mugatoque This whitepaper explains deduplication techniques

More information

Cheetah: An Efficient Flat Addressing Scheme for Fast Query Services in Cloud Computing

Cheetah: An Efficient Flat Addressing Scheme for Fast Query Services in Cloud Computing IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications Cheetah: An Efficient Flat Addressing Scheme for Fast Query Services in Cloud Computing Yu Hua Wuhan National

More information

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review COS 318: Operating Systems NSF, Snapshot, Dedup and Review Topics! NFS! Case Study: NetApp File System! Deduplication storage system! Course review 2 Network File System! Sun introduced NFS v2 in early

More information

Balakrishnan Nair. Senior Technology Consultant Back Up & Recovery Systems South Gulf. Copyright 2011 EMC Corporation. All rights reserved.

Balakrishnan Nair. Senior Technology Consultant Back Up & Recovery Systems South Gulf. Copyright 2011 EMC Corporation. All rights reserved. Balakrishnan Nair Senior Technology Consultant Back Up & Recovery Systems South Gulf 1 Thinking Fast: The World s Fastest Backup Now Does Archive Too Introducing the New EMC Backup and Recovery Solutions

More information

IBM Tivoli Storage Manager Version Introduction to Data Protection Solutions IBM

IBM Tivoli Storage Manager Version Introduction to Data Protection Solutions IBM IBM Tivoli Storage Manager Version 7.1.6 Introduction to Data Protection Solutions IBM IBM Tivoli Storage Manager Version 7.1.6 Introduction to Data Protection Solutions IBM Note: Before you use this

More information

Configuring Short RPO with Actifio StreamSnap and Dedup-Async Replication

Configuring Short RPO with Actifio StreamSnap and Dedup-Async Replication CDS and Sky Tech Brief Configuring Short RPO with Actifio StreamSnap and Dedup-Async Replication Actifio recommends using Dedup-Async Replication (DAR) for RPO of 4 hours or more and using StreamSnap for

More information

EMC RecoverPoint. EMC RecoverPoint Support

EMC RecoverPoint. EMC RecoverPoint Support Support, page 1 Adding an Account, page 2 RecoverPoint Appliance Clusters, page 3 Replication Through Consistency Groups, page 4 Group Sets, page 22 System Tasks, page 24 Support protects storage array

More information

StorageCraft OneBlox and Veeam 9.5 Expert Deployment Guide

StorageCraft OneBlox and Veeam 9.5 Expert Deployment Guide TECHNICAL DEPLOYMENT GUIDE StorageCraft OneBlox and Veeam 9.5 Expert Deployment Guide Overview StorageCraft, with its scale-out storage solution OneBlox, compliments Veeam to create a differentiated diskbased

More information

Scale-out Object Store for PB/hr Backups and Long Term Archive April 24, 2014

Scale-out Object Store for PB/hr Backups and Long Term Archive April 24, 2014 Scale-out Object Store for PB/hr Backups and Long Term Archive April 24, 2014 Gideon Senderov Director, Advanced Storage Products NEC Corporation of America Long-Term Data in the Data Center (EB) 140 120

More information

Can t We All Get Along? Redesigning Protection Storage for Modern Workloads

Can t We All Get Along? Redesigning Protection Storage for Modern Workloads Can t We All Get Along? Redesigning Protection Storage for Modern Workloads Yamini Allu, Fred Douglis, Mahesh Kamat, Ramya Prabhakar, Philip Shilane, and Rahul Ugale, Dell EMC https://www.usenix.org/conference/atc18/presentation/allu

More information

VCS-276.exam. Number: VCS-276 Passing Score: 800 Time Limit: 120 min File Version: VCS-276

VCS-276.exam. Number: VCS-276 Passing Score: 800 Time Limit: 120 min File Version: VCS-276 VCS-276.exam Number: VCS-276 Passing Score: 800 Time Limit: 120 min File Version: 1.0 VCS-276 Administration of Veritas NetBackup 8.0 Version 1.0 Exam A QUESTION 1 A NetBackup policy is configured to back

More information

Case study: Building bi-directional DR. Joep Piscaer, VMware vexpert, VCDX #101

Case study: Building bi-directional DR. Joep Piscaer, VMware vexpert, VCDX #101 Case study: Building bi-directional DR Joep Piscaer, VMware vexpert, VCDX #101 Agenda Introduction Project description, goals, requirements, constraints High level overview: product and component overview

More information

Virtualization Technique For Replica Synchronization

Virtualization Technique For Replica Synchronization Virtualization Technique For Replica Synchronization By : Ashwin G.Sancheti Email:ashwin@cs.jhu.edu Instructor : Prof.Randal Burns Date : 19 th Feb 2008 Roadmap Motivation/Goals What is Virtualization?

More information

Chapter 3 `How a Storage Policy Works

Chapter 3 `How a Storage Policy Works Chapter 3 `How a Storage Policy Works 32 - How a Storage Policy Works A Storage Policy defines the lifecycle management rules for all protected data. In its most basic form, a storage policy can be thought

More information

PASS4TEST. IT Certification Guaranteed, The Easy Way! We offer free update service for one year

PASS4TEST. IT Certification Guaranteed, The Easy Way!   We offer free update service for one year PASS4TEST \ http://www.pass4test.com We offer free update service for one year Exam : E20-329 Title : Technology Architect Backup and Recovery Solutions Design Exam Vendor : EMC Version : DEMO Get Latest

More information

IBM Spectrum Protect Version Introduction to Data Protection Solutions IBM

IBM Spectrum Protect Version Introduction to Data Protection Solutions IBM IBM Spectrum Protect Version 8.1.2 Introduction to Data Protection Solutions IBM IBM Spectrum Protect Version 8.1.2 Introduction to Data Protection Solutions IBM Note: Before you use this information

More information

Scale-out Data Deduplication Architecture

Scale-out Data Deduplication Architecture Scale-out Data Deduplication Architecture Gideon Senderov Product Management & Technical Marketing NEC Corporation of America Outline Data Growth and Retention Deduplication Methods Legacy Architecture

More information

EaSync: A Transparent File Synchronization Service across Multiple Machines

EaSync: A Transparent File Synchronization Service across Multiple Machines EaSync: A Transparent File Synchronization Service across Multiple Machines Huajian Mao 1,2, Hang Zhang 1,2, Xianqiang Bao 1,2, Nong Xiao 1,2, Weisong Shi 3, and Yutong Lu 1,2 1 State Key Laboratory of

More information

Frequency Based Chunking for Data De-Duplication

Frequency Based Chunking for Data De-Duplication 2010 18th Annual IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems Frequency Based Chunking for Data De-Duplication Guanlin Lu, Yu Jin, and

More information

VMware vsphere Data Protection 5.8 TECHNICAL OVERVIEW REVISED AUGUST 2014

VMware vsphere Data Protection 5.8 TECHNICAL OVERVIEW REVISED AUGUST 2014 VMware vsphere Data Protection 5.8 TECHNICAL OVERVIEW REVISED AUGUST 2014 Table of Contents Introduction.... 3 Features and Benefits of vsphere Data Protection... 3 Additional Features and Benefits of

More information

HP Designing and Implementing HP Enterprise Backup Solutions. Download Full Version :

HP Designing and Implementing HP Enterprise Backup Solutions. Download Full Version : HP HP0-771 Designing and Implementing HP Enterprise Backup Solutions Download Full Version : http://killexams.com/pass4sure/exam-detail/hp0-771 A. copy backup B. normal backup C. differential backup D.

More information

GFS: The Google File System. Dr. Yingwu Zhu

GFS: The Google File System. Dr. Yingwu Zhu GFS: The Google File System Dr. Yingwu Zhu Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one big CPU More storage, CPU required than one PC can

More information

ECE 2300 Digital Logic & Computer Organization. More Caches

ECE 2300 Digital Logic & Computer Organization. More Caches ECE 23 Digital Logic & Computer Organization Spring 217 More Caches 1 Prelim 2 stats High: 9 (out of 9) Mean: 7.2, Median: 73 Announcements Prelab 5(C) due tomorrow 2 Example: Direct Mapped (DM) Cache

More information

Dell DR4100. Disk based Data Protection and Disaster Recovery. April 3, Birger Ferber, Enterprise Technologist Storage EMEA

Dell DR4100. Disk based Data Protection and Disaster Recovery. April 3, Birger Ferber, Enterprise Technologist Storage EMEA Dell DR4100 Disk based Data Protection and Disaster Recovery April 3, 2013 Birger Ferber, Enterprise Technologist Storage EMEA Agenda DR4100 Technical Review DR4100 New Features DR4100 Use Cases DR4100

More information

Construct a High Efficiency VM Disaster Recovery Solution. Best choice for protecting virtual environments

Construct a High Efficiency VM Disaster Recovery Solution. Best choice for protecting virtual environments Construct a High Efficiency VM Disaster Recovery Solution Best choice for protecting virtual environments About NAKIVO Established in the USA since 2012 Provides data protection solutions for VMware, Hyper-V

More information

Parallelizing Inline Data Reduction Operations for Primary Storage Systems

Parallelizing Inline Data Reduction Operations for Primary Storage Systems Parallelizing Inline Data Reduction Operations for Primary Storage Systems Jeonghyeon Ma ( ) and Chanik Park Department of Computer Science and Engineering, POSTECH, Pohang, South Korea {doitnow0415,cipark}@postech.ac.kr

More information

Global Headquarters: 5 Speen Street Framingham, MA USA P F

Global Headquarters: 5 Speen Street Framingham, MA USA P F B U Y E R C A S E S T U D Y V M w a r e I m p r o v e s N e t w o r k U t i l i z a t i o n a n d B a c k u p P e r f o r m a n c e U s i n g A v a m a r ' s C l i e n t - S i d e D e d u p l i c a t i

More information

HP StoreOnce: reinventing data deduplication

HP StoreOnce: reinventing data deduplication HP : reinventing data deduplication Reduce the impact of explosive data growth with HP StorageWorks D2D Backup Systems Technical white paper Table of contents Executive summary... 2 Introduction to data

More information

Backup for VMware ESX Environments with Veritas NetBackup PureDisk from Symantec

Backup for VMware ESX Environments with Veritas NetBackup PureDisk from Symantec Backup for VMware ESX Environments with Veritas NetBackup PureDisk from Symantec A comparison of three different methods of backing up ESX guests using deduplication Mark Salmon, Sr. Principal Systems

More information

Lazy Exact Deduplication

Lazy Exact Deduplication Lazy Exact Deduplication JINGWEI MA, REBECCA J. STONES, YUXIANG MA, JINGUI WANG, JUNJIE REN, GANG WANG, and XIAOGUANG LIU, College of Computer and Control Engineering, Nankai University 11 Deduplication

More information

HYDRAstor: a Scalable Secondary Storage

HYDRAstor: a Scalable Secondary Storage HYDRAstor: a Scalable Secondary Storage 7th TF-Storage Meeting September 9 th 00 Łukasz Heldt Largest Japanese IT company $4 Billion in annual revenue 4,000 staff www.nec.com Polish R&D company 50 engineers

More information

Data Reduction Meets Reality What to Expect From Data Reduction

Data Reduction Meets Reality What to Expect From Data Reduction Data Reduction Meets Reality What to Expect From Data Reduction Doug Barbian and Martin Murrey Oracle Corporation Thursday August 11, 2011 9961: Data Reduction Meets Reality Introduction Data deduplication

More information

DELL EMC DATA DOMAIN OPERATING SYSTEM

DELL EMC DATA DOMAIN OPERATING SYSTEM DATA SHEET DD OS Essentials High-speed, scalable deduplication Up to 68 TB/hr performance Reduces protection storage requirements by 10 to 30x CPU-centric scalability Data invulnerability architecture

More information

The 5 Keys to Virtual Backup Excellence

The 5 Keys to Virtual Backup Excellence The 5 Keys to Virtual Backup Excellence The Challenges 51 87 Percent of IT pros say virtualized server backups do not meet or only partially meet business goals for restores and recovery Percent of CIOs

More information

Hyper-converged Secondary Storage for Backup with Deduplication Q & A. The impact of data deduplication on the backup process

Hyper-converged Secondary Storage for Backup with Deduplication Q & A. The impact of data deduplication on the backup process Hyper-converged Secondary Storage for Backup with Deduplication Q & A The impact of data deduplication on the backup process Table of Contents Introduction... 3 What is data deduplication?... 3 Is all

More information

WHITE PAPER. DATA DEDUPLICATION BACKGROUND: A Technical White Paper

WHITE PAPER. DATA DEDUPLICATION BACKGROUND: A Technical White Paper WHITE PAPER DATA DEDUPLICATION BACKGROUND: A Technical White Paper CONTENTS Data Deduplication Multiple Data Sets from a Common Storage Pool.......................3 Fixed-Length Blocks vs. Variable-Length

More information

TransferSoft Launches HyperTransfer for OEMs Most Comprehensive and Efficient Enterprisee Data Transfer Software Suite in the Industry

TransferSoft Launches HyperTransfer for OEMs Most Comprehensive and Efficient Enterprisee Data Transfer Software Suite in the Industry Broadcast Beat Magazine NAB 2015 Edition Author: Curtis Chan, Senior Editor TransferSoft Launches HyperTransfer for OEMs Most Comprehensive and Efficient Enterprisee Data Transfer Software Suite in the

More information

HEAD HardwarE Accelerated Deduplication

HEAD HardwarE Accelerated Deduplication HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee Executive Summary A-Z development of deduplication SW version

More information

GFS: The Google File System

GFS: The Google File System GFS: The Google File System Brad Karp UCL Computer Science CS GZ03 / M030 24 th October 2014 Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one

More information

The Google File System

The Google File System The Google File System By Ghemawat, Gobioff and Leung Outline Overview Assumption Design of GFS System Interactions Master Operations Fault Tolerance Measurements Overview GFS: Scalable distributed file

More information

Setting Up Quest QoreStor with Veeam Backup & Replication. Technical White Paper

Setting Up Quest QoreStor with Veeam Backup & Replication. Technical White Paper Setting Up Quest QoreStor with Veeam Backup & Replication Technical White Paper Quest Engineering August 2018 2018 Quest Software Inc. ALL RIGHTS RESERVED. THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES

More information

Google File System. Arun Sundaram Operating Systems

Google File System. Arun Sundaram Operating Systems Arun Sundaram Operating Systems 1 Assumptions GFS built with commodity hardware GFS stores a modest number of large files A few million files, each typically 100MB or larger (Multi-GB files are common)

More information

The Google File System (GFS)

The Google File System (GFS) 1 The Google File System (GFS) CS60002: Distributed Systems Antonio Bruto da Costa Ph.D. Student, Formal Methods Lab, Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur 2 Design constraints

More information

DELL EMC DATA DOMAIN OPERATING SYSTEM

DELL EMC DATA DOMAIN OPERATING SYSTEM DATA SHEET DD OS Essentials High-speed, scalable deduplication Reduces protection storage requirements by up to 55x Up to 3x restore performance CPU-centric scalability Data invulnerability architecture

More information

VMware vsphere Data Protection Evaluation Guide REVISED APRIL 2015

VMware vsphere Data Protection Evaluation Guide REVISED APRIL 2015 VMware vsphere Data Protection REVISED APRIL 2015 Table of Contents Introduction.... 3 Features and Benefits of vsphere Data Protection... 3 Requirements.... 4 Evaluation Workflow... 5 Overview.... 5 Evaluation

More information

TIBX NEXT-GENERATION ARCHIVE FORMAT IN ACRONIS BACKUP CLOUD

TIBX NEXT-GENERATION ARCHIVE FORMAT IN ACRONIS BACKUP CLOUD TIBX NEXT-GENERATION ARCHIVE FORMAT IN ACRONIS BACKUP CLOUD 1 Backup Speed and Reliability Are the Top Data Protection Mandates What are the top data protection mandates from your organization s IT leadership?

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771 Overview Google file system is a scalable distributed file system

More information

Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES

Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES May, 2017 Contents Introduction... 2 Overview... 2 Architecture... 2 SDFS File System Service... 3 Data Writes... 3 Data Reads... 3 De-duplication

More information

Your World is Hybrid: Protecting your VMs with Veeam and HPE Storage. Federico Venier HPE Storage Technical marketing

Your World is Hybrid: Protecting your VMs with Veeam and HPE Storage. Federico Venier HPE Storage Technical marketing Your World is Hybrid: Protecting your s with and HPE Storage Federico Venier HPE Storage Technical marketing federico.venier@hpe.com HPE & : Integration summary SQL Applications HPE + License Reselling

More information