Sparse Indexing: Large-Scale, Inline Deduplication Using Sampling and Locality

Size: px
Start display at page:

Download "Sparse Indexing: Large-Scale, Inline Deduplication Using Sampling and Locality"

Transcription

1 Sparse Indexing: Large-Scale, Inline Deduplication Using Sampling and Locality Mark Lillibridge, Kave Eshghi, Deepavali Bhagwat, Vinay Deolalikar, Greg Trezise, and Peter Camble Work done at Hewlett-Packard laboratories 2009 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

2 The Problem: deduplication at scale for disk-to-disk backup

3 A Disk-to-Disk Backup Scenario D2D server streaming data (fake tape library) Each tape: up to 400 GB Total: 100 TB 10 PB 3 February 26, 2009

4 Example backup streams Monday: file B file D file E Tuesday: file D file E Wednesday: file X file D file E 4 February 26, 2009

5 Little changes from day-to-day Monday: file B file D file E Tuesday: file D file E Wednesday: file X file D file E 5 February 26, 2009

6 After ideal deduplication Monday: file B file D file E Tuesday: file D file E Wednesday: file X file D file E 6 February 26, 2009

7 Chunk-based deduplication Monday: file B file D file E Tuesday: file D file E Wednesday: file X file D file E 7 February 26, 2009

8 Chunk-based deduplication Monday: file B file D file E Tuesday: file D file E Wednesday: file X file D file E 8 February 26, 2009

9 Chunk-based deduplication Monday: file B file D file E Tuesday: file D file E Wednesday: file X file D file E 9 February 26, 2009

10 The standard implementation index: RAM Disk chunk store: H(B) H(A) H(C) A B C 10 February 26, 2009

11 The standard implementation index: RAM Disk chunk store: H(B) H(A) H(C) A B C 11 February 26, 2009 Chunk-lookup disk bottleneck

12 One existing solution Avoiding the Disk Bottleneck in the Data Domain Deduplication File System. Benjamin Zhu, Data Domain, Inc.; Kai Li, Data Domain, Inc., and Princeton University; Hugo Patterson, Data Domain, Inc. FAST 08. Today: a new approach that uses significantly less RAM provides a guaranteed minimum throughput 12 February 26, 2009

13 Our Approach: Sparse indexing

14 Sparse indexing Key ideas: Chunk locality Sampling 14 February 26, 2009

15 No temporal locality Monday: file B file D file E Tuesday: file D file E Wednesday: file X file D file E 15 February 26, 2009

16 Large sections of data reappear mostly intact chunk locality Monday: file B file D file E Tuesday: file D file E Wednesday: file X file D file E 16 February 26, 2009

17 Exploiting chunk locality file B file D file X file D 17 February 26, 2009

18 Divide into segments file B file D new file X file D Chunks not shown, real segments much longer 18 February 26, 2009

19 Deduplicate one segment at a time file B file D new file X file D Against a few carefully chosen champion segments 19 February 26, 2009

20 Champion #1: the most similar segment file B file D new file X file D 20 February 26, 2009

21 Champion #2: most similar to remainder file B file D new file X file D 21 February 26, 2009

22 Finding similar segments by sampling file B file D file X file D Sparse Index: samples containing segment(s) 22 February 26, 2009

23 A few details Also keep segment recipes: list of pointers to a segment s chunks Actually deduplicate against champion recipes Better with variable-sized segments boundaries based on landmarks ( superchunks ) reduces number of champions required 23 February 26, 2009

24 Putting it all together byte stream Chunker chunks Segmenter segments Champion samples chooser segment IDs champion IDs Deduplicator Sparse index updates champion recipes new recipes new chunks (compressed) Segment recipes Disk storage Chunk data files 24 February 26, 2009

25 Results

26 Methodology Built a simulator Fixed parameters: 4 KB mean chunk size variable-size segments maximum of 1 segment ID kept per sample Varying parameters: mean segment size sampling rate maximum number of champions per segment (M) 26 February 26, 2009

27 The data sets Workgroup [this talk] 3.8 TB backups of 20 desktop PCs belonging to engineers semi-regular backups over 3 months via tar 154 full backups and 392 incremental backups end-of-week full backups are synthetic SMB [see paper] backups of a server with real Oracle data synthetic Microsoft Exchange data two weeks 0.6 TB 27 February 26, 2009

28 Chunk locality exists 28 February 26, 2009

29 Sampling can exploit most of it (10 MB mean segment size) 29 February 26, 2009

30 Deduplication with at most 10 champions 30 February 26, 2009

31 Deduplication depends primarily on 31 February 26, 2009

32 Index RAM usage 1/128 1/64 1/32 32 February 26, 2009

33 Comparison with Zhu, et al. Their chunk lookup: bloom filter: might the store have a copy? cache of chunk container indexes full on disk index 33 February 26, 2009

34 Comparison with Zhu, et al. Their chunk lookup: bloom filter: might the store have a copy? cache of chunk container indexes full on disk index When chunk locality is poor, deduplication quality remains constant but throughput degrades Find all duplicate chunks but larger chunk size 34 February 26, 2009

35 Ram usage comparison 1/128 1/64 1/32 4 KB 8 KB 16 KB 32 KB 35 February 26, 2009

36 What about all those disk accesses? Infrequent due to batch processing Example: load at most 10 champions per 10 MB segment average of 1.7 champions per 10 MB segment = 0.17 champions/mb = 1 seek per 5 MB I/O burden: 20 ms to load a champion recipe (~100 KB) 1 drive can handle > 250 MB/s ingestion rate 36 February 26, 2009

37 37 February 26, 2009 Thank You

Deduplication Storage System

Deduplication Storage System Deduplication Storage System Kai Li Charles Fitzmorris Professor, Princeton University & Chief Scientist and Co-Founder, Data Domain, Inc. 03/11/09 The World Is Becoming Data-Centric CERN Tier 0 Business

More information

SHHC: A Scalable Hybrid Hash Cluster for Cloud Backup Services in Data Centers

SHHC: A Scalable Hybrid Hash Cluster for Cloud Backup Services in Data Centers 2011 31st International Conference on Distributed Computing Systems Workshops SHHC: A Scalable Hybrid Hash Cluster for Cloud Backup Services in Data Centers Lei Xu, Jian Hu, Stephen Mkandawire and Hong

More information

ChunkStash: Speeding Up Storage Deduplication using Flash Memory

ChunkStash: Speeding Up Storage Deduplication using Flash Memory ChunkStash: Speeding Up Storage Deduplication using Flash Memory Biplob Debnath +, Sudipta Sengupta *, Jin Li * * Microsoft Research, Redmond (USA) + Univ. of Minnesota, Twin Cities (USA) Deduplication

More information

WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression

WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression Philip Shilane, Mark Huang, Grant Wallace, & Windsor Hsu Backup Recovery Systems Division EMC Corporation Introduction

More information

Deduplication File System & Course Review

Deduplication File System & Course Review Deduplication File System & Course Review Kai Li 12/13/13 Topics u Deduplication File System u Review 12/13/13 2 Storage Tiers of A Tradi/onal Data Center $$$$ Mirrored storage $$$ Dedicated Fibre Clients

More information

DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE

DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE WHITEPAPER DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE A Detailed Review ABSTRACT While tape has been the dominant storage medium for data protection for decades because of its low cost, it is steadily

More information

dedupv1: Improving Deduplication Throughput using Solid State Drives (SSD)

dedupv1: Improving Deduplication Throughput using Solid State Drives (SSD) University Paderborn Paderborn Center for Parallel Computing Technical Report dedupv1: Improving Deduplication Throughput using Solid State Drives (SSD) Dirk Meister Paderborn Center for Parallel Computing

More information

Reducing The De-linearization of Data Placement to Improve Deduplication Performance

Reducing The De-linearization of Data Placement to Improve Deduplication Performance Reducing The De-linearization of Data Placement to Improve Deduplication Performance Yujuan Tan 1, Zhichao Yan 2, Dan Feng 2, E. H.-M. Sha 1,3 1 School of Computer Science & Technology, Chongqing University

More information

Deploying De-Duplication on Ext4 File System

Deploying De-Duplication on Ext4 File System Deploying De-Duplication on Ext4 File System Usha A. Joglekar 1, Bhushan M. Jagtap 2, Koninika B. Patil 3, 1. Asst. Prof., 2, 3 Students Department of Computer Engineering Smt. Kashibai Navale College

More information

The Logic of Physical Garbage Collection in Deduplicating Storage

The Logic of Physical Garbage Collection in Deduplicating Storage The Logic of Physical Garbage Collection in Deduplicating Storage Fred Douglis Abhinav Duggal Philip Shilane Tony Wong Dell EMC Shiqin Yan University of Chicago Fabiano Botelho Rubrik 1 Deduplication in

More information

Speeding Up Cloud/Server Applications Using Flash Memory

Speeding Up Cloud/Server Applications Using Flash Memory Speeding Up Cloud/Server Applications Using Flash Memory Sudipta Sengupta and Jin Li Microsoft Research, Redmond, WA, USA Contains work that is joint with Biplob Debnath (Univ. of Minnesota) Flash Memory

More information

In-line Deduplication for Cloud storage to Reduce Fragmentation by using Historical Knowledge

In-line Deduplication for Cloud storage to Reduce Fragmentation by using Historical Knowledge In-line Deduplication for Cloud storage to Reduce Fragmentation by using Historical Knowledge Smitha.M. S, Prof. Janardhan Singh Mtech Computer Networking, Associate Professor Department of CSE, Cambridge

More information

Delta Compressed and Deduplicated Storage Using Stream-Informed Locality

Delta Compressed and Deduplicated Storage Using Stream-Informed Locality Delta Compressed and Deduplicated Storage Using Stream-Informed Locality Philip Shilane, Grant Wallace, Mark Huang, and Windsor Hsu Backup Recovery Systems Division EMC Corporation Abstract For backup

More information

HP Dynamic Deduplication achieving a 50:1 ratio

HP Dynamic Deduplication achieving a 50:1 ratio HP Dynamic Deduplication achieving a 50:1 ratio Table of contents Introduction... 2 Data deduplication the hottest topic in data protection... 2 The benefits of data deduplication... 2 How does data deduplication

More information

Rethinking Deduplication Scalability

Rethinking Deduplication Scalability Rethinking Deduplication Scalability Petros Efstathopoulos Petros Efstathopoulos@symantec.com Fanglu Guo Fanglu Guo@symantec.com Symantec Research Labs Symantec Corporation, Culver City, CA, USA 1 ABSTRACT

More information

Reducing Replication Bandwidth for Distributed Document Databases

Reducing Replication Bandwidth for Distributed Document Databases Reducing Replication Bandwidth for Distributed Document Databases Lianghong Xu 1, Andy Pavlo 1, Sudipta Sengupta 2 Jin Li 2, Greg Ganger 1 Carnegie Mellon University 1, Microsoft Research 2 Document-oriented

More information

Accelerating Restore and Garbage Collection in Deduplication-based Backup Systems via Exploiting Historical Information

Accelerating Restore and Garbage Collection in Deduplication-based Backup Systems via Exploiting Historical Information Accelerating Restore and Garbage Collection in Deduplication-based Backup Systems via Exploiting Historical Information Min Fu, Dan Feng, Yu Hua, Xubin He, Zuoning Chen *, Wen Xia, Fangting Huang, Qing

More information

EMC Data Domain for Archiving Are You Kidding?

EMC Data Domain for Archiving Are You Kidding? EMC Data Domain for Archiving Are You Kidding? Bill Roth / Bob Spurzem EMC EMC 1 Agenda EMC Introduction Data Domain Enterprise Vault Integration Data Domain NetBackup Integration Q & A EMC 2 EMC Introduction

More information

Providing a first class, enterprise-level, backup and archive service for Oxford University

Providing a first class, enterprise-level, backup and archive service for Oxford University Providing a first class, enterprise-level, backup and archive service for Oxford University delivering responsive, innovative IT 11th June 2013 11 th June 2013 Contents Service description Service infrastructure

More information

Parallelizing Inline Data Reduction Operations for Primary Storage Systems

Parallelizing Inline Data Reduction Operations for Primary Storage Systems Parallelizing Inline Data Reduction Operations for Primary Storage Systems Jeonghyeon Ma ( ) and Chanik Park Department of Computer Science and Engineering, POSTECH, Pohang, South Korea {doitnow0415,cipark}@postech.ac.kr

More information

UNIVERSITY OF CALIFORNIA SANTA CRUZ

UNIVERSITY OF CALIFORNIA SANTA CRUZ UNIVERSITY OF CALIFORNIA SANTA CRUZ DEDUPLICATION FOR LARGE SCALE BACKUP AND ARCHIVAL STORAGE A dissertation submitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILOSOPHY

More information

HP StoreOnce: reinventing data deduplication

HP StoreOnce: reinventing data deduplication HP : reinventing data deduplication Reduce the impact of explosive data growth with HP StorageWorks D2D Backup Systems Technical white paper Table of contents Executive summary... 2 Introduction to data

More information

An Application Awareness Local Source and Global Source De-Duplication with Security in resource constraint based Cloud backup services

An Application Awareness Local Source and Global Source De-Duplication with Security in resource constraint based Cloud backup services An Application Awareness Local Source and Global Source De-Duplication with Security in resource constraint based Cloud backup services S.Meghana Assistant Professor, Dept. of IT, Vignana Bharathi Institute

More information

Extreme Binning: Scalable, Parallel Deduplication for Chunk-based File Backup Deepavali Bhagwat, Kave Eshghi, Darrell D.E. Long, Mark Lillibridge

Extreme Binning: Scalable, Parallel Deduplication for Chunk-based File Backup Deepavali Bhagwat, Kave Eshghi, Darrell D.E. Long, Mark Lillibridge Extreme Binning: Scalable, Parallel Deduplication for Chunk-based File Backup Deepavali Bhagwat, Kave Eshghi, Darrell D.E. Long, Mark Lillibridge HP Laboratories HPL-29-1R2 Keyword(s): backup, deduplication,

More information

EMC DATA DOMAIN OPERATING SYSTEM

EMC DATA DOMAIN OPERATING SYSTEM EMC DATA DOMAIN OPERATING SYSTEM Powering EMC Protection Storage ESSENTIALS High-Speed, Scalable Deduplication Up to 31 TB/hr performance Reduces requirements for backup storage by 10 to 30x and archive

More information

Deduplication and Incremental Accelleration in Bacula with NetApp Technologies. Peter Buschman EMEA PS Consultant September 25th, 2012

Deduplication and Incremental Accelleration in Bacula with NetApp Technologies. Peter Buschman EMEA PS Consultant September 25th, 2012 Deduplication and Incremental Accelleration in Bacula with NetApp Technologies Peter Buschman EMEA PS Consultant September 25th, 2012 1 NetApp and Bacula Systems Bacula Systems became a NetApp Developer

More information

Design Tradeoffs for Data Deduplication Performance in Backup Workloads

Design Tradeoffs for Data Deduplication Performance in Backup Workloads Design Tradeoffs for Data Deduplication Performance in Backup Workloads Min Fu,DanFeng,YuHua,XubinHe, Zuoning Chen *, Wen Xia,YuchengZhang,YujuanTan Huazhong University of Science and Technology Virginia

More information

Data De-duplication for Distributed Segmented Parallel FS

Data De-duplication for Distributed Segmented Parallel FS Data De-duplication for Distributed Segmented Parallel FS Boris Zuckerman & Oskar Batuner Hewlett-Packard Co. Objectives Expose fundamentals of highly distributed segmented parallel file system architecture

More information

A study of practical deduplication

A study of practical deduplication A study of practical deduplication Dutch T. Meyer University of British Columbia Microsoft Research Intern William Bolosky Microsoft Research Why Dutch is Not Here A study of practical deduplication Dutch

More information

HP s VLS9000 and D2D4112 deduplication systems

HP s VLS9000 and D2D4112 deduplication systems Silverton Consulting StorInt Briefing Introduction Particularly in today s economy, costs and return on investment (ROI) often dominate product selection decisions. However, gathering the appropriate information

More information

Understanding Primary Storage Optimization Options Jered Floyd Permabit Technology Corp.

Understanding Primary Storage Optimization Options Jered Floyd Permabit Technology Corp. Understanding Primary Storage Optimization Options Jered Floyd Permabit Technology Corp. Primary Storage Optimization Technologies that let you store more data on the same storage Thin provisioning Copy-on-write

More information

Protect enterprise data, achieve long-term data retention

Protect enterprise data, achieve long-term data retention Technical white paper Protect enterprise data, achieve long-term data retention HP StoreOnce Catalyst and Symantec NetBackup OpenStorage Table of contents Introduction 2 Technology overview 3 HP StoreOnce

More information

DEC: An Efficient Deduplication-Enhanced Compression Approach

DEC: An Efficient Deduplication-Enhanced Compression Approach 2016 IEEE 22nd International Conference on Parallel and Distributed Systems DEC: An Efficient Deduplication-Enhanced Compression Approach Zijin Han, Wen Xia, Yuchong Hu *, Dan Feng, Yucheng Zhang, Yukun

More information

Computer Memory. Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1

Computer Memory. Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1 Computer Memory Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1 Warm Up public int sum1(int n, int m, int[][] table) { int output = 0; for (int i = 0; i < n; i++) { for (int j = 0; j

More information

A DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU

A DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU A DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU PRESENTED BY ROMAN SHOR Overview Technics of data reduction in storage systems:

More information

Data Reduction Meets Reality What to Expect From Data Reduction

Data Reduction Meets Reality What to Expect From Data Reduction Data Reduction Meets Reality What to Expect From Data Reduction Doug Barbian and Martin Murrey Oracle Corporation Thursday August 11, 2011 9961: Data Reduction Meets Reality Introduction Data deduplication

More information

Balakrishnan Nair. Senior Technology Consultant Back Up & Recovery Systems South Gulf. Copyright 2011 EMC Corporation. All rights reserved.

Balakrishnan Nair. Senior Technology Consultant Back Up & Recovery Systems South Gulf. Copyright 2011 EMC Corporation. All rights reserved. Balakrishnan Nair Senior Technology Consultant Back Up & Recovery Systems South Gulf 1 Thinking Fast: The World s Fastest Backup Now Does Archive Too Introducing the New EMC Backup and Recovery Solutions

More information

the minimum feature hashes of the given document. We are analyzing here this technique for deduplication workloads.

the minimum feature hashes of the given document. We are analyzing here this technique for deduplication workloads. Improved Deduplication through Parallel Binning Zhike Zhang Univ. of California Santa Cruz, CA zhike@cs.ucsc.edu Deepavali Bhagwat Hewlett Packard Company Palo Alto, CA deepavali.bhagwat@hp.com Witold

More information

Technology Insight Series

Technology Insight Series IBM ProtecTIER Deduplication for z/os John Webster March 04, 2010 Technology Insight Series Evaluator Group Copyright 2010 Evaluator Group, Inc. All rights reserved. Announcement Summary The many data

More information

ext 118

ext 118 www.online-computer.com 800-923-9988 ext 118 Copyright 2013 ExaGrid Systems, Inc. All rights reserved. ExaGrid and the ExaGrid logo are trademarks of ExaGrid Systems, Inc. All other trademarks or registered

More information

Hyper-converged Secondary Storage for Backup with Deduplication Q & A. The impact of data deduplication on the backup process

Hyper-converged Secondary Storage for Backup with Deduplication Q & A. The impact of data deduplication on the backup process Hyper-converged Secondary Storage for Backup with Deduplication Q & A The impact of data deduplication on the backup process Table of Contents Introduction... 3 What is data deduplication?... 3 Is all

More information

EMC DATA DOMAIN PRODUCT OvERvIEW

EMC DATA DOMAIN PRODUCT OvERvIEW EMC DATA DOMAIN PRODUCT OvERvIEW Deduplication storage for next-generation backup and archive Essentials Scalable Deduplication Fast, inline deduplication Provides up to 65 PBs of logical storage for long-term

More information

Part II: Data Center Software Architecture: Topic 2: Key-value Data Management Systems. SkimpyStash: Key Value Store on Flash-based Storage

Part II: Data Center Software Architecture: Topic 2: Key-value Data Management Systems. SkimpyStash: Key Value Store on Flash-based Storage ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 2: Key-value Data Management Systems SkimpyStash: Key Value

More information

DASH COPY GUIDE. Published On: 11/19/2013 V10 Service Pack 4A Page 1 of 31

DASH COPY GUIDE. Published On: 11/19/2013 V10 Service Pack 4A Page 1 of 31 DASH COPY GUIDE Published On: 11/19/2013 V10 Service Pack 4A Page 1 of 31 DASH Copy Guide TABLE OF CONTENTS OVERVIEW GETTING STARTED ADVANCED BEST PRACTICES FAQ TROUBLESHOOTING DASH COPY PERFORMANCE TUNING

More information

Advancing the AFF4 to the challenges of Volatile Memory & Single Hashes

Advancing the AFF4 to the challenges of Volatile Memory & Single Hashes Advancing the AFF4 to the challenges of Volatile Memory & Single Hashes Dr. Bradley Schatz Director, Schatz Forensic v1.0 DFRWS - 2017 Schatz Forensic 2017 Agenda AFF4 background What can be improved with

More information

Zero-Chunk: An Efficient Cache Algorithm to Accelerate the I/O Processing of Data Deduplication

Zero-Chunk: An Efficient Cache Algorithm to Accelerate the I/O Processing of Data Deduplication 2016 IEEE 22nd International Conference on Parallel and Distributed Systems Zero-Chunk: An Efficient Cache Algorithm to Accelerate the I/O Processing of Data Deduplication Hongyuan Gao 1, Chentao Wu 1,

More information

Multi-level Selective Deduplication for VM Snapshots in Cloud Storage

Multi-level Selective Deduplication for VM Snapshots in Cloud Storage Multi-level Selective Deduplication for VM Snapshots in Cloud Storage Wei Zhang, Hong Tang, Hao Jiang, Tao Yang, Xiaogang Li, Yue Zeng Dept. of Computer Science, UC Santa Barbara. Email: {wei, tyang}@cs.ucsb.edu

More information

Name: Instructions. Problem 1 : Short answer. [48 points] CMU / Storage Systems 25 Feb 2009 Spring 2010 Exam 1

Name: Instructions. Problem 1 : Short answer. [48 points] CMU / Storage Systems 25 Feb 2009 Spring 2010 Exam 1 CMU 18-746/15-746 Storage Systems 25 Feb 2009 Spring 2010 Exam 1 Instructions Name: There are four (4) questions on the exam. You may find questions that could have several answers and require an explanation

More information

Byte Index Chunking Approach for Data Compression

Byte Index Chunking Approach for Data Compression Ider Lkhagvasuren 1, Jung Min So 1, Jeong Gun Lee 1, Chuck Yoo 2, Young Woong Ko 1 1 Dept. of Computer Engineering, Hallym University Chuncheon, Korea {Ider555, jso, jeonggun.lee, yuko}@hallym.ac.kr 2

More information

Purity: building fast, highly-available enterprise flash storage from commodity components

Purity: building fast, highly-available enterprise flash storage from commodity components Purity: building fast, highly-available enterprise flash storage from commodity components J. Colgrove, J. Davis, J. Hayes, E. Miller, C. Sandvig, R. Sears, A. Tamches, N. Vachharajani, and F. Wang 0 Gala

More information

White paper ETERNUS CS800 Data Deduplication Background

White paper ETERNUS CS800 Data Deduplication Background White paper ETERNUS CS800 - Data Deduplication Background This paper describes the process of Data Deduplication inside of ETERNUS CS800 in detail. The target group consists of presales, administrators,

More information

Data Deduplication Methods for Achieving Data Efficiency

Data Deduplication Methods for Achieving Data Efficiency Data Deduplication Methods for Achieving Data Efficiency Matthew Brisse, Quantum Gideon Senderov, NEC... SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies

More information

IBM Real-time Compression and ProtecTIER Deduplication

IBM Real-time Compression and ProtecTIER Deduplication Compression and ProtecTIER Deduplication Two technologies that work together to increase storage efficiency Highlights Reduce primary storage capacity requirements with Compression Decrease backup data

More information

Building a High-performance Deduplication System

Building a High-performance Deduplication System Building a High-performance Deduplication System Fanglu Guo Petros Efstathopoulos Symantec Research Labs Symantec Corporation, Culver City, CA, USA Abstract Modern deduplication has become quite effective

More information

TIBX NEXT-GENERATION ARCHIVE FORMAT IN ACRONIS BACKUP CLOUD

TIBX NEXT-GENERATION ARCHIVE FORMAT IN ACRONIS BACKUP CLOUD TIBX NEXT-GENERATION ARCHIVE FORMAT IN ACRONIS BACKUP CLOUD 1 Backup Speed and Reliability Are the Top Data Protection Mandates What are the top data protection mandates from your organization s IT leadership?

More information

File system internals Tanenbaum, Chapter 4. COMP3231 Operating Systems

File system internals Tanenbaum, Chapter 4. COMP3231 Operating Systems File system internals Tanenbaum, Chapter 4 COMP3231 Operating Systems Architecture of the OS storage stack Application File system: Hides physical location of data on the disk Exposes: directory hierarchy,

More information

EMC for Mainframe Tape on Disk Solutions

EMC for Mainframe Tape on Disk Solutions EMC for Mainframe Tape on Disk Solutions May 2012 zmainframe Never trust a computer you can lift! 1 EMC & Bus-Tech for Mainframe EMC supports mainframe systems since 1990 with first integrated cached disk

More information

Configuration Guide for Veeam Backup & Replication with the HPE Hyper Converged 250 System

Configuration Guide for Veeam Backup & Replication with the HPE Hyper Converged 250 System Configuration Guide for Veeam Backup & Replication with the HPE Hyper Converged 250 System 1 + 1 = 3 HPE + Veeam Better Together Contents Intended audience...3 Veeam Backup & Replication overview...3 Adding

More information

The World s Fastest Backup Systems

The World s Fastest Backup Systems 3 The World s Fastest Backup Systems Erwin Freisleben BRS Presales Austria 4 EMC Data Domain: Leadership and Innovation A history of industry firsts 2003 2004 2005 2006 2007 2008 2009 2010 2011 First deduplication

More information

Deduplication: The hidden truth and what it may be costing you

Deduplication: The hidden truth and what it may be costing you Deduplication: The hidden truth and what it may be costing you Not all deduplication technologies are created equal. See why choosing the right one can save storage space by up to a factor of 10. By Adrian

More information

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu Database Architecture 2 & Storage Instructor: Matei Zaharia cs245.stanford.edu Summary from Last Time System R mostly matched the architecture of a modern RDBMS» SQL» Many storage & access methods» Cost-based

More information

Cumulus: Filesystem Backup to the Cloud

Cumulus: Filesystem Backup to the Cloud Cumulus: Filesystem Backup to the Cloud 7th USENIX Conference on File and Storage Technologies (FAST 09) Michael Vrable Stefan Savage Geoffrey M. Voelker University of California, San Diego February 26,

More information

Chapter 10 Protecting Virtual Environments

Chapter 10 Protecting Virtual Environments Chapter 10 Protecting Virtual Environments 164 - Protecting Virtual Environments As more datacenters move to virtualize their environments and the number of virtual machines and the physical hosts they

More information

Big Table. Google s Storage Choice for Structured Data. Presented by Group E - Dawei Yang - Grace Ramamoorthy - Patrick O Sullivan - Rohan Singla

Big Table. Google s Storage Choice for Structured Data. Presented by Group E - Dawei Yang - Grace Ramamoorthy - Patrick O Sullivan - Rohan Singla Big Table Google s Storage Choice for Structured Data Presented by Group E - Dawei Yang - Grace Ramamoorthy - Patrick O Sullivan - Rohan Singla Bigtable: Introduction Resembles a database. Does not support

More information

A. Deduplication rate is less than expected, accounting for the remaining GSAN capacity

A. Deduplication rate is less than expected, accounting for the remaining GSAN capacity Volume: 326 Questions Question No: 1 An EMC Avamar customer s Gen-1 system with 4 TB of GSAN capacity has reached read-only threshold. The customer indicates that the deduplicated backup data accounts

More information

Can t We All Get Along? Redesigning Protection Storage for Modern Workloads

Can t We All Get Along? Redesigning Protection Storage for Modern Workloads Can t We All Get Along? Redesigning Protection Storage for Modern Workloads Yamini Allu, Fred Douglis, Mahesh Kamat, Ramya Prabhakar, Philip Shilane, and Rahul Ugale, Dell EMC https://www.usenix.org/conference/atc18/presentation/allu

More information

File system internals Tanenbaum, Chapter 4. COMP3231 Operating Systems

File system internals Tanenbaum, Chapter 4. COMP3231 Operating Systems File system internals Tanenbaum, Chapter 4 COMP3231 Operating Systems Summary of the FS abstraction User's view Hierarchical structure Arbitrarily-sized files Symbolic file names Contiguous address space

More information

Veeam and HP: Meet your backup data protection goals

Veeam and HP: Meet your backup data protection goals Sponsored by Veeam and HP: Meet your backup data protection goals Eric Machabert Сonsultant and virtualization expert Introduction With virtualization systems becoming mainstream in recent years, backups

More information

bup: the git-based backup system Avery Pennarun

bup: the git-based backup system Avery Pennarun bup: the git-based backup system Avery Pennarun 2011 04 30 The Challenge Back up entire filesystems (> 1TB) Including huge VM disk images (files >100GB) Lots of separate files (500k or more) Calculate/store

More information

Erik Riedel Hewlett-Packard Labs

Erik Riedel Hewlett-Packard Labs Erik Riedel Hewlett-Packard Labs Greg Ganger, Christos Faloutsos, Dave Nagle Carnegie Mellon University Outline Motivation Freeblock Scheduling Scheduling Trade-Offs Performance Details Applications Related

More information

NetVault Backup Client and Server Sizing Guide 2.1

NetVault Backup Client and Server Sizing Guide 2.1 NetVault Backup Client and Server Sizing Guide 2.1 Recommended hardware and storage configurations for NetVault Backup 10.x and 11.x September, 2017 Page 1 Table of Contents 1. Abstract... 3 2. Introduction...

More information

A Survey and Classification of Storage Deduplication Systems

A Survey and Classification of Storage Deduplication Systems 11 A Survey and Classification of Storage Deduplication Systems JOÃO PAULO and JOSÉ PEREIRA, High-Assurance Software Lab (HASLab), INESC TEC & University of Minho The automatic elimination of duplicate

More information

1 of 8 10/10/2018, 12:52 PM RM-01, 10/10/2018. * Required. 1. Agency Name: * 2. Fiscal year reported: * 3. Date: *

1 of 8 10/10/2018, 12:52 PM RM-01, 10/10/2018. * Required. 1. Agency Name: * 2. Fiscal year reported: * 3. Date: * 1 of 8 10/10/2018, 12:52 PM RM-01, 10/10/2018 * Required 1. Agency Name: * 2. Fiscal year reported: * 3. Date: * Example: December 15, 2012 4. Name of agency staff member completing this report: * The

More information

Your World is Hybrid: Protecting your VMs with Veeam and HPE Storage. Federico Venier HPE Storage Technical marketing

Your World is Hybrid: Protecting your VMs with Veeam and HPE Storage. Federico Venier HPE Storage Technical marketing Your World is Hybrid: Protecting your s with and HPE Storage Federico Venier HPE Storage Technical marketing federico.venier@hpe.com HPE & : Integration summary SQL Applications HPE + License Reselling

More information

SAP HANA Inspirience Day Workshop SAP HANA Infra. René Witteveen Master ASE Converged Infrastructure, HP

SAP HANA Inspirience Day Workshop SAP HANA Infra. René Witteveen Master ASE Converged Infrastructure, HP SAP HANA Inspirience Day Workshop SAP HANA Infra René Witteveen Master ASE Converged Infrastructure, HP Workshop Outline. Introduction Name and Company Department/Role Status and experience with SAP HANA

More information

P-Dedupe: Exploiting Parallelism in Data Deduplication System

P-Dedupe: Exploiting Parallelism in Data Deduplication System 2012 IEEE Seventh International Conference on Networking, Architecture, and Storage P-Dedupe: Exploiting Parallelism in Data Deduplication System Wen Xia, Hong Jiang, Dan Feng,*, Lei Tian, Min Fu, Zhongtao

More information

Mainframe Virtual Tape: Improve Operational Efficiencies and Mitigate Risk in the Data Center

Mainframe Virtual Tape: Improve Operational Efficiencies and Mitigate Risk in the Data Center Mainframe Virtual Tape: Improve Operational Efficiencies and Mitigate Risk in the Data Center Ralph Armstrong EMC Backup Recovery Systems August 11, 2011 Session # 10135 Agenda Mainframe Tape Use Cases

More information

See what s new: Data Domain Global Deduplication Array, DD Boost and more. Copyright 2010 EMC Corporation. All rights reserved.

See what s new: Data Domain Global Deduplication Array, DD Boost and more. Copyright 2010 EMC Corporation. All rights reserved. See what s new: Data Domain Global Deduplication Array, DD Boost and more 2010 1 EMC Backup Recovery Systems (BRS) Division EMC Competitor Competitor Competitor Competitor Competitor Competitor Competitor

More information

Oracle Zero Data Loss Recovery Appliance (ZDLRA)

Oracle Zero Data Loss Recovery Appliance (ZDLRA) Oracle Zero Data Loss Recovery Appliance (ZDLRA) Overview Attila Mester Principal Sales Consultant Data Protection Copyright 2015, Oracle and/or its affiliates. All rights reserved. Safe Harbor Statement

More information

QuickSpecs. PCIe Solid State Drives for HP Workstations

QuickSpecs. PCIe Solid State Drives for HP Workstations Overview Introduction Storage technology with NAND media is outgrowing the bandwidth limitations of the SATA bus. New high performance Storage solutions will connect directly to the PCIe bus for revolutionary

More information

CS 550 Operating Systems Spring File System

CS 550 Operating Systems Spring File System 1 CS 550 Operating Systems Spring 2018 File System 2 OS Abstractions Process: virtualization of CPU Address space: virtualization of memory The above to allow a program to run as if it is in its own private,

More information

Efficient Hybrid Inline and Out-of-Line Deduplication for Backup Storage

Efficient Hybrid Inline and Out-of-Line Deduplication for Backup Storage Efficient Hybrid Inline and Out-of-Line Deduplication for Backup Storage YAN-KIT LI, MIN XU, CHUN-HO NG, and PATRICK P. C. LEE, The Chinese University of Hong Kong 2 Backup storage systems often remove

More information

When failure is not an option HP NonStop BackBox VTC

When failure is not an option HP NonStop BackBox VTC When failure is not an option HP NonStop BackBox VTC Sylvain Tétreault ETI-NET April 29 Th, 2015 Agenda Data Protection is not optional HP NonStop BackBox VTC How HP NonStop BackBox VTC works Today s world

More information

SMORE: A Cold Data Object Store for SMR Drives

SMORE: A Cold Data Object Store for SMR Drives SMORE: A Cold Data Object Store for SMR Drives Peter Macko, Xiongzi Ge, John Haskins Jr.*, James Kelley, David Slik, Keith A. Smith, and Maxim G. Smith Advanced Technology Group NetApp, Inc. * Qualcomm

More information

Copyright 2010 EMC Corporation. Do not Copy - All Rights Reserved.

Copyright 2010 EMC Corporation. Do not Copy - All Rights Reserved. 1 Using patented high-speed inline deduplication technology, Data Domain systems identify redundant data as they are being stored, creating a storage foot print that is 10X 30X smaller on average than

More information

SPECIFICATION FOR NETWORK ATTACHED STORAGE (NAS) TO BE FILLED BY BIDDER. NAS Controller Should be rack mounted with a form factor of not more than 2U

SPECIFICATION FOR NETWORK ATTACHED STORAGE (NAS) TO BE FILLED BY BIDDER. NAS Controller Should be rack mounted with a form factor of not more than 2U SPECIFICATION FOR NETWORK ATTACHED STORAGE (NAS) TO BE FILLED BY BIDDER S.No. Features Qualifying Minimum Requirements No. of Storage 1 Units 2 Make Offered 3 Model Offered 4 Rack mount 5 Processor 6 Memory

More information

TCO REPORT. NAS File Tiering. Economic advantages of enterprise file management

TCO REPORT. NAS File Tiering. Economic advantages of enterprise file management TCO REPORT NAS File Tiering Economic advantages of enterprise file management Executive Summary Every organization is under pressure to meet the exponential growth in demand for file storage capacity.

More information

STOREONCE OVERVIEW. Neil Fleming Mid-Market Storage Development Manager. Copyright 2010 Hewlett-Packard Development Company, L.P.

STOREONCE OVERVIEW. Neil Fleming Mid-Market Storage Development Manager. Copyright 2010 Hewlett-Packard Development Company, L.P. STOREONCE OVERVIEW Neil Fleming Neil.Fleming@HP.com Mid-Market Development Manager 1 DETERMINE YOUR RECOVERY NEEDS Wks Days Hrs Mins Secs Secs Mins Hrs Days Wks TAPE D2D Recovery Point SNAP SNAP Recovery

More information

Iomega REV Drive Data Transfer Performance

Iomega REV Drive Data Transfer Performance Technical White Paper March 2004 Iomega REV Drive Data Transfer Performance Understanding Potential Transfer Rates and Factors Affecting Throughput Introduction Maximum Sustained Transfer Rate Burst Transfer

More information

HP StorageWorks VLS and D2D solutions guide

HP StorageWorks VLS and D2D solutions guide HP StorageWorks VLS and D2D solutions guide Design guidelines for virtual tape libraries with deduplication and replication This document describes the HP StorageWorks VLS and D2D systems and their concepts,

More information

Pharmacy college.. Assist.Prof. Dr. Abdullah A. Abdullah

Pharmacy college.. Assist.Prof. Dr. Abdullah A. Abdullah The kinds of memory:- 1. RAM(Random Access Memory):- The main memory in the computer, it s the location where data and programs are stored (temporally). RAM is volatile means that the data is only there

More information

Backup Exec 20.1 Tuning and Performance Guide

Backup Exec 20.1 Tuning and Performance Guide Backup Exec 20.1 Tuning and Performance Guide Documentation version: Backup Exec 20.1 Legal Notice Copyright 2018 Veritas Technologies LLC. All rights reserved. Veritas and the Veritas Logo are trademarks

More information

HPDedup: A Hybrid Prioritized Data Deduplication Mechanism for Primary Storage in the Cloud

HPDedup: A Hybrid Prioritized Data Deduplication Mechanism for Primary Storage in the Cloud HPDedup: A Hybrid Prioritized Data Deduplication Mechanism for Primary Storage in the Cloud Huijun Wu 1,4, Chen Wang 2, Yinjin Fu 3, Sherif Sakr 1, Liming Zhu 1,2 and Kai Lu 4 The University of New South

More information

DUE to the explosive growth of the digital data, data

DUE to the explosive growth of the digital data, data 1162 IEEE TRANSACTIONS ON COMPUTERS, VOL. 64, NO. 4, APRIL 2015 Similarity and Locality Based Indexing for High Performance Data Deduplication Wen Xia, Hong Jiang, Senior Member, IEEE, Dan Feng, Member,

More information

Optimizing Parallel Access to the BaBar Database System Using CORBA Servers

Optimizing Parallel Access to the BaBar Database System Using CORBA Servers SLAC-PUB-9176 September 2001 Optimizing Parallel Access to the BaBar Database System Using CORBA Servers Jacek Becla 1, Igor Gaponenko 2 1 Stanford Linear Accelerator Center Stanford University, Stanford,

More information

CS252 S05. CMSC 411 Computer Systems Architecture Lecture 18 Storage Systems 2. I/O performance measures. I/O performance measures

CS252 S05. CMSC 411 Computer Systems Architecture Lecture 18 Storage Systems 2. I/O performance measures. I/O performance measures CMSC 411 Computer Systems Architecture Lecture 18 Storage Systems 2 I/O performance measures I/O performance measures diversity: which I/O devices can connect to the system? capacity: how many I/O devices

More information

Virtualization Selling with IBM Tape

Virtualization Selling with IBM Tape Virtualization Selling with IBM Tape Thepvitoon Kultumyotin (thepvito@th.ibm.com) Agenda Introduction IBM Tape Portfolio Virtual Tape Virtual Tape Concepts IBM TS7530 Product Overview IBM De-Duplication

More information

IBM System Storage LTO Ultrium 6 Tape Drive Performance White Paper

IBM System Storage LTO Ultrium 6 Tape Drive Performance White Paper IBM System Storage September, 212 IBM System Storage LTO Ultrium 6 Tape Drive Performance White Paper By Rogelio Rivera, Tape Performance Gustavo Vargas, Tape Performance Marco Vázquez, Tape Performance

More information

Design Tradeoffs for Data Deduplication Performance in Backup Workloads

Design Tradeoffs for Data Deduplication Performance in Backup Workloads Design Tradeoffs for Data Deduplication Performance in Backup Workloads Min Fu, Dan Feng, and Yu Hua, Huazhong University of Science and Technology; Xubin He, Virginia Commonwealth University; Zuoning

More information

Map-Reduce. Marco Mura 2010 March, 31th

Map-Reduce. Marco Mura 2010 March, 31th Map-Reduce Marco Mura (mura@di.unipi.it) 2010 March, 31th This paper is a note from the 2009-2010 course Strumenti di programmazione per sistemi paralleli e distribuiti and it s based by the lessons of

More information