Design Tradeoffs for Data Deduplication Performance in Backup Workloads

Similar documents
Design Tradeoffs for Data Deduplication Performance in Backup Workloads

Accelerating Restore and Garbage Collection in Deduplication-based Backup Systems via Exploiting Historical Information

Reducing The De-linearization of Data Placement to Improve Deduplication Performance

FGDEFRAG: A Fine-Grained Defragmentation Approach to Improve Restore Performance

ChunkStash: Speeding Up Storage Deduplication using Flash Memory

A DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU

DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE

Parallelizing Inline Data Reduction Operations for Primary Storage Systems

Understanding Primary Storage Optimization Options Jered Floyd Permabit Technology Corp.

HPDedup: A Hybrid Prioritized Data Deduplication Mechanism for Primary Storage in the Cloud

FGDEFRAG: A Fine-Grained Defragmentation Approach to Improve Restore Performance

In-line Deduplication for Cloud storage to Reduce Fragmentation by using Historical Knowledge

MIGRATORY COMPRESSION Coarse-grained Data Reordering to Improve Compressibility

DUE to the explosive growth of the digital data, data

Speeding Up Cloud/Server Applications Using Flash Memory

A study of practical deduplication

WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression

Computer Architecture and System Software Lecture 09: Memory Hierarchy. Instructor: Rob Bergen Applied Computer Science University of Winnipeg

A Scalable Inline Cluster Deduplication Framework for Big Data Protection

DEC: An Efficient Deduplication-Enhanced Compression Approach

ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency

Using Transparent Compression to Improve SSD-based I/O Caches

Course Outline. Processes CPU Scheduling Synchronization & Deadlock Memory Management File Systems & I/O Distributed Systems

Introduction to OpenMP. Lecture 10: Caches

Deploying De-Duplication on Ext4 File System

Delta Compressed and Deduplicated Storage Using Stream-Informed Locality

Zero-Chunk: An Efficient Cache Algorithm to Accelerate the I/O Processing of Data Deduplication

Compression and Decompression of Virtual Disk Using Deduplication

Cheetah: An Efficient Flat Addressing Scheme for Fast Query Services in Cloud Computing

Efficient Hybrid Inline and Out-of-Line Deduplication for Backup Storage

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review

Cray XE6 Performance Workshop

Can t We All Get Along? Redesigning Protection Storage for Modern Workloads

Flash-Conscious Cache Population for Enterprise Database Workloads

Chapter 8 & Chapter 9 Main Memory & Virtual Memory

Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures

vsan All Flash Features First Published On: Last Updated On:

GearDB: A GC-free Key-Value Store on HM-SMR Drives with Gear Compaction

Recall from Tuesday. Our solution to fragmentation is to split up a process s address space into smaller chunks. Physical Memory OS.

Column Stores vs. Row Stores How Different Are They Really?

Optimizing for Recovery

The What, Why and How of the Pure Storage Enterprise Flash Array. Ethan L. Miller (and a cast of dozens at Pure Storage)

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

CS 5523 Operating Systems: Memory Management (SGG-8)

An Application Awareness Local Source and Global Source De-Duplication with Security in resource constraint based Cloud backup services

Operating Systems. Designed and Presented by Dr. Ayman Elshenawy Elsefy

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

UCS Invicta: A New Generation of Storage Performance. Mazen Abou Najm DC Consulting Systems Engineer

Don t stack your Log on my Log

Reducing Replication Bandwidth for Distributed Document Databases

arxiv: v3 [cs.dc] 27 Jun 2013

CS 31: Intro to Systems Virtual Memory. Kevin Webb Swarthmore College November 15, 2018

Assignment 1 due Mon (Feb 4pm

Operating Systems and Computer Networks. Memory Management. Dr.-Ing. Pascal A. Klein

The Logic of Physical Garbage Collection in Deduplicating Storage

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Chapter 9: Virtual Memory. Operating System Concepts 9 th Edition

DEBAR: A Scalable High-Performance Deduplication Storage System for Backup and Archiving

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Basic Memory Management. Basic Memory Management. Address Binding. Running a user program. Operating Systems 10/14/2018 CSC 256/456 1

Memory Management Topics. CS 537 Lecture 11 Memory. Virtualizing Resources

Distributed File Systems II

Main-Memory Databases 1 / 25

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

Sparse Indexing: Large-Scale, Inline Deduplication Using Sampling and Locality

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Introducing Tegile. Company Overview. Product Overview. Solutions & Use Cases. Partnering with Tegile

Flashed-Optimized VPSA. Always Aligned with your Changing World

Scale-out Data Deduplication Architecture

CS 426 Parallel Computing. Parallel Computing Platforms

Recall: Address Space Map. 13: Memory Management. Let s be reasonable. Processes Address Space. Send it to disk. Freeing up System Memory

Database Systems II. Secondary Storage

C has been and will always remain on top for performancecritical

Cluster and Single-Node Analysis of Long-Term Deduplication Paerns

Virtual Memory 1. Virtual Memory

Virtual Memory 1. Virtual Memory

SmartMD: A High Performance Deduplication Engine with Mixed Pages

EFFICIENTLY ENABLING CONVENTIONAL BLOCK SIZES FOR VERY LARGE DIE- STACKED DRAM CACHES

vsansparse Tech Note First Published On: Last Updated On:

HEAD HardwarE Accelerated Deduplication

Main Memory (Part II)

dedupv1: Improving Deduplication Throughput using Solid State Drives (SSD)

[537] Fast File System. Tyler Harter

Deduplication Storage System

HYDRAstor: a Scalable Secondary Storage

Chapter 8: Virtual Memory. Operating System Concepts

Locality and The Fast File System. Dongkun Shin, SKKU

The Fusion Distributed File System

How to Reduce Data Capacity in Objectbased Storage: Dedup and More

Correlation based File Prefetching Approach for Hadoop

Towards Weakly Consistent Local Storage Systems

Cumulus: Filesystem Backup to the Cloud

Demand Code Paging for NAND Flash in MMU-less Embedded Systems. Jose Baiocchi and Bruce Childers

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees

The Google File System (GFS)

IBM N Series. Store the maximum amount of data for the lowest possible cost. Matthias Rettl Systems Engineer NetApp Austria GmbH IBM Corporation

Operating Systems Memory Management. Mathieu Delalandre University of Tours, Tours city, France

BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE

Rethinking Deduplication Scalability

Transcription:

Design Tradeoffs for Data Deduplication Performance in Backup Workloads Min Fu,DanFeng,YuHua,XubinHe, Zuoning Chen *, Wen Xia,YuchengZhang,YujuanTan Huazhong University of Science and Technology Virginia Commonwealth University * National Engineering Research Center for Parallel Computer Chongqing University Feb. 19, 2015 Min Fu,DanFeng, Yu Hua, Xubin He,ZuoningChen Design Tradeoffs *, for Wen Data Xia Deduplication,YuchengZhang Performance,YujuanTan in Backup ( Feb. Huazhong Workloads 19, 2015 University 1 / of 32Sc

Background In big data era, we had 4.4 ZB of data in 2013, and expectedly to grow by 10-fold in 2020 [IDC 2014]; data redundancy widely exists in real-world applications. Data deduplication is a scalable compression technology non-overlapping chunking no byte-by-byte comparison (fingerprinting) A significantly lower computation overhead than traditional compression technologies faster due to coarse-grained compression higher compression ratio since it looks for duplicate chunks in a larger scope. (The entire system VS. a limited compression window) Min Fu,DanFeng, Yu Hua, Xubin He,ZuoningChen Design Tradeoffs *, for Wen Data Xia Deduplication,YuchengZhang Performance,YujuanTan in Backup ( Feb. Huazhong Workloads 19, 2015 University 2 / of 32Sc

An Overview of a Typical Deduplication System Min Fu,DanFeng, Yu Hua, Xubin He,ZuoningChen Design Tradeoffs *, for Wen Data Xia Deduplication,YuchengZhang Performance,YujuanTan in Backup ( Feb. Huazhong Workloads 19, 2015 University 3 / of 32Sc

Motivations Challenges to design an efficient deduplication system Chunking Indexing Defragmenting Restoring Garbage collecting... We have a huge number of papers, solutions, and design choices Which one is better? Better in what? backup performance, restore performance, deduplication ratio, or memory footprint? How about their interplays?. Min Fu,DanFeng, Yu Hua, Xubin He,ZuoningChen Design Tradeoffs *, for Wen Data Xia Deduplication,YuchengZhang Performance,YujuanTan in Backup ( Feb. Huazhong Workloads 19, 2015 University 4 / of 32Sc

...1 The Parameter Space...2 The DeFrame Framework...3 Evaluation...4 Conclusions Min Fu,DanFeng, Yu Hua, Xubin He,ZuoningChen Design Tradeoffs *, for Wen Data Xia Deduplication,YuchengZhang Performance,YujuanTan in Backup ( Feb. Huazhong Workloads 19, 2015 University 5 / of 32Sc

The Parameter Space In this paper, we mainly explore fingerprint indexing, rewriting algorithm (in-line defragmenting), restore algorithm (cache replacement), and their interplays. Parameter Space Descriptions Key-value mapping Mapping fingerprints to their prefetching units Fingerprint cache In-memory fingerprint cache Indexing Sampling Selecting representative fingerprints Segmenting Splitting the unit of logical locality Segment selection Selecting segments to be prefetched Segment prefetching Exploiting segment-level locality Defragmenting (rewriting) Reducing fragmentation Restoring Designing restore cache/algorithm Table: The major parameters we discuss. Min Fu,DanFeng, Yu Hua, Xubin He,ZuoningChen Design Tradeoffs *, for Wen Data Xia Deduplication,YuchengZhang Performance,YujuanTan in Backup ( Feb. Huazhong Workloads 19, 2015 University 6 / of 32Sc

Indexing Bottleneck A deduplication system requires a huge key-value store to identify duplicates An in-memory key-value store is not cost-efficient: Amazon.com: a 1 TB HDD costs $60, and an 8 GB DRAM costs $80 suppose 4KB-sized chunks and 32-byte-sized key-value pair indexing 1 TB unique date requires an 8 GB-sized key-value store An HDD-based key-value store easily becomes the performance bottleneck, compared to the fast CDC chunking (400 MB/s and 102,400 chunks per sec under commercial CPUs) Modern fingerprint indexes exploit workload characteristics (locality) of backup systems to prefetch and cache fingerprints Hence, a fingerprint index consists of two major components: key-value store fingerprint prefetching/caching module Min Fu,DanFeng, Yu Hua, Xubin He,ZuoningChen Design Tradeoffs *, for Wen Data Xia Deduplication,YuchengZhang Performance,YujuanTan in Backup ( Feb. Huazhong Workloads 19, 2015 University 7 / of 32Sc

The Fingerprint Index Taxonomy Figure: Categories of existing work. Classification according to the use of key-value store Exact Deduplication (ED): fully indexing stored fingerprints Near-exact Deduplication (ND): partially indexing stored fingerprints Classification according to the prefetching policy Logical Locality (LL): the chunk sequence before deduplication Physical Locality (PL): the physical layout of stored chunks Min Fu,DanFeng, Yu Hua, Xubin He,ZuoningChen Design Tradeoffs *, for Wen Data Xia Deduplication,YuchengZhang Performance,YujuanTan in Backup ( Feb. Huazhong Workloads 19, 2015 University 8 / of 32Sc

Exact vs. Near-exact Deduplication Exact Deduplication (ED) indexes all stored fingerprints a huge key-value store on disks fingerprint prefetching/caching to improve backup throughput Near-exact Deduplication (ND) indexes only sampled (representative) fingerprints asmallkey-valuestoreindram fingerprint prefetching/caching to improve deduplication ratio ND trades deduplication ratio for higher backup/restore performance and lower memory footprint Does a lower memory footprint indicate a lower financial cost? To avoid an increase of the storage cost, ND needs to achieve 97% of the deduplication ratio of ED. Min Fu,DanFeng, Yu Hua, Xubin He,ZuoningChen Design Tradeoffs *, for Wen Data Xia Deduplication,YuchengZhang Performance,YujuanTan in Backup ( Feb. Huazhong Workloads 19, 2015 University 9 / of 32Sc

Exploiting Physical Locality The key-value store maps a fingerprint to its physical location, i.e., a container. Weakness: the prefetching efficiency decreases over time due to the fragmentation problem. Older containers have many useless fingerprints for new backups. For Near-exact Deduplication, how to select (sample) representative fingerprints in each container? Selects the fingerprints that mod R = 0 in a container, or Selects the first fingerprint every R fingerprints in a container. Min Fu,DanFeng, Yu Hua, Xubin He,ZuoningChen Design Tradeoffs *, for Wen Data Xia Deduplication,YuchengZhang Performance,YujuanTan in Backup Feb. ( Huazhong Workloads 19, 2015University 10 / of 32Sc

Exploiting Logical Locality The key-value store maps a fingerprint to its logical location, i.e., a segment in a recipe. The segment serves as the prefetching unit Advantage: no fragmentation problem Weakness: extremely high update overhead to the key-value store Even duplicate fingerprints have new logical locations (in new recipes and new segments). Optimization: only update sampled duplicate fingerprints. How to segmenting and sampling? Min Fu,DanFeng, Yu Hua, Xubin He,ZuoningChen Design Tradeoffs *, for Wen Data Xia Deduplication,YuchengZhang Performance,YujuanTan in Backup Feb. ( Huazhong Workloads 19, 2015University 11 / of 32Sc

Design Choices for Exploiting Logical Locality BLC Extreme Binning Sparse Indexing SiLo Exact deduplication Yes No No No Segmenting method FSS FDS CDS FSS & FDS Sampling method N/A Minimum Random Minimum Segment selection Base Top-all Top-k Top-1 Segment prefetching Yes No No Yes Key-value mapping relationship 1:1 1:1 Varied 1:1 Table: Existing work exploiting logical locality. Segmenting: Fixed-Sized Segmenting (FSS), File-Defined Segmenting (FDS), and Content-Defined Segmenting (CDS) Sampling: Uniform, Random, and Minimum. Segment selection: Base, Top-k, and Mix. Segment prefetching: exploiting segment-level locality. Key-value mapping: each representative fingerprint can refer to a varied number of logical locations. Min Fu,DanFeng, Yu Hua, Xubin He,ZuoningChen Design Tradeoffs *, for Wen Data Xia Deduplication,YuchengZhang Performance,YujuanTan in Backup Feb. ( Huazhong Workloads 19, 2015University 12 / of 32Sc

Defragmenting and Rewriting Algorithm The rewriting algorithm is an emerging dimension to reduce fragmentation. What is the fragmentation? The deviation between the logical locality and physical locality. The fragmentation hurts the restore (read) performance, and the backup performance of the fingerprint index exploiting physical locality. Min Fu,DanFeng, Yu Hua, Xubin He,ZuoningChen Design Tradeoffs *, for Wen Data Xia Deduplication,YuchengZhang Performance,YujuanTan in Backup Feb. ( Huazhong Workloads 19, 2015University 13 / of 32Sc

Existing Rewriting Algorithms Buffer-based algorithm CFL-based Selective Deduplication [Nam 2012] Context-Based Rewriting [Kaczmarczyk 2012] Capping [Lillibridge 2013] History-aware algorithm History-Aware Rewriting [Fu 2014] How about their interplays with the state-of-the-art fingerprint indexes? How does the rewriting algorithm improve the fingerprint index exploiting physical locality? How do the different prefetching schemes affect the efficiency of the rewriting algorithm? Min Fu,DanFeng, Yu Hua, Xubin He,ZuoningChen Design Tradeoffs *, for Wen Data Xia Deduplication,YuchengZhang Performance,YujuanTan in Backup Feb. ( Huazhong Workloads 19, 2015University 14 / of 32Sc

The Restore Algorithm While the rewriting algorithm determines the chunk layout, the restore algorithm improves restore performance under limited memory. How to write, and then how to read. Existing restore algorithms: traditional LRU cache Belady s optimal replacement cache rolling forward assembly area [Lillibridge 2013] How about their interplays with the rewriting algorithm? Min Fu,DanFeng, Yu Hua, Xubin He,ZuoningChen Design Tradeoffs *, for Wen Data Xia Deduplication,YuchengZhang Performance,YujuanTan in Backup Feb. ( Huazhong Workloads 19, 2015University / of 32Sc

The DeFrame Architecture Fingerprint index: duplicate identification key-value store fingerprint prefetching/caching Container store: container management (physical locality) Recipe store: recipe management (logical locality) Min Fu,DanFeng, Yu Hua, Xubin He,ZuoningChen Design Tradeoffs *, for Wen Data Xia Deduplication,YuchengZhang Performance,YujuanTan in Backup Feb. ( Huazhong Workloads 19, 2015University 16 / of 32Sc

The Backup Pipeline Dedup phase identifies duplicate/unique chunks Rewrite phase identifies fragmented duplicate chunks Filter phase determines whether write a chunk Advantage: we can implement a new rewriting algorithm without the need to modify the fingerprint index, and vice versa. Min Fu,DanFeng, Yu Hua, Xubin He,ZuoningChen Design Tradeoffs *, for Wen Data Xia Deduplication,YuchengZhang Performance,YujuanTan in Backup Feb. ( Huazhong Workloads 19, 2015University 17 / of 32Sc

The Restore Pipeline Read recipe phase reads the recipe and output fingerprints Restore algorithm phase receives fingerprints and fetch chunks from the container store Reconstruct file phase receives the chunks and reconstruct files Min Fu,DanFeng, Yu Hua, Xubin He,ZuoningChen Design Tradeoffs *, for Wen Data Xia Deduplication,YuchengZhang Performance,YujuanTan in Backup Feb. ( Huazhong Workloads 19, 2015University 18 / of 32Sc

Garbage Collection Users can set a retention time for the backups. All expired backups will be deleted automatically by DeFrame. How to reclaim the invalid chunks becomes a major challenge. We develop History-Aware Rewriting algorithm to aggregate valid chunks into fewer containers We develop Container-Marker Algorithm to reclaim invalid containers. More details can be found in our ATC 14 paper. Min Fu,DanFeng, Yu Hua, Xubin He,ZuoningChen Design Tradeoffs *, for Wen Data Xia Deduplication,YuchengZhang Performance,YujuanTan in Backup Feb. ( Huazhong Workloads 19, 2015University 19 / of 32Sc

Experimental Setups Dataset name Kernel VMDK RDB Total size 104 GB 1.89 TB 1.12 TB # of versions 258 127 212 Deduplication ratio 45.28 27.36 39.1 Avg. chunk size 5.29 KB 5.25 KB 4.5 KB Self-reference < 1% 15-20% 0 Fragmentation Severe Moderate Severe Table: The characteristics of our datasets. Kernel: downloaded from kernel.org VMDK: 127 consecutive snapshots of a virtual machine disk image RDB: 212 consecutive snapshots of a Redis database Min Fu,DanFeng, Yu Hua, Xubin He,ZuoningChen Design Tradeoffs *, for Wen Data Xia Deduplication,YuchengZhang Performance,YujuanTan in Backup Feb. ( Huazhong Workloads 19, 2015University / of 32Sc

Metrics and Our Goal. Quantitative Metrics.. Deduplication ratio: the original backup data size divided by the size of stored data.... Memory footprint: the runtime DRAM consumption. Storage cost: the total cost of HDDs and DRAM for stored chunks and the fingerprint index. Lookup/update request per GB: the number of lookup/update requests to the key-value store to deduplicate 1 GB of data. Restore speed: 1 divided by mean containers read per MB of restored data. It is practically impossible to find a solution that performs the best in all metrics. We aim to find a solution with the following properties: sustained, high backup performance as the top priority. reasonable tradeoffs in the remaining metrics. Min Fu,DanFeng, Yu Hua, Xubin He,ZuoningChen Design Tradeoffs *, for Wen Data Xia Deduplication,YuchengZhang Performance,YujuanTan in Backup Feb. ( Huazhong Workloads 19, 2015University 21 / of 32Sc

EDPL vs. EDLL EDPL EDLL(R=16) EDLL(R=32) EDLL(R=64) EDLL(R=128) EDLL(R=256) EDPL EDLL(R=16) EDLL(R=32) EDLL(R=64) EDLL(R=128) EDLL(R=256) lookup requests per GB 1400 1200 1000 800 600 400 200 0 0 50 100 150 200 backup version (a) Lookup overhead update requests per GB 100000 10000 1000 50 100 150 200 backup version (b) Update overhead Figure: Comparisons between EDPL and EDLL in terms of lookup and update overheads. R = 256 indicates a sampling ratio of 256:1. Results come from RDB. EDPL suffers from the ever-increasing lookup overhead. For EDLL, the sampling optimization is efficient. Min Fu,DanFeng, Yu Hua, Xubin He,ZuoningChen Design Tradeoffs *, for Wen Data Xia Deduplication,YuchengZhang Performance,YujuanTan in Backup Feb. ( Huazhong Workloads 19, 2015University 22 / of 32Sc

NDPL vs. NDLL (a) NDPL (b) NDLL Figure: Comparing NDPL and NDLL under different cache sizes. The Y-axis shows the relative deduplication ratio to exact deduplication. NDLL performs better in Kernel and RDB, but worse in VMDK than NDPL. Min Fu,DanFeng, Yu Hua, Xubin He,ZuoningChen Design Tradeoffs *, for Wen Data Xia Deduplication,YuchengZhang Performance,YujuanTan in Backup Feb. ( Huazhong Workloads 19, 2015University 23 / of 32Sc

The Interplays Between Fingerprint Index and Rewriting Algorithm lookup requests per GB 1800 1600 1400 1200 1000 800 600 400 200 0 EDPL EDLL EDPL + HAR EDLL + HAR 0 50 100 150 200 250 backup versions (a) relative deduplication ratio 100% 80% 60% 40% 20% 0% EDPL NDPL EDLL NDLL Kernel RDB VMDK (b) Figure: (a) How does HAR improve EDPL in terms of lookup overhead in Kernel? (b) How does fingerprint index affect HAR? The Y-axis shows the relative deduplication ratio to that of exact deduplication without rewriting. Exact Deduplication exploiting Physical Locality (EDPL) has the best interplays with the rewriting algorithm (HAR). Min Fu,DanFeng, Yu Hua, Xubin He,ZuoningChen Design Tradeoffs *, for Wen Data Xia Deduplication,YuchengZhang Performance,YujuanTan in Backup Feb. ( Huazhong Workloads 19, 2015University 24 / of 32Sc

The Interplays Between Rewriting and Restore Algorithm LRU w/o HAR ASM w/o HAR OPT w/o HAR LRU + HAR ASM + HAR OPT + HAR restore speed 4 3.5 3 2.5 2 1.5 1 0.5 0 0 50 100 150 200 250 backup version Figure: EDPL is used as the fingerprint index When HAR is used, the optimal cache is better; otherwise, the rolling forward assembly area is better. Min Fu,DanFeng, Yu Hua, Xubin He,ZuoningChen Design Tradeoffs *, for Wen Data Xia Deduplication,YuchengZhang Performance,YujuanTan in Backup Feb. ( Huazhong Workloads 19, 2015University 25 / of 32Sc

Conclusions We propose a taxonomy to understand the parameter space of data deduplication. We design and implement a framework to evaluate the parameter space. We present our experimental results, and draw the following recommendations. Subspace Recommended parameter settings Advantages EDLL NDPL NDLL EDPL content-defined segmenting random sampling uniform sampling content-defined segmenting similarity detection & segment prefetching an efficient rewriting algorithm lowest storage cost sustained backup performance lowest memory footprint simplest logical frame lowest memory footprint high deduplication ratio sustained high restore performance good interplays with the rewriting algorithm Table: How to choose a reasonable solution according to required tradeoff. Min Fu,DanFeng, Yu Hua, Xubin He,ZuoningChen Design Tradeoffs *, for Wen Data Xia Deduplication,YuchengZhang Performance,YujuanTan in Backup Feb. ( Huazhong Workloads 19, 2015University 26 / of 32Sc

Thank You! Q&A DeFrame is released at www.github.com/fomy/destor Min Fu,DanFeng, Yu Hua, Xubin He,ZuoningChen Design Tradeoffs *, for Wen Data Xia Deduplication,YuchengZhang Performance,YujuanTan in Backup Feb. ( Huazhong Workloads 19, 2015University 27 / of 32Sc

Exploiting Similarity for NDLL Figure: This figure shows the workflow of the Top-k similarity detection. Observations: NDLLworksbetterthanNDPLindatasetswhere self-references are rare, but worse in datasets where self-references are common. Self-references are duplicates in a single file (backup). We could exploit similarity to improve deduplication ratio. Advantage: higher deduplication ratio than the Base procedure. Weakness: more complicated procedure and an additional buffer, compared to the Base procedure. Min Fu,DanFeng, Yu Hua, Xubin He,ZuoningChen Design Tradeoffs *, for Wen Data Xia Deduplication,YuchengZhang Performance,YujuanTan in Backup Feb. ( Huazhong Workloads 19, 2015University 28 / of 32Sc

The efficiency of similarity detection Figure: Impacts of the segment selection (s), segment prefetching (p), and mapping relationship (v) on deduplication ratio. On the X-axis, we have parameters in the format (s, p, v). s could be Base and Top-k (k varies from 1 to 4). p varies from 1 to 4. v varies from 1 to 4. They finally achieve 90% of the deduplication ratio of exact deduplication. Min Fu,DanFeng, Yu Hua, Xubin He,ZuoningChen Design Tradeoffs *, for Wen Data Xia Deduplication,YuchengZhang Performance,YujuanTan in Backup Feb. ( Huazhong Workloads 19, 2015University 29 / of 32Sc

The efficiency of similarity detection Figure: Impacts of the segment selection (s), segment prefetching (p), and mapping relationship (v) on segments read. The segment prefetching is complementary with Top-k. Min Fu,DanFeng, Yu Hua, Xubin He,ZuoningChen Design Tradeoffs *, for Wen Data Xia Deduplication,YuchengZhang Performance,YujuanTan in Backup Feb. ( Huazhong Workloads 19, 2015University 30 / of 32Sc

Storage cost Dataset Fraction EDPL/EDLL NDPL-64 NDPL-128 NDPL-256 NDLL-64 NDLL-128 NDLL-256 DRAM 1.33% 0.83% 0.49% 0.31% 0.66% 0.34% 0.16% Kernel HDD 57.34% 65.01% 70.56% 77.58% 59.03% 59.83% 60.23% Total 58.67% 65.84% 71.04% 77.89% 59.69% 60.17% 60.39% DRAM 1.40% 0.83% 0.48% 0.31% 0.70% 0.35% 0.17% RDB HDD 55.15% 61.25% 66.08% 73.58% 55.27% 55.34% 55.65% Total 56.55% 62.07% 66.56% 73.89% 55.97% 55.69% 55.82% DRAM 1.41% 0.82% 0.45% 0.27% 0.71% 0.35% 0.18% VMDK HDD 54.86% 60.32% 63.16% 67.10% 59.79% 62.92% 71.24% Total 56.27% 61.14% 63.61% 67.36% 60.49% 63.27% 71.42% Table: The storage costs relative to the baseline which indexes all fingerprints in DRAM. NDPL-128 is NDPL of a 128:1 uniform sampling ratio. Via exploiting locality, the storage cost reduces by about 40%. Near-exact deduplication reduces the memory footprint, however it generally increases the total storage cost. Min Fu,DanFeng, Yu Hua, Xubin He,ZuoningChen Design Tradeoffs *, for Wen Data Xia Deduplication,YuchengZhang Performance,YujuanTan in Backup Feb. ( Huazhong Workloads 19, 2015University 31 / of 32Sc

Choosing Sampling Method in NDPL memory footprint (KB) 1e+06 100000 10000 uniform random 1000 0 5 10 15 20 25 30 deduplication ratio Figure: Impacts of varying sampling method on NDPL. Points in each line are of different sampling ratios, which are 256, 128, 64, 32, 16, and 1 from left to right. The uniform sampling achieves significantly better deduplication ratio than the random sampling. Min Fu,DanFeng, Yu Hua, Xubin He,ZuoningChen Design Tradeoffs *, for Wen Data Xia Deduplication,YuchengZhang Performance,YujuanTan in Backup Feb. ( Huazhong Workloads 19, 2015University 32 / of 32Sc