A DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU
|
|
- Randell Sanders
- 5 years ago
- Views:
Transcription
1 A DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU PRESENTED BY ROMAN SHOR
2 Overview Technics of data reduction in storage systems: Traditional compression Huffman coding and dictionary coding (e.g. GZIP) Data deduplication eliminate redundancy at the chunk/file level Delta compression removes redundancy among non-duplicate but very similar data files and chunks 2
3 Spot the difference A A Delta 3
4 Delta compression Source + Target Delta Δ Δ + Source Reverse Delta Target 4
5 Outline Background and motivation Design and implementation Performance evaluation Conclusions 5
6 Motivation Delta Compression Deduplication Target Similar Data Duplicate data Granularity String Chunk\File Scalability Weak Strong Delta compression can eliminate more redundancy among non-duplicate but similar chunks (about 2-3X) Uses of delta compression: Dropbox reduce bandwidth requirement by sending only the delta updates. I-CASH save space and enlarges logical space of SSD caches. Difference Engine save memory by sub page level sharing. 6
7 Delta algorithms Insert/delete delta algorithms use a longest common subsequence (LCS) algorithm, to compute an edit script that modifies the source version into the target version. Copy/insert locate matching offsets in the source and target, then emit a sequence of copy instructions for each matching range and insert instructions to cover the unmatched regions. Source= proxy cache Target= cache proxy Insert/delete: insert( cache ), retain(0,5), delete(5,6) Copy/insert : copy(6,5), insert( ), copy(0,5) 7
8 Copy/insert delta Building index Source Hash Offset 9dc6 8 b9b7 6 8
9 Copy/insert delta Searching for matches Target Hash Offset 9dc6 8 Source 9
10 Challenges Locating duplicate and similar data chunks and calculating the differences among similar data chunks. Byte-wise sliding window to identify matched strings is very time-consuming. Average delta encoding speed of similar chunks falls in the range of 30 90MB/s. 10
11 Outline Background and motivation Design and implementation Performance evaluation Conclusions 11
12 Approach Content Defined Chunking (CDC) dividing the base and input chunks into smaller independent strings and then detecting duplicates among these strings. Locality of redundant data regions immediately adjacent to the confirmed duplicate strings may contain duplicate content. 12
13 Design Gear-based chunking fast chunking using lookup table. Spooky fingerprinting to duplicate identification among strings. Greedy byte-wise scanning searching areas adjacent to duplicate strings to hopefully find more redundancy. Encoding encode the duplicate and non-duplicate as Copy and Insert instructions respectively. 13
14 Gear-based chunking Gear hash H i = H i GearTable B i GearTable - array of 256 random 32-bit integers Total: 1 ADD, 1 SHIFT, 1 ARRAY LOOKUP Rabin hash H i = H i 1 U B i n 8 B i T hash N U,T denote predefined arrays for finite field multiplication Total: 1 OR, 2 XORs, 2 SHIFTs, 2 ARRAY LOOKUPs 14
15 Gear-based chunking H i = H i GearTable 2 = H i+1 = H i 1 + GearTable 7 = =377 H i+2 = H i GearTable 5 = =974 H i+3 = H i GearTable 9 = =259 H i+4 = H i GearTable 2 = =774 GearTable
16 Spooky fingerprinting 64-bit Spooky hash instead of time-consuming SHA-1 Compare content byte-wise (memcmp() in C language). Negligible overhead relative to chunking and fingerprinting. Other fast hash approaches like Murmur and xxhash can also be employed. 16
17 Greedy byte-wise scanning CDC-based approach cannot accurately find boundary between changed and duplicate regions. Exploiting data-stream content locality. Chunk-level search for resemblance-detected chunks. String-level search in the duplicate-adjacent areas. 17
18 Ddelta workflow Step 1: Scanning from both ends Step 2: Identifying duplicate strings 18
19 Ddelta workflow Step 3: Scanning areas adjacent to duplicates Step 4: Encoding delta chunk (C= Copy I= Insert ) 19
20 Post deduplication workflow System overview 20
21 Similarity detection Compute fingerprints over the chunk/file and select N smallest values. 6 fingerprints over a chunk. Combine those to super-fingerprints. 2 super-fingerprints (3 fingerprints each). Search the index for a match of super-fingerprint Choose BestFit or FirstFit strategy. FirstFit in our case. 21
22 Outline Background and motivation Design and implementation Performance evaluation Conclusions 22
23 Evaluation datasets GCC and Linux datasets represent workloads of typical large software source code. VM-A VM images of different OS release versions, low dedup-factor. VM-B 177 backups of an Ubuntu VM in use, common use-case for data reduction in the real world. RDB 211 backups of Redis key value store database, typical database workload for data reduction. Bench is generated from snapshots of a personal cloud storage benchmark. 23
24 Experimental setup Data deduplication chunk sizes are: average 8 KB maximum 64 KB, and minimum 2 KB 1 2 Xdelta and Zdelta used as delta compression baseline Metrics: Compression ratio CR - percentage of data reduced Compression factor CF - ratio of data sizes before and after data reduction Platform: Ubuntu OS quad-core Intel i7 processor at 2.8 GHz, with a 16GB RAM 2x1TB 7200RPM hard disks, 120GB SSD 1. File system support for delta compression, Department of EE and CS, University of California at Berkeley, J. MacDonald(Masters Thesis). 2. Zdelta: An efficient delta compression tool, Technical report, Department of CS at Polytechnic University, D. Trendafilov, N. Memon, T. Suel. 24
25 Gear hash evaluation Hash function distribution 25
26 Gear hash evaluation Chunk-size distribution on RDB 26
27 Gear hash evaluation Chunking speed Compression performance 27
28 Ddelta evaluation Post-deduplication data reduction system that implements delta and GZ compression on the nonduplicate chunks. Case study I: delta compression of resemblancedetected similar chunks Case study II: delta compression for updated tarred files 28
29 Ddelta evaluation CR by the three duplicateidentification steps of Ddelta, different workloads. CR as a function of the average string size on the Linux dataset. 29
30 Ddelta evaluation Encoding speed as a function of the average string size, Linux dataset. Evaluating combinations of chunking schemes with fingerprinting schemes. 30
31 Ddelta evaluation Encoding speed as a function of the average string size, Linux dataset. Evaluating combinations of chunking schemes with fingerprinting schemes. 31
32 Ddelta evaluation CR of post-deduplication data reduction schemes 32
33 Ddelta evaluation Compressing throughput 33
34 Ddelta evaluation Uncompressing throughput 34
35 Ddelta evaluation 2 Delta compression performance of the updated similar tarred files. 35
36 Ddelta evaluation 2 CR of Ddelta, Xdelta and deduplication on similar tarred datasets Encoding speed Decoding speed 36
37 Outline Background and motivation Design and implementation Performance evaluation Conclusions 37
38 Conclusions Delta compression scheme can be fast. Encoding speedup of x2.5 - x8 Decoding speedup of x2 - x20 Using deduplication principles without sacrifice to compression ratio. Gear-based chunking improves Rabin Content- Defined Chunking process by a factor of about x2.1 38
39 Ongoing questions Similarity detection? DARE: A Deduplication-Aware Resemblance Detection and Elimination Scheme for Data Reduction with Low Overheads. How GC will manage delta compressed files? Inline or Offline? Write/Read throughput? 39
DEC: An Efficient Deduplication-Enhanced Compression Approach
2016 IEEE 22nd International Conference on Parallel and Distributed Systems DEC: An Efficient Deduplication-Enhanced Compression Approach Zijin Han, Wen Xia, Yuchong Hu *, Dan Feng, Yucheng Zhang, Yukun
More informationChunkStash: Speeding Up Storage Deduplication using Flash Memory
ChunkStash: Speeding Up Storage Deduplication using Flash Memory Biplob Debnath +, Sudipta Sengupta *, Jin Li * * Microsoft Research, Redmond (USA) + Univ. of Minnesota, Twin Cities (USA) Deduplication
More informationReducing Replication Bandwidth for Distributed Document Databases
Reducing Replication Bandwidth for Distributed Document Databases Lianghong Xu 1, Andy Pavlo 1, Sudipta Sengupta 2 Jin Li 2, Greg Ganger 1 Carnegie Mellon University 1, Microsoft Research 2 Document-oriented
More informationDesign Tradeoffs for Data Deduplication Performance in Backup Workloads
Design Tradeoffs for Data Deduplication Performance in Backup Workloads Min Fu,DanFeng,YuHua,XubinHe, Zuoning Chen *, Wen Xia,YuchengZhang,YujuanTan Huazhong University of Science and Technology Virginia
More informationThe Effectiveness of Deduplication on Virtual Machine Disk Images
The Effectiveness of Deduplication on Virtual Machine Disk Images Keren Jin & Ethan L. Miller Storage Systems Research Center University of California, Santa Cruz Motivation Virtualization is widely deployed
More informationMIGRATORY COMPRESSION Coarse-grained Data Reordering to Improve Compressibility
MIGRATORY COMPRESSION Coarse-grained Data Reordering to Improve Compressibility Xing Lin *, Guanlin Lu, Fred Douglis, Philip Shilane, Grant Wallace * University of Utah EMC Corporation Data Protection
More informationWAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression
WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression Philip Shilane, Mark Huang, Grant Wallace, & Windsor Hsu Backup Recovery Systems Division EMC Corporation Introduction
More informationAccelerating Restore and Garbage Collection in Deduplication-based Backup Systems via Exploiting Historical Information
Accelerating Restore and Garbage Collection in Deduplication-based Backup Systems via Exploiting Historical Information Min Fu, Dan Feng, Yu Hua, Xubin He, Zuoning Chen *, Wen Xia, Fangting Huang, Qing
More informationDeduplication Storage System
Deduplication Storage System Kai Li Charles Fitzmorris Professor, Princeton University & Chief Scientist and Co-Founder, Data Domain, Inc. 03/11/09 The World Is Becoming Data-Centric CERN Tier 0 Business
More informationParallelizing Inline Data Reduction Operations for Primary Storage Systems
Parallelizing Inline Data Reduction Operations for Primary Storage Systems Jeonghyeon Ma ( ) and Chanik Park Department of Computer Science and Engineering, POSTECH, Pohang, South Korea {doitnow0415,cipark}@postech.ac.kr
More informationSpeeding Up Cloud/Server Applications Using Flash Memory
Speeding Up Cloud/Server Applications Using Flash Memory Sudipta Sengupta and Jin Li Microsoft Research, Redmond, WA, USA Contains work that is joint with Biplob Debnath (Univ. of Minnesota) Flash Memory
More informationFGDEFRAG: A Fine-Grained Defragmentation Approach to Improve Restore Performance
FGDEFRAG: A Fine-Grained Defragmentation Approach to Improve Restore Performance Yujuan Tan, Jian Wen, Zhichao Yan, Hong Jiang, Witawas Srisa-an, Baiping Wang, Hao Luo Outline Background and Motivation
More informationdedupv1: Improving Deduplication Throughput using Solid State Drives (SSD)
University Paderborn Paderborn Center for Parallel Computing Technical Report dedupv1: Improving Deduplication Throughput using Solid State Drives (SSD) Dirk Meister Paderborn Center for Parallel Computing
More informationA Scalable Inline Cluster Deduplication Framework for Big Data Protection
University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln CSE Technical reports Computer Science and Engineering, Department of Summer 5-30-2012 A Scalable Inline Cluster Deduplication
More informationContents Part I Traditional Deduplication Techniques and Solutions Introduction Existing Deduplication Techniques
Contents Part I Traditional Deduplication Techniques and Solutions 1 Introduction... 3 1.1 Data Explosion... 3 1.2 Redundancies... 4 1.3 Existing Deduplication Solutions to Remove Redundancies... 5 1.4
More informationDelta Compressed and Deduplicated Storage Using Stream-Informed Locality
Delta Compressed and Deduplicated Storage Using Stream-Informed Locality Philip Shilane, Grant Wallace, Mark Huang, and Windsor Hsu Backup Recovery Systems Division EMC Corporation Abstract For backup
More informationDeploying De-Duplication on Ext4 File System
Deploying De-Duplication on Ext4 File System Usha A. Joglekar 1, Bhushan M. Jagtap 2, Koninika B. Patil 3, 1. Asst. Prof., 2, 3 Students Department of Computer Engineering Smt. Kashibai Navale College
More informationReducing The De-linearization of Data Placement to Improve Deduplication Performance
Reducing The De-linearization of Data Placement to Improve Deduplication Performance Yujuan Tan 1, Zhichao Yan 2, Dan Feng 2, E. H.-M. Sha 1,3 1 School of Computer Science & Technology, Chongqing University
More informationAlternative Approaches for Deduplication in Cloud Storage Environment
International Journal of Computational Intelligence Research ISSN 0973-1873 Volume 13, Number 10 (2017), pp. 2357-2363 Research India Publications http://www.ripublication.com Alternative Approaches for
More informationSmartMD: A High Performance Deduplication Engine with Mixed Pages
SmartMD: A High Performance Deduplication Engine with Mixed Pages Fan Guo 1, Yongkun Li 1, Yinlong Xu 1, Song Jiang 2, John C. S. Lui 3 1 University of Science and Technology of China 2 University of Texas,
More informationDUE to the explosive growth of the digital data, data
1162 IEEE TRANSACTIONS ON COMPUTERS, VOL. 64, NO. 4, APRIL 2015 Similarity and Locality Based Indexing for High Performance Data Deduplication Wen Xia, Hong Jiang, Senior Member, IEEE, Dan Feng, Member,
More informationData Reduction Meets Reality What to Expect From Data Reduction
Data Reduction Meets Reality What to Expect From Data Reduction Doug Barbian and Martin Murrey Oracle Corporation Thursday August 11, 2011 9961: Data Reduction Meets Reality Introduction Data deduplication
More informationPurity: building fast, highly-available enterprise flash storage from commodity components
Purity: building fast, highly-available enterprise flash storage from commodity components J. Colgrove, J. Davis, J. Hayes, E. Miller, C. Sandvig, R. Sears, A. Tamches, N. Vachharajani, and F. Wang 0 Gala
More informationOptimizing Flash-based Key-value Cache Systems
Optimizing Flash-based Key-value Cache Systems Zhaoyan Shen, Feng Chen, Yichen Jia, Zili Shao Department of Computing, Hong Kong Polytechnic University Computer Science & Engineering, Louisiana State University
More informationVirtualization Technique For Replica Synchronization
Virtualization Technique For Replica Synchronization By : Ashwin G.Sancheti Email:ashwin@cs.jhu.edu Instructor : Prof.Randal Burns Date : 19 th Feb 2008 Roadmap Motivation/Goals What is Virtualization?
More informationBenefits of Storage Capacity Optimization Methods (COMs) And. Performance Optimization Methods (POMs)
Benefits of Storage Capacity Optimization Methods (COMs) And Performance Optimization Methods (POMs) Herb Tanzer & Chuck Paridon Storage Product & Storage Performance Architects Hewlett Packard Enterprise
More informationApplication-Aware Big Data Deduplication in Cloud Environment
IEEE TRANSACTIONS ON CLOUD COMPUTING 1 Application-Aware Big Data Deduplication in Cloud Environment Yinjin Fu, Nong Xiao, Hong Jiang, Fellow, IEEE, Guyu Hu, and Weiwei Chen Abstract Deduplication has
More informationHPDedup: A Hybrid Prioritized Data Deduplication Mechanism for Primary Storage in the Cloud
HPDedup: A Hybrid Prioritized Data Deduplication Mechanism for Primary Storage in the Cloud Huijun Wu 1,4, Chen Wang 2, Yinjin Fu 3, Sherif Sakr 1, Liming Zhu 1,2 and Kai Lu 4 The University of New South
More informationImproving Backup and Restore Performance for Deduplication-based Cloud Backup Services
University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Computer Science and Engineering: Theses, Dissertations, and Student Research Computer Science and Engineering, Department
More informationDeduplication File System & Course Review
Deduplication File System & Course Review Kai Li 12/13/13 Topics u Deduplication File System u Review 12/13/13 2 Storage Tiers of A Tradi/onal Data Center $$$$ Mirrored storage $$$ Dedicated Fibre Clients
More informationReducing Replication Bandwidth for Distributed Document Databases
Reducing Replication Bandwidth for Distributed Document Databases Lianghong Xu Andrew Pavlo Sudipta Sengupta Jin Li Gregory R. Ganger Carnegie Mellon University, Microsoft Research Long Research Paper
More informationReducing Replication Bandwidth for Distributed Document Databases
Reducing Replication Bandwidth for Distributed Document Databases Lianghong Xu Andrew Pavlo Sudipta Sengupta Jin Li Gregory R. Ganger Carnegie Mellon University, Microsoft Research Abstract With the rise
More informationAn Application Awareness Local Source and Global Source De-Duplication with Security in resource constraint based Cloud backup services
An Application Awareness Local Source and Global Source De-Duplication with Security in resource constraint based Cloud backup services S.Meghana Assistant Professor, Dept. of IT, Vignana Bharathi Institute
More informationDEBAR: A Scalable High-Performance Deduplication Storage System for Backup and Archiving
University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln CSE Technical reports Computer Science and Engineering, Department of 1-5-29 DEBAR: A Scalable High-Performance Deduplication
More informationKnockoff: Cheap versions in the cloud. Xianzheng Dou, Peter M. Chen, Jason Flinn
Knockoff: Cheap versions in the cloud Xianzheng Dou, Peter M. Chen, Jason Flinn Cloud-based storage Google Drive Dropbox Pros: Ease-of-management Reliability Microsoft OneDrive Xianzheng Dou 1 Cloud-based
More informationMigratory Compression: Coarse-grained Data Reordering to Improve Compressibility
Migratory Compression: Coarse-grained Data Reordering to Improve Compressibility Xing Lin 1, Guanlin Lu 2, Fred Douglis 2, Philip Shilane 2, Grant Wallace 2 1 University of Utah, 2 EMC Corporation Data
More informationSHHC: A Scalable Hybrid Hash Cluster for Cloud Backup Services in Data Centers
2011 31st International Conference on Distributed Computing Systems Workshops SHHC: A Scalable Hybrid Hash Cluster for Cloud Backup Services in Data Centers Lei Xu, Jian Hu, Stephen Mkandawire and Hong
More informationCascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching
Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching Kefei Wang and Feng Chen Louisiana State University SoCC '18 Carlsbad, CA Key-value Systems in Internet Services Key-value
More informationHEAD HardwarE Accelerated Deduplication
HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee Executive Summary A-Z development of deduplication SW version
More informationLinearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency
Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi Xin, Onur Mutlu, Todd C. Mowry Phillip B. Gibbons,
More informationInternational Journal of Computer Engineering and Applications, Volume XII, Special Issue, March 18, ISSN
International Journal of Computer Engineering and Applications, Volume XII, Special Issue, March 18, www.ijcea.com ISSN 2321-3469 SECURE DATA DEDUPLICATION FOR CLOUD STORAGE: A SURVEY Vidya Kurtadikar
More informationOnline Deduplication for Databases
Online Deduplication for Databases ABSTRACT Lianghong Xu Carnegie Mellon University lianghon@andrew.cmu.edu Sudipta Sengupta Microsoft Research sudipta@microsoft.com is a similarity-based deduplication
More informationarxiv: v3 [cs.dc] 27 Jun 2013
RevDedup: A Reverse Deduplication Storage System Optimized for Reads to Latest Backups arxiv:1302.0621v3 [cs.dc] 27 Jun 2013 Chun-Ho Ng and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong
More informationFunctional Partitioning to Optimize End-to-End Performance on Many-core Architectures
Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures Min Li, Sudharshan S. Vazhkudai, Ali R. Butt, Fei Meng, Xiaosong Ma, Youngjae Kim,Christian Engelmann, and Galen Shipman
More informationMulti-level Byte Index Chunking Mechanism for File Synchronization
, pp.339-350 http://dx.doi.org/10.14257/ijseia.2014.8.3.31 Multi-level Byte Index Chunking Mechanism for File Synchronization Ider Lkhagvasuren, Jung Min So, Jeong Gun Lee, Jin Kim and Young Woong Ko *
More informationDELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE
WHITEPAPER DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE A Detailed Review ABSTRACT While tape has been the dominant storage medium for data protection for decades because of its low cost, it is steadily
More informationPageForge: A Near-Memory Content- Aware Page-Merging Architecture
PageForge: A Near-Memory Content- Aware Page-Merging Architecture Dimitrios Skarlatos, Nam Sung Kim, and Josep Torrellas University of Illinois at Urbana-Champaign MICRO-50 @ Boston Motivation: Server
More informationUNIC: Secure Deduplication of General Computations. Yang Tang and Junfeng Yang Columbia University
UNIC: Secure Deduplication of General Computations Yang Tang and Junfeng Yang Columbia University The world s data is fast exploding A significant portion of the data is redundant. Data deduplication can
More informationFlashed-Optimized VPSA. Always Aligned with your Changing World
Flashed-Optimized VPSA Always Aligned with your Changing World Yair Hershko Co-founder, VP Engineering, Zadara Storage 3 Modern Data Storage for Modern Computing Innovating data services to meet modern
More informationDEDUPLICATION AWARE AND DUPLICATE ELIMINATION SCHEME FOR DATA REDUCTION IN BACKUP STORAGE SYSTEMS
DEDUPLICATION AWARE AND DUPLICATE ELIMINATION SCHEME FOR DATA REDUCTION IN BACKUP STORAGE SYSTEMS NIMMAGADDA SRIKANTHI, DR G.RAMU, YERRAGUDIPADU Department Of Computer Science and Professor, Department
More informationApril 2010 Rosen Shingle Creek Resort Orlando, Florida
Data Reduction and File Systems Jeffrey Tofano Chief Technical Officer, Quantum Corporation Today s Agenda File Systems and Data Reduction Overview File System and Data Reduction Integration Issues Reviewing
More informationCompression and Decompression of Virtual Disk Using Deduplication
Compression and Decompression of Virtual Disk Using Deduplication Bharati Ainapure 1, Siddhant Agarwal 2, Rukmi Patel 3, Ankita Shingvi 4, Abhishek Somani 5 1 Professor, Department of Computer Engineering,
More informationP-Dedupe: Exploiting Parallelism in Data Deduplication System
2012 IEEE Seventh International Conference on Networking, Architecture, and Storage P-Dedupe: Exploiting Parallelism in Data Deduplication System Wen Xia, Hong Jiang, Dan Feng,*, Lei Tian, Min Fu, Zhongtao
More informationAPPLICATION-AWARE LOCAL-GLOBAL SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES OF PERSONAL STORAGE
APPLICATION-AWARE LOCAL-GLOBAL SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES OF PERSONAL STORAGE Ms.M.Elakkiya 1, Ms.K.Kamaladevi 2, Mrs.K.Kayalvizhi 3 1,2 PG Scholar, 3Assistant Professor, Department
More informationEaSync: A Transparent File Synchronization Service across Multiple Machines
EaSync: A Transparent File Synchronization Service across Multiple Machines Huajian Mao 1,2, Hang Zhang 1,2, Xianqiang Bao 1,2, Nong Xiao 1,2, Weisong Shi 3, and Yutong Lu 1,2 1 State Key Laboratory of
More informationPebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees
PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees Pandian Raju 1, Rohan Kadekodi 1, Vijay Chidambaram 1,2, Ittai Abraham 2 1 The University of Texas at Austin 2 VMware Research
More informationEfficient Hybrid Inline and Out-of-Line Deduplication for Backup Storage
Efficient Hybrid Inline and Out-of-Line Deduplication for Backup Storage YAN-KIT LI, MIN XU, CHUN-HO NG, and PATRICK P. C. LEE, The Chinese University of Hong Kong 2 Backup storage systems often remove
More informationDATABASE COMPRESSION. Pooja Nilangekar [ ] Rohit Agrawal [ ] : Advanced Database Systems
DATABASE COMPRESSION Pooja Nilangekar [ poojan@cmu.edu ] Rohit Agrawal [ rohit10@cmu.edu ] 15721 : Advanced Database Systems PROJECT OBJECTIVE Compressing the DBMS :- Use less space to store cold data
More informationDeduplication and Its Application to Corporate Data
White Paper Deduplication and Its Application to Corporate Data Lorem ipsum ganus metronique elit quesal norit parique et salomin taren ilat mugatoque This whitepaper explains deduplication techniques
More informationCheetah: An Efficient Flat Addressing Scheme for Fast Query Services in Cloud Computing
IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications Cheetah: An Efficient Flat Addressing Scheme for Fast Query Services in Cloud Computing Yu Hua Wuhan National
More informationHierarchical Substring Caching for Efficient Content Distribution to Low-Bandwidth Clients
Hierarchical Substring Caching for Efficient Content Distribution to Low-Bandwidth Clients Utku Irmak CIS Department Polytechnic University Brooklyn, NY 11201 uirmak@cis.poly.edu Torsten Suel CIS Department
More informationA New Key-Value Data Store For Heterogeneous Storage Architecture
A New Key-Value Data Store For Heterogeneous Storage Architecture brien.porter@intel.com wanyuan.yang@intel.com yuan.zhou@intel.com jian.zhang@intel.com Intel APAC R&D Ltd. 1 Agenda Introduction Background
More informationFGDEFRAG: A Fine-Grained Defragmentation Approach to Improve Restore Performance
FGDEFRAG: A Fine-Grained Defragmentation Approach to Improve Restore Performance Yujuan Tan, Jian Wen, Zhichao Yan, Hong Jiang, Witawas Srisa-an, aiping Wang, Hao Luo College of Computer Science, Chongqing
More informationDell EMC SAP HANA Appliance Backup and Restore Performance with Dell EMC Data Domain
Dell EMC SAP HANA Appliance Backup and Restore Performance with Dell EMC Data Domain Performance testing results using Dell EMC Data Domain DD6300 and Data Domain Boost for Enterprise Applications July
More informationSparse Indexing: Large-Scale, Inline Deduplication Using Sampling and Locality
Sparse Indexing: Large-Scale, Inline Deduplication Using Sampling and Locality Mark Lillibridge, Kave Eshghi, Deepavali Bhagwat, Vinay Deolalikar, Greg Trezise, and Peter Camble Work done at Hewlett-Packard
More informationTAPER: Tiered Approach for Eliminating Redundancy in Replica Synchronization
TAPER: Tiered Approach for Eliminating Redundancy in Replica Synchronization Navendu Jain, Mike Dahlin, and Renu Tewari Department of Computer Sciences, University of Texas at Austin, Austin, TX, 7872
More informationConfiguring Short RPO with Actifio StreamSnap and Dedup-Async Replication
CDS and Sky Tech Brief Configuring Short RPO with Actifio StreamSnap and Dedup-Async Replication Actifio recommends using Dedup-Async Replication (DAR) for RPO of 4 hours or more and using StreamSnap for
More informationDesign Tradeoffs for Data Deduplication Performance in Backup Workloads
Design Tradeoffs for Data Deduplication Performance in Backup Workloads Min Fu, Dan Feng, and Yu Hua, Huazhong University of Science and Technology; Xubin He, Virginia Commonwealth University; Zuoning
More informationDifference Engine: Harnessing Memory Redundancy in Virtual Machines (D. Gupta et all) Presented by: Konrad Go uchowski
Difference Engine: Harnessing Memory Redundancy in Virtual Machines (D. Gupta et all) Presented by: Konrad Go uchowski What is Virtual machine monitor (VMM)? Guest OS Guest OS Guest OS Virtual machine
More informationIEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 4, NO. X, XXXXX Boafft: Distributed Deduplication for Big Data Storage in the Cloud
TRANSACTIONS ON CLOUD COMPUTING, VOL. 4, NO. X, XXXXX 2016 1 Boafft: Distributed Deduplication for Big Data Storage in the Cloud Shengmei Luo, Guangyan Zhang, Chengwen Wu, Samee U. Khan, Senior Member,,
More informationDeduplication: The hidden truth and what it may be costing you
Deduplication: The hidden truth and what it may be costing you Not all deduplication technologies are created equal. See why choosing the right one can save storage space by up to a factor of 10. By Adrian
More informationAn Experimental Study of Rapidly Alternating Bottleneck in n-tier Applications
An Experimental Study of Rapidly Alternating Bottleneck in n-tier Applications Qingyang Wang, Yasuhiko Kanemasa, Jack Li, Deepal Jayasinghe, Toshihiro Shimizu, Masazumi Matsubara, Motoyuki Kawaba, Calton
More informationIBM B2B INTEGRATOR BENCHMARKING IN THE SOFTLAYER ENVIRONMENT
IBM B2B INTEGRATOR BENCHMARKING IN THE SOFTLAYER ENVIRONMENT 215-4-14 Authors: Deep Chatterji (dchatter@us.ibm.com) Steve McDuff (mcduffs@ca.ibm.com) CONTENTS Disclaimer...3 Pushing the limits of B2B Integrator...4
More informationEC-Bench: Benchmarking Onload and Offload Erasure Coders on Modern Hardware Architectures
EC-Bench: Benchmarking Onload and Offload Erasure Coders on Modern Hardware Architectures Haiyang Shi, Xiaoyi Lu, and Dhabaleswar K. (DK) Panda {shi.876, lu.932, panda.2}@osu.edu The Ohio State University
More informationDeduplication and Incremental Accelleration in Bacula with NetApp Technologies. Peter Buschman EMEA PS Consultant September 25th, 2012
Deduplication and Incremental Accelleration in Bacula with NetApp Technologies Peter Buschman EMEA PS Consultant September 25th, 2012 1 NetApp and Bacula Systems Bacula Systems became a NetApp Developer
More informationThe Logic of Physical Garbage Collection in Deduplicating Storage
The Logic of Physical Garbage Collection in Deduplicating Storage Fred Douglis Abhinav Duggal Philip Shilane Tony Wong Dell EMC Shiqin Yan University of Chicago Fabiano Botelho Rubrik 1 Deduplication in
More informationThe Power of Prediction: Cloud Bandwidth and Cost Reduction
The Power of Prediction: Cloud Bandwidth and Cost Reduction Eyal Zohar Israel Cidon Technion Osnat(Ossi) Mokryn Tel-Aviv College Traffic Redundancy Elimination (TRE) Traffic redundancy stems from downloading
More informationNetVault Backup Client and Server Sizing Guide 2.1
NetVault Backup Client and Server Sizing Guide 2.1 Recommended hardware and storage configurations for NetVault Backup 10.x and 11.x September, 2017 Page 1 Table of Contents 1. Abstract... 3 2. Introduction...
More informationDecoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching
Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood University of Wisconsin-Madison 1 Please find the power point presentation
More informationOasis: An Active Storage Framework for Object Storage Platform
Oasis: An Active Storage Framework for Object Storage Platform Yulai Xie 1, Dan Feng 1, Darrell D. E. Long 2, Yan Li 2 1 School of Computer, Huazhong University of Science and Technology Wuhan National
More informationScalable Compression and Transmission of Large, Three- Dimensional Materials Microstructures
Scalable Compression and Transmission of Large, Three- Dimensional Materials Microstructures William A. Pearlman Center for Image Processing Research Rensselaer Polytechnic Institute pearlw@ecse.rpi.edu
More informationNetApp Data Compression, Deduplication, and Data Compaction
Technical Report NetApp Data Compression, Deduplication, and Data Compaction Data ONTAP 8.3.1 and Later Karthik Viswanath, NetApp February 2018 TR-4476 Abstract This technical report focuses on implementing
More informationRethinking Deduplication Scalability
Rethinking Deduplication Scalability Petros Efstathopoulos Petros Efstathopoulos@symantec.com Fanglu Guo Fanglu Guo@symantec.com Symantec Research Labs Symantec Corporation, Culver City, CA, USA 1 ABSTRACT
More informationColumn Stores vs. Row Stores How Different Are They Really?
Column Stores vs. Row Stores How Different Are They Really? Daniel J. Abadi (Yale) Samuel R. Madden (MIT) Nabil Hachem (AvantGarde) Presented By : Kanika Nagpal OUTLINE Introduction Motivation Background
More informationI-CASH: Intelligently Coupled Array of SSD and HDD
: Intelligently Coupled Array of SSD and HDD Jin Ren and Qing Yang Dept. of Electrical, Computer, and Biomedical Engineering University of Rhode Island, Kingston, RI 02881 (rjin,qyang)@ele.uri.edu Abstract
More informationbup: the git-based backup system Avery Pennarun
bup: the git-based backup system Avery Pennarun 2011 04 30 The Challenge Back up entire filesystems (> 1TB) Including huge VM disk images (files >100GB) Lots of separate files (500k or more) Calculate/store
More informationOnline Version Only. Book made by this file is ILLEGAL. Design and Implementation of Binary File Similarity Evaluation System. 1.
, pp.1-10 http://dx.doi.org/10.14257/ijmue.2014.9.1.01 Design and Implementation of Binary File Similarity Evaluation System Sun-Jung Kim 2, Young Jun Yoo, Jungmin So 1, Jeong Gun Lee 1, Jin Kim 1 and
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google SOSP 03, October 19 22, 2003, New York, USA Hyeon-Gyu Lee, and Yeong-Jae Woo Memory & Storage Architecture Lab. School
More informationWhite paper ETERNUS CS800 Data Deduplication Background
White paper ETERNUS CS800 - Data Deduplication Background This paper describes the process of Data Deduplication inside of ETERNUS CS800 in detail. The target group consists of presales, administrators,
More informationscc: Cluster Storage Provisioning Informed by Application Characteristics and SLAs
scc: Cluster Storage Provisioning Informed by Application Characteristics and SLAs Harsha V. Madhyastha*, John C. McCullough, George Porter, Rishi Kapoor, Stefan Savage, Alex C. Snoeren, and Amin Vahdat
More informationJPEG decoding using end of block markers to concurrently partition channels on a GPU. Patrick Chieppe (u ) Supervisor: Dr.
JPEG decoding using end of block markers to concurrently partition channels on a GPU Patrick Chieppe (u5333226) Supervisor: Dr. Eric McCreath JPEG Lossy compression Widespread image format Introduction
More informationEfficient Deduplication Techniques for Modern Backup Operation
824 IEEE TRANSACTIONS ON COMPUTERS, VOL. 60, NO. 6, JUNE 2011 Efficient Deduplication Techniques for Modern Backup Operation Jaehong Min, Daeyoung Yoon, and Youjip Won Abstract In this work, we focus on
More informationDEDUPLICATION OF VM MEMORY PAGES USING MAPREDUCE IN LIVE MIGRATION
DEDUPLICATION OF VM MEMORY PAGES USING MAPREDUCE IN LIVE MIGRATION TYJ Naga Malleswari 1 and Vadivu G 2 1 Department of CSE, Sri Ramaswami Memorial University, Chennai, India 2 Department of Information
More informationCOS 318: Operating Systems. NSF, Snapshot, Dedup and Review
COS 318: Operating Systems NSF, Snapshot, Dedup and Review Topics! NFS! Case Study: NetApp File System! Deduplication storage system! Course review 2 Network File System! Sun introduced NFS v2 in early
More informationHedvig as backup target for Veeam
Hedvig as backup target for Veeam Solution Whitepaper Version 1.0 April 2018 Table of contents Executive overview... 3 Introduction... 3 Solution components... 4 Hedvig... 4 Hedvig Virtual Disk (vdisk)...
More informationE-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing Systems
E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing Systems Rebecca Taft, Essam Mansour, Marco Serafini, Jennie Duggan, Aaron J. Elmore, Ashraf Aboulnaga, Andrew Pavlo, Michael
More informationPresented by: Nafiseh Mahmoudi Spring 2017
Presented by: Nafiseh Mahmoudi Spring 2017 Authors: Publication: Type: ACM Transactions on Storage (TOS), 2016 Research Paper 2 High speed data processing demands high storage I/O performance. Flash memory
More informationUnderstanding Primary Storage Optimization Options Jered Floyd Permabit Technology Corp.
Understanding Primary Storage Optimization Options Jered Floyd Permabit Technology Corp. Primary Storage Optimization Technologies that let you store more data on the same storage Thin provisioning Copy-on-write
More informationIn-line Deduplication for Cloud storage to Reduce Fragmentation by using Historical Knowledge
In-line Deduplication for Cloud storage to Reduce Fragmentation by using Historical Knowledge Smitha.M. S, Prof. Janardhan Singh Mtech Computer Networking, Associate Professor Department of CSE, Cambridge
More informationByte Index Chunking Approach for Data Compression
Ider Lkhagvasuren 1, Jung Min So 1, Jeong Gun Lee 1, Chuck Yoo 2, Young Woong Ko 1 1 Dept. of Computer Engineering, Hallym University Chuncheon, Korea {Ider555, jso, jeonggun.lee, yuko}@hallym.ac.kr 2
More information