MIGRATORY COMPRESSION Coarse-grained Data Reordering to Improve Compressibility
|
|
- Susan York
- 6 years ago
- Views:
Transcription
1 MIGRATORY COMPRESSION Coarse-grained Data Reordering to Improve Compressibility Xing Lin *, Guanlin Lu, Fred Douglis, Philip Shilane, Grant Wallace * University of Utah EMC Corporation Data Protection and Availability Division 1
2 Overview Compression: finds redundancy among strings within a certain distance (window size) Problem: window sizes are small, and similarity across a larger distance will not be identified Migratory compression (MC): coarse-grained reorganization to group similar blocks to improve compressibility A generic pre-processing stage for standard compressors In many cases, improve both compressibility and throughput Effective for improving compression for archival storage 2
3 Background Compression Factor (CF) = ratio of original / compressed Throughput = original size / (de)compression time Compressor MAX Window size Techniques gzip 64 KB LZ; Huffman coding bzip2 9 KB Run-length encoding; Burrows-Wheeler Transform; Huffman coding 7z 1 GB LZ; Markov chain-based range coder 3
4 Background Compression Factor (CF) = ratio of original / compressed Throughput = original size / (de)compression time Compressor MAX Window size Techniques gzip 64 KB LZ; Huffman coding bzip2 9 KB Run-length encoding; Burrows-Wheeler Transform; Huffman coding 7z 1 GB LZ; Markov chain-based range coder 4
5 Background Compression Factor (CF) = ratio of original / compressed Throughput = original size / (de)compression time Compressor MAX Window size Techniques gzip 64 KB LZ; Huffman coding bzip2 9 KB Run-length encoding; Burrows- Wheeler Transform; Huffman coding 7z 1 GB LZ; Markov chain-based range coder Compression Tput (MB/s) Example Throughput vs. CF 14 gzip bzip z Compression Factor (X) The larger the window, the better the compression but slower
6 Background Compression Factor (CF) = ratio of original / compressed Throughput = original size / (de)compression time Compressor MAX Window size Techniques gzip 64 KB LZ; Huffman coding bzip2 9 KB Run-length encoding; Burrows- Wheeler Transform; Huffman coding 7z 1 GB LZ; Markov chain-based range coder rzip (from rsync): intra-file deduplication; bzip2 the remainder Compression Tput (MB/s) Example Throughput vs. CF gzip bzip Compression Factor (X) rzip 7z The larger the window, the better the compression but slower 6
7 Motivation Compress a single, large file Traditional compressors are unable to exploit redundancy across a large range of data (e.g., many GB) gz window A B A C (GBs) A B compress 7z window A transform A A B B C compress Standard compressor mzip 7
8 Motivation Better compression for long-term retention Data migration from backup to archive tier Observation: similar blocks could be scattered across compression regions (CRs) in the backup tier CR1 CR2 A B C A B C CR: unit of data that is compressed together (e.g. 128 KB) Backup tier 8
9 Motivation Better compression for long-term retention Data migration from backup to archive tier Observation: similar blocks could be scattered across compression regions (CRs) in the backup tier CR1 A B C Migrate CR1 A B C CR2 A B C CR2 A B C Backup tier Current practice Archive tier 9
10 Motivation Better compression for long-term retention Data migration from backup to archive tier Observation: similar blocks could be scattered across compression regions (CRs) in the backup tier Proposal: fill each CR with similar data CR1 A B C CR1 A A B B Migrate CR2 A B C CR2 C C Backup tier Archive tier Migratory compression 1
11 Recap Migratory compression Idea: improve compression by grouping similar blocks Use cases mzip: compress a single, large file Data migration for archival storage Considerations How to identify similar blocks? How to reorganize blocks efficiently? Other tradeoffs (refer to the paper) 11
12 mzip Workflow File Segmentation Similarity detector Reorganizer Reorganized file Compressor Segmentation: partition into blocks, calculate similarity features Similarity detector: group by content and identify duplicate and similar blocks; output migrate and restore recipe Reorganizer: rearrange the input file Reorganized file may exist only as a pipe into compressor Compressed file 12
13 mzip Workflow File Compressed file Segmentation Decompressor Similarity detector Reorganized file Reorganizer Reorganized file Extract restore recipe Compressor Restorer Compressed file File 13
14 mzip Example migrate recipe Block ID A Similarity detector File B C A D B restore recipe 14
15 mzip Example migrate recipe Block ID Reorganize A A File B C A D B A B B C D Reorganized File Restore restore recipe 1
16 Similarity Detection Similarity feature (based on [Broder 1997]) A strong hash for duplication detection Features: weak hashes provide hints about similarity among blocks Two blocks are similar when they share values for some weak hashes Mature technique REBL: Redundancy Elimination at Block Level [Kulkarni et al. 24] WAN Optimized Replication [Shilane et al. 212] 16
17 Data Reorganization Re-arrange the input file, according to a recipe Reorganizer & Restorer Options Block-level recipe A B C A D B input Block-level 17
18 Data Reorganization Re-arrange the input file, according to a recipe Reorganizer & Restorer Options Block-level recipe B C A D B input Block-level A 18
19 Data Reorganization Re-arrange the input file, according to a recipe Reorganizer & Restorer Options Block-level recipe B C D B input Block-level A A 19
20 Data Reorganization Re-arrange the input file, according to a recipe Reorganizer & Restorer Options Block-level recipe C D B input Block-level A A B 2
21 Data Reorganization Re-arrange the input file, according to a recipe Reorganizer & Restorer Options Block-level recipe C D input Block-level A A B B 21
22 Data Reorganization Re-arrange the input file, according to a recipe Reorganizer & Restorer Options Block-level recipe D input Block-level A A B B C 22
23 Data Reorganization Re-arrange the input file, according to a recipe Reorganizer & Restorer Options Block-level recipe input Block-level A A B B C D 23
24 Data Reorganization Re-arrange the input file, according to a recipe Reorganizer & Restorer Options Block-level Random IOs Fine for memory and SSD recipe input Block-level A A B B C D 24
25 Data Reorganization Re-arrange the input file, according to a recipe Reorganizer & Restorer Options Block-level Random IOs Fine for memory and SSD Multipass (HDD) 1st scan recipe A B C A D B input Block-level Multipass Buffer 2
26 Data Reorganization Re-arrange the input file, according to a recipe Reorganizer & Restorer Options Block-level Random IOs Fine for memory and SSD Multipass (HDD) 1st scan recipe C D B input Block-level Multipass A A B Buffer 26
27 Data Reorganization Re-arrange the input file, according to a recipe Reorganizer & Restorer Options Block-level Random IOs Fine for memory and SSD Multipass (HDD) 2nd scan recipe C D B Buffer input Block-level Multipass 27
28 Data Reorganization Re-arrange the input file, according to a recipe Reorganizer & Restorer Options Block-level Random IOs Fine for memory and SSD Multipass (HDD) 2nd scan recipe input Block-level Multipass B C D Buffer 28
29 Data Reorganization Re-arrange the input file, according to a recipe Reorganizer & Restorer Options Block-level Random IOs Fine for memory and SSD Multipass (HDD) Convert random IOs into sequential scans 2nd scan recipe input Block-level Multipass B C D Buffer 29
30 Evaluation mzip: How much can compression be improved? What is the complexity (runtime overhead)? How does SSD or HDD affect data reorganization performance? For HDD, does multi-pass help? How does MC perform, compared with delta compression? (refer to our paper) What are the configuration options for MC? (refer to our paper) Data migration for archival storage Compression and performance 3
31 Datasets Type Name Size (GB) Dedupe Factor (X) Compression Factor (standalone - baseline) gzip bzip2 7z rzip Workstation backup server backup VM image WS1 EXCHANGE2 Ubuntu- VM * Only one dataset for each type is shown here. Refer to our paper for others. 31
32 Compression Factor vs. Compression Throughput Workstation1 MC improves both CF and compression throughput ü Deduplication ü Re-organization Compression Tput (MB/s) gzip(mc) gzip rzip bzip2 7z Compression Factor (X) gzip gz(mc) bzip2 bz(mc) 7z 7z(mc) rzip rz(mc) 32
33 Compression Factor vs. Compression Throughput Exchange2 MC improves CF but slightly reduces compression throughput Compression Tput (MB/s) gzip bzip2 7z rzip gz(mc) bz(mc) 7z(mc) rz(mc) Compression Factor (X) 33
34 Maximal Compression Compressors can be run at various compression levels and window sizes, making a tradeoff between compression and runtime Workstation1 Compression Tput (MB/s) z-DEF(MC) 7z-MAX(MC) 7z-DEF 7z-MAX Compression Factor (X) MC benefits maximal compression Results for other compressors have same trends 34
35 Compression Factor Breakdown Compresion Factor (X) reorg gzip dedup MC improves compression to varying extents beyond dedup and gzip EX2 WS1 UBUN Dataset 3
36 Data Reorganization SSD: reasonably good HDD: multipass helps Configurations mem: input and output stored in tmpfs SSD/HDD: used to store input; output to HDD; 8 GB mem Compression Tput (MB/s) EX1 WS1 UBUN UBUN: 7 G Multipass window: 4.8 G (2 scans) mem ssd-block hdd-multipass hdd-block 36
37 Data Migration for Archival Storage DataDomain Filesystem (DDFS) Backup tier: LZ (fast) Archive tier: recompresses with GZ (2-44% better CF) With MC, Identify and sort by cluster sizes of similar blocks Migrate in 3 stages: top third largest clusters, middle third, bottom third (including distinct blocks) Datasets WORKSTATIONS: many backups of several workstations Exchange[123]: many backups of 3 exchange servers 37
38 Effect of MC on Archival Migration Compression Factor (X) gzip mc_total top1/3 mid1/3 end1/3 ü MC improves CF [44% (EX3), 17% (WS)] Top 2/3 compresses very well WS EX1 EX2 EX3 38
39 Archival Migration Costs Migration Runtime Exchange1: 3X longer Read Performance As data is scattered in file system, read performance suffers [Lillibridge 213] entire EXCHANGE1: 1.3 X longer final backup: 7X longer (1.24 X longer if just the top third of similar clusters are reorganized) Memory overheads 39
40 Related Work Improving traditional compressors LZMA algorithm with extra-large window (7z) rzip rolling hash to deduplicate with a large window Burrows-Wheeler Transform to reorder data (bzip2) Delta compression (MC is slightly better in CF and throughput) Similar to MC when limited to a single file Bookkeeping complexities in a file system Similarity detection WAN Optimized Replication: delta-compress similar chunks for replication REBL: Redundancy Elimination at Block Level (Never applied practically at scale, or in self-contained file) 4
41 Conclusions Migratory Compression preprocesses data to make it more compressible Identify and cluster similar data mzip Improves existing compressors, in both compressibility and frequently runtime Redraw the performance-compression curve! Archival storage MC reduces $/GB further Throughput CF 41
42 Thank you! Questions? 42
Migratory Compression: Coarse-grained Data Reordering to Improve Compressibility
Migratory Compression: Coarse-grained Data Reordering to Improve Compressibility Xing Lin 1, Guanlin Lu 2, Fred Douglis 2, Philip Shilane 2, Grant Wallace 2 1 University of Utah, 2 EMC Corporation Data
More informationWAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression
WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression Philip Shilane, Mark Huang, Grant Wallace, & Windsor Hsu Backup Recovery Systems Division EMC Corporation Introduction
More informationDEC: An Efficient Deduplication-Enhanced Compression Approach
2016 IEEE 22nd International Conference on Parallel and Distributed Systems DEC: An Efficient Deduplication-Enhanced Compression Approach Zijin Han, Wen Xia, Yuchong Hu *, Dan Feng, Yucheng Zhang, Yukun
More informationDelta Compressed and Deduplicated Storage Using Stream-Informed Locality
Delta Compressed and Deduplicated Storage Using Stream-Informed Locality Philip Shilane, Grant Wallace, Mark Huang, and Windsor Hsu Backup Recovery Systems Division EMC Corporation Abstract For backup
More informationA DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU
A DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU PRESENTED BY ROMAN SHOR Overview Technics of data reduction in storage systems:
More informationThe Logic of Physical Garbage Collection in Deduplicating Storage
The Logic of Physical Garbage Collection in Deduplicating Storage Fred Douglis Abhinav Duggal Philip Shilane Tony Wong Dell EMC Shiqin Yan University of Chicago Fabiano Botelho Rubrik 1 Deduplication in
More informationDesign Tradeoffs for Data Deduplication Performance in Backup Workloads
Design Tradeoffs for Data Deduplication Performance in Backup Workloads Min Fu,DanFeng,YuHua,XubinHe, Zuoning Chen *, Wen Xia,YuchengZhang,YujuanTan Huazhong University of Science and Technology Virginia
More informationChunkStash: Speeding Up Storage Deduplication using Flash Memory
ChunkStash: Speeding Up Storage Deduplication using Flash Memory Biplob Debnath +, Sudipta Sengupta *, Jin Li * * Microsoft Research, Redmond (USA) + Univ. of Minnesota, Twin Cities (USA) Deduplication
More informationDeduplication Storage System
Deduplication Storage System Kai Li Charles Fitzmorris Professor, Princeton University & Chief Scientist and Co-Founder, Data Domain, Inc. 03/11/09 The World Is Becoming Data-Centric CERN Tier 0 Business
More informationSTUDY OF VARIOUS DATA COMPRESSION TOOLS
STUDY OF VARIOUS DATA COMPRESSION TOOLS Divya Singh [1], Vimal Bibhu [2], Abhishek Anand [3], Kamalesh Maity [4],Bhaskar Joshi [5] Senior Lecturer, Department of Computer Science and Engineering, AMITY
More informationThe Effectiveness of Deduplication on Virtual Machine Disk Images
The Effectiveness of Deduplication on Virtual Machine Disk Images Keren Jin & Ethan L. Miller Storage Systems Research Center University of California, Santa Cruz Motivation Virtualization is widely deployed
More informationSmartMD: A High Performance Deduplication Engine with Mixed Pages
SmartMD: A High Performance Deduplication Engine with Mixed Pages Fan Guo 1, Yongkun Li 1, Yinlong Xu 1, Song Jiang 2, John C. S. Lui 3 1 University of Science and Technology of China 2 University of Texas,
More informationIn-line Deduplication for Cloud storage to Reduce Fragmentation by using Historical Knowledge
In-line Deduplication for Cloud storage to Reduce Fragmentation by using Historical Knowledge Smitha.M. S, Prof. Janardhan Singh Mtech Computer Networking, Associate Professor Department of CSE, Cambridge
More informationROOT I/O compression algorithms. Oksana Shadura, Brian Bockelman University of Nebraska-Lincoln
ROOT I/O compression algorithms Oksana Shadura, Brian Bockelman University of Nebraska-Lincoln Introduction Compression Algorithms 2 Compression algorithms Los Reduces size by permanently eliminating certain
More informationData Reduction Meets Reality What to Expect From Data Reduction
Data Reduction Meets Reality What to Expect From Data Reduction Doug Barbian and Martin Murrey Oracle Corporation Thursday August 11, 2011 9961: Data Reduction Meets Reality Introduction Data deduplication
More informationFrequency Based Chunking for Data De-Duplication
2010 18th Annual IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems Frequency Based Chunking for Data De-Duplication Guanlin Lu, Yu Jin, and
More informationReducing Replication Bandwidth for Distributed Document Databases
Reducing Replication Bandwidth for Distributed Document Databases Lianghong Xu 1, Andy Pavlo 1, Sudipta Sengupta 2 Jin Li 2, Greg Ganger 1 Carnegie Mellon University 1, Microsoft Research 2 Document-oriented
More informationCharacteristics of Backup Workloads in Production Systems
Characteristics of Backup Workloads in Production Systems Grant Wallace Fred Douglis Hangwei Qian Philip Shilane Stephen Smaldone Mark Chamness Windsor Hsu Backup Recovery Systems Division EMC Corporation
More informationSpeeding Up Cloud/Server Applications Using Flash Memory
Speeding Up Cloud/Server Applications Using Flash Memory Sudipta Sengupta and Jin Li Microsoft Research, Redmond, WA, USA Contains work that is joint with Biplob Debnath (Univ. of Minnesota) Flash Memory
More informationGFS: The Google File System
GFS: The Google File System Brad Karp UCL Computer Science CS GZ03 / M030 24 th October 2014 Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one
More informationFGDEFRAG: A Fine-Grained Defragmentation Approach to Improve Restore Performance
FGDEFRAG: A Fine-Grained Defragmentation Approach to Improve Restore Performance Yujuan Tan, Jian Wen, Zhichao Yan, Hong Jiang, Witawas Srisa-an, Baiping Wang, Hao Luo Outline Background and Motivation
More informationAlgorithms and Data Structures for Efficient Free Space Reclamation in WAFL
Algorithms and Data Structures for Efficient Free Space Reclamation in WAFL Ram Kesavan Technical Director, WAFL NetApp, Inc. SDC 2017 1 Outline Garbage collection in WAFL Usenix FAST 2017 ACM Transactions
More informationThe Fusion Distributed File System
Slide 1 / 44 The Fusion Distributed File System Dongfang Zhao February 2015 Slide 2 / 44 Outline Introduction FusionFS System Architecture Metadata Management Data Movement Implementation Details Unique
More informationFunctional Partitioning to Optimize End-to-End Performance on Many-core Architectures
Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures Min Li, Sudharshan S. Vazhkudai, Ali R. Butt, Fei Meng, Xiaosong Ma, Youngjae Kim,Christian Engelmann, and Galen Shipman
More informationScale-out Data Deduplication Architecture
Scale-out Data Deduplication Architecture Gideon Senderov Product Management & Technical Marketing NEC Corporation of America Outline Data Growth and Retention Deduplication Methods Legacy Architecture
More informationThe World s Fastest Backup Systems
3 The World s Fastest Backup Systems Erwin Freisleben BRS Presales Austria 4 EMC Data Domain: Leadership and Innovation A history of industry firsts 2003 2004 2005 2006 2007 2008 2009 2010 2011 First deduplication
More informationFile System Internals. Jo, Heeseung
File System Internals Jo, Heeseung Today's Topics File system implementation File descriptor table, File table Virtual file system File system design issues Directory implementation: filename -> metadata
More informationAccelerating Restore and Garbage Collection in Deduplication-based Backup Systems via Exploiting Historical Information
Accelerating Restore and Garbage Collection in Deduplication-based Backup Systems via Exploiting Historical Information Min Fu, Dan Feng, Yu Hua, Xubin He, Zuoning Chen *, Wen Xia, Fangting Huang, Qing
More informationEfficient Document Analytics on Compressed Data: Method, Challenges, Algorithms, Insights
Efficient Document Analytics on Compressed Data: Method, Challenges, Algorithms, Insights Feng Zhang, Jidong Zhai, Xipeng Shen #, Onur Mutlu, Wenguang Chen Renmin University of China Tsinghua University
More informationConfiguring Short RPO with Actifio StreamSnap and Dedup-Async Replication
CDS and Sky Tech Brief Configuring Short RPO with Actifio StreamSnap and Dedup-Async Replication Actifio recommends using Dedup-Async Replication (DAR) for RPO of 4 hours or more and using StreamSnap for
More informationPracticeDump. Free Practice Dumps - Unlimited Free Access of practice exam
PracticeDump http://www.practicedump.com Free Practice Dumps - Unlimited Free Access of practice exam Exam : E22-280 Title : Avamar Backup and Data Deduplication Exam Vendors : EMC Version : DEMO Get Latest
More informationSystems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15
Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2014/15 Lecture X: Parallel Databases Topics Motivation and Goals Architectures Data placement Query processing Load balancing
More informationCumulus: Filesystem Backup to the Cloud
Cumulus: Filesystem Backup to the Cloud 7th USENIX Conference on File and Storage Technologies (FAST 09) Michael Vrable Stefan Savage Geoffrey M. Voelker University of California, San Diego February 26,
More informationOpportunistic Use of Content Addressable Storage for Distributed File Systems
Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia *, Michael Kozuch, M. Satyanarayanan *, Brad Karp, Thomas Bressoud, and Adrian Perrig * * Carnegie Mellon University,
More informationLTO and Magnetic Tapes Lively Roadmap With a roadmap like this, how can tape be dead?
LTO and Magnetic Tapes Lively Roadmap With a roadmap like this, how can tape be dead? By Greg Schulz Founder and Senior Analyst, the StorageIO Group Author The Green and Virtual Data Center (CRC) April
More informationIBM Tivoli Storage Manager for Windows Version Installation Guide IBM
IBM Tivoli Storage Manager for Windows Version 7.1.8 Installation Guide IBM IBM Tivoli Storage Manager for Windows Version 7.1.8 Installation Guide IBM Note: Before you use this information and the product
More informationUnderstanding Primary Storage Optimization Options Jered Floyd Permabit Technology Corp.
Understanding Primary Storage Optimization Options Jered Floyd Permabit Technology Corp. Primary Storage Optimization Technologies that let you store more data on the same storage Thin provisioning Copy-on-write
More informationData backup and Disaster Recovery Expert
Data backup and Disaster Recovery Expert Acronis X QNAP New-generation backup solution for enterprises to minimize downtime Insight into Acronis X QNAP NAS Acronis with QNAP NAS supports your backup plan
More informationHeap Compression for Memory-Constrained Java
Heap Compression for Memory-Constrained Java CSE Department, PSU G. Chen M. Kandemir N. Vijaykrishnan M. J. Irwin Sun Microsystems B. Mathiske M. Wolczko OOPSLA 03 October 26-30 2003 Overview PROBLEM:
More informationHP Dynamic Deduplication achieving a 50:1 ratio
HP Dynamic Deduplication achieving a 50:1 ratio Table of contents Introduction... 2 Data deduplication the hottest topic in data protection... 2 The benefits of data deduplication... 2 How does data deduplication
More informationvsan All Flash Features First Published On: Last Updated On:
First Published On: 11-07-2016 Last Updated On: 11-07-2016 1 1. vsan All Flash Features 1.1.Deduplication and Compression 1.2.RAID-5/RAID-6 Erasure Coding Table of Contents 2 1. vsan All Flash Features
More informationParallelizing Inline Data Reduction Operations for Primary Storage Systems
Parallelizing Inline Data Reduction Operations for Primary Storage Systems Jeonghyeon Ma ( ) and Chanik Park Department of Computer Science and Engineering, POSTECH, Pohang, South Korea {doitnow0415,cipark}@postech.ac.kr
More informationDeduplication File System & Course Review
Deduplication File System & Course Review Kai Li 12/13/13 Topics u Deduplication File System u Review 12/13/13 2 Storage Tiers of A Tradi/onal Data Center $$$$ Mirrored storage $$$ Dedicated Fibre Clients
More informationI-CASH: Intelligently Coupled Array of SSD and HDD
: Intelligently Coupled Array of SSD and HDD Jin Ren and Qing Yang Dept. of Electrical, Computer, and Biomedical Engineering University of Rhode Island, Kingston, RI 02881 (rjin,qyang)@ele.uri.edu Abstract
More informationOpendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES
Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES May, 2017 Contents Introduction... 2 Overview... 2 Architecture... 2 SDFS File System Service... 3 Data Writes... 3 Data Reads... 3 De-duplication
More informationAchieving Storage Efficiency through EMC Celerra Data Deduplication
Achieving Storage Efficiency through EMC Celerra Data Deduplication Applied Technology Abstract This white paper describes Celerra Data Deduplication, a feature of the EMC Celerra network file server that
More informationCompression in Open Source Databases. Peter Zaitsev April 20, 2016
Compression in Open Source Databases Peter Zaitsev April 20, 2016 About the Talk 2 A bit of the History Approaches to Data Compression What some of the popular systems implement 2 Lets Define The Term
More informationAsynchronous Logging and Fast Recovery for a Large-Scale Distributed In-Memory Storage
Asynchronous Logging and Fast Recovery for a Large-Scale Distributed In-Memory Storage Kevin Beineke, Florian Klein, Michael Schöttner Institut für Informatik, Heinrich-Heine-Universität Düsseldorf Outline
More informationCSE 4/521 Introduction to Operating Systems. Lecture 23 File System Implementation II (Allocation Methods, Free-Space Management) Summer 2018
CSE 4/521 Introduction to Operating Systems Lecture 23 File System Implementation II (Allocation Methods, Free-Space Management) Summer 2018 Overview Objective: To discuss how the disk is managed for a
More informationHEAD HardwarE Accelerated Deduplication
HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee Executive Summary A-Z development of deduplication SW version
More informationEfficient Hybrid Inline and Out-of-Line Deduplication for Backup Storage
Efficient Hybrid Inline and Out-of-Line Deduplication for Backup Storage YAN-KIT LI, MIN XU, CHUN-HO NG, and PATRICK P. C. LEE, The Chinese University of Hong Kong 2 Backup storage systems often remove
More informationdedupv1: Improving Deduplication Throughput using Solid State Drives (SSD)
University Paderborn Paderborn Center for Parallel Computing Technical Report dedupv1: Improving Deduplication Throughput using Solid State Drives (SSD) Dirk Meister Paderborn Center for Parallel Computing
More informationAn Application Awareness Local Source and Global Source De-Duplication with Security in resource constraint based Cloud backup services
An Application Awareness Local Source and Global Source De-Duplication with Security in resource constraint based Cloud backup services S.Meghana Assistant Professor, Dept. of IT, Vignana Bharathi Institute
More informationReducing The De-linearization of Data Placement to Improve Deduplication Performance
Reducing The De-linearization of Data Placement to Improve Deduplication Performance Yujuan Tan 1, Zhichao Yan 2, Dan Feng 2, E. H.-M. Sha 1,3 1 School of Computer Science & Technology, Chongqing University
More informationCan t We All Get Along? Redesigning Protection Storage for Modern Workloads
Can t We All Get Along? Redesigning Protection Storage for Modern Workloads Yamini Allu, Fred Douglis, Mahesh Kamat, Ramya Prabhakar, Philip Shilane, and Rahul Ugale, Dell EMC https://www.usenix.org/conference/atc18/presentation/allu
More informationThe storage challenges of virtualized environments
The storage challenges of virtualized environments The virtualization challenge: Ageing and Inflexible storage architectures Mixing of platforms causes management complexity Unable to meet the requirements
More informationVirtualization Technique For Replica Synchronization
Virtualization Technique For Replica Synchronization By : Ashwin G.Sancheti Email:ashwin@cs.jhu.edu Instructor : Prof.Randal Burns Date : 19 th Feb 2008 Roadmap Motivation/Goals What is Virtualization?
More informationVideo Compression An Introduction
Video Compression An Introduction The increasing demand to incorporate video data into telecommunications services, the corporate environment, the entertainment industry, and even at home has made digital
More informationBigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13
Bigtable A Distributed Storage System for Structured Data Presenter: Yunming Zhang Conglong Li References SOCC 2010 Key Note Slides Jeff Dean Google Introduction to Distributed Computing, Winter 2008 University
More informationComputer Systems Laboratory Sungkyunkwan University
File System Internals Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Today s Topics File system implementation File descriptor table, File table
More informationVerron Martina vspecialist. Copyright 2012 EMC Corporation. All rights reserved.
Verron Martina vspecialist 1 TRANSFORMING MISSION CRITICAL APPLICATIONS 2 Application Environments Historically Physical Infrastructure Limits Application Value Challenges Different Environments Limits
More informationFILE SYSTEMS. CS124 Operating Systems Winter , Lecture 23
FILE SYSTEMS CS124 Operating Systems Winter 2015-2016, Lecture 23 2 Persistent Storage All programs require some form of persistent storage that lasts beyond the lifetime of an individual process Most
More informationExperimenting with Burrows-Wheeler Compression
Experimenting with Burrows-Wheeler Compression Juha Kärkkäinen University of Helsinki (Work done mostly as Visiting Scientist at Google Zürich) 3rd Workshop on Compression, Text and Algorithms Melbourne,
More informationOpera Web Browser Archive - FTP Site Statistics. Top 20 Directories Sorted by Disk Space
Property Value FTP Server ftp.opera.com Description Opera Web Browser Archive Country United States Scan Date 04/Nov/2015 Total Dirs 1,557 Total Files 2,211 Total Data 43.83 GB Top 20 Directories Sorted
More informationMicrosoft DPM Meets BridgeSTOR Advanced Data Reduction and Security
2011 Microsoft DPM Meets BridgeSTOR Advanced Data Reduction and Security BridgeSTOR Deduplication, Compression, Thin Provisioning and Encryption Transform DPM from Good to Great BridgeSTOR, LLC 4/4/2011
More informationDELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE
WHITEPAPER DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE A Detailed Review ABSTRACT While tape has been the dominant storage medium for data protection for decades because of its low cost, it is steadily
More information10 Million Smart Meter Data with Apache HBase
10 Million Smart Meter Data with Apache HBase 5/31/2017 OSS Solution Center Hitachi, Ltd. Masahiro Ito OSS Summit Japan 2017 Who am I? Masahiro Ito ( 伊藤雅博 ) Software Engineer at Hitachi, Ltd. Focus on
More informationDASH COPY GUIDE. Published On: 11/19/2013 V10 Service Pack 4A Page 1 of 31
DASH COPY GUIDE Published On: 11/19/2013 V10 Service Pack 4A Page 1 of 31 DASH Copy Guide TABLE OF CONTENTS OVERVIEW GETTING STARTED ADVANCED BEST PRACTICES FAQ TROUBLESHOOTING DASH COPY PERFORMANCE TUNING
More informationDeduplication and Incremental Accelleration in Bacula with NetApp Technologies. Peter Buschman EMEA PS Consultant September 25th, 2012
Deduplication and Incremental Accelleration in Bacula with NetApp Technologies Peter Buschman EMEA PS Consultant September 25th, 2012 1 NetApp and Bacula Systems Bacula Systems became a NetApp Developer
More informationCOS 318: Operating Systems. NSF, Snapshot, Dedup and Review
COS 318: Operating Systems NSF, Snapshot, Dedup and Review Topics! NFS! Case Study: NetApp File System! Deduplication storage system! Course review 2 Network File System! Sun introduced NFS v2 in early
More informationarxiv: v3 [cs.dc] 27 Jun 2013
RevDedup: A Reverse Deduplication Storage System Optimized for Reads to Latest Backups arxiv:1302.0621v3 [cs.dc] 27 Jun 2013 Chun-Ho Ng and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong
More informationCompression Update: ZSTD & ZLIB. Oksana Shadura, Brian Bockelman University of Lincoln Nebraska
Compression Update: ZSTD & ZLIB Oksana Shadura, Brian Bockelman University of Lincoln Nebraska 1 Background: Compression algorithms comparisons As part of the DIANA/HEP to improve ROOT-based analysis,
More informationCloud Meets Big Data For VMware Environments
Cloud Meets Big Data For VMware Environments
More informationFlash Memory Based Storage System
Flash Memory Based Storage System References SmartSaver: Turning Flash Drive into a Disk Energy Saver for Mobile Computers, ISLPED 06 Energy-Aware Flash Memory Management in Virtual Memory System, islped
More informationA Scalable Inline Cluster Deduplication Framework for Big Data Protection
University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln CSE Technical reports Computer Science and Engineering, Department of Summer 5-30-2012 A Scalable Inline Cluster Deduplication
More informationHow to Reduce Data Capacity in Objectbased Storage: Dedup and More
How to Reduce Data Capacity in Objectbased Storage: Dedup and More Dong In Shin G-Cube, Inc. http://g-cube.kr Unstructured Data Explosion A big paradigm shift how to generate and consume data Transactional
More informationCS 493: Algorithms for Massive Data Sets Dictionary-based compression February 14, 2002 Scribe: Tony Wirth LZ77
CS 493: Algorithms for Massive Data Sets February 14, 2002 Dictionary-based compression Scribe: Tony Wirth This lecture will explore two adaptive dictionary compression schemes: LZ77 and LZ78. We use the
More informationChapter 4 Data Movement Process
Chapter 4 Data Movement Process 46 - Data Movement Process Understanding how CommVault software moves data within the production and protected environment is essential to understanding how to configure
More informationSetting Up the DR Series System on Veeam
Setting Up the DR Series System on Veeam Quest Engineering June 2017 A Quest Technical White Paper Revisions Date January 2014 May 2014 July 2014 April 2015 June 2015 November 2015 April 2016 Description
More informationWhite Paper Features and Benefits of Fujitsu All-Flash Arrays for Virtualization and Consolidation ETERNUS AF S2 series
White Paper Features and Benefits of Fujitsu All-Flash Arrays for Virtualization and Consolidation Fujitsu All-Flash Arrays are extremely effective tools when virtualization is used for server consolidation.
More informationBusiness Benefits of Policy Based Data De-Duplication Data Footprint Reduction with Quality of Service (QoS) for Data Protection
Data Footprint Reduction with Quality of Service (QoS) for Data Protection By Greg Schulz Founder and Senior Analyst, the StorageIO Group Author The Green and Virtual Data Center (Auerbach) October 28th,
More informationChapter 14: File-System Implementation
Chapter 14: File-System Implementation Directory Implementation Allocation Methods Free-Space Management Efficiency and Performance Recovery 14.1 Silberschatz, Galvin and Gagne 2013 Objectives To describe
More informationSo, what is data compression, and why do we need it?
In the last decade we have been witnessing a revolution in the way we communicate 2 The major contributors in this revolution are: Internet; The explosive development of mobile communications; and The
More informationCS3600 SYSTEMS AND NETWORKS
CS3600 SYSTEMS AND NETWORKS NORTHEASTERN UNIVERSITY Lecture 11: File System Implementation Prof. Alan Mislove (amislove@ccs.neu.edu) File-System Structure File structure Logical storage unit Collection
More informationPurity: building fast, highly-available enterprise flash storage from commodity components
Purity: building fast, highly-available enterprise flash storage from commodity components J. Colgrove, J. Davis, J. Hayes, E. Miller, C. Sandvig, R. Sears, A. Tamches, N. Vachharajani, and F. Wang 0 Gala
More informationA study of practical deduplication
A study of practical deduplication Dutch T. Meyer University of British Columbia Microsoft Research Intern William Bolosky Microsoft Research Why Dutch is Not Here A study of practical deduplication Dutch
More informationTechnology Insight Series
IBM ProtecTIER Deduplication for z/os John Webster March 04, 2010 Technology Insight Series Evaluator Group Copyright 2010 Evaluator Group, Inc. All rights reserved. Announcement Summary The many data
More information7. Archiving and compressing 7.1 Introduction
7. Archiving and compressing 7.1 Introduction In this chapter, we discuss how to manage archive files at the command line. File archiving is used when one or more files need to be transmitted or stored
More informationHYDRAstor: a Scalable Secondary Storage
HYDRAstor: a Scalable Secondary Storage 7th TF-Storage Meeting September 9 th 00 Łukasz Heldt Largest Japanese IT company $4 Billion in annual revenue 4,000 staff www.nec.com Polish R&D company 50 engineers
More informationDistributed File Systems II
Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation
More informationLossless Compression of Breast Tomosynthesis Objects Maximize DICOM Transmission Speed and Review Performance and Minimize Storage Space
Lossless Compression of Breast Tomosynthesis Objects Maximize DICOM Transmission Speed and Review Performance and Minimize Storage Space David A. Clunie PixelMed Publishing What is Breast Tomosynthesis?
More informationIEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 4, NO. X, XXXXX Boafft: Distributed Deduplication for Big Data Storage in the Cloud
TRANSACTIONS ON CLOUD COMPUTING, VOL. 4, NO. X, XXXXX 2016 1 Boafft: Distributed Deduplication for Big Data Storage in the Cloud Shengmei Luo, Guangyan Zhang, Chengwen Wu, Samee U. Khan, Senior Member,,
More informationMainframe Backup Modernization Disk Library for mainframe
Mainframe Backup Modernization Disk Library for mainframe Mainframe is more important than ever itunes Downloads Instagram Photos Twitter Tweets Facebook Likes YouTube Views Google Searches CICS Transactions
More informationBackup and Recovery Best Practices
Backup and Recovery Best Practices Session: 3 Track: ELA Services Skip Farmer Symantec 1 Backup System Infrastructure 2 Isolating Performance Issues 3 Virtual Machine Backups 4 Reporting - Opscenter Analytics
More informationHedvig as backup target for Veeam
Hedvig as backup target for Veeam Solution Whitepaper Version 1.0 April 2018 Table of contents Executive overview... 3 Introduction... 3 Solution components... 4 Hedvig... 4 Hedvig Virtual Disk (vdisk)...
More informationSystem recommendations for version 17.1
System recommendations for version 17.1 This article contains information about recommended hardware resources and network environments for version 17.1 of Sage 300 Construction and Real Estate. NOTE:
More informationEvaluating Cloud Storage Strategies. James Bottomley; CTO, Server Virtualization
Evaluating Cloud Storage Strategies James Bottomley; CTO, Server Virtualization Introduction to Storage Attachments: - Local (Direct cheap) SAS, SATA - Remote (SAN, NAS expensive) FC net Types - Block
More informationEI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)
EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) Dept. of Computer Science & Engineering Chentao Wu wuct@cs.sjtu.edu.cn Download lectures ftp://public.sjtu.edu.cn User:
More informationChapter 11: Implementing File
Chapter 11: Implementing File Systems Chapter 11: Implementing File Systems File-System Structure File-System Implementation Directory Implementation Allocation Methods Free-Space Management Efficiency
More informationWhite paper ETERNUS CS800 Data Deduplication Background
White paper ETERNUS CS800 - Data Deduplication Background This paper describes the process of Data Deduplication inside of ETERNUS CS800 in detail. The target group consists of presales, administrators,
More information