Compression and Decompression of Virtual Disk Using Deduplication

Size: px
Start display at page:

Download "Compression and Decompression of Virtual Disk Using Deduplication"

Transcription

1 Compression and Decompression of Virtual Disk Using Deduplication Bharati Ainapure 1, Siddhant Agarwal 2, Rukmi Patel 3, Ankita Shingvi 4, Abhishek Somani 5 1 Professor, Department of Computer Engineering, MITCOE, Pune University, India 2,3,4,5 Students, Department of Computer Engineering, MITCOE, Pune University, India Abstract The basic goal of virtualization is to centralize administrative tasks while improving scalability and workload. One of the biggest challenges to the data storage community is how to effectively store data without taking the exact same data and storing again and again in different locations on the back end servers. The latest answer offered by the data storage field is the technology known as data deduplication. Data de-duplication is a method of reducing storage needs by eliminating redundant data. Only one unique instance of the data is actually retained on back end server. Redundant data is replaced with a pointer to the unique data copy. File de-duplication eliminates duplicate files. In this paper we are just focusing on how this deduplication method can be used for taking VM (virtual Machine) backup. This paper also shows the use of compression and decompression method in VM backup. Compression is another way to reduce the space requirements of backup file. Decompression comes into picture when we need to restore the compressed backup data again. The compressed data needs to be decompressed so that we can get the data in its original form, as it was before compressing. So our paper models an efficient approach for VM backup using compression and decompression by deduplication. Keyword Deduplication, VM (Virtual machine), Virtualization, Compression and Decompression, VD (Virtual Disk). I. INTRODUCTION Most of the enterprises are moving towards virtualization, so that they can use several virtual machines and store the data on a server. If any time, a server loses data of or crashes any virtual machine, then that data can be recovered and used on another virtual machine, if it is backed up. So, we have discussed the back up idea for virtual machines in this paper. Each file in the system is associated with an inode, which is identified by an integer number. Inodes store information about files and folders, such as file ownership, access mode (read, write, execute permissions), and file type. from the inode number, the file system driver portion of the kernel can access the contents of the inode, including the location of the file allowing access to the file. Inodes lists the files data blocks. In computing, data deduplication is a specialized data compression technique for eliminating coarse-grained redundant data. In the deduplication process, unique chunks of data, or files, are identified and stored during a process of analysis. As the analysis continues, other chunks are compared to the stored copy and whenever a match occurs, the redundant chunk is replaced with a small reference that points to the stored chunk. Given that the same block or file pattern may occur dozens, hundreds, or even thousands of times, the amount of data that must be stored or transferred can be greatly reduced. Restoring data includes the decompression of the compressed data and save it back. II. VIRTUAL DISK A Virtual Disk is a file that represents as a physical disk drive to a guest operating system. The file may be configured on host and also on a remote file system. The user can install a new operating system onto a virtual disk without repartitioning the physical disk or rebooting the host machine. III. COMPRESSION Compression is useful because it helps to reduce the consumption of expensive resources, such as hard disk space. Compression was one of the main drivers for the growth of information during the past two decades. There are two types of Compression: 1. Lossless- Lossless compression algorithms usually exploit statistical redundancy in such a way as to represent the sender's data more concisely without error [2]. It allows the exact original data to be reconstructed form the compressed data. It is used in cases where where it is important that the original and the decompressed data be identical. Typical examples of it are executable programs, text documents and source code. 2. Lossy- It is possible, if some loss of fidelity is acceptable. It provides a way to obtain the best fidelity for a given amount ISSN: Page 49

2 of compression. Lossy compression is a data encoding method that compresses data by discarding some of it. It is most commonly used to compress the multimedia data. IV. DATA DEDUPLICATION A de-duplication system identifies and eliminates redundant blocks of data, significantly reducing the amount of disk needed to store said data [4]. It looks at the data on a sub-file (i.e. block) level, and attempts to determine if it s seen the data before. If it hasn t, it stores it. If it has seen it before, it ensures that it is stored only once, and all other references to that data are merely pointers. The most popular approach to determine duplicates is to assign an identifier to a file or a chunk of data using a Hash Table, that generates a unique ID for that block or file. This unique ID is then compared with the central index. If the ID exists, then the data is already stored before. Therefore, only a pointer to the previously saved data needs to be saved. If the ID is new, then the block is unique. The unique ID is added to the index and the unique chunk is stored. Deduplication can be done at: 1. File Level- Here, checksum for each file is computed. If the calculated checksum is new, then it is stored in a Hash Table and the inode entry is made in the Hash Table. If the checksum is already stored in the Hash Table, then the data block of this file is just pointed to the data block of the previously saved inode for the same checksum in the Hash Table [3]. 2. Block Level- Here, the entire device, whose back up is to be taken, is divided into blocks of same size, and then checksum is calculated on these blocks [3]. 3. Byte Level- Here, data chunks are compared byte by byte. It checks for redundant data even more accurately. Examples of duplicate data that a deduplication system would store only once are:- 1. The same file backed up from different servers. 2. A weekly full backup when only 5% has changed. 3. A daily full backup of a database that doesn t support incremental backups. Deduplication can be Post-process or Inline. 1. Post- process Deduplication- In this, new data is first stored on the storage device and then a process at a later time will analyse the data looking for deduplication. The benefit is that there is no need to wait for the Hash calculations and lookup to be completed before storing the data thereby ensuring that performance is not degraded. One potential drawback is that you may unnecessarily store the duplicate data for a short time which may be an issue if the stora 2. ge system is near full capacity [2]. 2. Inline Deduplication- In this, the deduplication hash calculations are created on the target device as the data enters the device in real time. If the device spots a block that is already stored on the system, it does not store the new block, just references to the existing block. The benefit of this over Post-process deduplication is that it requires less storage as data is not duplicated. On the negative side, hash calculations and the lookups takes so long that data ingestion can be slower thereby reducing the backup throughout the device [2]. Fig. 1 File Level, Block Level, Byte Level Deduplication Generally, Block level deduplication is preferred over File level deduplication, because in file level, even if a small portion of the file is changed(say title of the file), then the entire file needs to be stored again, since it s checksum value changes. So, we may need to store large part of same data again. But this is not the case in block level deduplication, because it is evaluated on small individual blocks. But, File level deduplication can be performed most easily as compared to block level deduplication. It requires less processing power, since files hash numbers are relatively easy to generate. V. IMPLEMENTATION OF COMPRESSION OF VIRTUAL DISKS ISSN: Page 50

3 This is of two types: 1. By mounting the File system on the Host machine Here, we mount the File system of the Guest machine, whose backup is to be taken, separately on the Host machine. So, we can get the file structure with inodes of the File system. So, we can perform File level as well as Block level deduplication. 2. Without mounting the File system on the Host machine Here, we don t mount the File system of the Guest machine, whose backup is to be taken, on the Host machine. Due to this, we don t get the file structure and the inodes of the file system. Hence, we cannot use the File level deduplication in this type. Only Block level deduplication can be used in this method. A. By mounting the File System of the Guest Machine on the Host Machine Implementation using File Level Deduplication 1. Here, consider we have a Host machine and a Guest machine. We need to take a backup of the Guest machine on the Host machine. We will use snapshots of the host machine here. 2. Snapshots are the state of the system at a particular point of time. To avoid the down-time, highavailability systems may perform the backup on the snapshots- a read-only copy of the data set frozen at a point in time. 3. So, we mount the snapshots of the Guest machine on the Host machine. 4. We then perform the File level deduplication on these mounted snapshots and store these compressed data on the back end server. 5. While restoring, we again mount these compressed data on the Host machine and perform decompression and restore on the Guest machine. 6. Here, we have to write the deduplication algorithm for each of the file systems separately [5]. a. In File level deduplication, a change within the file causes the whole file to be saved again. b. But, indexes for File level deduplication are significantly smaller, which takes less computational time when duplicates are being determined. a. The reduction ratio in File level Deduplication may be only in the ratio 5:1 or less. b. Backup process is less affected by deduplication process. Reassembling the files is easier as they are less in number to reassemble. Implementation using Block Level Deduplication 1. In this, we copy the snapshots of the Guest machine on the Host machine. 2. Then, we divide the snapshots as a whole into a number of fixed-sized or variable-sized blocks. 3. Then, checksum is calculated for each of these blocks individually. If new checksum value is found, then it is stored in the Hash table, else pointer to the already stored Hash table entry is saved. 4. Then the unique blocks are compressed. 5. These compressed snapshots are the saved on the back end. 6. While restoring, we again mount these compressed data on the Host machine and perform the decompression and restore on the Guest machine [5]. Fig. 3 Implementing Block Level Deduplication Significance and Fig. 2 Implementing File Level Deduplication Significance and disadvantages: a. If we use the Block level deduplication, then there is no need to write the algorithm for each different file system. Instead, the same algorithm can be used for any file system type. b. Block based deduplication would only save the changed blocks between one version of the file and ISSN: Page 51

4 the next. The reduction ratio is found out to be in the 20:1 to 50:1 range for stored data. a. Block based deduplication will require reassembly of the chunks based on the master index that maps the unique segments and pointers to unique segments. B. Without mounting the File system of the Guest Machine on the Host Machine Implementation using Block Level Deduplication without mounting the File System 1. Here, the snapshots, whose backup is to be taken, is directly saved without mounting them on the Host machine. 2. So, we don t get the file structure and inodes of the File system. 3. We then divide these snapshots into a number of fixed-sized or variable-sized blocks. 4. Then we compute the checksum of each block. 5. If the checksum value is already present, then we only store the pointer, else we save the checksum value in the Hash Table. 6. After that, we compress the individual files and save them on the back end server. 7. While restoring, we again use the Hash table and the pointers to restore the blocks saved on the back end server on the Guest Machine. Significance and a. In Block level deduplication, then there is no need to write the algorithm for each different file system. Instead, the same algorithm can be used for any file system type. b. Block based deduplication would only save the changed blocks between one version of the file and the next. The reduction ratio is found out to be in the 20:1 to 50:1 range for stored data. a. Block based deduplication will require reassembly of the chunks based on the master index that maps the unique segments and pointers to unique segments. b. File structure is not known to us. c. Inode structure in not known to us. VI. ADVANTAGES 1. Data deduplication is an effective way to reduce the storage space in a backup environment and can achieve compression ratios ranging from 10:1 to 50:1. 2. Deduplication eliminates the redundant data and ensures that only one instance of the data is actually retained on the storage media. 3. Compression is a good choice for data that is uncompressed and unencrypted. 4. Compression is also useful for extending the life of older storage systems. 5. Snapshots are point-in-time copies of files, directories or volumes that are especially helpful in the context of backup. 6. Some systems save space by copying only the changes and using pointers to the original snapshots. VII. LIMITATIONS 1. If the data is compressed or deduplicated, the process of data analysis will be slower and in the case of a partially corrupted file, it will not be recoverable at all. 2. Deduplication that works at the File level compares file for duplicity. Since the files can be large, it can adversely affect both the dedup ratio and the throughput. 3. Data that is already compressed does not compress well, in fact the resulting data can actually be larger than the original data. 4. In lossless and lossy compression, compression behaves best when the data type is comprehended. Results will be far from ideal if an appropriate compression algorithm is used on the wrong type of data. VIII. CONCLUSION Thus, compression and decompression using Deduplication is an efficient way of backup. De-duplication is an efficient approach to reduce storage demands in environments with large numbers of VM disk images. Deduplication of VM disk images can save 80% or more of the space required to store the operating system and application environment. It is particularly effective when disk images correspond to different versions of a single operating system. Snapshots offer Recovery Time Objectives and Recovery Point Objectives. IX. FUTURE SCOPE 1. People are looking to deduplication for the future of data storage in order to reduce the amount of drives spinning. 2. This in turn reduces the data center footprint for storage and reduces power needs. 3. Propose effective methods to estimate the opportunities of data reduction for large-scale storage systems. 4. The challenge is to achieve maximum dedup ratio with as little effect on throughput as possible. ISSN: Page 52

5 REFERENCES [1] Santos, W.; Teixeira, T.; Machado, C.; Meira, W.; Da Silva, A.S.; Ferreira, D.R.; Guedes, D.; Univ. Fed. de Minas Gerais, Belo Horizonte, A Scalable Parallel Deduplication Algorithm. [2] Ming-Bo Lin. Member, IEEE and Yung-Ti Chang, A New Architecture of a Two-Stage Lossless Data Compression and Decompression Algorithm, VLSI Systems, Vol 17, No 9, Sep [3] Jaehong Min, Daeyoung Yoon and Youjip Won, Efficient Deduplication Technique for Modern Backup Operation, IEEE Transactions on Computers, Vol 60, No 6, June [4] Srivatsa Maddodi, Girja V. Attigeri and Dr. Karunakar A. K, Data Deduplication Techniques and Analysis. [5] Cornel Constantinescu, Joseph Glider and David Chambliss, Mixing Deduplication and Compression on Active Data Sets, 2011 Data Compression Conference. ISSN: Page 53

Compression and Replication of Device Files using Deduplication Technique

Compression and Replication of Device Files using Deduplication Technique Compression and Replication of Device Files using Deduplication Technique Bharati Ainapure Assistant Professor Department of Computer Engineering. MITCOE, Pune University, India. Siddhant Agarwal Abhishek

More information

Chapter 10 Protecting Virtual Environments

Chapter 10 Protecting Virtual Environments Chapter 10 Protecting Virtual Environments 164 - Protecting Virtual Environments As more datacenters move to virtualize their environments and the number of virtual machines and the physical hosts they

More information

Deploying De-Duplication on Ext4 File System

Deploying De-Duplication on Ext4 File System Deploying De-Duplication on Ext4 File System Usha A. Joglekar 1, Bhushan M. Jagtap 2, Koninika B. Patil 3, 1. Asst. Prof., 2, 3 Students Department of Computer Engineering Smt. Kashibai Navale College

More information

Oracle Advanced Compression. An Oracle White Paper June 2007

Oracle Advanced Compression. An Oracle White Paper June 2007 Oracle Advanced Compression An Oracle White Paper June 2007 Note: The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated

More information

Understanding Virtual System Data Protection

Understanding Virtual System Data Protection Understanding Virtual System Data Protection Server virtualization is the most important new technology introduced in the data center in the past decade. It has changed the way we think about computing

More information

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review COS 318: Operating Systems NSF, Snapshot, Dedup and Review Topics! NFS! Case Study: NetApp File System! Deduplication storage system! Course review 2 Network File System! Sun introduced NFS v2 in early

More information

A New Compression Method Strictly for English Textual Data

A New Compression Method Strictly for English Textual Data A New Compression Method Strictly for English Textual Data Sabina Priyadarshini Department of Computer Science and Engineering Birla Institute of Technology Abstract - Data compression is a requirement

More information

Design Tradeoffs for Data Deduplication Performance in Backup Workloads

Design Tradeoffs for Data Deduplication Performance in Backup Workloads Design Tradeoffs for Data Deduplication Performance in Backup Workloads Min Fu,DanFeng,YuHua,XubinHe, Zuoning Chen *, Wen Xia,YuchengZhang,YujuanTan Huazhong University of Science and Technology Virginia

More information

The What, Why and How of the Pure Storage Enterprise Flash Array. Ethan L. Miller (and a cast of dozens at Pure Storage)

The What, Why and How of the Pure Storage Enterprise Flash Array. Ethan L. Miller (and a cast of dozens at Pure Storage) The What, Why and How of the Pure Storage Enterprise Flash Array Ethan L. Miller (and a cast of dozens at Pure Storage) Enterprise storage: $30B market built on disk Key players: EMC, NetApp, HP, etc.

More information

Hyper-converged Secondary Storage for Backup with Deduplication Q & A. The impact of data deduplication on the backup process

Hyper-converged Secondary Storage for Backup with Deduplication Q & A. The impact of data deduplication on the backup process Hyper-converged Secondary Storage for Backup with Deduplication Q & A The impact of data deduplication on the backup process Table of Contents Introduction... 3 What is data deduplication?... 3 Is all

More information

INTRODUCTION TO XTREMIO METADATA-AWARE REPLICATION

INTRODUCTION TO XTREMIO METADATA-AWARE REPLICATION Installing and Configuring the DM-MPIO WHITE PAPER INTRODUCTION TO XTREMIO METADATA-AWARE REPLICATION Abstract This white paper introduces XtremIO replication on X2 platforms. XtremIO replication leverages

More information

Parallelizing Inline Data Reduction Operations for Primary Storage Systems

Parallelizing Inline Data Reduction Operations for Primary Storage Systems Parallelizing Inline Data Reduction Operations for Primary Storage Systems Jeonghyeon Ma ( ) and Chanik Park Department of Computer Science and Engineering, POSTECH, Pohang, South Korea {doitnow0415,cipark}@postech.ac.kr

More information

Data Reduction Meets Reality What to Expect From Data Reduction

Data Reduction Meets Reality What to Expect From Data Reduction Data Reduction Meets Reality What to Expect From Data Reduction Doug Barbian and Martin Murrey Oracle Corporation Thursday August 11, 2011 9961: Data Reduction Meets Reality Introduction Data deduplication

More information

An Oracle White Paper October Advanced Compression with Oracle Database 11g

An Oracle White Paper October Advanced Compression with Oracle Database 11g An Oracle White Paper October 2011 Advanced Compression with Oracle Database 11g Oracle White Paper Advanced Compression with Oracle Database 11g Introduction... 3 Oracle Advanced Compression... 4 Compression

More information

Why Datrium DVX is Best for VDI

Why Datrium DVX is Best for VDI Why Datrium DVX is Best for VDI 385 Moffett Park Dr. Sunnyvale, CA 94089 844-478-8349 www.datrium.com Technical Report Introduction Managing a robust and growing virtual desktop infrastructure in current

More information

In-line Deduplication for Cloud storage to Reduce Fragmentation by using Historical Knowledge

In-line Deduplication for Cloud storage to Reduce Fragmentation by using Historical Knowledge In-line Deduplication for Cloud storage to Reduce Fragmentation by using Historical Knowledge Smitha.M. S, Prof. Janardhan Singh Mtech Computer Networking, Associate Professor Department of CSE, Cambridge

More information

Chapter 7. GridStor Technology. Adding Data Paths. Data Paths for Global Deduplication. Data Path Properties

Chapter 7. GridStor Technology. Adding Data Paths. Data Paths for Global Deduplication. Data Path Properties Chapter 7 GridStor Technology GridStor technology provides the ability to configure multiple data paths to storage within a storage policy copy. Having multiple data paths enables the administrator to

More information

Reducing Costs in the Data Center Comparing Costs and Benefits of Leading Data Protection Technologies

Reducing Costs in the Data Center Comparing Costs and Benefits of Leading Data Protection Technologies Reducing Costs in the Data Center Comparing Costs and Benefits of Leading Data Protection Technologies November 2007 Reducing Costs in the Data Center Table of Contents The Increasingly Costly Data Center...1

More information

HP Dynamic Deduplication achieving a 50:1 ratio

HP Dynamic Deduplication achieving a 50:1 ratio HP Dynamic Deduplication achieving a 50:1 ratio Table of contents Introduction... 2 Data deduplication the hottest topic in data protection... 2 The benefits of data deduplication... 2 How does data deduplication

More information

Protect enterprise data, achieve long-term data retention

Protect enterprise data, achieve long-term data retention Technical white paper Protect enterprise data, achieve long-term data retention HP StoreOnce Catalyst and Symantec NetBackup OpenStorage Table of contents Introduction 2 Technology overview 3 HP StoreOnce

More information

ADVANCED DEDUPLICATION CONCEPTS. Thomas Rivera, BlueArc Gene Nagle, Exar

ADVANCED DEDUPLICATION CONCEPTS. Thomas Rivera, BlueArc Gene Nagle, Exar ADVANCED DEDUPLICATION CONCEPTS Thomas Rivera, BlueArc Gene Nagle, Exar SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individual members may

More information

GFS: The Google File System. Dr. Yingwu Zhu

GFS: The Google File System. Dr. Yingwu Zhu GFS: The Google File System Dr. Yingwu Zhu Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one big CPU More storage, CPU required than one PC can

More information

International Journal of Computer Engineering and Applications, Volume XII, Special Issue, March 18, ISSN

International Journal of Computer Engineering and Applications, Volume XII, Special Issue, March 18,   ISSN International Journal of Computer Engineering and Applications, Volume XII, Special Issue, March 18, www.ijcea.com ISSN 2321-3469 SECURE DATA DEDUPLICATION FOR CLOUD STORAGE: A SURVEY Vidya Kurtadikar

More information

ADVANCED DATA REDUCTION CONCEPTS

ADVANCED DATA REDUCTION CONCEPTS ADVANCED DATA REDUCTION CONCEPTS Thomas Rivera, Hitachi Data Systems Gene Nagle, BridgeSTOR Author: Thomas Rivera, Hitachi Data Systems Author: Gene Nagle, BridgeSTOR SNIA Legal Notice The material contained

More information

Understanding Primary Storage Optimization Options Jered Floyd Permabit Technology Corp.

Understanding Primary Storage Optimization Options Jered Floyd Permabit Technology Corp. Understanding Primary Storage Optimization Options Jered Floyd Permabit Technology Corp. Primary Storage Optimization Technologies that let you store more data on the same storage Thin provisioning Copy-on-write

More information

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 18: Naming, Directories, and File Caching

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 18: Naming, Directories, and File Caching CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring 2004 Lecture 18: Naming, Directories, and File Caching 18.0 Main Points How do users name files? What is a name? Lookup:

More information

Dell DR4000 Replication Overview

Dell DR4000 Replication Overview Dell DR4000 Replication Overview Contents Introduction... 1 Challenges with Data Disaster Recovery... 1 The Dell DR4000 Solution A Replication Overview... 2 Advantages of using DR4000 replication for disaster

More information

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 18: Naming, Directories, and File Caching

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 18: Naming, Directories, and File Caching CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring 2002 Lecture 18: Naming, Directories, and File Caching 18.0 Main Points How do users name files? What is a name? Lookup:

More information

EMC DATA DOMAIN OPERATING SYSTEM

EMC DATA DOMAIN OPERATING SYSTEM EMC DATA DOMAIN OPERATING SYSTEM Powering EMC Protection Storage ESSENTIALS High-Speed, Scalable Deduplication Up to 31 TB/hr performance Reduces requirements for backup storage by 10 to 30x and archive

More information

Don t Get Duped By Dedupe: Introducing Adaptive Deduplication

Don t Get Duped By Dedupe: Introducing Adaptive Deduplication Don t Get Duped By Dedupe: Introducing Adaptive Deduplication 7 Technology Circle Suite 100 Columbia, SC 29203 Phone: 866.359.5411 E-Mail: sales@unitrends.com URL: www.unitrends.com 1 The purpose of deduplication

More information

Symantec NetBackup 7 for VMware

Symantec NetBackup 7 for VMware V-Ray visibility into virtual machine protection Overview There s little question that server virtualization is the single biggest game-changing trend in IT today. Budget-strapped IT departments are racing

More information

A DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU

A DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU A DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU PRESENTED BY ROMAN SHOR Overview Technics of data reduction in storage systems:

More information

Scale-out Data Deduplication Architecture

Scale-out Data Deduplication Architecture Scale-out Data Deduplication Architecture Gideon Senderov Product Management & Technical Marketing NEC Corporation of America Outline Data Growth and Retention Deduplication Methods Legacy Architecture

More information

Last Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications

Last Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications Last Class Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications Basic Timestamp Ordering Optimistic Concurrency Control Multi-Version Concurrency Control C. Faloutsos A. Pavlo Lecture#23:

More information

Chapter 1. Storage Concepts. CommVault Concepts & Design Strategies: https://www.createspace.com/

Chapter 1. Storage Concepts. CommVault Concepts & Design Strategies: https://www.createspace.com/ Chapter 1 Storage Concepts 4 - Storage Concepts In order to understand CommVault concepts regarding storage management we need to understand how and why we protect data, traditional backup methods, and

More information

SMD149 - Operating Systems - File systems

SMD149 - Operating Systems - File systems SMD149 - Operating Systems - File systems Roland Parviainen November 21, 2005 1 / 59 Outline Overview Files, directories Data integrity Transaction based file systems 2 / 59 Files Overview Named collection

More information

File System Performance (and Abstractions) Kevin Webb Swarthmore College April 5, 2018

File System Performance (and Abstractions) Kevin Webb Swarthmore College April 5, 2018 File System Performance (and Abstractions) Kevin Webb Swarthmore College April 5, 2018 Today s Goals Supporting multiple file systems in one name space. Schedulers not just for CPUs, but disks too! Caching

More information

EMC DATA DOMAIN PRODUCT OvERvIEW

EMC DATA DOMAIN PRODUCT OvERvIEW EMC DATA DOMAIN PRODUCT OvERvIEW Deduplication storage for next-generation backup and archive Essentials Scalable Deduplication Fast, inline deduplication Provides up to 65 PBs of logical storage for long-term

More information

Saving capacity by using Thin provisioning, Deduplication, and Compression In Qsan Unified Storage. U400Q Series U600Q Series

Saving capacity by using Thin provisioning, Deduplication, and Compression In Qsan Unified Storage. U400Q Series U600Q Series Saving capacity by using Thin provisioning, Deduplication, and Compression In Qsan Unified Storage U400Q Series U600Q Series Version 1.0.0 November 2012 Copyright Copyright@2004~2012, Qsan Technology,

More information

Backup management with D2D for HP OpenVMS

Backup management with D2D for HP OpenVMS OpenVMS Technical Journal V19 Backup management with D2D for HP OpenVMS Table of contents Overview... 2 Introduction... 2 What is a D2D device?... 2 Traditional tape backup vs. D2D backup... 2 Advantages

More information

Tape Drive Data Compression Q & A

Tape Drive Data Compression Q & A Tape Drive Data Compression Q & A Question What is data compression and how does compression work? Data compression permits increased storage capacities by using a mathematical algorithm that reduces redundant

More information

Deduplication and Incremental Accelleration in Bacula with NetApp Technologies. Peter Buschman EMEA PS Consultant September 25th, 2012

Deduplication and Incremental Accelleration in Bacula with NetApp Technologies. Peter Buschman EMEA PS Consultant September 25th, 2012 Deduplication and Incremental Accelleration in Bacula with NetApp Technologies Peter Buschman EMEA PS Consultant September 25th, 2012 1 NetApp and Bacula Systems Bacula Systems became a NetApp Developer

More information

Final Examination CS 111, Fall 2016 UCLA. Name:

Final Examination CS 111, Fall 2016 UCLA. Name: Final Examination CS 111, Fall 2016 UCLA Name: This is an open book, open note test. You may use electronic devices to take the test, but may not access the network during the test. You have three hours

More information

Setting Up the Dell DR Series System on Veeam

Setting Up the Dell DR Series System on Veeam Setting Up the Dell DR Series System on Veeam Dell Engineering April 2016 A Dell Technical White Paper Revisions Date January 2014 May 2014 July 2014 April 2015 June 2015 November 2015 April 2016 Description

More information

Keywords Data compression, Lossless data compression technique, Huffman Coding, Arithmetic coding etc.

Keywords Data compression, Lossless data compression technique, Huffman Coding, Arithmetic coding etc. Volume 6, Issue 2, February 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Comparative

More information

SolidFire and Pure Storage Architectural Comparison

SolidFire and Pure Storage Architectural Comparison The All-Flash Array Built for the Next Generation Data Center SolidFire and Pure Storage Architectural Comparison June 2014 This document includes general information about Pure Storage architecture as

More information

Veeam and HP: Meet your backup data protection goals

Veeam and HP: Meet your backup data protection goals Sponsored by Veeam and HP: Meet your backup data protection goals Eric Machabert Сonsultant and virtualization expert Introduction With virtualization systems becoming mainstream in recent years, backups

More information

Cybernetics Virtual Tape Libraries Media Migration Manager Streamlines Flow of D2D2T Backup. April 2009

Cybernetics Virtual Tape Libraries Media Migration Manager Streamlines Flow of D2D2T Backup. April 2009 Cybernetics Virtual Tape Libraries Media Migration Manager Streamlines Flow of D2D2T Backup April 2009 Cybernetics has been in the business of data protection for over thirty years. Our data storage and

More information

Technology Insight Series

Technology Insight Series EMC Avamar for NAS - Accelerating NDMP Backup Performance John Webster June, 2011 Technology Insight Series Evaluator Group Copyright 2011 Evaluator Group, Inc. All rights reserved. Page 1 of 7 Introduction/Executive

More information

Image Compression for Mobile Devices using Prediction and Direct Coding Approach

Image Compression for Mobile Devices using Prediction and Direct Coding Approach Image Compression for Mobile Devices using Prediction and Direct Coding Approach Joshua Rajah Devadason M.E. scholar, CIT Coimbatore, India Mr. T. Ramraj Assistant Professor, CIT Coimbatore, India Abstract

More information

TIBX NEXT-GENERATION ARCHIVE FORMAT IN ACRONIS BACKUP CLOUD

TIBX NEXT-GENERATION ARCHIVE FORMAT IN ACRONIS BACKUP CLOUD TIBX NEXT-GENERATION ARCHIVE FORMAT IN ACRONIS BACKUP CLOUD 1 Backup Speed and Reliability Are the Top Data Protection Mandates What are the top data protection mandates from your organization s IT leadership?

More information

COMPARATIVE STUDY OF TWO MODERN FILE SYSTEMS: NTFS AND HFS+

COMPARATIVE STUDY OF TWO MODERN FILE SYSTEMS: NTFS AND HFS+ COMPARATIVE STUDY OF TWO MODERN FILE SYSTEMS: NTFS AND HFS+ Viral H. Panchal 1, Brijal Panchal 2, Heta K. Desai 3 Asst. professor, Computer Engg., S.N.P.I.T&RC, Umrakh, Gujarat, India 1 Student, Science

More information

Get More Out of Storage with Data Domain Deduplication Storage Systems

Get More Out of Storage with Data Domain Deduplication Storage Systems 1 Get More Out of Storage with Data Domain Deduplication Storage Systems David M. Auslander Sales Director, New England / Eastern Canada 2 EMC Data Domain Dedupe everything without changing anything Simplify

More information

Shared snapshots. 1 Abstract. 2 Introduction. Mikulas Patocka Red Hat Czech, s.r.o. Purkynova , Brno Czech Republic

Shared snapshots. 1 Abstract. 2 Introduction. Mikulas Patocka Red Hat Czech, s.r.o. Purkynova , Brno Czech Republic Shared snapshots Mikulas Patocka Red Hat Czech, s.r.o. Purkynova 99 612 45, Brno Czech Republic mpatocka@redhat.com 1 Abstract Shared snapshots enable the administrator to take many snapshots of the same

More information

The term "physical drive" refers to a single hard disk module. Figure 1. Physical Drive

The term physical drive refers to a single hard disk module. Figure 1. Physical Drive HP NetRAID Tutorial RAID Overview HP NetRAID Series adapters let you link multiple hard disk drives together and write data across them as if they were one large drive. With the HP NetRAID Series adapter,

More information

Chapter 2 CommVault Data Management Concepts

Chapter 2 CommVault Data Management Concepts Chapter 2 CommVault Data Management Concepts 10 - CommVault Data Management Concepts The Simpana product suite offers a wide range of features and options to provide great flexibility in configuring and

More information

Setting Up the DR Series System on Veeam

Setting Up the DR Series System on Veeam Setting Up the DR Series System on Veeam Quest Engineering June 2017 A Quest Technical White Paper Revisions Date January 2014 May 2014 July 2014 April 2015 June 2015 November 2015 April 2016 Description

More information

ZYNSTRA TECHNICAL BRIEFING NOTE

ZYNSTRA TECHNICAL BRIEFING NOTE ZYNSTRA TECHNICAL BRIEFING NOTE Backup What is Backup? Backup is a service that forms an integral part of each Cloud Managed Server. Its purpose is to regularly store an additional copy of your data and

More information

REVIEW ON IMAGE COMPRESSION TECHNIQUES AND ADVANTAGES OF IMAGE COMPRESSION

REVIEW ON IMAGE COMPRESSION TECHNIQUES AND ADVANTAGES OF IMAGE COMPRESSION REVIEW ON IMAGE COMPRESSION TECHNIQUES AND ABSTRACT ADVANTAGES OF IMAGE COMPRESSION Amanpreet Kaur 1, Dr. Jagroop Singh 2 1 Ph. D Scholar, Deptt. of Computer Applications, IK Gujral Punjab Technical University,

More information

Journal of Computer Engineering and Technology (IJCET), ISSN (Print), International Journal of Computer Engineering

Journal of Computer Engineering and Technology (IJCET), ISSN (Print), International Journal of Computer Engineering Journal of Computer Engineering and Technology (IJCET), ISSN 0976 6367(Print), International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 6367(Print) ISSN 0976 6375(Online) Volume

More information

Backup 2.0: Simply Better Data Protection

Backup 2.0: Simply Better Data Protection Simply Better Protection 2.0: Simply Better Protection Gain Net Savings of $15 for Every $1 Invested on B2.0 Technologies Executive Summary Traditional backup methods are reaching their technology end-of-life.

More information

Single Instance Storage Strategies

Single Instance Storage Strategies Single Instance Storage Strategies Michael Fahey, Hitachi Data Systems SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individuals may use this

More information

IBM Real-time Compression and ProtecTIER Deduplication

IBM Real-time Compression and ProtecTIER Deduplication Compression and ProtecTIER Deduplication Two technologies that work together to increase storage efficiency Highlights Reduce primary storage capacity requirements with Compression Decrease backup data

More information

Maximizing your Storage Capacity: Data reduction techniques and performance metrics

Maximizing your Storage Capacity: Data reduction techniques and performance metrics E-Guide Data reduction techniques and performance metrics Data reduction technologies include anything that reduces the footprint of your data on disk. In primary storage, there are three types of data

More information

NetApp SolidFire and Pure Storage Architectural Comparison A SOLIDFIRE COMPETITIVE COMPARISON

NetApp SolidFire and Pure Storage Architectural Comparison A SOLIDFIRE COMPETITIVE COMPARISON A SOLIDFIRE COMPETITIVE COMPARISON NetApp SolidFire and Pure Storage Architectural Comparison This document includes general information about Pure Storage architecture as it compares to NetApp SolidFire.

More information

Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES

Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES May, 2017 Contents Introduction... 2 Overview... 2 Architecture... 2 SDFS File System Service... 3 Data Writes... 3 Data Reads... 3 De-duplication

More information

HYCU and ExaGrid Hyper-converged Backup for Nutanix

HYCU and ExaGrid Hyper-converged Backup for Nutanix HYCU and ExaGrid Hyper-converged Backup for Nutanix Backing Up and Recovering Data: Nutanix, ExaGrid and HYCU As IT data centers move to hyper-converged infrastructure, new and innovative backup approaches

More information

Data deduplication for Similar Files

Data deduplication for Similar Files Int'l Conf. Scientific Computing CSC'17 37 Data deduplication for Similar Files Mohamad Zaini Nurshafiqah, Nozomi Miyamoto, Hikari Yoshii, Riichi Kodama, Itaru Koike, Toshiyuki Kinoshita School of Computer

More information

ExaGrid Using Veeam Backup and Replication Software With an ExaGrid System

ExaGrid Using Veeam Backup and Replication Software With an ExaGrid System ExaGrid Using Veeam Backup and Replication Software With an ExaGrid System PN: 210-0317-01 Copyright No part of this document may be reproduced or transmitted in any form or by any means without the express

More information

Reducing The De-linearization of Data Placement to Improve Deduplication Performance

Reducing The De-linearization of Data Placement to Improve Deduplication Performance Reducing The De-linearization of Data Placement to Improve Deduplication Performance Yujuan Tan 1, Zhichao Yan 2, Dan Feng 2, E. H.-M. Sha 1,3 1 School of Computer Science & Technology, Chongqing University

More information

An Oracle White Paper June Exadata Hybrid Columnar Compression (EHCC)

An Oracle White Paper June Exadata Hybrid Columnar Compression (EHCC) An Oracle White Paper June 2011 (EHCC) Introduction... 3 : Technology Overview... 4 Warehouse Compression... 6 Archive Compression... 7 Conclusion... 9 Introduction enables the highest levels of data compression

More information

Copyright 2010 EMC Corporation. Do not Copy - All Rights Reserved.

Copyright 2010 EMC Corporation. Do not Copy - All Rights Reserved. 1 Using patented high-speed inline deduplication technology, Data Domain systems identify redundant data as they are being stored, creating a storage foot print that is 10X 30X smaller on average than

More information

A Research Paper on Lossless Data Compression Techniques

A Research Paper on Lossless Data Compression Techniques IJIRST International Journal for Innovative Research in Science & Technology Volume 4 Issue 1 June 2017 ISSN (online): 2349-6010 A Research Paper on Lossless Data Compression Techniques Prof. Dipti Mathpal

More information

Executive Summary SOLE SOURCE JUSTIFICATION. Microsoft Integration

Executive Summary SOLE SOURCE JUSTIFICATION. Microsoft Integration Executive Summary Commvault Simpana software delivers the unparalleled advantages and benefits of a truly holistic approach to data management. It is one product that contains individually licensable modules

More information

IMAGE COMPRESSION USING HYBRID TRANSFORM TECHNIQUE

IMAGE COMPRESSION USING HYBRID TRANSFORM TECHNIQUE Volume 4, No. 1, January 2013 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at www.jgrcs.info IMAGE COMPRESSION USING HYBRID TRANSFORM TECHNIQUE Nikita Bansal *1, Sanjay

More information

DELL EMC DATA DOMAIN OPERATING SYSTEM

DELL EMC DATA DOMAIN OPERATING SYSTEM DATA SHEET DD OS Essentials High-speed, scalable deduplication Up to 68 TB/hr performance Reduces protection storage requirements by 10 to 30x CPU-centric scalability Data invulnerability architecture

More information

Hybrid Cloud NAS for On-Premise and In-Cloud File Services with Panzura and Google Cloud Storage

Hybrid Cloud NAS for On-Premise and In-Cloud File Services with Panzura and Google Cloud Storage Panzura White Paper Hybrid Cloud NAS for On-Premise and In-Cloud File Services with Panzura and Google Cloud Storage By: Rich Weber, Product Management at Panzura This article describes how Panzura works

More information

Strategies for Single Instance Storage. Michael Fahey Hitachi Data Systems

Strategies for Single Instance Storage. Michael Fahey Hitachi Data Systems Strategies for Single Instance Storage Michael Fahey Hitachi Data Systems Abstract Single Instance Strategies for Storage Single Instance Storage has become a very popular topic in the industry because

More information

WHITE PAPER: ENTERPRISE SOLUTIONS. Disk-Based Data Protection Achieving Faster Backups and Restores and Reducing Backup Windows

WHITE PAPER: ENTERPRISE SOLUTIONS. Disk-Based Data Protection Achieving Faster Backups and Restores and Reducing Backup Windows WHITE PAPER: ENTERPRISE SOLUTIONS Disk-Based Data Protection Achieving Faster Backups and Restores and Reducing Backup Windows White Paper: Enterprise Security Disk-Based Data Protection Achieving Faster

More information

HOW DATA DEDUPLICATION WORKS A WHITE PAPER

HOW DATA DEDUPLICATION WORKS A WHITE PAPER HOW DATA DEDUPLICATION WORKS A WHITE PAPER HOW DATA DEDUPLICATION WORKS ABSTRACT IT departments face explosive data growth, driving up costs of storage for backup and disaster recovery (DR). For this reason,

More information

Ext3/4 file systems. Don Porter CSE 506

Ext3/4 file systems. Don Porter CSE 506 Ext3/4 file systems Don Porter CSE 506 Logical Diagram Binary Formats Memory Allocators System Calls Threads User Today s Lecture Kernel RCU File System Networking Sync Memory Management Device Drivers

More information

The storage challenges of virtualized environments

The storage challenges of virtualized environments The storage challenges of virtualized environments The virtualization challenge: Ageing and Inflexible storage architectures Mixing of platforms causes management complexity Unable to meet the requirements

More information

StorageCraft OneXafe and Veeam 9.5

StorageCraft OneXafe and Veeam 9.5 TECHNICAL DEPLOYMENT GUIDE NOV 2018 StorageCraft OneXafe and Veeam 9.5 Expert Deployment Guide Overview StorageCraft, with its scale-out storage solution OneXafe, compliments Veeam to create a differentiated

More information

DATA DOMAIN INVULNERABILITY ARCHITECTURE: ENHANCING DATA INTEGRITY AND RECOVERABILITY

DATA DOMAIN INVULNERABILITY ARCHITECTURE: ENHANCING DATA INTEGRITY AND RECOVERABILITY WHITEPAPER DATA DOMAIN INVULNERABILITY ARCHITECTURE: ENHANCING DATA INTEGRITY AND RECOVERABILITY A Detailed Review ABSTRACT No single mechanism is sufficient to ensure data integrity in a storage system.

More information

Deploying Application and OS Virtualization Together: Citrix and Virtuozzo

Deploying Application and OS Virtualization Together: Citrix and Virtuozzo White Paper Deploying Application and OS Virtualization Together: Citrix and Virtuozzo www.swsoft.com Version 1.0 Table of Contents The Virtualization Continuum: Deploying Virtualization Together... 3

More information

Announcements. Persistence: Crash Consistency

Announcements. Persistence: Crash Consistency Announcements P4 graded: In Learn@UW by end of day P5: Available - File systems Can work on both parts with project partner Fill out form BEFORE tomorrow (WED) morning for match Watch videos; discussion

More information

Study of LZ77 and LZ78 Data Compression Techniques

Study of LZ77 and LZ78 Data Compression Techniques Study of LZ77 and LZ78 Data Compression Techniques Suman M. Choudhary, Anjali S. Patel, Sonal J. Parmar Abstract Data Compression is defined as the science and art of the representation of information

More information

Faculty of Engineering Computer Engineering Department Islamic University of Gaza Network Lab # 5 Managing Groups

Faculty of Engineering Computer Engineering Department Islamic University of Gaza Network Lab # 5 Managing Groups Faculty of Engineering Computer Engineering Department Islamic University of Gaza 2012 Network Lab # 5 Managing Groups Network Lab # 5 Managing Groups Objective: Learn about groups and where to create

More information

Flash Decisions: Which Solution is Right for You?

Flash Decisions: Which Solution is Right for You? Flash Decisions: Which Solution is Right for You? A Guide to Finding the Right Flash Solution Introduction Chapter 1: Why Flash Storage Now? Chapter 2: Flash Storage Options Chapter 3: Choosing the Right

More information

Data Deduplication using Even or Odd Block (EOB) Checking Algorithm in Hybrid Cloud

Data Deduplication using Even or Odd Block (EOB) Checking Algorithm in Hybrid Cloud Data Deduplication using Even or Odd Block (EOB) Checking Algorithm in Hybrid Cloud Suganthi.M 1, Hemalatha.B 2 Research Scholar, Depart. Of Computer Science, Chikkanna Government Arts College, Tamilnadu,

More information

Oracle Advanced Compression. An Oracle White Paper April 2008

Oracle Advanced Compression. An Oracle White Paper April 2008 Oracle Advanced Compression An Oracle White Paper April 2008 Oracle Advanced Compression Introduction... 2 Oracle Advanced Compression... 2 Compression for Relational Data... 3 Innovative Algorithm...

More information

Physical Representation of Files

Physical Representation of Files Physical Representation of Files A disk drive consists of a disk pack containing one or more platters stacked like phonograph records. Information is stored on both sides of the platter. Each platter is

More information

White Paper Simplified Backup and Reliable Recovery

White Paper Simplified Backup and Reliable Recovery Simplified Backup and Reliable Recovery NEC Corporation of America necam.com Overview Amanda Enterprise from Zmanda - A Carbonite company, is a backup and recovery solution that offers fast installation,

More information

ò Very reliable, best-of-breed traditional file system design ò Much like the JOS file system you are building now

ò Very reliable, best-of-breed traditional file system design ò Much like the JOS file system you are building now Ext2 review Very reliable, best-of-breed traditional file system design Ext3/4 file systems Don Porter CSE 506 Much like the JOS file system you are building now Fixed location super blocks A few direct

More information

Module 6 STILL IMAGE COMPRESSION STANDARDS

Module 6 STILL IMAGE COMPRESSION STANDARDS Module 6 STILL IMAGE COMPRESSION STANDARDS Lesson 19 JPEG-2000 Error Resiliency Instructional Objectives At the end of this lesson, the students should be able to: 1. Name two different types of lossy

More information

Virtuozzo Containers

Virtuozzo Containers Parallels Virtuozzo Containers White Paper An Introduction to Operating System Virtualization and Parallels Containers www.parallels.com Table of Contents Introduction... 3 Hardware Virtualization... 3

More information

Why In-Place Copy Data Management is The Better Choice

Why In-Place Copy Data Management is The Better Choice Why In-Place Copy Data Management is The Better Choice Copy Data Management (CDM) is increasingly being recognized as a musthave technology for the modern data center. But there remains a fundamental difference

More information

A Parallel Reconfigurable Architecture for DCT of Lengths N=32/16/8

A Parallel Reconfigurable Architecture for DCT of Lengths N=32/16/8 Page20 A Parallel Reconfigurable Architecture for DCT of Lengths N=32/16/8 ABSTRACT: Parthiban K G* & Sabin.A.B ** * Professor, M.P. Nachimuthu M. Jaganathan Engineering College, Erode, India ** PG Scholar,

More information

IMAGE COMPRESSION USING HYBRID QUANTIZATION METHOD IN JPEG

IMAGE COMPRESSION USING HYBRID QUANTIZATION METHOD IN JPEG IMAGE COMPRESSION USING HYBRID QUANTIZATION METHOD IN JPEG MANGESH JADHAV a, SNEHA GHANEKAR b, JIGAR JAIN c a 13/A Krishi Housing Society, Gokhale Nagar, Pune 411016,Maharashtra, India. (mail2mangeshjadhav@gmail.com)

More information

White paper ETERNUS CS800 Data Deduplication Background

White paper ETERNUS CS800 Data Deduplication Background White paper ETERNUS CS800 - Data Deduplication Background This paper describes the process of Data Deduplication inside of ETERNUS CS800 in detail. The target group consists of presales, administrators,

More information