A Comparative Survey on Big Data Deduplication Techniques for Efficient Storage System

Size: px
Start display at page:

Download "A Comparative Survey on Big Data Deduplication Techniques for Efficient Storage System"

Transcription

1 A Comparative Survey on Big Data Techniques for Efficient Storage System Supriya Milind More Sardar Patel Institute of Technology Kailas Devadkar Sardar Patel Institute of Technology ABSTRACT - Nowadays due to the exponential growth in use of emerging technology such as cloud computing and big data, the rate of data growth is also increasing rapidly. Every second millions of data is being generated because of the use of different new technologies like IOT and Sensor. Hence it is very challenging to store and handle such large amount of data. Many enterprise organizations are investing lots of money to store such big data for backup and disaster recovery purpose. But traditional backup solution does not provide any facility of preventing the system from storing duplicate data, which increases the storage cost and backup time, which in turn decreases the system performance. Data is the solution for such problem. It is a new emerging technique which eliminates the duplicate or redundant data and stores only unique copy of data. Hence it reduces the storage utilization and cost of maintaining redundant data. In this paper, we have studied different related research paper from literature, and attempted to summarize different storage utilization techniques, Stages in data deduplication, categorization of data deduplication techniques based on different criteria. KEYWORDS Big data, Cloud computing, Data, Storage Optimization, Stages in deduplication. INTRODUCTION Nowadays, due to the exponential growth in use of emerging technology like cloud computing and big data, data growth rate is also increasing rapidly. Every second millions of data is being generated because of the use of different new technologies like IOT and Sensor. Hence it is very challenging to store and handle such large amount of data. Many Enterprise organizations are investing lots of money to store such big data for backup and for disaster recovery purpose. But traditional backup solution does not provide any facility of preventing the system from storing duplicate data, which increases the storage cost and backup time, which in turn decreases the system performance. Data is the solution for such problem. It is a new emerging technique which eliminates the duplicate or redundant data and stores only unique copy of data. Hence it reduces the storage utilization and cost of maintaining redundant data. Today, not only enterprise companies but also a common person need there data to be kept safe. For that reason they store there data on multiple places. Storing data on multiple places results in high amount of storage utilization. Another problem could be disaster that can be occurred due to natural reason or artificial reason; hence everyone wants their sensitive data to be safe and secure. We cannot underestimate such long term perception because sensitive data is very important to be preserved. Data deduplication is solution for such problems; it eliminates redundant data and only store unique data. is a new compression technology which is used for efficient use of storage space and better way to handle the duplicate data. Data deduplication breaks the file into number of pieces, and then Hash algorithm is applied to each piece of file which generates unique identifier for each piece which is called as chunk of file. Further this unique identifier will be compared with already stored identifier stored in index table. After Comparison if match found then it will be considered as duplicate piece of data, hence simply discard that data and only store the reference pointer to that unique identifier so that retrieval operation will be easy[5]. 529

2 In rest of the paper, Section 2 describes different storage optimization techniques, Section 3 gives the brief idea of data deduplication, stages of deduplication, comparative study of current deduplication methods, Section 4 summarizes different types of data deduplication, and finally section 5 concluded the paper. STORAGE OPTIMIZATION TECHNIQUES Primary storage is very expensive need in today's era of digital world, but storage is most important not only for enterprises but also for home users also. Primary storage is also called as tier 1 storage through which we can store our data optimally. There are different optimization techniques provided by the suppliers, like thin provisioning, clones, snapshots, compression and deduplication. But among these which storage optimization technique is better is a question in front of the IT sector. We will see briefly pros and cons of each of these storage optimization techniques. 1. Compression: Compression is one of the important storage optimization techniques. Compression is used to store the data more efficiently so as the maximum data can be stored in limited storage. It is also used for bandwidth optimization across network. Compression technique removes the binary level redundant data from the data blocks in order to save storage space. There are again two techniques for compression lossy and lossless compression. In lossless compression when any file is compressed its quality will remain same and by decompressing original file can be obtained as it is, but in lossy it will remove the data permanently.compression does not take effort for duplicate data, it will store the data irrespective of duplication of data[7]. 2. Thin provisioning: It is the technique used for effectively allocating the storage space to save the data. It basically focuses on allocating disk space more reliably to multiple users. For that it will consider the minimum requirement of storage need of each user. Thin provisioning works in shared storage environment in which it will allocate the data block dynamically whenever there is need of storage or out of storage condition arises i.e. it is pure on demand process. It maintains the pool of free space because of which it will achieve higher storage utilization ratio [1].In traditional provisioning for each application it will allocate some extra amount of storage capacity, which cannot be used by any other application. Hence most of the time it results in wastage of physical memory unnecessarily[7].but in case of thin provisioning it will remove such extra paid storage capacity and allocate exact amount of storage needed. In case of more storage space need it will dynamically added to the existing combined storage system. 3. Snapshot: Snapshot technology is one of the popular data storage technologies nowadays. Snapshots are read-only copies of data which are useful not only for data protection but also for replication [1].Most vendors use this technology at operating system to access data at application layer effectively. Snapshot means at any given time of period recording state of the storage devices. Hence it can be further recovered at the time of failure. There are different ways to implement snapshot technology depending upon vendors and environment method can be determined. Some vendors may use read only snapshot and some may use writable snapshot technology. Copy-on-write, Redirect-on-write, Split mirror these are some of the techniques in snapshot technology. In which copy in write takes snapshot of metadata of the original data where redirect on write, writes only changed data instead of taking copy of complete original data. Though it gives data protection performance can be issue in this technology. 4. Clones: Clones and snapshots are somewhat may look similar and can confuse vendors but there is difference in them. Cloning means creating an exact similar copy of the virtual machine where snapshot creates a delta file, which allows you to rollback a virtual machine. Snapshots and clones are similar in nature but both have different attributes and mode of uses [7].Clone VM is an exact copy of production VM, including IP address, DNS name and so on. Snapshot is a "quick revert" feature, if something goes wrong. But you still need good old backups on other storage. Above all storage optimization methods use different techniques to efficiently store large amount of data in limited disk space with low cost and less storage requirement. But above techniques do not take care of the duplicate data; it will store the redundant data as it is which in turn require more storage space. For that reason data deduplication is used. 530

3 DATA DEDUPLICATION Data deduplication is a new emerging technique which is used to eliminate redundant data and stores only unique copy of the data. Hence it will be responsible for better storage utilization and efficient technique to handle similar data. For example, suppose there is one system in which there are 100 instance of particular 1 MB file attachment. When the system is backed up without deduplication it will need 100MB of storage. But if deduplication is applied on system, only one instance will be stored initially and then subsequent instances will be given reference pointer to the original instance saved. In such a way, the demand for storage space is reduced from 100MB to 1MB. There are different Stages in [5]: 1. The chunking/blocking method divides the large data file into small pieces called chunks or blocks. 2. Hash algorithm is applied to generate a unique hash identifier of each data block. 3. When new data block comes for backup, it will be compared with already stored hash identifier. 4. If match found, reference pointer will be given and duplicate data block is deleted, otherwise it will store the unique identifier and data block. TYPES OF DATA DEDUPLICATION Data techniques are depending upon following conditions: A. Based on chunking method B. Based on Location of deduplication C. Based on time of deduplication A) Based on chunking methods: The overall performance of data deduplication is depending on one major key element which is blocking algorithm. There are different methods of blocking. Depending on chunking/blocking method there are two types of deduplication: 1. File level chunking: File level chunking algorithm, does not divide the file into small blocks instead it considers whole file as a one chunk. Hence in this method, for whole file only one index value is generated and this index value is further compared with already stored index values. Because of single index value there are very less entries in index table which in turn reduce storage space and more number of indexes can be stored in index table. This file level chunking fails when there is slight change in file data, because it will generate index for complete file again rather it should generate index for only changed data.which in turns reduces the deduplication elimination ratio and throughput of the system. 2. Block Level Chunking: There are again two type of block level chunking algorithm, a) Fixed-Size Chunking: Fixed-size chunking algorithm divides the data file into fixed size blocks or chunks. The block boundaries can be offset like, 4kB, 8kB, 16kB etc. This algorithm solves the problem of file level chunking as it generates index value only for changes part and not for entire data file[5].however for large file it will creates large number of small chunks which in turn requires more storage space to store index value and metadata information. In this method Byte shifting problem can be occurred. b) Variable-Size Chunking: In this method, the data file can be divided into multiple small blocks, where blocks are of variable size. It will break the file based on content of data rather any fixed size. It resolves the issue of fixed size chunking. In fixed size chunking data boundary does not change even the data is changed but in variable size chunking there are different chunk boundary based on different parameters because of which boundary can be changed or shifted when any file is changed or deleted. Hence in case of any change in file less boundaries needs to be changed. B) based on location: Depending on location of deduplication following are types of deduplication [6]: 1. Source-based deduplication:-in this, duplicate data is eliminated before transmitting to the target machine. By using, source based deduplication we can reduce bandwidth use as only unique data is being transferred. It requires very less hardware requirement. But it requires more processing resources at source or client side. 531

4 2. Target-based deduplication:-in this, complete backup data is transferred to target location and there redundant data can be eliminated. It increases the bandwidth cost but gives good performance as compared to source based deduplication. C) based on time: Depending upon the question 'when to perform deduplication' there are following types of deduplication, 1. Inline deduplication-in this, deduplication occurred at client side i.e. before storing the data into disk data will be deduplicated [4].Only unique data has been transferred to the target server side. Inline deduplication reduces the network overhead while transferring of data, but it also need the high processing power at source side to perform deduplication. 2. Post process deduplication- In post process, applied at the server side. All the incoming data first stored in the disk as it is, then duplicate data is removed and unique copy of the data is saved in storage. Performance of post process is higher as compared to inline deduplication, but it needs more disk space to store the data and require fast disk cache. COMPARATIVE STUDY OF CURRENT RESEARCHES ON DATA DEDUPLICATION Lots of research has been done in data deduplication field. In traditional data deduplication technique single global index table has been used for storing hash values of files or data blocks which in turn puts computational overhead and decreases the performance[2].for that reason distributes and parallel deduplication is used. Table 1 shows different data deduplication research studies. Table 1. Comparison table Research paper Author name Chunking method Technique used Methodology Hadoop Based Scalable Cluster for Big Data Qing Liu, Yinjin Fu, Guiqiang Ni Fixed size blocking algorithm Map reduce and HDFS. They have used Mapreduce technique for parallel deduplication framework. Index table is distributed in each node which is stored in lightweight local MySQL databases. Boafft: Distributed for Big Data Storage in the Cloud Shengmei Luo, Guangyan Zhang, Chengwen Wu Block level chunking(s uper chunk) data routing algorithm based on similarity index It uses efficient data routing algorithm which is based on data similarity, Hence reduces the network overhead to identify target storage location. It uses multiple storage data node for parallel deduplication. Bucket Based Data Technique for Big Data Storage System Naresh Kumar, Rahul Rawat, S. C. Jain Fixed size blocking algorithm bucket based technique Buckets are used to store the Hash value of blocks. Map reduce technique applied to compare hashes stored in bucket with incoming hash of block. 532

5 Extreme Binning: Scalable, Parallel for Chunk-based File Backup Deepavali Bhagwat,Kave Eshghi,Darrell D. E. Long File level Borders theorem, File similarity Extreme binning uses file similarity. First, it chooses minimum hash index value of particular file as its characteristic fingerprint using border s Theory. Then it transfers the files to the same deduplication server to deduplicate. ADMAD: Application- Driven Metadata Aware Archival Storage System Chuanyi LIU,Yingping Lu,unhui Shi,Guanlin Lu, David H.C. Du,Dong- ShenWANG. Variable size chunking metadata information Used metadata information of different levels in the I/O path such that more Meaningful data Chunks can be generated in the process of file partitioning in order to achieve interfile level deduplication. Next Level Approach of Data in the Era of Big Data Shamsher Singh,Ravinder Singh Fixed size chunking Both source and target level deduplication If file size<1gb deduplication will occur at primary namenode.if file size is > 1GB, then name node will apply create chunk of files and transfer chunks to the secondary data nodes where deduplication is performed. CONCLUSION In this paper, we studied different Data deduplication techniques and explored how data deduplication is used to handle duplicate data more efficiently. Depending upon location, chunking type and time there are different deduplication techniques. These techniques can be used depending upon the vendor's needs and expectations. We have done comparative survey of current data deduplication methods successfully. In order to overcome the issue of traditional deduplication which has global index table, distributed or parallel data deduplication can be used. Many researchers have contributed in this area but still more research hast to be done in order to increase processing time, data retrieval efficiency and throughput. REFERENCES [1] Andre Brinkmann, Sascha Effert, Snapshots and Continuous Data Replication in Cluster Storage Environments,Fourth International Workshop on Storage Network Architecture and Parallel I/O, IEEE,2008. [2] Q. Liu, Y. Fu, G. Ni, R. Hou, Hadoop Based Scalable Cluster for Big Data,2016 IEEE 36th International Conference on Distributed Computing Systems Workshops. [3] N Kumar, R. Rawat, and S. C. Jain, Bucket Based Data Technique,5th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO) (Trends and Futur e Directions), Noida, 2016, pp [4] Z. Sun, J. Shen, and J. Yong, A novel approach to data deduplication over the engineering-oriented cloud systems,integrated Computer-Aided Engineering, vol. 20, no. 1, pp , [5] A. Venish and K. Siva Sankar, Study of Chunking Algorithm in Data, Springer India

6 [6] R. Vikraman and A. S, A Study on Various Data Systems,International Journal of Computer Applications, vol. 94, no.4, pp , [7] George Crump Optimization(2011, September 30),Which Primary Storage is beast? [Online] Available: [8] E. Manogar and S. Abirami, A Study on Data Techniques for Optimized Storage,2014 Sixth International Conference on Advanced Computing(lCoAC), IEEE 2014, pp [9] R-S Chang, C-S Liao, K-Z Fan, and C-M Wu, Dynamic Decision in a Hadoop Distributed File System International Journal of Distributed Sensor Networks, pp. 1-14, April 2014 [10] Min Xu, Yunfeng Zhu, Patrick P. C. Lee, Yinlong Xu, Even Data Placement for Load Balance in Reliable Distributed Storage Systems In Proc. of IEEE International Symposium on Quality of Service (IWQoS), pp , [11] Deepu S,Bhaskar,Shylaja, PERFORMANCE COMPARISON OF DEDUPLICATION TECHNIQUES FOR STORAGE IN CLOUD COMPUTING ENVIRONMENT,Asian Journal of Computer Science And Information Technology 4 : 5 (2014) [12] Amanpreet Kaur,Sonia Sharma, An Efficient Framework and Techniques of Data in Cloud Computing,IJCST Vol. 8,April - June [13] Shengmei Luo, Guangyan Zhang, Chengwen Wu, Boafft: Distributed for Big Data Storage in the Cloud,IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 4, NO. X, XXXXX [14] Deepavali Bhagwat,Kave Eshghi,Darrell D. E. Long, Extreme Binning: Scalable, Parallel for Chunkbased File Backup,in Proc. IEEE Int. Symp. Modell. Anal. Simulation Comput. Telecommun. Syst., 2009, pp [15] C. Liu, Y. Lu, C. Shi, et al., ADMAD: Application-driven metadata aware deduplication archival storage System,in Proc. 5th IEEE Int. Workshop Storage Netw. Archit. Parallel I/Os, 2008, pp

International Journal of Computer Engineering and Applications, Volume XII, Special Issue, March 18, ISSN

International Journal of Computer Engineering and Applications, Volume XII, Special Issue, March 18,   ISSN International Journal of Computer Engineering and Applications, Volume XII, Special Issue, March 18, www.ijcea.com ISSN 2321-3469 SECURE DATA DEDUPLICATION FOR CLOUD STORAGE: A SURVEY Vidya Kurtadikar

More information

Understanding Primary Storage Optimization Options Jered Floyd Permabit Technology Corp.

Understanding Primary Storage Optimization Options Jered Floyd Permabit Technology Corp. Understanding Primary Storage Optimization Options Jered Floyd Permabit Technology Corp. Primary Storage Optimization Technologies that let you store more data on the same storage Thin provisioning Copy-on-write

More information

A Review on Backup-up Practices using Deduplication

A Review on Backup-up Practices using Deduplication Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 9, September 2015,

More information

LOAD BALANCING AND DEDUPLICATION

LOAD BALANCING AND DEDUPLICATION LOAD BALANCING AND DEDUPLICATION Mr.Chinmay Chikode Mr.Mehadi Badri Mr.Mohit Sarai Ms.Kshitija Ubhe ABSTRACT Load Balancing is a method of distributing workload across multiple computing resources such

More information

Compression and Decompression of Virtual Disk Using Deduplication

Compression and Decompression of Virtual Disk Using Deduplication Compression and Decompression of Virtual Disk Using Deduplication Bharati Ainapure 1, Siddhant Agarwal 2, Rukmi Patel 3, Ankita Shingvi 4, Abhishek Somani 5 1 Professor, Department of Computer Engineering,

More information

DATA DEDUPLCATION AND MIGRATION USING LOAD REBALANCING APPROACH IN HDFS Pritee Patil 1, Nitin Pise 2,Sarika Bobde 3 1

DATA DEDUPLCATION AND MIGRATION USING LOAD REBALANCING APPROACH IN HDFS Pritee Patil 1, Nitin Pise 2,Sarika Bobde 3 1 DATA DEDUPLCATION AND MIGRATION USING LOAD REBALANCING APPROACH IN HDFS Pritee Patil 1, Nitin Pise 2,Sarika Bobde 3 1 Department of Computer Engineering 2 Department of Computer Engineering Maharashtra

More information

Hyper-converged Secondary Storage for Backup with Deduplication Q & A. The impact of data deduplication on the backup process

Hyper-converged Secondary Storage for Backup with Deduplication Q & A. The impact of data deduplication on the backup process Hyper-converged Secondary Storage for Backup with Deduplication Q & A The impact of data deduplication on the backup process Table of Contents Introduction... 3 What is data deduplication?... 3 Is all

More information

IMPLEMENTATION OF INFORMATION RETRIEVAL (IR) ALGORITHM FOR CLOUD COMPUTING: A COMPARATIVE STUDY BETWEEN WITH AND WITHOUT MAPREDUCE MECHANISM *

IMPLEMENTATION OF INFORMATION RETRIEVAL (IR) ALGORITHM FOR CLOUD COMPUTING: A COMPARATIVE STUDY BETWEEN WITH AND WITHOUT MAPREDUCE MECHANISM * Journal of Contemporary Issues in Business Research ISSN 2305-8277 (Online), 2012, Vol. 1, No. 2, 42-56. Copyright of the Academic Journals JCIBR All rights reserved. IMPLEMENTATION OF INFORMATION RETRIEVAL

More information

Chapter 1. Storage Concepts. CommVault Concepts & Design Strategies: https://www.createspace.com/

Chapter 1. Storage Concepts. CommVault Concepts & Design Strategies: https://www.createspace.com/ Chapter 1 Storage Concepts 4 - Storage Concepts In order to understand CommVault concepts regarding storage management we need to understand how and why we protect data, traditional backup methods, and

More information

Deduplication Storage System

Deduplication Storage System Deduplication Storage System Kai Li Charles Fitzmorris Professor, Princeton University & Chief Scientist and Co-Founder, Data Domain, Inc. 03/11/09 The World Is Becoming Data-Centric CERN Tier 0 Business

More information

SSDs in the Cloud Dave Wright, CEO SolidFire, Inc.

SSDs in the Cloud Dave Wright, CEO SolidFire, Inc. SSDs in the Cloud Dave Wright, CEO SolidFire, Inc. Storage in the Cloud Local Storage Swap Temp Files Data Processing WD Seagate FusionIO Bulk Object Storage Media Files Content Distribution Backup Archival

More information

In-line Deduplication for Cloud storage to Reduce Fragmentation by using Historical Knowledge

In-line Deduplication for Cloud storage to Reduce Fragmentation by using Historical Knowledge In-line Deduplication for Cloud storage to Reduce Fragmentation by using Historical Knowledge Smitha.M. S, Prof. Janardhan Singh Mtech Computer Networking, Associate Professor Department of CSE, Cambridge

More information

Deploying De-Duplication on Ext4 File System

Deploying De-Duplication on Ext4 File System Deploying De-Duplication on Ext4 File System Usha A. Joglekar 1, Bhushan M. Jagtap 2, Koninika B. Patil 3, 1. Asst. Prof., 2, 3 Students Department of Computer Engineering Smt. Kashibai Navale College

More information

ADVANCED DATA REDUCTION CONCEPTS

ADVANCED DATA REDUCTION CONCEPTS ADVANCED DATA REDUCTION CONCEPTS Thomas Rivera, Hitachi Data Systems Gene Nagle, BridgeSTOR Author: Thomas Rivera, Hitachi Data Systems Author: Gene Nagle, BridgeSTOR SNIA Legal Notice The material contained

More information

QADR with Energy Consumption for DIA in Cloud

QADR with Energy Consumption for DIA in Cloud Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 4, NO. X, XXXXX Boafft: Distributed Deduplication for Big Data Storage in the Cloud

IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 4, NO. X, XXXXX Boafft: Distributed Deduplication for Big Data Storage in the Cloud TRANSACTIONS ON CLOUD COMPUTING, VOL. 4, NO. X, XXXXX 2016 1 Boafft: Distributed Deduplication for Big Data Storage in the Cloud Shengmei Luo, Guangyan Zhang, Chengwen Wu, Samee U. Khan, Senior Member,,

More information

ADVANCED DEDUPLICATION CONCEPTS. Thomas Rivera, BlueArc Gene Nagle, Exar

ADVANCED DEDUPLICATION CONCEPTS. Thomas Rivera, BlueArc Gene Nagle, Exar ADVANCED DEDUPLICATION CONCEPTS Thomas Rivera, BlueArc Gene Nagle, Exar SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individual members may

More information

Tom Sas HP. Author: SNIA - Data Protection & Capacity Optimization (DPCO) Committee

Tom Sas HP. Author: SNIA - Data Protection & Capacity Optimization (DPCO) Committee Advanced PRESENTATION Data Reduction TITLE GOES HERE Concepts Tom Sas HP Author: SNIA - Data Protection & Capacity Optimization (DPCO) Committee SNIA Legal Notice The material contained in this tutorial

More information

PRESENTATION TITLE GOES HERE

PRESENTATION TITLE GOES HERE Enterprise Storage PRESENTATION TITLE GOES HERE Leah Schoeb, Member of SNIA Technical Council SNIA EmeraldTM Training SNIA Emerald Power Efficiency Measurement Specification, for use in EPA ENERGY STAR

More information

Turning Object. Storage into Virtual Machine Storage. White Papers

Turning Object. Storage into Virtual Machine Storage. White Papers Turning Object Open vstorage is the World s fastest Distributed Block Store that spans across different Datacenter. It combines ultrahigh performance and low latency connections with a data integrity that

More information

Current Topics in OS Research. So, what s hot?

Current Topics in OS Research. So, what s hot? Current Topics in OS Research COMP7840 OSDI Current OS Research 0 So, what s hot? Operating systems have been around for a long time in many forms for different types of devices It is normally general

More information

ChunkStash: Speeding Up Storage Deduplication using Flash Memory

ChunkStash: Speeding Up Storage Deduplication using Flash Memory ChunkStash: Speeding Up Storage Deduplication using Flash Memory Biplob Debnath +, Sudipta Sengupta *, Jin Li * * Microsoft Research, Redmond (USA) + Univ. of Minnesota, Twin Cities (USA) Deduplication

More information

ISSN: (Online) Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies

ISSN: (Online) Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Paper / Case Study Available online at: www.ijarcsms.com

More information

Accelerating Restore and Garbage Collection in Deduplication-based Backup Systems via Exploiting Historical Information

Accelerating Restore and Garbage Collection in Deduplication-based Backup Systems via Exploiting Historical Information Accelerating Restore and Garbage Collection in Deduplication-based Backup Systems via Exploiting Historical Information Min Fu, Dan Feng, Yu Hua, Xubin He, Zuoning Chen *, Wen Xia, Fangting Huang, Qing

More information

Self Destruction Of Data On Cloud Computing

Self Destruction Of Data On Cloud Computing Volume 118 No. 24 2018 ISSN: 1314-3395 (on-line version) url: http://www.acadpubl.eu/hub/ http://www.acadpubl.eu/hub/ Self Destruction Of Data On Cloud Computing Pradnya Harpale 1,Mohini Korde 2, Pritam

More information

The Fusion Distributed File System

The Fusion Distributed File System Slide 1 / 44 The Fusion Distributed File System Dongfang Zhao February 2015 Slide 2 / 44 Outline Introduction FusionFS System Architecture Metadata Management Data Movement Implementation Details Unique

More information

Reducing The De-linearization of Data Placement to Improve Deduplication Performance

Reducing The De-linearization of Data Placement to Improve Deduplication Performance Reducing The De-linearization of Data Placement to Improve Deduplication Performance Yujuan Tan 1, Zhichao Yan 2, Dan Feng 2, E. H.-M. Sha 1,3 1 School of Computer Science & Technology, Chongqing University

More information

Rio-2 Hybrid Backup Server

Rio-2 Hybrid Backup Server A Revolution in Data Storage for Today s Enterprise March 2018 Notices This white paper provides information about the as of the date of issue of the white paper. Processes and general practices are subject

More information

New research on Key Technologies of unstructured data cloud storage

New research on Key Technologies of unstructured data cloud storage 2017 International Conference on Computing, Communications and Automation(I3CA 2017) New research on Key Technologies of unstructured data cloud storage Songqi Peng, Rengkui Liua, *, Futian Wang State

More information

Cloud Computing. Hwajung Lee. Key Reference: Prof. Jong-Moon Chung s Lecture Notes at Yonsei University

Cloud Computing. Hwajung Lee. Key Reference: Prof. Jong-Moon Chung s Lecture Notes at Yonsei University Cloud Computing Hwajung Lee Key Reference: Prof. Jong-Moon Chung s Lecture Notes at Yonsei University Cloud Computing Cloud Introduction Cloud Service Model Big Data Hadoop MapReduce HDFS (Hadoop Distributed

More information

Technical White Paper: IntelliFlash Architecture

Technical White Paper: IntelliFlash Architecture Executive Summary... 2 IntelliFlash OS... 3 Achieving High Performance & High Capacity... 3 Write Cache... 4 Read Cache... 5 Metadata Acceleration... 5 Data Reduction... 6 Enterprise Resiliency & Capabilities...

More information

Alternative Approaches for Deduplication in Cloud Storage Environment

Alternative Approaches for Deduplication in Cloud Storage Environment International Journal of Computational Intelligence Research ISSN 0973-1873 Volume 13, Number 10 (2017), pp. 2357-2363 Research India Publications http://www.ripublication.com Alternative Approaches for

More information

CLOUD-SCALE FILE SYSTEMS

CLOUD-SCALE FILE SYSTEMS Data Management in the Cloud CLOUD-SCALE FILE SYSTEMS 92 Google File System (GFS) Designing a file system for the Cloud design assumptions design choices Architecture GFS Master GFS Chunkservers GFS Clients

More information

DDSF: A Data Deduplication System Framework for Cloud Environments

DDSF: A Data Deduplication System Framework for Cloud Environments DDSF: A Data Deduplication System Framework for Cloud Environments Jianhua Gu, Chuang Zhang and Wenwei Zhang School of Computer Science and Technology, High Performance Computing R&D Center Northwestern

More information

EaSync: A Transparent File Synchronization Service across Multiple Machines

EaSync: A Transparent File Synchronization Service across Multiple Machines EaSync: A Transparent File Synchronization Service across Multiple Machines Huajian Mao 1,2, Hang Zhang 1,2, Xianqiang Bao 1,2, Nong Xiao 1,2, Weisong Shi 3, and Yutong Lu 1,2 1 State Key Laboratory of

More information

Considerations to Accurately Measure Solid State Storage Systems

Considerations to Accurately Measure Solid State Storage Systems Considerations to Accurately Measure Solid State Storage Systems PRESENTATION TITLE GOES HERE Leah Schoeb Storage Solutions and Performance Manager Intel Landscape for Solid State Storage Greater Performance

More information

A Thorough Introduction to 64-Bit Aggregates

A Thorough Introduction to 64-Bit Aggregates Technical Report A Thorough Introduction to 64-Bit Aggregates Shree Reddy, NetApp September 2011 TR-3786 CREATING AND MANAGING LARGER-SIZED AGGREGATES The NetApp Data ONTAP 8.0 operating system operating

More information

VERITAS Storage Foundation 4.0 TM for Databases

VERITAS Storage Foundation 4.0 TM for Databases VERITAS Storage Foundation 4.0 TM for Databases Powerful Manageability, High Availability and Superior Performance for Oracle, DB2 and Sybase Databases Enterprises today are experiencing tremendous growth

More information

Storage Optimization with Oracle Database 11g

Storage Optimization with Oracle Database 11g Storage Optimization with Oracle Database 11g Terabytes of Data Reduce Storage Costs by Factor of 10x Data Growth Continues to Outpace Budget Growth Rate of Database Growth 1000 800 600 400 200 1998 2000

More information

Why Datrium DVX is Best for VDI

Why Datrium DVX is Best for VDI Why Datrium DVX is Best for VDI 385 Moffett Park Dr. Sunnyvale, CA 94089 844-478-8349 www.datrium.com Technical Report Introduction Managing a robust and growing virtual desktop infrastructure in current

More information

A DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU

A DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU A DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU PRESENTED BY ROMAN SHOR Overview Technics of data reduction in storage systems:

More information

Copyright 2010 EMC Corporation. Do not Copy - All Rights Reserved.

Copyright 2010 EMC Corporation. Do not Copy - All Rights Reserved. 1 Using patented high-speed inline deduplication technology, Data Domain systems identify redundant data as they are being stored, creating a storage foot print that is 10X 30X smaller on average than

More information

50 TB. Traditional Storage + Data Protection Architecture. StorSimple Cloud-integrated Storage. Traditional CapEx: $375K Support: $75K per Year

50 TB. Traditional Storage + Data Protection Architecture. StorSimple Cloud-integrated Storage. Traditional CapEx: $375K Support: $75K per Year Compelling Economics: Traditional Storage vs. StorSimple Traditional Storage + Data Protection Architecture StorSimple Cloud-integrated Storage Servers Servers Primary Volume Disk Array ($100K; Double

More information

HYCU and ExaGrid Hyper-converged Backup for Nutanix

HYCU and ExaGrid Hyper-converged Backup for Nutanix HYCU and ExaGrid Hyper-converged Backup for Nutanix Backing Up and Recovering Data: Nutanix, ExaGrid and HYCU As IT data centers move to hyper-converged infrastructure, new and innovative backup approaches

More information

DEDUPLICATION OF VM MEMORY PAGES USING MAPREDUCE IN LIVE MIGRATION

DEDUPLICATION OF VM MEMORY PAGES USING MAPREDUCE IN LIVE MIGRATION DEDUPLICATION OF VM MEMORY PAGES USING MAPREDUCE IN LIVE MIGRATION TYJ Naga Malleswari 1 and Vadivu G 2 1 Department of CSE, Sri Ramaswami Memorial University, Chennai, India 2 Department of Information

More information

Introducing Tegile. Company Overview. Product Overview. Solutions & Use Cases. Partnering with Tegile

Introducing Tegile. Company Overview. Product Overview. Solutions & Use Cases. Partnering with Tegile Tegile Systems 1 Introducing Tegile Company Overview Product Overview Solutions & Use Cases Partnering with Tegile 2 Company Overview Company Overview Te gile - [tey-jile] Tegile = technology + agile Founded

More information

Scale-out Object Store for PB/hr Backups and Long Term Archive April 24, 2014

Scale-out Object Store for PB/hr Backups and Long Term Archive April 24, 2014 Scale-out Object Store for PB/hr Backups and Long Term Archive April 24, 2014 Gideon Senderov Director, Advanced Storage Products NEC Corporation of America Long-Term Data in the Data Center (EB) 140 120

More information

Hedvig as backup target for Veeam

Hedvig as backup target for Veeam Hedvig as backup target for Veeam Solution Whitepaper Version 1.0 April 2018 Table of contents Executive overview... 3 Introduction... 3 Solution components... 4 Hedvig... 4 Hedvig Virtual Disk (vdisk)...

More information

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures GFS Overview Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures Interface: non-posix New op: record appends (atomicity matters,

More information

SEMINAR. Achieve 100% Backup Success! Achieve 100% Backup Success! Today s Goals. Today s Goals

SEMINAR. Achieve 100% Backup Success! Achieve 100% Backup Success! Today s Goals. Today s Goals Presented by: George Crump President and Founder, Storage Switzerland www.storagedecisions.com Today s Goals Achieve 100% Backup Success! o The Data Protection Problem Bad Foundation, New Problems o Fixing

More information

Chapter 11. SnapProtect Technology

Chapter 11. SnapProtect Technology Chapter 11 SnapProtect Technology Hardware based snapshot technology provides the ability to use optimized hardware and disk appliances to snap data on disk arrays providing quick recovery by reverting

More information

Technology Insight Series

Technology Insight Series IBM ProtecTIER Deduplication for z/os John Webster March 04, 2010 Technology Insight Series Evaluator Group Copyright 2010 Evaluator Group, Inc. All rights reserved. Announcement Summary The many data

More information

WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression

WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression Philip Shilane, Mark Huang, Grant Wallace, & Windsor Hsu Backup Recovery Systems Division EMC Corporation Introduction

More information

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT PhD Summary DOCTORATE OF PHILOSOPHY IN COMPUTER SCIENCE & ENGINEERING By Sandip Kumar Goyal (09-PhD-052) Under the Supervision

More information

White paper ETERNUS CS800 Data Deduplication Background

White paper ETERNUS CS800 Data Deduplication Background White paper ETERNUS CS800 - Data Deduplication Background This paper describes the process of Data Deduplication inside of ETERNUS CS800 in detail. The target group consists of presales, administrators,

More information

Saving capacity by using Thin provisioning, Deduplication, and Compression In Qsan Unified Storage. U400Q Series U600Q Series

Saving capacity by using Thin provisioning, Deduplication, and Compression In Qsan Unified Storage. U400Q Series U600Q Series Saving capacity by using Thin provisioning, Deduplication, and Compression In Qsan Unified Storage U400Q Series U600Q Series Version 1.0.0 November 2012 Copyright Copyright@2004~2012, Qsan Technology,

More information

EMC VNX2 Deduplication and Compression

EMC VNX2 Deduplication and Compression White Paper VNX5200, VNX5400, VNX5600, VNX5800, VNX7600, & VNX8000 Maximizing effective capacity utilization Abstract This white paper discusses the capacity optimization technologies delivered in the

More information

TCO REPORT. NAS File Tiering. Economic advantages of enterprise file management

TCO REPORT. NAS File Tiering. Economic advantages of enterprise file management TCO REPORT NAS File Tiering Economic advantages of enterprise file management Executive Summary Every organization is under pressure to meet the exponential growth in demand for file storage capacity.

More information

FLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568

FLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568 FLAT DATACENTER STORAGE Paper-3 Presenter-Pratik Bhatt fx6568 FDS Main discussion points A cluster storage system Stores giant "blobs" - 128-bit ID, multi-megabyte content Clients and servers connected

More information

Caching and Buffering in HDF5

Caching and Buffering in HDF5 Caching and Buffering in HDF5 September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 1 Software stack Life cycle: What happens to data when it is transferred from application buffer to HDF5 file and from HDF5

More information

File Structures and Indexing

File Structures and Indexing File Structures and Indexing CPS352: Database Systems Simon Miner Gordon College Last Revised: 10/11/12 Agenda Check-in Database File Structures Indexing Database Design Tips Check-in Database File Structures

More information

ASN Configuration Best Practices

ASN Configuration Best Practices ASN Configuration Best Practices Managed machine Generally used CPUs and RAM amounts are enough for the managed machine: CPU still allows us to read and write data faster than real IO subsystem allows.

More information

Map-Reduce. Marco Mura 2010 March, 31th

Map-Reduce. Marco Mura 2010 March, 31th Map-Reduce Marco Mura (mura@di.unipi.it) 2010 March, 31th This paper is a note from the 2009-2010 course Strumenti di programmazione per sistemi paralleli e distribuiti and it s based by the lessons of

More information

DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE

DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE WHITEPAPER DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE A Detailed Review ABSTRACT While tape has been the dominant storage medium for data protection for decades because of its low cost, it is steadily

More information

HOW DATA DEDUPLICATION WORKS A WHITE PAPER

HOW DATA DEDUPLICATION WORKS A WHITE PAPER HOW DATA DEDUPLICATION WORKS A WHITE PAPER HOW DATA DEDUPLICATION WORKS ABSTRACT IT departments face explosive data growth, driving up costs of storage for backup and disaster recovery (DR). For this reason,

More information

Market Report. Scale-out 2.0: Simple, Scalable, Services- Oriented Storage. Scale-out Storage Meets the Enterprise. June 2010.

Market Report. Scale-out 2.0: Simple, Scalable, Services- Oriented Storage. Scale-out Storage Meets the Enterprise. June 2010. Market Report Scale-out 2.0: Simple, Scalable, Services- Oriented Storage Scale-out Storage Meets the Enterprise By Terri McClure June 2010 Market Report: Scale-out 2.0: Simple, Scalable, Services-Oriented

More information

Compression and Replication of Device Files using Deduplication Technique

Compression and Replication of Device Files using Deduplication Technique Compression and Replication of Device Files using Deduplication Technique Bharati Ainapure Assistant Professor Department of Computer Engineering. MITCOE, Pune University, India. Siddhant Agarwal Abhishek

More information

Zero Data Loss Recovery Appliance DOAG Konferenz 2014, Nürnberg

Zero Data Loss Recovery Appliance DOAG Konferenz 2014, Nürnberg Zero Data Loss Recovery Appliance Frank Schneede, Sebastian Solbach Systemberater, BU Database, Oracle Deutschland B.V. & Co. KG Safe Harbor Statement The following is intended to outline our general product

More information

1 Quantum Corporation 1

1 Quantum Corporation 1 1 Tactics and Tips for Protecting Virtual Servers Mark Eastman Director, Solutions Marketing April 2008 VMware Changing the Way Data Protection is Done No longer 1 server, 1 backup paradigm App Virtual

More information

The Analysis Research of Hierarchical Storage System Based on Hadoop Framework Yan LIU 1, a, Tianjian ZHENG 1, Mingjiang LI 1, Jinpeng YUAN 1

The Analysis Research of Hierarchical Storage System Based on Hadoop Framework Yan LIU 1, a, Tianjian ZHENG 1, Mingjiang LI 1, Jinpeng YUAN 1 International Conference on Intelligent Systems Research and Mechatronics Engineering (ISRME 2015) The Analysis Research of Hierarchical Storage System Based on Hadoop Framework Yan LIU 1, a, Tianjian

More information

Open vstorage EMC SCALEIO Architectural Comparison

Open vstorage EMC SCALEIO Architectural Comparison Open vstorage EMC SCALEIO Architectural Comparison Open vstorage is the World s fastest Distributed Block Store that spans across different Datacenter. It combines ultrahigh performance and low latency

More information

Scale-Out Architectures for Secondary Storage

Scale-Out Architectures for Secondary Storage Technology Insight Paper Scale-Out Architectures for Secondary Storage NEC is a Pioneer with HYDRAstor By Steve Scully, Sr. Analyst February 2018 Scale-Out Architectures for Secondary Storage 1 Scale-Out

More information

High Performance Computing on MapReduce Programming Framework

High Performance Computing on MapReduce Programming Framework International Journal of Private Cloud Computing Environment and Management Vol. 2, No. 1, (2015), pp. 27-32 http://dx.doi.org/10.21742/ijpccem.2015.2.1.04 High Performance Computing on MapReduce Programming

More information

vsan All Flash Features First Published On: Last Updated On:

vsan All Flash Features First Published On: Last Updated On: First Published On: 11-07-2016 Last Updated On: 11-07-2016 1 1. vsan All Flash Features 1.1.Deduplication and Compression 1.2.RAID-5/RAID-6 Erasure Coding Table of Contents 2 1. vsan All Flash Features

More information

Mitigating Data Skew Using Map Reduce Application

Mitigating Data Skew Using Map Reduce Application Ms. Archana P.M Mitigating Data Skew Using Map Reduce Application Mr. Malathesh S.H 4 th sem, M.Tech (C.S.E) Associate Professor C.S.E Dept. M.S.E.C, V.T.U Bangalore, India archanaanil062@gmail.com M.S.E.C,

More information

IBM Spectrum Protect Version Introduction to Data Protection Solutions IBM

IBM Spectrum Protect Version Introduction to Data Protection Solutions IBM IBM Spectrum Protect Version 8.1.2 Introduction to Data Protection Solutions IBM IBM Spectrum Protect Version 8.1.2 Introduction to Data Protection Solutions IBM Note: Before you use this information

More information

Flashed-Optimized VPSA. Always Aligned with your Changing World

Flashed-Optimized VPSA. Always Aligned with your Changing World Flashed-Optimized VPSA Always Aligned with your Changing World Yair Hershko Co-founder, VP Engineering, Zadara Storage 3 Modern Data Storage for Modern Computing Innovating data services to meet modern

More information

TOOLS FOR INTEGRATING BIG DATA IN CLOUD COMPUTING: A STATE OF ART SURVEY

TOOLS FOR INTEGRATING BIG DATA IN CLOUD COMPUTING: A STATE OF ART SURVEY Journal of Analysis and Computation (JAC) (An International Peer Reviewed Journal), www.ijaconline.com, ISSN 0973-2861 International Conference on Emerging Trends in IOT & Machine Learning, 2018 TOOLS

More information

Data Centers. Tom Anderson

Data Centers. Tom Anderson Data Centers Tom Anderson Transport Clarification RPC messages can be arbitrary size Ex: ok to send a tree or a hash table Can require more than one packet sent/received We assume messages can be dropped,

More information

CSE380 - Operating Systems

CSE380 - Operating Systems CSE380 - Operating Systems Notes for Lecture 17-11/10/05 Matt Blaze, Micah Sherr (some examples by Insup Lee) Implementing File Systems We ve looked at the user view of file systems names, directory structure,

More information

Nutanix Tech Note. Virtualizing Microsoft Applications on Web-Scale Infrastructure

Nutanix Tech Note. Virtualizing Microsoft Applications on Web-Scale Infrastructure Nutanix Tech Note Virtualizing Microsoft Applications on Web-Scale Infrastructure The increase in virtualization of critical applications has brought significant attention to compute and storage infrastructure.

More information

SolidFire and Ceph Architectural Comparison

SolidFire and Ceph Architectural Comparison The All-Flash Array Built for the Next Generation Data Center SolidFire and Ceph Architectural Comparison July 2014 Overview When comparing the architecture for Ceph and SolidFire, it is clear that both

More information

MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti

MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16 MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti 1 Department

More information

Deduplication and Its Application to Corporate Data

Deduplication and Its Application to Corporate Data White Paper Deduplication and Its Application to Corporate Data Lorem ipsum ganus metronique elit quesal norit parique et salomin taren ilat mugatoque This whitepaper explains deduplication techniques

More information

Speeding Up Cloud/Server Applications Using Flash Memory

Speeding Up Cloud/Server Applications Using Flash Memory Speeding Up Cloud/Server Applications Using Flash Memory Sudipta Sengupta and Jin Li Microsoft Research, Redmond, WA, USA Contains work that is joint with Biplob Debnath (Univ. of Minnesota) Flash Memory

More information

IMAGE COMPRESSION USING HYBRID TRANSFORM TECHNIQUE

IMAGE COMPRESSION USING HYBRID TRANSFORM TECHNIQUE Volume 4, No. 1, January 2013 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at www.jgrcs.info IMAGE COMPRESSION USING HYBRID TRANSFORM TECHNIQUE Nikita Bansal *1, Sanjay

More information

Computer Based Image Algorithm For Wireless Sensor Networks To Prevent Hotspot Locating Attack

Computer Based Image Algorithm For Wireless Sensor Networks To Prevent Hotspot Locating Attack Computer Based Image Algorithm For Wireless Sensor Networks To Prevent Hotspot Locating Attack J.Anbu selvan 1, P.Bharat 2, S.Mathiyalagan 3 J.Anand 4 1, 2, 3, 4 PG Scholar, BIT, Sathyamangalam ABSTRACT:

More information

IBM Tivoli Storage Manager Version Introduction to Data Protection Solutions IBM

IBM Tivoli Storage Manager Version Introduction to Data Protection Solutions IBM IBM Tivoli Storage Manager Version 7.1.6 Introduction to Data Protection Solutions IBM IBM Tivoli Storage Manager Version 7.1.6 Introduction to Data Protection Solutions IBM Note: Before you use this

More information

The What, Why and How of the Pure Storage Enterprise Flash Array. Ethan L. Miller (and a cast of dozens at Pure Storage)

The What, Why and How of the Pure Storage Enterprise Flash Array. Ethan L. Miller (and a cast of dozens at Pure Storage) The What, Why and How of the Pure Storage Enterprise Flash Array Ethan L. Miller (and a cast of dozens at Pure Storage) Enterprise storage: $30B market built on disk Key players: EMC, NetApp, HP, etc.

More information

Analyzing and Improving Load Balancing Algorithm of MooseFS

Analyzing and Improving Load Balancing Algorithm of MooseFS , pp. 169-176 http://dx.doi.org/10.14257/ijgdc.2014.7.4.16 Analyzing and Improving Load Balancing Algorithm of MooseFS Zhang Baojun 1, Pan Ruifang 1 and Ye Fujun 2 1. New Media Institute, Zhejiang University

More information

CSE 153 Design of Operating Systems

CSE 153 Design of Operating Systems CSE 153 Design of Operating Systems Winter 2018 Lecture 22: File system optimizations and advanced topics There s more to filesystems J Standard Performance improvement techniques Alternative important

More information

Performance Analysis of Hadoop Application For Heterogeneous Systems

Performance Analysis of Hadoop Application For Heterogeneous Systems IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 3, Ver. I (May-Jun. 2016), PP 30-34 www.iosrjournals.org Performance Analysis of Hadoop Application

More information

Scale-out Data Deduplication Architecture

Scale-out Data Deduplication Architecture Scale-out Data Deduplication Architecture Gideon Senderov Product Management & Technical Marketing NEC Corporation of America Outline Data Growth and Retention Deduplication Methods Legacy Architecture

More information

Oracle Zero Data Loss Recovery Appliance (ZDLRA)

Oracle Zero Data Loss Recovery Appliance (ZDLRA) Oracle Zero Data Loss Recovery Appliance (ZDLRA) Overview Attila Mester Principal Sales Consultant Data Protection Copyright 2015, Oracle and/or its affiliates. All rights reserved. Safe Harbor Statement

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771 Overview Google file system is a scalable distributed file system

More information

Parallelizing Inline Data Reduction Operations for Primary Storage Systems

Parallelizing Inline Data Reduction Operations for Primary Storage Systems Parallelizing Inline Data Reduction Operations for Primary Storage Systems Jeonghyeon Ma ( ) and Chanik Park Department of Computer Science and Engineering, POSTECH, Pohang, South Korea {doitnow0415,cipark}@postech.ac.kr

More information

Identifying Workloads for the Cloud

Identifying Workloads for the Cloud Identifying Workloads for the Cloud 1 This brief is based on a webinar in RightScale s I m in the Cloud Now What? series. Browse our entire library for webinars on cloud computing management. Meet our

More information

ABSTRACT I. INTRODUCTION

ABSTRACT I. INTRODUCTION 2018 IJSRSET Volume 4 Issue 4 Print ISSN: 2395-1990 Online ISSN : 2394-4099 Themed Section : Engineering and Technology An Efficient Search Method over an Encrypted Cloud Data Dipeeka Radke, Nikita Hatwar,

More information

Software-defined Storage: Fast, Safe and Efficient

Software-defined Storage: Fast, Safe and Efficient Software-defined Storage: Fast, Safe and Efficient TRY NOW Thanks to Blockchain and Intel Intelligent Storage Acceleration Library Every piece of data is required to be stored somewhere. We all know about

More information

If you knew then...what you know now. The Why, What and Who of scale-out storage

If you knew then...what you know now. The Why, What and Who of scale-out storage If you knew then......what you know now The Why, What and Who of scale-out storage The Why? Calculating your storage needs has never been easy and now it is becoming more complicated. With Big Data (pssst

More information