DEDUPLICATION AWARE AND DUPLICATE ELIMINATION SCHEME FOR DATA REDUCTION IN BACKUP STORAGE SYSTEMS
|
|
- Bridget Elinor Newman
- 6 years ago
- Views:
Transcription
1 DEDUPLICATION AWARE AND DUPLICATE ELIMINATION SCHEME FOR DATA REDUCTION IN BACKUP STORAGE SYSTEMS NIMMAGADDA SRIKANTHI, DR G.RAMU, YERRAGUDIPADU Department Of Computer Science and Professor, Department Of Computer SUBBARAYUDU Engineering, Institute Of Aeronautical Science And Engineering,Institute Of Assist Professor Department of Engineering Dundigal Hyderabad Aeronautical Engineering Dundigal Computer Science And Engineering, Hyderabad Institute Of Aeronautical Engineering Dundigal Hyderabad ABSTRACT Data reduction is turning into important in storage systems in huge information era. Data deduplication is a way of removing savage desires by means of disposing of redundant statistics. To gain the excessive statistics deduplication, and green deduplication method i.e. DARE, a less expenses deduplication-conscious resemblance detection and elimination scheme is designed. In this gadget, we perform deduplication on cloud. Firstly we fragment the record and then observe AES encryption set of rules to encrypt the record. This affords safety. SHA-256 is used to test reproduction files. DARE plays deduplication at 3 levels i.e. root level, block (chew) stage and then adjacent chunks are checked. The DARE use Duplicate Adjacency resemblance Detection (Dupadj) scheme. It considers any 2 records blocks to be comparable (i.e. Applicants for delta compression). Our device achieves a higher throughput, with the aid of using existing replica adjoining information for similarity identification and discovers most duplication on data. Key Words - Data deduplication, duplicate adjacency, delta compression 1. INTRODUCTION The redundancy of the information on the cloud garage is growing. Thus exploiting the replica information can help in saving the gap. It allows in lowering the time too required for moving statistics in lowbandwidth network. Data discount is the method of minimizing the quantity of information that wishes to be stored in information garage surroundings. Data Deduplication has turn out to be an important and monetary manner to dispose of the redundant information segments, accordingly assuaging the stress incurred by using large amounts of statistics want to store, Fingerprints are used to symbolize and identify equal records blocks while acting statistics deduplication. To address this task, statistics deduplication technique is desired. Data deduplication strategies are broadly utilized by storage servers to do away with the opportunities of storing multiple copies of the information. Deduplication identifies replica data portions going to be stored in storage systems also removes duplication in existing saved data in storage systems. Hence yield a big fee saving. There are strategies available for duplication checking which includes: 1) File level duplication check. 2) Chunk level duplication test. In first method, best the document with identical call are removed from the storage whereas in second, the duplicate chunks of identical documents are eliminated and shops best one reproduction of them. In this paper, we introduce DARE,non-replica aware similarity identification and elimination scheme. DARE integrates two schemes i.e. information non-replica and delta compression to acquire excessive statistics discount performance at less expenses. A DupAdj technique is proposed to take advantage of present reproduction- 157
2 adjacency records behind non-replica to locate almost identical information blocks for delta compression. Precisely, due to locality of comparable information in support datasets, the non-replica blocks which might be neighbouring to the duplicate ones are examined as correct delta compression applicants for similarly statistics discount. A conceptual and actual learning of the conventional terrificfunction method is performed, which indicates that stepped forward similarity identification for additionally delta compression is viable whilst the previously mentioned present duplicate-adjacency statistics is missing or constrained. An research into the rehabilitation of nonreplicated support facts indicates that delta compression has the capability to refine the statistics-repair overall execution of non-replicate most effective networks with the aid of similarly removing redundancy after deduplication and for that reason enlarging the logical space of the restoration cache. 2.LITERATURE SURVEY Now a days increasing the records garage potential is one of the crucial challenges, due to the greater needs for using cloud offerings. There have been offered several approaches to become aware of and remove duplicated records in virtual machines prior to sending their data to a shared storage resource. Therefore approach of storage records ought to be green also the method of locating data have to be intelligent as a great deal as feasible. However, there is no technique among various storing information tactics, to be in reality predicted to have the good overall performance in the use of bandwidth for storage. One of the beneficial strategies to have fast and green data garage is de-duplication. In this paper, we are able to address various de duplication approaches and don't forget advantages and downsides of them. With latency sensitive workloads, inline deduplication has numerous summons: fragmentation controlling to more disk search for reads, de-duplication handling expenses in the crucial course, and additional manifest because of IOs for dedup metadata control. To token those summons, we attains two perceptions through detect actual-international, number of workloads: i) There's great use of data elements with in a relatively small duration on disk for replicated statistics, and ii) Reuse of specific data with in a relatively small duration exists inside the gain of replicated hunk. Initially, we grasp spatial locality to carry out deduplication only while the duplicate hunks from lengthy sequences on disk, thereby, warding off fragmentation. We introduce a big scale investigation of initial information deduplication and employ the detecting to pressure the layout of a recently developed number one information deduplication gadget. Document case information was examined from 15 universly dispensed file servers web organizing information for above 2000 customers in a massive multinational employer. The discoveries are utilized to reach at blocking and data encoding perspective which increase deduplication reduction at the same time as lessens the causing data about data and constructing a constant bite sort dispersal. Removing of deduplication processing with information length is performed the use of a RAM saving block of collection of buckets in a array and statistics dividing so that reminiscence, CPU, and disk are searching assets continue to be had to meet the workload of serving IO. We currently 158
3 introduce the architecture of a brand current initial statistics deduplication gadget and examine the deduplication performance and chunking elements of the device. 3.REALTED WORK Data deduplication is gaining growing friction in record-exhaustive garage structures as only the most green statistics discount tactics in latest years. Fingerprintbased non-replica strategies put off replicate blocks via examining their comfy fingerprints.one of the primary summons going through data non-replication is a way to mostly locate and do away with statistics redundancy in storage systems at low overheads and higher throughput[1]. Resembalnce detection with delta compression, is an important method to information depletion in storage structures, changed into presented more than 10 years in the past but turned into ultimately concealed with the aid of fingerprint-based totally deduplication due to the existing weak scalability. Similarity identification finds excessive information amongst comparable facts at the byte level even as reproduction identification unearths absolutely same information at the chew stage, which makes the hindmost plenty extra capable than the previous in big savage structures. REBL and DERD are regular remarkable-characteristic based similarity identification proposals for facts discount. They calculate the capabilities of the information circulate[6](e.g.,rabin fingerprints)and institution characterstics into excellent-characterstics to seize the similarity of facts and then delta compress those records. All those techniques need excessive computation and indexing overheads for similarity detection. As a outcome of this, the easier and faster non- replica technique has come to be a further famous facts discount generation inside the current 5 years. However, similarity identification is obtaining growing friction in storage systems for the reason of its capability to seize and remove statistics redundancy amongst almost identical however non-reproduction information blocks, which efficaciously accompanish the fingerprint-primarily based nonreplication. 4. FRAME WORK DARE is designed in such a way that it can make better of similarity identification for further information depletion in nonreplica storage systems. DARE follows below 4 important steps: 4.1 Duplicate Detection The information circulate is initially blocked by means of the use of CDC technique[3], fingerprinted by using SHA- 256 set of rules, duplicate is detected, after which clustered into repository of sequence of blocks to keep the backup stream locality[3]. 4.2 Resemblance Detection The Dupadj similarity identification module in DARE initially discovered replica-adjoining blocks within the repositories shaped in proceeding. Further, the improved high-quality-feature detects same blocks in the ultimate nonreproduction and non-comparable blocks that can had been overlooked via the Dupadj identification module whilst the replica-adjoining records is missing. 4.3 Delta Compression For whole of the similarity blocks discovered in Step 2, DARE look through its base-hunk, then delta encodes their variations. 4.4 Storage Management The information which is not reduced, i.e., non-interchangeable may be saved as 159
4 boxes into the disk. The document mapping data most of the similar hunks, akin to hunks, and non-interchangeable blocks will also be make an note because the report recepies to smooth destiny facts repair actions on DARE. System Architecture in Fig.1 the user is responsible to choose report or upload the information. Enter may be report or photo. Then file is divided into no. of chunks and each bite generates fingerprint. Using this fingerprint DARE will initially discover replica blocks by means of the non-replica module. If duplicate discovered, it do away with duplicate chew and provide index. For every non-reproduction bite, DARE first uses DupAdj detection module to fast discover whether it's far a delta compression applicant. If it isn't a delta compression applicant, for every of the similar to chunks detected, DARE reads its base-bite, then delta encodes their variations. The statistics NOT diminished, i.e., non-interchangeable and delta blocks, could be saved as repositories at the disk. substitute connection is supplied. If the fingerprint does no longer suit with the present fingerprint, then it's far taken into consideration as the brand new fingerprint. For non-duplicate chunks, our technique uses DupAdj i.e. it ll take a look at for adjacent chunks and keep them in separate desk. For those chunks, again SHA-256 algorithm might be implemented Fig 2: Dupadj Concept Fig.2 suggests that if chew B2 and E2 is duplicate, chunks round them can be considered potentially comparable chunks(i.e.,b1&e1andb3&e3),and then migrate the similar chunks into one table as a way to cast off redundancy as tons as viable. If comparable chunks are detected, hyperlink will be furnished, else might be considered for delta compression. Only will distinction could be saved. 5. EXPERIMENTAL RESULTS In this experiment, we upload the file to detect and remove the duplicate data. After uploading the file, we can generate the chunks for uploaded file. In this DARE, we used SHA algorithm to check the duplicate from the uploaded file. The SHA algorithm creates the SHA strings for every chunk. These SHA strings are used to duplicate check. Fig 1: Block Diagram of Proposed System To reap statistics non-replica effectiveness, fingerprints are differentiated. If fingerprint fits, information is observed as replica and could not be saved, as a 160
5 7.REFERENCES [1] W. Xia, H. Jiang, D. Feng, and Y. Hua, Silo: A similarity-locality based near-exact deduplication scheme with low RAM overhead and high throughput, in Proc. USENIX Conf. USENIX Annu. Tech. Conf., Jun. 2011, pp The duplicate Adjacency will be performed by using super-feature approach. In duplicate adjacency, we are verifying the similarity among the chunks. Finally, we can see the total chunks and similar chunks size in the chart. [2] K. Eshghi and H. K. Tang, A framework for analyzing and improving content-based chunking algorithms, Hewlett Packard Labs., Palo Alto, CA, USA, Tech. Rep. HPL (R.1), [3] E. Kruus, C. Ungureanu, and C. Dubnicki, Bimodal content defined chunking for backup streams, in Proc. 7th USENIX Conf. File Storage Technol., 2010, p. 18. [4] B. Romanski, º. Heldt, W. Kilian, K. Lichota, and C. Dubnicki, Anchor-driven subchunk deduplication, in Proc. 4th Annu. Int. Syst. Storage Conf., May 2011, pp [5] G. Lu, Y. Jin, and D. H. Du, Frequency based chunking for data de-duplication, in Proc. IEEE Int. Symp. Model., Anal. Simul. Comput. Telecommun. Syst., Aug. 2010, pp CONCLUSION We conclude that in this paper we suggested a significant DARE strategy to improve the super feature approach. DARE using DupAdj overcomes the replica-adjoining statistics for green similarity identification in present nonreplicate structures. DARE is an effective and green device for maximizing facts discount by using similarly detecting corresponding to statistics with low overhead. Delta compression is used to in addition put off redundancy and efficiently amplify the analytical area of the restoration cache. [6] M. Rabin, Fingerprinting by random polynomials, Center Res. Comput. Techn., Aiken Comput. Lab., Harvard Univ., Cambridge, MA, USA, 1981 [7] N. Jain, M. Dahlin, and R. Tewari, Taper: Tiered approach for eliminating redundancy in replica synchronization, in Proc. USENIX Conf. File Storage Technol., Mar. 2005, pp [8] M. Fu, D. Feng, Y. Hua, X. He, Z. Chen, W. Xia, Y. Zhang, and Y. Tan, Design tradeoffs for data deduplication performance in backup workloads, in Proc. 13th USENIX Conf. File Storage Technol., 2015, pp [9] Israeli Experimental Syst. Conf., May 2009, pp. D. Meister and A. Brinkmann, Multi-level comparison of data deduplication in a backup scenario, in Proc [10] U. Manber, et al., Finding similar files in a large file system, in Proc. USENIX Winter Tech. Conf., Jan. 1994, pp
Reducing The De-linearization of Data Placement to Improve Deduplication Performance
Reducing The De-linearization of Data Placement to Improve Deduplication Performance Yujuan Tan 1, Zhichao Yan 2, Dan Feng 2, E. H.-M. Sha 1,3 1 School of Computer Science & Technology, Chongqing University
More informationA DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU
A DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU PRESENTED BY ROMAN SHOR Overview Technics of data reduction in storage systems:
More informationOnline Version Only. Book made by this file is ILLEGAL. Design and Implementation of Binary File Similarity Evaluation System. 1.
, pp.1-10 http://dx.doi.org/10.14257/ijmue.2014.9.1.01 Design and Implementation of Binary File Similarity Evaluation System Sun-Jung Kim 2, Young Jun Yoo, Jungmin So 1, Jeong Gun Lee 1, Jin Kim 1 and
More informationDeploying De-Duplication on Ext4 File System
Deploying De-Duplication on Ext4 File System Usha A. Joglekar 1, Bhushan M. Jagtap 2, Koninika B. Patil 3, 1. Asst. Prof., 2, 3 Students Department of Computer Engineering Smt. Kashibai Navale College
More informationParallelizing Inline Data Reduction Operations for Primary Storage Systems
Parallelizing Inline Data Reduction Operations for Primary Storage Systems Jeonghyeon Ma ( ) and Chanik Park Department of Computer Science and Engineering, POSTECH, Pohang, South Korea {doitnow0415,cipark}@postech.ac.kr
More informationWAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression
WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression Philip Shilane, Mark Huang, Grant Wallace, & Windsor Hsu Backup Recovery Systems Division EMC Corporation Introduction
More informationDesign Tradeoffs for Data Deduplication Performance in Backup Workloads
Design Tradeoffs for Data Deduplication Performance in Backup Workloads Min Fu,DanFeng,YuHua,XubinHe, Zuoning Chen *, Wen Xia,YuchengZhang,YujuanTan Huazhong University of Science and Technology Virginia
More informationChunkStash: Speeding Up Storage Deduplication using Flash Memory
ChunkStash: Speeding Up Storage Deduplication using Flash Memory Biplob Debnath +, Sudipta Sengupta *, Jin Li * * Microsoft Research, Redmond (USA) + Univ. of Minnesota, Twin Cities (USA) Deduplication
More informationAccelerating Restore and Garbage Collection in Deduplication-based Backup Systems via Exploiting Historical Information
Accelerating Restore and Garbage Collection in Deduplication-based Backup Systems via Exploiting Historical Information Min Fu, Dan Feng, Yu Hua, Xubin He, Zuoning Chen *, Wen Xia, Fangting Huang, Qing
More informationIn-line Deduplication for Cloud storage to Reduce Fragmentation by using Historical Knowledge
In-line Deduplication for Cloud storage to Reduce Fragmentation by using Historical Knowledge Smitha.M. S, Prof. Janardhan Singh Mtech Computer Networking, Associate Professor Department of CSE, Cambridge
More informationAlternative Approaches for Deduplication in Cloud Storage Environment
International Journal of Computational Intelligence Research ISSN 0973-1873 Volume 13, Number 10 (2017), pp. 2357-2363 Research India Publications http://www.ripublication.com Alternative Approaches for
More informationUtilization of Data Deduplication towards Improvement of Primary Storage Performance in Cloud
Utilization of Data Deduplication towards Improvement of Primary Storage Performance in Cloud P.Sai Sandip 1, P.Rajeshwari 2, Dr.G.Vishnu Murthy 3 M.Tech Student, Computer Science Dept., Anurag Group of
More informationSpeeding Up Cloud/Server Applications Using Flash Memory
Speeding Up Cloud/Server Applications Using Flash Memory Sudipta Sengupta and Jin Li Microsoft Research, Redmond, WA, USA Contains work that is joint with Biplob Debnath (Univ. of Minnesota) Flash Memory
More informationAn Application Awareness Local Source and Global Source De-Duplication with Security in resource constraint based Cloud backup services
An Application Awareness Local Source and Global Source De-Duplication with Security in resource constraint based Cloud backup services S.Meghana Assistant Professor, Dept. of IT, Vignana Bharathi Institute
More informationMulti-level Byte Index Chunking Mechanism for File Synchronization
, pp.339-350 http://dx.doi.org/10.14257/ijseia.2014.8.3.31 Multi-level Byte Index Chunking Mechanism for File Synchronization Ider Lkhagvasuren, Jung Min So, Jeong Gun Lee, Jin Kim and Young Woong Ko *
More informationA Review on Backup-up Practices using Deduplication
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 9, September 2015,
More informationHYDRAstor: a Scalable Secondary Storage
HYDRAstor: a Scalable Secondary Storage 7th USENIX Conference on File and Storage Technologies (FAST '09) February 26 th 2009 C. Dubnicki, L. Gryz, L. Heldt, M. Kaczmarczyk, W. Kilian, P. Strzelczak, J.
More informationApril 2010 Rosen Shingle Creek Resort Orlando, Florida
Data Reduction and File Systems Jeffrey Tofano Chief Technical Officer, Quantum Corporation Today s Agenda File Systems and Data Reduction Overview File System and Data Reduction Integration Issues Reviewing
More informationDeduplication Storage System
Deduplication Storage System Kai Li Charles Fitzmorris Professor, Princeton University & Chief Scientist and Co-Founder, Data Domain, Inc. 03/11/09 The World Is Becoming Data-Centric CERN Tier 0 Business
More informationLOAD BALANCING AND DEDUPLICATION
LOAD BALANCING AND DEDUPLICATION Mr.Chinmay Chikode Mr.Mehadi Badri Mr.Mohit Sarai Ms.Kshitija Ubhe ABSTRACT Load Balancing is a method of distributing workload across multiple computing resources such
More informationENCRYPTED DATA MANAGEMENT WITH DEDUPLICATION IN CLOUD COMPUTING
ENCRYPTED DATA MANAGEMENT WITH DEDUPLICATION IN CLOUD COMPUTING S KEERTHI 1*, MADHAVA REDDY A 2* 1. II.M.Tech, Dept of CSE, AM Reddy Memorial College of Engineering & Technology, Petlurivaripalem. 2. Assoc.
More informationA Comparative Survey on Big Data Deduplication Techniques for Efficient Storage System
A Comparative Survey on Big Data Techniques for Efficient Storage System Supriya Milind More Sardar Patel Institute of Technology Kailas Devadkar Sardar Patel Institute of Technology ABSTRACT - Nowadays
More informationFGDEFRAG: A Fine-Grained Defragmentation Approach to Improve Restore Performance
FGDEFRAG: A Fine-Grained Defragmentation Approach to Improve Restore Performance Yujuan Tan, Jian Wen, Zhichao Yan, Hong Jiang, Witawas Srisa-an, aiping Wang, Hao Luo College of Computer Science, Chongqing
More informationCheetah: An Efficient Flat Addressing Scheme for Fast Query Services in Cloud Computing
IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications Cheetah: An Efficient Flat Addressing Scheme for Fast Query Services in Cloud Computing Yu Hua Wuhan National
More informationDEC: An Efficient Deduplication-Enhanced Compression Approach
2016 IEEE 22nd International Conference on Parallel and Distributed Systems DEC: An Efficient Deduplication-Enhanced Compression Approach Zijin Han, Wen Xia, Yuchong Hu *, Dan Feng, Yucheng Zhang, Yukun
More informationReducing Replication Bandwidth for Distributed Document Databases
Reducing Replication Bandwidth for Distributed Document Databases Lianghong Xu 1, Andy Pavlo 1, Sudipta Sengupta 2 Jin Li 2, Greg Ganger 1 Carnegie Mellon University 1, Microsoft Research 2 Document-oriented
More informationHOW DATA DEDUPLICATION WORKS A WHITE PAPER
HOW DATA DEDUPLICATION WORKS A WHITE PAPER HOW DATA DEDUPLICATION WORKS ABSTRACT IT departments face explosive data growth, driving up costs of storage for backup and disaster recovery (DR). For this reason,
More informationCost-Efficient Remote Backup Services for Enterprise Clouds
1650 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 12, NO. 5, OCTOBER 2016 Cost-Efficient Remote Backup Services for Enterprise Clouds Yu Hua, Senior Member, IEEE, Xue Liu, Member, IEEE, and Dan Feng,
More informationSecure Data Deduplication with Dynamic Ownership Management in Cloud Storage
Secure Data Deduplication with Dynamic Ownership Management in Cloud Storage Dr.S.Masood Ahamed 1, N.Mounika 2, N.vasavi 3, M.Vinitha Reddy 4 HOD, Department of Computer Science & Engineering,, Guru Nanak
More informationDeduplication File System & Course Review
Deduplication File System & Course Review Kai Li 12/13/13 Topics u Deduplication File System u Review 12/13/13 2 Storage Tiers of A Tradi/onal Data Center $$$$ Mirrored storage $$$ Dedicated Fibre Clients
More informationA multilingual reference based on cloud pattern
A multilingual reference based on cloud pattern G.Rama Rao Department of Computer science and Engineering, Christu Jyothi Institute of Technology and Science, Jangaon Abstract- With the explosive growth
More informationWAN OPTIMIZATION BY APPLICATION INDEPENDENT DATA REDUNDANCY ELIMINATION (BY CACHING)
WAN OPTIMIZATION BY APPLICATION INDEPENDENT DATA REDUNDANCY ELIMINATION (BY CACHING) Manjunath R Bhat 1, Bhishma 2, Kumar D 3, Kailas K M 4, Basavaraj Jakkali 5 1,2,3,4 Pursuing B.E in Computer Science
More informationWhy Datrium DVX is Best for VDI
Why Datrium DVX is Best for VDI 385 Moffett Park Dr. Sunnyvale, CA 94089 844-478-8349 www.datrium.com Technical Report Introduction Managing a robust and growing virtual desktop infrastructure in current
More informationDelta Compressed and Deduplicated Storage Using Stream-Informed Locality
Delta Compressed and Deduplicated Storage Using Stream-Informed Locality Philip Shilane, Grant Wallace, Mark Huang, and Windsor Hsu Backup Recovery Systems Division EMC Corporation Abstract For backup
More informationFGDEFRAG: A Fine-Grained Defragmentation Approach to Improve Restore Performance
FGDEFRAG: A Fine-Grained Defragmentation Approach to Improve Restore Performance Yujuan Tan, Jian Wen, Zhichao Yan, Hong Jiang, Witawas Srisa-an, Baiping Wang, Hao Luo Outline Background and Motivation
More informationAssignment 5. Georgia Koloniari
Assignment 5 Georgia Koloniari 2. "Peer-to-Peer Computing" 1. What is the definition of a p2p system given by the authors in sec 1? Compare it with at least one of the definitions surveyed in the last
More informationA Survey and Classification of Storage Deduplication Systems
11 A Survey and Classification of Storage Deduplication Systems JOÃO PAULO and JOSÉ PEREIRA, High-Assurance Software Lab (HASLab), INESC TEC & University of Minho The automatic elimination of duplicate
More informationHYDRAstor: a Scalable Secondary Storage
HYDRAstor: a Scalable Secondary Storage 7th TF-Storage Meeting September 9 th 00 Łukasz Heldt Largest Japanese IT company $4 Billion in annual revenue 4,000 staff www.nec.com Polish R&D company 50 engineers
More informationDELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE
WHITEPAPER DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE A Detailed Review ABSTRACT While tape has been the dominant storage medium for data protection for decades because of its low cost, it is steadily
More informationEnhancing Downloading Time By Using Content Distribution Algorithm
RESEARCH ARTICLE OPEN ACCESS Enhancing Downloading Time By Using Content Distribution Algorithm VILSA V S Department of Computer Science and Technology TKM Institute of Technology, Kollam, Kerala Mailid-vilsavijay@gmail.com
More informationNutanix Tech Note. Virtualizing Microsoft Applications on Web-Scale Infrastructure
Nutanix Tech Note Virtualizing Microsoft Applications on Web-Scale Infrastructure The increase in virtualization of critical applications has brought significant attention to compute and storage infrastructure.
More informationByte Index Chunking Approach for Data Compression
Ider Lkhagvasuren 1, Jung Min So 1, Jeong Gun Lee 1, Chuck Yoo 2, Young Woong Ko 1 1 Dept. of Computer Engineering, Hallym University Chuncheon, Korea {Ider555, jso, jeonggun.lee, yuko}@hallym.ac.kr 2
More informationDEMYSTIFYING DATA DEDUPLICATION A WHITE PAPER
DEMYSTIFYING DATA DEDUPLICATION A WHITE PAPER DEMYSTIFYING DATA DEDUPLICATION ABSTRACT While data redundancy was once an acceptable operational part of the backup process, the rapid growth of digital content
More informationImproving Backup and Restore Performance for Deduplication-based Cloud Backup Services
University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Computer Science and Engineering: Theses, Dissertations, and Student Research Computer Science and Engineering, Department
More informationSpatial Index Keyword Search in Multi- Dimensional Database
Spatial Index Keyword Search in Multi- Dimensional Database Sushma Ahirrao M. E Student, Department of Computer Engineering, GHRIEM, Jalgaon, India ABSTRACT: Nearest neighbor search in multimedia databases
More informationSHHC: A Scalable Hybrid Hash Cluster for Cloud Backup Services in Data Centers
2011 31st International Conference on Distributed Computing Systems Workshops SHHC: A Scalable Hybrid Hash Cluster for Cloud Backup Services in Data Centers Lei Xu, Jian Hu, Stephen Mkandawire and Hong
More informationUsing Deduplication: 5 Steps to Backup Efficiencies
W H I TEP A P E R Using Deduplication: 5 Steps to Backup Efficiencies B Y M I C H A E L S P I N D L E R, P R A C T I C E M ANA G E R, D ATA L I N K ABSTRACT Deduplication technologies are providing dramatic
More informationPROFILING BASED REDUCE MEMORY PROVISIONING FOR IMPROVING THE PERFORMANCE IN HADOOP
ISSN: 0976-2876 (Print) ISSN: 2250-0138 (Online) PROFILING BASED REDUCE MEMORY PROVISIONING FOR IMPROVING THE PERFORMANCE IN HADOOP T. S. NISHA a1 AND K. SATYANARAYAN REDDY b a Department of CSE, Cambridge
More informationEaSync: A Transparent File Synchronization Service across Multiple Machines
EaSync: A Transparent File Synchronization Service across Multiple Machines Huajian Mao 1,2, Hang Zhang 1,2, Xianqiang Bao 1,2, Nong Xiao 1,2, Weisong Shi 3, and Yutong Lu 1,2 1 State Key Laboratory of
More informationBusiness Continuity and Disaster Recovery. Ed Crowley Ch 12
Business Continuity and Disaster Recovery Ed Crowley Ch 12 Topics Disaster Recovery Business Impact Analysis MTBF and MTTR RTO and RPO Redundancy Failover Backup Sites Load Balancing Mirror Sites Disaster
More informationGFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures
GFS Overview Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures Interface: non-posix New op: record appends (atomicity matters,
More informationHow to Reduce Data Capacity in Objectbased Storage: Dedup and More
How to Reduce Data Capacity in Objectbased Storage: Dedup and More Dong In Shin G-Cube, Inc. http://g-cube.kr Unstructured Data Explosion A big paradigm shift how to generate and consume data Transactional
More informationWhite paper ETERNUS CS800 Data Deduplication Background
White paper ETERNUS CS800 - Data Deduplication Background This paper describes the process of Data Deduplication inside of ETERNUS CS800 in detail. The target group consists of presales, administrators,
More informationHedvig as backup target for Veeam
Hedvig as backup target for Veeam Solution Whitepaper Version 1.0 April 2018 Table of contents Executive overview... 3 Introduction... 3 Solution components... 4 Hedvig... 4 Hedvig Virtual Disk (vdisk)...
More informationScale-out Data Deduplication Architecture
Scale-out Data Deduplication Architecture Gideon Senderov Product Management & Technical Marketing NEC Corporation of America Outline Data Growth and Retention Deduplication Methods Legacy Architecture
More informationSSDs in the Cloud Dave Wright, CEO SolidFire, Inc.
SSDs in the Cloud Dave Wright, CEO SolidFire, Inc. Storage in the Cloud Local Storage Swap Temp Files Data Processing WD Seagate FusionIO Bulk Object Storage Media Files Content Distribution Backup Archival
More informationCopyright 2010 EMC Corporation. Do not Copy - All Rights Reserved.
1 Using patented high-speed inline deduplication technology, Data Domain systems identify redundant data as they are being stored, creating a storage foot print that is 10X 30X smaller on average than
More informationStorage Efficiency Opportunities and Analysis for Video Repositories
Storage Efficiency Opportunities and Analysis for Video Repositories Suganthi Dewakar Sethuraman Subbiah Gokul Soundararajan Mike Wilson Mark W. Storer Box Inc. Kishore Udayashankar Exablox Kaladhar Voruganti
More informationOpen vstorage EMC SCALEIO Architectural Comparison
Open vstorage EMC SCALEIO Architectural Comparison Open vstorage is the World s fastest Distributed Block Store that spans across different Datacenter. It combines ultrahigh performance and low latency
More informationMulti-level Selective Deduplication for VM Snapshots in Cloud Storage
Multi-level Selective Deduplication for VM Snapshots in Cloud Storage Wei Zhang, Hong Tang, Hao Jiang, Tao Yang, Xiaogang Li, Yue Zeng Dept. of Computer Science, UC Santa Barbara. Email: {wei, tyang}@cs.ucsb.edu
More informationApplications - Aware Big Data Deduplication in Cloud Environment
Applications - Aware Big Data Deduplication in Cloud Environment S. PRASAD 1, N. PHANI KUMAR 2 1 PG Scholar, Department of CSE, S.V. College of Engineering 2 Assistant Professor, Department of CSE, S.V.
More informationP-Dedupe: Exploiting Parallelism in Data Deduplication System
2012 IEEE Seventh International Conference on Networking, Architecture, and Storage P-Dedupe: Exploiting Parallelism in Data Deduplication System Wen Xia, Hong Jiang, Dan Feng,*, Lei Tian, Min Fu, Zhongtao
More informationStorageCraft OneXafe and Veeam 9.5
TECHNICAL DEPLOYMENT GUIDE NOV 2018 StorageCraft OneXafe and Veeam 9.5 Expert Deployment Guide Overview StorageCraft, with its scale-out storage solution OneXafe, compliments Veeam to create a differentiated
More informationInternational Journal of Computer Engineering and Applications, Volume XII, Special Issue, March 18, ISSN
International Journal of Computer Engineering and Applications, Volume XII, Special Issue, March 18, www.ijcea.com ISSN 2321-3469 SECURE DATA DEDUPLICATION FOR CLOUD STORAGE: A SURVEY Vidya Kurtadikar
More informationMassive Scalability With InterSystems IRIS Data Platform
Massive Scalability With InterSystems IRIS Data Platform Introduction Faced with the enormous and ever-growing amounts of data being generated in the world today, software architects need to pay special
More informationQADR with Energy Consumption for DIA in Cloud
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
More informationData Reduction Meets Reality What to Expect From Data Reduction
Data Reduction Meets Reality What to Expect From Data Reduction Doug Barbian and Martin Murrey Oracle Corporation Thursday August 11, 2011 9961: Data Reduction Meets Reality Introduction Data deduplication
More informationData Deduplication using Even or Odd Block (EOB) Checking Algorithm in Hybrid Cloud
Data Deduplication using Even or Odd Block (EOB) Checking Algorithm in Hybrid Cloud Suganthi.M 1, Hemalatha.B 2 Research Scholar, Depart. Of Computer Science, Chikkanna Government Arts College, Tamilnadu,
More informationHPDedup: A Hybrid Prioritized Data Deduplication Mechanism for Primary Storage in the Cloud
HPDedup: A Hybrid Prioritized Data Deduplication Mechanism for Primary Storage in the Cloud Huijun Wu 1,4, Chen Wang 2, Yinjin Fu 3, Sherif Sakr 1, Liming Zhu 1,2 and Kai Lu 4 The University of New South
More informationSolidFire and Ceph Architectural Comparison
The All-Flash Array Built for the Next Generation Data Center SolidFire and Ceph Architectural Comparison July 2014 Overview When comparing the architecture for Ceph and SolidFire, it is clear that both
More informationSINGLE PASS DEPENDENT BIT ALLOCATION FOR SPATIAL SCALABILITY CODING OF H.264/SVC
SINGLE PASS DEPENDENT BIT ALLOCATION FOR SPATIAL SCALABILITY CODING OF H.264/SVC Randa Atta, Rehab F. Abdel-Kader, and Amera Abd-AlRahem Electrical Engineering Department, Faculty of Engineering, Port
More informationExecutive Summary. The Need for Shared Storage. The Shared Storage Dilemma for the SMB. The SMB Answer - DroboElite. Enhancing your VMware Environment
Executive Summary The Need for Shared Storage The Shared Storage Dilemma for the SMB The SMB Answer - DroboElite Enhancing your VMware Environment Ideal for Virtualized SMB Conclusion Executive Summary
More informationBusiness Benefits of Policy Based Data De-Duplication Data Footprint Reduction with Quality of Service (QoS) for Data Protection
Data Footprint Reduction with Quality of Service (QoS) for Data Protection By Greg Schulz Founder and Senior Analyst, the StorageIO Group Author The Green and Virtual Data Center (Auerbach) October 28th,
More informationOpendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES
Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES May, 2017 Contents Introduction... 2 Overview... 2 Architecture... 2 SDFS File System Service... 3 Data Writes... 3 Data Reads... 3 De-duplication
More informationFile System Implementation. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
File System Implementation Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Implementing a File System On-disk structures How does file system represent
More informationData Management Glossary
Data Management Glossary A Access path: The route through a system by which data is found, accessed and retrieved Agile methodology: An approach to software development which takes incremental, iterative
More informationParallel Processing for Data Deduplication
Parallel Processing for Data Deduplication Peter Sobe, Denny Pazak, Martin Stiehr Faculty of Computer Science and Mathematics Dresden University of Applied Sciences Dresden, Germany Corresponding Author
More informationMAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti
International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16 MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti 1 Department
More informationDATA DEDUPLCATION AND MIGRATION USING LOAD REBALANCING APPROACH IN HDFS Pritee Patil 1, Nitin Pise 2,Sarika Bobde 3 1
DATA DEDUPLCATION AND MIGRATION USING LOAD REBALANCING APPROACH IN HDFS Pritee Patil 1, Nitin Pise 2,Sarika Bobde 3 1 Department of Computer Engineering 2 Department of Computer Engineering Maharashtra
More informationCompression and Decompression of Virtual Disk Using Deduplication
Compression and Decompression of Virtual Disk Using Deduplication Bharati Ainapure 1, Siddhant Agarwal 2, Rukmi Patel 3, Ankita Shingvi 4, Abhishek Somani 5 1 Professor, Department of Computer Engineering,
More informationUnderstanding Primary Storage Optimization Options Jered Floyd Permabit Technology Corp.
Understanding Primary Storage Optimization Options Jered Floyd Permabit Technology Corp. Primary Storage Optimization Technologies that let you store more data on the same storage Thin provisioning Copy-on-write
More informationUniversity of Cambridge Engineering Part IIB Module 4F12 - Computer Vision and Robotics Mobile Computer Vision
report University of Cambridge Engineering Part IIB Module 4F12 - Computer Vision and Robotics Mobile Computer Vision Web Server master database User Interface Images + labels image feature algorithm Extract
More informationBinary Encoded Attribute-Pairing Technique for Database Compression
Binary Encoded Attribute-Pairing Technique for Database Compression Akanksha Baid and Swetha Krishnan Computer Sciences Department University of Wisconsin, Madison baid,swetha@cs.wisc.edu Abstract Data
More informationRethinking Deduplication Scalability
Rethinking Deduplication Scalability Petros Efstathopoulos Petros Efstathopoulos@symantec.com Fanglu Guo Fanglu Guo@symantec.com Symantec Research Labs Symantec Corporation, Culver City, CA, USA 1 ABSTRACT
More informationReducing Replication Bandwidth for Distributed Document Databases
Reducing Replication Bandwidth for Distributed Document Databases Lianghong Xu Andrew Pavlo Sudipta Sengupta Jin Li Gregory R. Ganger Carnegie Mellon University, Microsoft Research Abstract With the rise
More informationFrequency Based Chunking for Data De-Duplication
2010 18th Annual IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems Frequency Based Chunking for Data De-Duplication Guanlin Lu, Yu Jin, and
More informationINTRODUCTION TO XTREMIO METADATA-AWARE REPLICATION
Installing and Configuring the DM-MPIO WHITE PAPER INTRODUCTION TO XTREMIO METADATA-AWARE REPLICATION Abstract This white paper introduces XtremIO replication on X2 platforms. XtremIO replication leverages
More informationComparative Analysis of Range Aggregate Queries In Big Data Environment
Comparative Analysis of Range Aggregate Queries In Big Data Environment Ranjanee S PG Scholar, Dept. of Computer Science and Engineering, Institute of Road and Transport Technology, Erode, TamilNadu, India.
More informationStorageCraft OneBlox and Veeam 9.5 Expert Deployment Guide
TECHNICAL DEPLOYMENT GUIDE StorageCraft OneBlox and Veeam 9.5 Expert Deployment Guide Overview StorageCraft, with its scale-out storage solution OneBlox, compliments Veeam to create a differentiated diskbased
More informationCONFIGURATION GUIDE WHITE PAPER JULY ActiveScale. Family Configuration Guide
WHITE PAPER JULY 2018 ActiveScale Family Configuration Guide Introduction The world is awash in a sea of data. Unstructured data from our mobile devices, emails, social media, clickstreams, log files,
More informationAn Efficient Web Cache Replacement Policy
In the Proc. of the 9th Intl. Symp. on High Performance Computing (HiPC-3), Hyderabad, India, Dec. 23. An Efficient Web Cache Replacement Policy A. Radhika Sarma and R. Govindarajan Supercomputer Education
More informationCHAPTER 7 CONCLUSION AND FUTURE SCOPE
121 CHAPTER 7 CONCLUSION AND FUTURE SCOPE This research has addressed the issues of grid scheduling, load balancing and fault tolerance for large scale computational grids. To investigate the solution
More informationContents Part I Traditional Deduplication Techniques and Solutions Introduction Existing Deduplication Techniques
Contents Part I Traditional Deduplication Techniques and Solutions 1 Introduction... 3 1.1 Data Explosion... 3 1.2 Redundancies... 4 1.3 Existing Deduplication Solutions to Remove Redundancies... 5 1.4
More informationWHITE PAPER PernixData FVP
WHITE PAPER PernixData FVP Technical White Paper 1 EXECUTIVE SUMMARY The last decade has seen virtualization become a mainstay in the enterprise data center. Enterprises are now looking to virtualize their
More informationScaling Without Sharding. Baron Schwartz Percona Inc Surge 2010
Scaling Without Sharding Baron Schwartz Percona Inc Surge 2010 Web Scale!!!! http://www.xtranormal.com/watch/6995033/ A Sharding Thought Experiment 64 shards per proxy [1] 1 TB of data storage per node
More informationConstruct a High Efficiency VM Disaster Recovery Solution. Best choice for protecting virtual environments
Construct a High Efficiency VM Disaster Recovery Solution Best choice for protecting virtual environments About NAKIVO Established in the USA since 2012 Provides data protection solutions for VMware, Hyper-V
More informationTHE COMPLETE GUIDE COUCHBASE BACKUP & RECOVERY
THE COMPLETE GUIDE COUCHBASE BACKUP & RECOVERY INTRODUCTION Driven by the need to remain competitive and differentiate themselves, organizations are undergoing digital transformations and becoming increasingly
More informationHigh Availability through Warm-Standby Support in Sybase Replication Server A Whitepaper from Sybase, Inc.
High Availability through Warm-Standby Support in Sybase Replication Server A Whitepaper from Sybase, Inc. Table of Contents Section I: The Need for Warm Standby...2 The Business Problem...2 Section II:
More informationECE 669 Parallel Computer Architecture
ECE 669 Parallel Computer Architecture Lecture 9 Workload Evaluation Outline Evaluation of applications is important Simulation of sample data sets provides important information Working sets indicate
More informationReducing Replication Bandwidth for Distributed Document Databases
Reducing Replication Bandwidth for Distributed Document Databases Lianghong Xu Andrew Pavlo Sudipta Sengupta Jin Li Gregory R. Ganger Carnegie Mellon University, Microsoft Research Long Research Paper
More information