Reducing The De-linearization of Data Placement to Improve Deduplication Performance

Size: px
Start display at page:

Download "Reducing The De-linearization of Data Placement to Improve Deduplication Performance"

Transcription

1 Reducing The De-linearization of Data Placement to Improve Deduplication Performance Yujuan Tan 1, Zhichao Yan 2, Dan Feng 2, E. H.-M. Sha 1,3 1 School of Computer Science & Technology, Chongqing University 2 School of Computer Science & Technology, Huazhong University of Science & Technology 3 Department of Computer Science, University of Texas at Dallas {tanyujuan, jarod2046, edwinsha}@gmail.com, {dfeng}@hust.edu.cn Abstract Data deduplication is a lossless compression technology that replaces the redundant data chunks with pointers pointing to the already-stored ones. Due to this intrinsic data elimination feature, the deduplication commodity would delinearize the data placement and force the data chunks that belong to the same data object to be divided into multiple separate parts. In our preliminary study, it is found that the de-linearization of the data placement would weaken the data spatial locality that is used for improving data read performance, deduplication throughput and efficiency in some deduplication approaches, which significantly affects the deduplication performance. In this paper, we first analyze the negative effect of the de-linearization of data placement to the data deduplication performance with some examples and experimental evidences, and then propose an effective approach to reduce the de-linearization of data placement by sacrificing little compression ratios. The experimental evaluation driven by the real world datasets shows that our proposed approach effectively reduces the de-linearization of the data placement and enhances the data spatial locality, which significantly improves the deduplication performances including deduplication throughput, deduplication efficiency and data read performance, while at the cost of little compression ratios. I. INTRODUCTION Data deduplication is a lossless compression technology that has been widely used in the large-scale primary and secondary storage systems. It breaks the data streams into serials of data chunks and removes the redundant ones to save storage space. However, due to the removal of the redundant data chunks, the data deduplication de-linearizes the data placement and deteriorates the data layout, resulting in degraded deduplication performance. The data layout is the overview of the placement of the data chunks that are stored in storage systems. In deduplication storage systems, there are two kinds of data chunks: the new unique data chunks that will write to the disk storage sequentially and the redundant data chunks that need to be removed by being linked to the data chunks that have been stored already. For the latter redundant ones, it is likely that they have separate locations since the preceding data streams store them and thus can not be stored with the former new unique ones together. This redundancy elimination de-linearizes the data placement absolutely and makes the data chunks not to be stored in the same order as they emerge in the data stream, which would significantly weaken the spatial locality and affect the data read performance, deduplication throughput, data availability, and etc. For example, during the data restores in case of disaster recoveries, it would need a lot of disk seeks for data reconstruction since the data chunks that belong to the same file or directory are divided into multiple separate parts and can not be retrieved together. In the worst case, it would need one disk seek for every single chunk, which would significantly affect the data read performance. Moreover, the weakening of the spatial locality due to the de-linearization of the data placement has a significant negative effect on some deduplication approaches that use the spatial locality to improve the deduplication throughput and efficiency. The data deduplication meets a significant disk bottleneck problem due to the fact that that it should fetch the chunk index from disk to RAM page by page for redundancy identification under limited RAM usage, which throttles the deduplication process and leads to degraded deduplication throughput and data write performance. Existing well-known and widely used deduplication approaches mainly focus on exploiting the data spatial locality (i.e., also called duplicate locality) to fetch the useful chunk index from disk to memory in batch to alleviate the disk bottleneck, such as DDFS [1] and SiLo [2]. SiLo exploits further the spatial locality to improve the deduplication throughput while not giving too much deduplication efficiency (i.e., compression ratio). To our best known, most of the research vendors are very interested in the spatial locality that emerges in the backup stream to improve the deduplication performance, but few of them have paid attention to the fact that this spatial locality will get weaker and weaker with the increasing amount of the deduplicated data (i.e., the new unique data chunks) and the de-linearization of the data placement, which will in turn degrade the deduplication performance. In this paper, we focus on reducing the de-linearization of the data placement and enhancing the spatial locality to improve the deduplication performance, including the deduplication throughput, deduplication efficiency, and data read performances. Our contributions are summarized by the following three key points.

2 Firstly, we analyze the negative effect of the delinearization of data placement to the spatial locality that can be used for improving the deduplication throughput and data read performance. Secondly, we have done experimental study and found that the spatial locality gets worse with the increasing amount of the deduplicated data and de-linearization of data placement, which makes the existing deduplication approaches that exploit spatial locality to improve deduplication performance less effective. Finally, we propose an effective approach to reduce the de-linearization of data placement and to enhance the spatial locality, which effectively improves the deduplication throughput, deduplication efficiency and data read performance, as showed by the experimental results with real-world datasets. The rest of this paper is organized as follows. In the next section we will analyze the negative effect of the de-linearization of data placement on the deduplication performance. In Section III, we will propose an efficient approach to reduce the de-linearization of data placement. Section IV evaluates our proposed approach with real world datasets and section V concludes the paper. II. BACKGROUND AND MOTIVATION During data deduplication process, the data chunks are divided into two categories: the deduplicated-data chunks (i.e., new unique data chunks) and the redundant data chunks. In order to maximize the deduplication throughput and data write performance, the deduplicated-data chunks are writing to and stored on the disks sequentially. Therefore, if there are no redundant data chunks to be removed, all of the data chunks will be stored on the disks in the same order as that they emerge in the data stream. However, due to the removal of the redundant chunks across multiple data objects (i.e., the data object is used for representing the super-chunk that is composed of multiple data chunks, such as one data stream, one file directory, or a single file), the new unique data chunks are separated from the redundant data chunks, and the locations of the redundant data chunks are determined by the data objects who write it to the deduplication storage system, thus de-linearizing the data placement and weakening the data spatial locality. Furthermore, with the increasing amounts of the stored deduplicated-data chunks and the de-linearization of the data placement, the spatial locality that emerges in the backup stream is getting weaker and weaker, which significantly affects the whole deduplication performance. In this section, we will discuss how the de-linearization of data placement affects the spatial locality and what the negative effect of weakening spatial locality is. A. The Spatial Locality and The De-linearization of The Data Placement The spatial locality means that if a particular location is referenced at a particular time, then the nearby locations Chunk Metadata Part.1 Chunk Data Part... Part.1 Part.N-2 Part.2 Part.N-1 Part.3 Part.N-1 Part.3 Part.2 Part.N-2 Part... Part.N Part.N Fig. 1: An Example of One File that Has N Data Fragments. will likely be referenced in the near future. In deduplication storage systems, this concept of the spatial locality is very useful for both of the deduplication process in data writing and the data reconstruction in data reading. In deduplication process, the spatial locality is also called duplicate locality, which denotes that the data chunks near a duplicate chunk are likely to be duplicate ones with a high probability [1]. By creating and maintaining this duplicate locality that emerges in the data stream on disk storage, the nearby duplicate data chunks can be fetched to RAM in advance for redundancy identification when one duplicate chunk is found, thus avoiding the disk accesses and improving the deduplication throughput. While for the data reconstruction in data reading, the spatial locality can be regarded as that if one data chunk is read, the nearby chunks will be read in the very near future. If this spatial locality is available for data reading, the nearby chunks can be read in batch and many disk seeks can be substantially reduced to improve the data reading performance. Unfortunately, for either the deduplication process or data reconstructions, the existing of such spatial locality anytime and anywhere is an ideal state in deduplication storage systems. While for deduplication process, the spatial locality can be created and maintained for the initial backup sessions. But when the followed backup session that shares some redundant chuncks with the preceeding ones comes to the system, its redundant data chunks have to be removed and only the new unique data chunks are writing to and stored on the disks sequentially. Thus it is impossible to store the data chunks together that are composed of this backup session and impossible to fully create and maintain the spatial locality that emerges in this data stream on the disk storage. Furthermore, as more data streams coming to the system and the amount of the deduplicated-data increasing, the spatial locality will get much weaker due to the de-linearization of data placement. Thus the deduplication throughput that is improved through exploiting spatial locality will be gradually degraded. Section II-B1 will present experimental evidences to verify this outcome. While for data reading, the spatial locality used for data reconstruction has the same tendency as that used for redundancy identification with the de-linearization of the data placement. By taking Fig.1 (i.e., Fig.1 depicts the data layout of a file with N separate parts stored on the disk, with the chunk metadata in the front and the real chunk data in the latter) for example, its N 1 data parts, from Part.1 to P art.n 1, are all shared with other files and not stored together with Frag.N. Thus when reading this file from the disk, it needs about N disk seeks with the assumption

3 that these separate parts are far apart from each other and any two of them can not be read together by one disk seek. The data reading time F(read) can be calculated by F(read) = N Time seek +f size /W seq (1) with the assumption that Time seek denotes the time required for each disk seek, f size represents the size of this reading file and W seq denotes the sequential read bandwidth. But if this file is not deduplicated and all of its data chunks are stored together with Frag.N, its reading time F(read) is only equal to 1 Time seek +f size /W seq. By ignoring their common time f size /W seq, F(read) f size /W seq is N times larger than F(read) f size /W seq. Moreover, as the amount of the stored deduplicated-data increasing, the followed coming data objects will share more redundant data with the preceding ones, and thus the logically contiguous data chunks will be scattered across more disk locations and the spatial locality that can be used for data reading will get much more worse, and so as the data reading performance. B. Experimental Evidences In existing deduplication storage systems, the exploration of spatial locality (i.e., duplicate locality)that emerges in backup streams mainly works for two research goals: one is to alleviate the disk bottleneck such as in DDFS and Sparse Indexing, and the other is to sacrifice reasonable deduplication efficiency while improving the deduplication throughput like that in SiLo. Unfortunately, the continual removal of the redundant chunks and the de-linearization of the data placement by data deduplication can weaken the duplicate locality, especially for long term data backups and retentions, which significantly affects the deduplication throughput and deduplication efficiency. 1) The Degradation of Deduplication Throughput: In our preliminary experimental study, we have implemented the deduplication approach proposed by DDFS [1] and evaluated its deduplication throughput driven by realworld datasets. Fig. 2 depicts the average deduplication throughput obtained from 20 full backup generations of one author s file system of about647gb data. As showed in this figure, the deduplication throughput is decreased with the increasing of the backup generations, from the 213MB/s for generation 1 to only 110MB/s for generation 20, which is consistent with our intuition that the deduplication throughput will be degraded with the increasing amounts of the deduplicated-data chunks. 2) The Degradation of Deduplication Efficiency: In addition to improving the deduplication throughput, some researchers exploit the duplicate locality in order not to loose too much deduplication efficiency while improving deduplication throughput, such as SiLo [2]. SiLo groups the data chunks into segments and then the segments are grouped into big blocks, and the redundancy identification is based on similar segments instead of full chunk index under limited available RAM capacity. When finding out similar segments, it not only checks the chunk redundancy Deduplication Throughput (MB/s) Fig. 2: The degradation of the deduplication throughput. Deduplication Efficiency Fig. 3: The degradation of the deduplication efficiency. among those similar segments but also checks the blocks that those segments belong to, thus improving the deduplication efficiency. The number of the redundant chunks that can be found among the blocks depends heavily on the duplicate locality of the redundant chunks existed in the blocks. The weakening of the duplicate locality would make the amount of the redundant data chunks that exist in these blocks reduced and degrade the deduplication efficiency. By using the datasets of about 20 incremental backup generations of one author s file system as in SiLo, we have evaluated its deduplication efficiency. The deduplication efficiency is defined as the redundant data actually existing in the dataset divided by the data that is removed by SiLo. Fig. 3 shows the obtained deduplication efficiency from the 20 backup generations. As seen from the results, it is found that the deduplication efficiency decreases with the increasing of the backup generations, which reveals that the duplicate locality is gradually weakened and that some of redundant data chunks don t exist in the nearby blocks which are to be found and removed. III. REDUCING THE DE-LINEARIZATION OF DATA PLACEMENT As analyzed in Section II, the removal of the redundant data and the de-linearization of the data placement can weaken the data spatial locality that is used for improving the deduplication throughput, deduplication efficiency and data read performance, which significantly affects the deduplication performances. Nevertheless, due to the fact that the removal of the redundant data is the primary concern

4 for data deduplication to save storage space and the fact that the de-linearization of data placement can not be avoided, we focus on reducing the de-linearization of data placement instead of removing it. In this section, we propose an effective method, called, which aims to reduce the de-linearization of data placement and enhance the data spatial locality to improve the deduplication performances. A. Key Idea The key idea of is to choose some redundant data writing to the disk storage instead of removing it, thus reducing the de-linearization of data placement. However, due to the fact that the write of the redundant data would consume extra storage space and violate the primary concern of data deduplication, the key to implement is to determine which data chunks to be removed or not to be removed to enhance the spatial locality while at the cost of little compression ratios. Inspired by the analysis that the degraded deduplication performance is mainly caused by the weak spatial locality due to the de-linearization of data placement in section II, selects the unremoved redundant chunks according to a metric called Spatial Locality Level(i.e., SPL for short). In, Spatial Locality Level is used to measure the data spatial locality for chunk groups and can be dynamically calculated during deduplication process. If the dynamically calculated Spatial Locality Level is lower than a preset value, will write the corresponding chunks to the disk storage instead of removing them, thus reducing the de-linearization of the data placement and enhancing the spatial locality. B. Design and implementation mainly focuses on the selection of some redundant chunks that are not to be removed. It works after finding out all the redundant data chunks and the correlated locations of the already-stored ones for each data stream. Upon each incoming data stream, breaks it into serials of chunks and groups multiple contiguous chunks into segments. Each of segments varies from 0.5MB to 2MB based on the chunk content. The segment is a processing unit that reads and writes data chunks. After finding out all the redundant chunks and the correlated locations, calculates the spatial locality for each segment that has the redundant chunks. In, we define the Spatial Locality Level SPL(m,k) as SPL(m,k) = Seg m Seg k Seg m where Seg m is the incoming segment and Seg k is the segment that has been already stored on disks, Seg m Seg k represents the data chunks that are shared by Seg m and Seg k.spl(m,k) is the spatial locality that the data chunks in Seg m correspond to that in Seg k which can be fetched together by one disk seek. If Seg m Seg k = Seg m, SPL(m,k) = 1, which means that Seg m has strong spatial (2) locality corresponding to Seg k and all of the chunks in Seg m can be retrieved through one disk seek by reading Seg k. IfSPL(m,k) is smaller than a preset valueαthat indicates the corresponding spatial locality is weak, will not remove the redundant chunks in Seg m Seg k and will write them to the disk storage together with the new unique chunks in Seg m, thus reducing the de-linearization ofseg m and enhancing the spatial locality. The preset value α can be adjusted and controlled to trade off the spatial locality improvement and the sacrificed compression ratios for different datasets. Due to the space constrictions, we don t depict the architecture and the data flow path of in this paper. IV. EXPERIMENTAL EVALUATION We have implemented based on the deduplication approaches proposed in DDFS and evaluated its performances on deduplication throughput, deduplication efficiency, and data read performance. The dataset used for the evaluation is generated from 66 backups of the file systems by five graduate students in our research group, totaling about 1.72TB data. To access s benefits, we have further compare the experimental results with that obtained from DDFS and SiLo prototype systems, as showed in this section. As a side note, the DDFS and SiLo prototype systems are called as DDFS-Like and SiLo-Like by us since we only implement their deduplication approaches by ourselves instead of borrowing their developed prototype systems for the evaluation, and we only show the experimental results of by setting α as 0.1 due to the space restrictions. A. Deduplication Throughput As in deduplication approaches such as DDFS, the deduplication throughput is decreased by the de-linearization of the data placement and the weakening of the spatial locality. In this subsection, we focus on the benefits of in improving the deduplication throughput by reducing the de-linearization of data placement. Fig. 4 compares the average deduplication throughput of, DDFS-Like and SiLo-Like obtained from the 66 backup generations. As showed in the result, it is found that the deduplication throughput of DDFS-Like is much less than that of., by reducing the de-linearization of data placement, archives the comparable deduplication throughput with that of SiLo. Moreover, sometimes when the backup stream has very good spatial locality, the deduplication throughput of is even higher than that of SiLo, such as in backup generation 1, 2, 3, 4, 5, 41 and 42. This is because that, when the spatial locality is very good, can prefetch the nearby duplicate chunks continually to RAM for the redundancy identification when one duplicate chunk is found by one disk seek, while Silo needs to waste some disk seeks for the irrelative chunks since its prefetching policy is based on the similar segments with some probability.

5 Deduplication Throughput (MB/s) DDFS-Like SiLo-Like 60 Fig. 4: The comparison of the deduplication throughput. Deduplication Efficiency SiLo-Like 0.88 Fig. 5: The comparison of the deduplication efficiency. B. Deduplication Efficiency In order to achieve high deduplication throughput, both of and SiLo keep some redundant data not removed during deduplication process. dosen t remove some redundant data for reducing the de-linearization of data palcement while SiLo ignores the redundant data that exists among the less-similar segments with some probability. Fig. 5 plots the deduplication efficiency of and SiLo obtained from the 66 backup generations, where the deduplication efficiency here is defined as the redundant data actually existing in the dataset divided by the redundant data that is removed by or SiLo. Moreover, to compare the amount of the redundant data that is kept by and SiLo well, we only calculate the redundant data that exists in the segments who share part of the redundant chunks with others, ignoring the redundant chunks in those segments who have the duplicate ones sharing all of the redundant chunks that would be removed by both and SiLo. As seen from the results, it is found that the amount of the redundant data kept by De- Frag is much less than that by SiLo. When the backup generation reaches 66, SiLo has 12% of the redundant data not removed while has only 4% of the redundant data not removed, which indicates that can achieve higher deduplication efficiency than that of SiLo, although their deduplication throughput is approachable as showed in Fig. 4. Data Read (MB/s) DDFS-Like 25 Fig. 6: The comparison of Data Read Performance. C. Data Read Performance As analyzed in section II, the reduction of the delinearization of data placement can enhance the spatial locality and has a direct impact on the data read performance. In our experiments, we compare the read performance of and DDFS-Like by reconstructing the backup generations from 1 to 20. Fig. 6 shows the experimental results. As showed by the results, it is found that s read performance is higher than that of DDFS-Like, which indicates that has the potential benefit in improving the data read performances by reducing the de-linearization of data placement. V. CONCLUSION Many researchers are very interested in reducing the data fragments that are caused by data elimination to improve the data read performance in deduplication storage systems [3], [4], [5]. In this paper, motivated by the analysis that the de-linearization of data placement has a negative effect on the spatial locality and on the deduplication performance, we propose, which keeps some redundant data not removed to reduce the de-linearization of data placement and to improve some deduplication performances rather than the only data read performance. As showed by our experimental results with real world datasets, effectively enhances the spatial locality and improves the deduplication performance including the deduplication throughput, deduplication efficiency and data read performance. REFERENCES [1] B. Zhu, K. Li, and H. Patterson, Avoiding the disk bottleneck in the Data Domain deduplication file system, in FAST 08, Feb [2] W. Xia, H. Jiang, D. Feng, and Y. Hua, SiLo: A Similarity-Locality based Near-Exact Deduplication Scheme with Low RAM Overhead and High Throughput, in USENIX 11, Jun [3] K. Srinivasan, T. Bisson, G. Goodson, and K. Voruganti, idedup: Latency-aware, inline data deduplication for primary storage, in FAST 12, Feb [4] Y. J. Nam, D. Park, and D. Du, Assuring Demanded Read Performance of Data Deduplication Storage with Backup Datasets, in MASCTOS 12, Aug [5] M. Kaczmarczyk, M. Barczynski, W. Kilian, and C. Dubnicki, Reducing impact of data fragmentation caused by in-line deduplication, in SYSTOR 12, Jun

FGDEFRAG: A Fine-Grained Defragmentation Approach to Improve Restore Performance

FGDEFRAG: A Fine-Grained Defragmentation Approach to Improve Restore Performance FGDEFRAG: A Fine-Grained Defragmentation Approach to Improve Restore Performance Yujuan Tan, Jian Wen, Zhichao Yan, Hong Jiang, Witawas Srisa-an, Baiping Wang, Hao Luo Outline Background and Motivation

More information

FGDEFRAG: A Fine-Grained Defragmentation Approach to Improve Restore Performance

FGDEFRAG: A Fine-Grained Defragmentation Approach to Improve Restore Performance FGDEFRAG: A Fine-Grained Defragmentation Approach to Improve Restore Performance Yujuan Tan, Jian Wen, Zhichao Yan, Hong Jiang, Witawas Srisa-an, aiping Wang, Hao Luo College of Computer Science, Chongqing

More information

Design Tradeoffs for Data Deduplication Performance in Backup Workloads

Design Tradeoffs for Data Deduplication Performance in Backup Workloads Design Tradeoffs for Data Deduplication Performance in Backup Workloads Min Fu,DanFeng,YuHua,XubinHe, Zuoning Chen *, Wen Xia,YuchengZhang,YujuanTan Huazhong University of Science and Technology Virginia

More information

DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE

DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE WHITEPAPER DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE A Detailed Review ABSTRACT While tape has been the dominant storage medium for data protection for decades because of its low cost, it is steadily

More information

Accelerating Restore and Garbage Collection in Deduplication-based Backup Systems via Exploiting Historical Information

Accelerating Restore and Garbage Collection in Deduplication-based Backup Systems via Exploiting Historical Information Accelerating Restore and Garbage Collection in Deduplication-based Backup Systems via Exploiting Historical Information Min Fu, Dan Feng, Yu Hua, Xubin He, Zuoning Chen *, Wen Xia, Fangting Huang, Qing

More information

In-line Deduplication for Cloud storage to Reduce Fragmentation by using Historical Knowledge

In-line Deduplication for Cloud storage to Reduce Fragmentation by using Historical Knowledge In-line Deduplication for Cloud storage to Reduce Fragmentation by using Historical Knowledge Smitha.M. S, Prof. Janardhan Singh Mtech Computer Networking, Associate Professor Department of CSE, Cambridge

More information

Parallelizing Inline Data Reduction Operations for Primary Storage Systems

Parallelizing Inline Data Reduction Operations for Primary Storage Systems Parallelizing Inline Data Reduction Operations for Primary Storage Systems Jeonghyeon Ma ( ) and Chanik Park Department of Computer Science and Engineering, POSTECH, Pohang, South Korea {doitnow0415,cipark}@postech.ac.kr

More information

DUE to the explosive growth of the digital data, data

DUE to the explosive growth of the digital data, data 1162 IEEE TRANSACTIONS ON COMPUTERS, VOL. 64, NO. 4, APRIL 2015 Similarity and Locality Based Indexing for High Performance Data Deduplication Wen Xia, Hong Jiang, Senior Member, IEEE, Dan Feng, Member,

More information

Sparse Indexing: Large-Scale, Inline Deduplication Using Sampling and Locality

Sparse Indexing: Large-Scale, Inline Deduplication Using Sampling and Locality Sparse Indexing: Large-Scale, Inline Deduplication Using Sampling and Locality Mark Lillibridge, Kave Eshghi, Deepavali Bhagwat, Vinay Deolalikar, Greg Trezise, and Peter Camble Work done at Hewlett-Packard

More information

ChunkStash: Speeding Up Storage Deduplication using Flash Memory

ChunkStash: Speeding Up Storage Deduplication using Flash Memory ChunkStash: Speeding Up Storage Deduplication using Flash Memory Biplob Debnath +, Sudipta Sengupta *, Jin Li * * Microsoft Research, Redmond (USA) + Univ. of Minnesota, Twin Cities (USA) Deduplication

More information

Copyright 2010 EMC Corporation. Do not Copy - All Rights Reserved.

Copyright 2010 EMC Corporation. Do not Copy - All Rights Reserved. 1 Using patented high-speed inline deduplication technology, Data Domain systems identify redundant data as they are being stored, creating a storage foot print that is 10X 30X smaller on average than

More information

HYDRAstor: a Scalable Secondary Storage

HYDRAstor: a Scalable Secondary Storage HYDRAstor: a Scalable Secondary Storage 7th USENIX Conference on File and Storage Technologies (FAST '09) February 26 th 2009 C. Dubnicki, L. Gryz, L. Heldt, M. Kaczmarczyk, W. Kilian, P. Strzelczak, J.

More information

A DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU

A DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU A DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU PRESENTED BY ROMAN SHOR Overview Technics of data reduction in storage systems:

More information

Speeding Up Cloud/Server Applications Using Flash Memory

Speeding Up Cloud/Server Applications Using Flash Memory Speeding Up Cloud/Server Applications Using Flash Memory Sudipta Sengupta and Jin Li Microsoft Research, Redmond, WA, USA Contains work that is joint with Biplob Debnath (Univ. of Minnesota) Flash Memory

More information

Hyper-converged Secondary Storage for Backup with Deduplication Q & A. The impact of data deduplication on the backup process

Hyper-converged Secondary Storage for Backup with Deduplication Q & A. The impact of data deduplication on the backup process Hyper-converged Secondary Storage for Backup with Deduplication Q & A The impact of data deduplication on the backup process Table of Contents Introduction... 3 What is data deduplication?... 3 Is all

More information

Deduplication File System & Course Review

Deduplication File System & Course Review Deduplication File System & Course Review Kai Li 12/13/13 Topics u Deduplication File System u Review 12/13/13 2 Storage Tiers of A Tradi/onal Data Center $$$$ Mirrored storage $$$ Dedicated Fibre Clients

More information

An Application Awareness Local Source and Global Source De-Duplication with Security in resource constraint based Cloud backup services

An Application Awareness Local Source and Global Source De-Duplication with Security in resource constraint based Cloud backup services An Application Awareness Local Source and Global Source De-Duplication with Security in resource constraint based Cloud backup services S.Meghana Assistant Professor, Dept. of IT, Vignana Bharathi Institute

More information

Design Tradeoffs for Data Deduplication Performance in Backup Workloads

Design Tradeoffs for Data Deduplication Performance in Backup Workloads Design Tradeoffs for Data Deduplication Performance in Backup Workloads Min Fu, Dan Feng, and Yu Hua, Huazhong University of Science and Technology; Xubin He, Virginia Commonwealth University; Zuoning

More information

SAM: A Semantic-Aware Multi-Tiered Source De-duplication Framework for Cloud Backup

SAM: A Semantic-Aware Multi-Tiered Source De-duplication Framework for Cloud Backup : A Semantic-Aware Multi-Tiered Source De-duplication Framework for Cloud Backup Yujuan Tan 1, Hong Jiang 2, Dan Feng 1,, Lei Tian 1,2, Zhichao Yan 1,Guohui Zhou 1 1 Key Laboratory of Data Storage Systems,

More information

A Review on Backup-up Practices using Deduplication

A Review on Backup-up Practices using Deduplication Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 9, September 2015,

More information

CABdedupe: A Causality-based Deduplication Performance Booster for Cloud Backup Services

CABdedupe: A Causality-based Deduplication Performance Booster for Cloud Backup Services 2011 IEEE International Parallel & Distributed Processing Symposium CABdedupe: A Causality-based Deduplication Performance Booster for Cloud Services Yujuan Tan 1, Hong Jiang 2, Dan Feng 1,, Lei Tian 1,2,

More information

WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression

WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression Philip Shilane, Mark Huang, Grant Wallace, & Windsor Hsu Backup Recovery Systems Division EMC Corporation Introduction

More information

APPLICATION-AWARE LOCAL-GLOBAL SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES OF PERSONAL STORAGE

APPLICATION-AWARE LOCAL-GLOBAL SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES OF PERSONAL STORAGE APPLICATION-AWARE LOCAL-GLOBAL SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES OF PERSONAL STORAGE Ms.M.Elakkiya 1, Ms.K.Kamaladevi 2, Mrs.K.Kayalvizhi 3 1,2 PG Scholar, 3Assistant Professor, Department

More information

Cheetah: An Efficient Flat Addressing Scheme for Fast Query Services in Cloud Computing

Cheetah: An Efficient Flat Addressing Scheme for Fast Query Services in Cloud Computing IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications Cheetah: An Efficient Flat Addressing Scheme for Fast Query Services in Cloud Computing Yu Hua Wuhan National

More information

Parallel Processing for Data Deduplication

Parallel Processing for Data Deduplication Parallel Processing for Data Deduplication Peter Sobe, Denny Pazak, Martin Stiehr Faculty of Computer Science and Mathematics Dresden University of Applied Sciences Dresden, Germany Corresponding Author

More information

P-Dedupe: Exploiting Parallelism in Data Deduplication System

P-Dedupe: Exploiting Parallelism in Data Deduplication System 2012 IEEE Seventh International Conference on Networking, Architecture, and Storage P-Dedupe: Exploiting Parallelism in Data Deduplication System Wen Xia, Hong Jiang, Dan Feng,*, Lei Tian, Min Fu, Zhongtao

More information

Chapter 11: File System Implementation. Objectives

Chapter 11: File System Implementation. Objectives Chapter 11: File System Implementation Objectives To describe the details of implementing local file systems and directory structures To describe the implementation of remote file systems To discuss block

More information

DEBAR: A Scalable High-Performance Deduplication Storage System for Backup and Archiving

DEBAR: A Scalable High-Performance Deduplication Storage System for Backup and Archiving University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln CSE Technical reports Computer Science and Engineering, Department of 1-5-29 DEBAR: A Scalable High-Performance Deduplication

More information

arxiv: v3 [cs.dc] 27 Jun 2013

arxiv: v3 [cs.dc] 27 Jun 2013 RevDedup: A Reverse Deduplication Storage System Optimized for Reads to Latest Backups arxiv:1302.0621v3 [cs.dc] 27 Jun 2013 Chun-Ho Ng and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong

More information

HYDRAstor: a Scalable Secondary Storage

HYDRAstor: a Scalable Secondary Storage HYDRAstor: a Scalable Secondary Storage 7th TF-Storage Meeting September 9 th 00 Łukasz Heldt Largest Japanese IT company $4 Billion in annual revenue 4,000 staff www.nec.com Polish R&D company 50 engineers

More information

Operating Systems. Lecture File system implementation. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

Operating Systems. Lecture File system implementation. Master of Computer Science PUF - Hồ Chí Minh 2016/2017 Operating Systems Lecture 7.2 - File system implementation Adrien Krähenbühl Master of Computer Science PUF - Hồ Chí Minh 2016/2017 Design FAT or indexed allocation? UFS, FFS & Ext2 Journaling with Ext3

More information

SHHC: A Scalable Hybrid Hash Cluster for Cloud Backup Services in Data Centers

SHHC: A Scalable Hybrid Hash Cluster for Cloud Backup Services in Data Centers 2011 31st International Conference on Distributed Computing Systems Workshops SHHC: A Scalable Hybrid Hash Cluster for Cloud Backup Services in Data Centers Lei Xu, Jian Hu, Stephen Mkandawire and Hong

More information

Deduplication Storage System

Deduplication Storage System Deduplication Storage System Kai Li Charles Fitzmorris Professor, Princeton University & Chief Scientist and Co-Founder, Data Domain, Inc. 03/11/09 The World Is Becoming Data-Centric CERN Tier 0 Business

More information

Record Placement Based on Data Skew Using Solid State Drives

Record Placement Based on Data Skew Using Solid State Drives Record Placement Based on Data Skew Using Solid State Drives Jun Suzuki 1, Shivaram Venkataraman 2, Sameer Agarwal 2, Michael Franklin 2, and Ion Stoica 2 1 Green Platform Research Laboratories, NEC j-suzuki@ax.jp.nec.com

More information

Alternative Approaches for Deduplication in Cloud Storage Environment

Alternative Approaches for Deduplication in Cloud Storage Environment International Journal of Computational Intelligence Research ISSN 0973-1873 Volume 13, Number 10 (2017), pp. 2357-2363 Research India Publications http://www.ripublication.com Alternative Approaches for

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771 Overview Google file system is a scalable distributed file system

More information

dedupv1: Improving Deduplication Throughput using Solid State Drives (SSD)

dedupv1: Improving Deduplication Throughput using Solid State Drives (SSD) University Paderborn Paderborn Center for Parallel Computing Technical Report dedupv1: Improving Deduplication Throughput using Solid State Drives (SSD) Dirk Meister Paderborn Center for Parallel Computing

More information

Deploying De-Duplication on Ext4 File System

Deploying De-Duplication on Ext4 File System Deploying De-Duplication on Ext4 File System Usha A. Joglekar 1, Bhushan M. Jagtap 2, Koninika B. Patil 3, 1. Asst. Prof., 2, 3 Students Department of Computer Engineering Smt. Kashibai Navale College

More information

A Scalable Inline Cluster Deduplication Framework for Big Data Protection

A Scalable Inline Cluster Deduplication Framework for Big Data Protection University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln CSE Technical reports Computer Science and Engineering, Department of Summer 5-30-2012 A Scalable Inline Cluster Deduplication

More information

Single-pass restore after a media failure. Caetano Sauer, Goetz Graefe, Theo Härder

Single-pass restore after a media failure. Caetano Sauer, Goetz Graefe, Theo Härder Single-pass restore after a media failure Caetano Sauer, Goetz Graefe, Theo Härder 20% of drives fail after 4 years High failure rate on first year (factory defects) Expectation of 50% for 6 years https://www.backblaze.com/blog/how-long-do-disk-drives-last/

More information

HPDedup: A Hybrid Prioritized Data Deduplication Mechanism for Primary Storage in the Cloud

HPDedup: A Hybrid Prioritized Data Deduplication Mechanism for Primary Storage in the Cloud HPDedup: A Hybrid Prioritized Data Deduplication Mechanism for Primary Storage in the Cloud Huijun Wu 1,4, Chen Wang 2, Yinjin Fu 3, Sherif Sakr 1, Liming Zhu 1,2 and Kai Lu 4 The University of New South

More information

The Logic of Physical Garbage Collection in Deduplicating Storage

The Logic of Physical Garbage Collection in Deduplicating Storage The Logic of Physical Garbage Collection in Deduplicating Storage Fred Douglis Abhinav Duggal Philip Shilane Tony Wong Dell EMC Shiqin Yan University of Chicago Fabiano Botelho Rubrik 1 Deduplication in

More information

DEC: An Efficient Deduplication-Enhanced Compression Approach

DEC: An Efficient Deduplication-Enhanced Compression Approach 2016 IEEE 22nd International Conference on Parallel and Distributed Systems DEC: An Efficient Deduplication-Enhanced Compression Approach Zijin Han, Wen Xia, Yuchong Hu *, Dan Feng, Yucheng Zhang, Yukun

More information

CHAPTER 5 PROPAGATION DELAY

CHAPTER 5 PROPAGATION DELAY 98 CHAPTER 5 PROPAGATION DELAY Underwater wireless sensor networks deployed of sensor nodes with sensing, forwarding and processing abilities that operate in underwater. In this environment brought challenges,

More information

Efficient Hybrid Inline and Out-of-Line Deduplication for Backup Storage

Efficient Hybrid Inline and Out-of-Line Deduplication for Backup Storage Efficient Hybrid Inline and Out-of-Line Deduplication for Backup Storage YAN-KIT LI, MIN XU, CHUN-HO NG, and PATRICK P. C. LEE, The Chinese University of Hong Kong 2 Backup storage systems often remove

More information

Oasis: An Active Storage Framework for Object Storage Platform

Oasis: An Active Storage Framework for Object Storage Platform Oasis: An Active Storage Framework for Object Storage Platform Yulai Xie 1, Dan Feng 1, Darrell D. E. Long 2, Yan Li 2 1 School of Computer, Huazhong University of Science and Technology Wuhan National

More information

Storage Efficiency Opportunities and Analysis for Video Repositories

Storage Efficiency Opportunities and Analysis for Video Repositories Storage Efficiency Opportunities and Analysis for Video Repositories Suganthi Dewakar Sethuraman Subbiah Gokul Soundararajan Mike Wilson Mark W. Storer Box Inc. Kishore Udayashankar Exablox Kaladhar Voruganti

More information

Reducing Replication Bandwidth for Distributed Document Databases

Reducing Replication Bandwidth for Distributed Document Databases Reducing Replication Bandwidth for Distributed Document Databases Lianghong Xu 1, Andy Pavlo 1, Sudipta Sengupta 2 Jin Li 2, Greg Ganger 1 Carnegie Mellon University 1, Microsoft Research 2 Document-oriented

More information

Correlation based File Prefetching Approach for Hadoop

Correlation based File Prefetching Approach for Hadoop IEEE 2nd International Conference on Cloud Computing Technology and Science Correlation based File Prefetching Approach for Hadoop Bo Dong 1, Xiao Zhong 2, Qinghua Zheng 1, Lirong Jian 2, Jian Liu 1, Jie

More information

SMD149 - Operating Systems - File systems

SMD149 - Operating Systems - File systems SMD149 - Operating Systems - File systems Roland Parviainen November 21, 2005 1 / 59 Outline Overview Files, directories Data integrity Transaction based file systems 2 / 59 Files Overview Named collection

More information

Veeam and HP: Meet your backup data protection goals

Veeam and HP: Meet your backup data protection goals Sponsored by Veeam and HP: Meet your backup data protection goals Eric Machabert Сonsultant and virtualization expert Introduction With virtualization systems becoming mainstream in recent years, backups

More information

Compression and Decompression of Virtual Disk Using Deduplication

Compression and Decompression of Virtual Disk Using Deduplication Compression and Decompression of Virtual Disk Using Deduplication Bharati Ainapure 1, Siddhant Agarwal 2, Rukmi Patel 3, Ankita Shingvi 4, Abhishek Somani 5 1 Professor, Department of Computer Engineering,

More information

Operating Systems. Operating Systems Professor Sina Meraji U of T

Operating Systems. Operating Systems Professor Sina Meraji U of T Operating Systems Operating Systems Professor Sina Meraji U of T How are file systems implemented? File system implementation Files and directories live on secondary storage Anything outside of primary

More information

[537] Fast File System. Tyler Harter

[537] Fast File System. Tyler Harter [537] Fast File System Tyler Harter File-System Case Studies Local - FFS: Fast File System - LFS: Log-Structured File System Network - NFS: Network File System - AFS: Andrew File System File-System Case

More information

HP Dynamic Deduplication achieving a 50:1 ratio

HP Dynamic Deduplication achieving a 50:1 ratio HP Dynamic Deduplication achieving a 50:1 ratio Table of contents Introduction... 2 Data deduplication the hottest topic in data protection... 2 The benefits of data deduplication... 2 How does data deduplication

More information

ASEP: An Adaptive Sequential Prefetching Scheme for Second-level Storage System

ASEP: An Adaptive Sequential Prefetching Scheme for Second-level Storage System JOURNAL OF COMPUTERS, VOL. 7, NO. 8, AUGUST 2012 1853 : An Adaptive Sequential Prefetching Scheme for Second-level Storage System Xiaodong Shi Computer College, Huazhong University of Science and Technology,

More information

A Comparative Survey on Big Data Deduplication Techniques for Efficient Storage System

A Comparative Survey on Big Data Deduplication Techniques for Efficient Storage System A Comparative Survey on Big Data Techniques for Efficient Storage System Supriya Milind More Sardar Patel Institute of Technology Kailas Devadkar Sardar Patel Institute of Technology ABSTRACT - Nowadays

More information

Chapter 14: File-System Implementation

Chapter 14: File-System Implementation Chapter 14: File-System Implementation Directory Implementation Allocation Methods Free-Space Management Efficiency and Performance Recovery 14.1 Silberschatz, Galvin and Gagne 2013 Objectives To describe

More information

Data De-duplication for Distributed Segmented Parallel FS

Data De-duplication for Distributed Segmented Parallel FS Data De-duplication for Distributed Segmented Parallel FS Boris Zuckerman & Oskar Batuner Hewlett-Packard Co. Objectives Expose fundamentals of highly distributed segmented parallel file system architecture

More information

Chapter 1. Storage Concepts. CommVault Concepts & Design Strategies: https://www.createspace.com/

Chapter 1. Storage Concepts. CommVault Concepts & Design Strategies: https://www.createspace.com/ Chapter 1 Storage Concepts 4 - Storage Concepts In order to understand CommVault concepts regarding storage management we need to understand how and why we protect data, traditional backup methods, and

More information

White paper ETERNUS CS800 Data Deduplication Background

White paper ETERNUS CS800 Data Deduplication Background White paper ETERNUS CS800 - Data Deduplication Background This paper describes the process of Data Deduplication inside of ETERNUS CS800 in detail. The target group consists of presales, administrators,

More information

Google File System. Arun Sundaram Operating Systems

Google File System. Arun Sundaram Operating Systems Arun Sundaram Operating Systems 1 Assumptions GFS built with commodity hardware GFS stores a modest number of large files A few million files, each typically 100MB or larger (Multi-GB files are common)

More information

Purity: building fast, highly-available enterprise flash storage from commodity components

Purity: building fast, highly-available enterprise flash storage from commodity components Purity: building fast, highly-available enterprise flash storage from commodity components J. Colgrove, J. Davis, J. Hayes, E. Miller, C. Sandvig, R. Sears, A. Tamches, N. Vachharajani, and F. Wang 0 Gala

More information

EaSync: A Transparent File Synchronization Service across Multiple Machines

EaSync: A Transparent File Synchronization Service across Multiple Machines EaSync: A Transparent File Synchronization Service across Multiple Machines Huajian Mao 1,2, Hang Zhang 1,2, Xianqiang Bao 1,2, Nong Xiao 1,2, Weisong Shi 3, and Yutong Lu 1,2 1 State Key Laboratory of

More information

Maintaining Mutual Consistency for Cached Web Objects

Maintaining Mutual Consistency for Cached Web Objects Maintaining Mutual Consistency for Cached Web Objects Bhuvan Urgaonkar, Anoop George Ninan, Mohammad Salimullah Raunak Prashant Shenoy and Krithi Ramamritham Department of Computer Science, University

More information

Four Steps to Unleashing The Full Potential of Your Database

Four Steps to Unleashing The Full Potential of Your Database Four Steps to Unleashing The Full Potential of Your Database This insightful technical guide offers recommendations on selecting a platform that helps unleash the performance of your database. What s the

More information

Improving Backup and Restore Performance for Deduplication-based Cloud Backup Services

Improving Backup and Restore Performance for Deduplication-based Cloud Backup Services University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Computer Science and Engineering: Theses, Dissertations, and Student Research Computer Science and Engineering, Department

More information

UNIT III MEMORY MANAGEMENT

UNIT III MEMORY MANAGEMENT UNIT III MEMORY MANAGEMENT TOPICS TO BE COVERED 3.1 Memory management 3.2 Contiguous allocation i Partitioned memory allocation ii Fixed & variable partitioning iii Swapping iv Relocation v Protection

More information

File system internals Tanenbaum, Chapter 4. COMP3231 Operating Systems

File system internals Tanenbaum, Chapter 4. COMP3231 Operating Systems File system internals Tanenbaum, Chapter 4 COMP3231 Operating Systems Architecture of the OS storage stack Application File system: Hides physical location of data on the disk Exposes: directory hierarchy,

More information

Mahanaxar: Quality of Service Guarantees in High-Bandwidth, Real-Time Streaming Data Storage

Mahanaxar: Quality of Service Guarantees in High-Bandwidth, Real-Time Streaming Data Storage Mahanaxar: Quality of Service Guarantees in High-Bandwidth, Real-Time Streaming Data Storage David Bigelow, Scott Brandt, John Bent, HB Chen Systems Research Laboratory University of California, Santa

More information

There is a general need for long-term and shared data storage: Files meet these requirements The file manager or file system within the OS

There is a general need for long-term and shared data storage: Files meet these requirements The file manager or file system within the OS Why a file system? Why a file system There is a general need for long-term and shared data storage: need to store large amount of information persistent storage (outlives process and system reboots) concurrent

More information

Process size is independent of the main memory present in the system.

Process size is independent of the main memory present in the system. Hardware control structure Two characteristics are key to paging and segmentation: 1. All memory references are logical addresses within a process which are dynamically converted into physical at run time.

More information

DEDUPLICATION AWARE AND DUPLICATE ELIMINATION SCHEME FOR DATA REDUCTION IN BACKUP STORAGE SYSTEMS

DEDUPLICATION AWARE AND DUPLICATE ELIMINATION SCHEME FOR DATA REDUCTION IN BACKUP STORAGE SYSTEMS DEDUPLICATION AWARE AND DUPLICATE ELIMINATION SCHEME FOR DATA REDUCTION IN BACKUP STORAGE SYSTEMS NIMMAGADDA SRIKANTHI, DR G.RAMU, YERRAGUDIPADU Department Of Computer Science and Professor, Department

More information

COMMVAULT. Enabling high-speed WAN backups with PORTrockIT

COMMVAULT. Enabling high-speed WAN backups with PORTrockIT COMMVAULT Enabling high-speed WAN backups with PORTrockIT EXECUTIVE SUMMARY Commvault offers one of the most advanced and full-featured data protection solutions on the market, with built-in functionalities

More information

CS 111. Operating Systems Peter Reiher

CS 111. Operating Systems Peter Reiher Operating System Principles: File Systems Operating Systems Peter Reiher Page 1 Outline File systems: Why do we need them? Why are they challenging? Basic elements of file system design Designing file

More information

Clustering and Reclustering HEP Data in Object Databases

Clustering and Reclustering HEP Data in Object Databases Clustering and Reclustering HEP Data in Object Databases Koen Holtman CERN EP division CH - Geneva 3, Switzerland We formulate principles for the clustering of data, applicable to both sequential HEP applications

More information

Computer Architecture and System Software Lecture 09: Memory Hierarchy. Instructor: Rob Bergen Applied Computer Science University of Winnipeg

Computer Architecture and System Software Lecture 09: Memory Hierarchy. Instructor: Rob Bergen Applied Computer Science University of Winnipeg Computer Architecture and System Software Lecture 09: Memory Hierarchy Instructor: Rob Bergen Applied Computer Science University of Winnipeg Announcements Midterm returned + solutions in class today SSD

More information

Hedvig as backup target for Veeam

Hedvig as backup target for Veeam Hedvig as backup target for Veeam Solution Whitepaper Version 1.0 April 2018 Table of contents Executive overview... 3 Introduction... 3 Solution components... 4 Hedvig... 4 Hedvig Virtual Disk (vdisk)...

More information

Technology Insight Series

Technology Insight Series IBM ProtecTIER Deduplication for z/os John Webster March 04, 2010 Technology Insight Series Evaluator Group Copyright 2010 Evaluator Group, Inc. All rights reserved. Announcement Summary The many data

More information

Locality and The Fast File System. Dongkun Shin, SKKU

Locality and The Fast File System. Dongkun Shin, SKKU Locality and The Fast File System 1 First File System old UNIX file system by Ken Thompson simple supported files and the directory hierarchy Kirk McKusick The problem: performance was terrible. Performance

More information

A file system is a clearly-defined method that the computer's operating system uses to store, catalog, and retrieve files.

A file system is a clearly-defined method that the computer's operating system uses to store, catalog, and retrieve files. File Systems A file system is a clearly-defined method that the computer's operating system uses to store, catalog, and retrieve files. Module 11: File-System Interface File Concept Access :Methods Directory

More information

DASH COPY GUIDE. Published On: 11/19/2013 V10 Service Pack 4A Page 1 of 31

DASH COPY GUIDE. Published On: 11/19/2013 V10 Service Pack 4A Page 1 of 31 DASH COPY GUIDE Published On: 11/19/2013 V10 Service Pack 4A Page 1 of 31 DASH Copy Guide TABLE OF CONTENTS OVERVIEW GETTING STARTED ADVANCED BEST PRACTICES FAQ TROUBLESHOOTING DASH COPY PERFORMANCE TUNING

More information

Utilization of Data Deduplication towards Improvement of Primary Storage Performance in Cloud

Utilization of Data Deduplication towards Improvement of Primary Storage Performance in Cloud Utilization of Data Deduplication towards Improvement of Primary Storage Performance in Cloud P.Sai Sandip 1, P.Rajeshwari 2, Dr.G.Vishnu Murthy 3 M.Tech Student, Computer Science Dept., Anurag Group of

More information

A multilingual reference based on cloud pattern

A multilingual reference based on cloud pattern A multilingual reference based on cloud pattern G.Rama Rao Department of Computer science and Engineering, Christu Jyothi Institute of Technology and Science, Jangaon Abstract- With the explosive growth

More information

Operating Systems Memory Management. Mathieu Delalandre University of Tours, Tours city, France

Operating Systems Memory Management. Mathieu Delalandre University of Tours, Tours city, France Operating Systems Memory Management Mathieu Delalandre University of Tours, Tours city, France mathieu.delalandre@univ-tours.fr 1 Operating Systems Memory Management 1. Introduction 2. Contiguous memory

More information

FCFS: On-Disk Design Revision: 1.8

FCFS: On-Disk Design Revision: 1.8 Revision: 1.8 Date: 2003/07/06 12:26:43 1 Introduction This document describes the on disk format of the FCFSobject store. 2 Design Constraints 2.1 Constraints from attributes of physical disks The way

More information

A Comparison of File. D. Roselli, J. R. Lorch, T. E. Anderson Proc USENIX Annual Technical Conference

A Comparison of File. D. Roselli, J. R. Lorch, T. E. Anderson Proc USENIX Annual Technical Conference A Comparison of File System Workloads D. Roselli, J. R. Lorch, T. E. Anderson Proc. 2000 USENIX Annual Technical Conference File System Performance Integral component of overall system performance Optimised

More information

Improving throughput for small disk requests with proximal I/O

Improving throughput for small disk requests with proximal I/O Improving throughput for small disk requests with proximal I/O Jiri Schindler with Sandip Shete & Keith A. Smith Advanced Technology Group 2/16/2011 v.1.3 Important Workload in Datacenters Serial reads

More information

L9: Storage Manager Physical Data Organization

L9: Storage Manager Physical Data Organization L9: Storage Manager Physical Data Organization Disks and files Record and file organization Indexing Tree-based index: B+-tree Hash-based index c.f. Fig 1.3 in [RG] and Fig 2.3 in [EN] Functional Components

More information

A LOW-COMPLEXITY AND LOSSLESS REFERENCE FRAME ENCODER ALGORITHM FOR VIDEO CODING

A LOW-COMPLEXITY AND LOSSLESS REFERENCE FRAME ENCODER ALGORITHM FOR VIDEO CODING 2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) A LOW-COMPLEXITY AND LOSSLESS REFERENCE FRAME ENCODER ALGORITHM FOR VIDEO CODING Dieison Silveira, Guilherme Povala,

More information

Identifying and Eliminating Backup System Bottlenecks: Taking Your Existing Backup System to the Next Level

Identifying and Eliminating Backup System Bottlenecks: Taking Your Existing Backup System to the Next Level Identifying and Eliminating Backup System Bottlenecks: Taking Your Existing Backup System to the Next Level Jacob Farmer, CTO Cambridge Computer SNIA Legal Notice The material contained in this tutorial

More information

C13: Files and Directories: System s Perspective

C13: Files and Directories: System s Perspective CISC 7310X C13: Files and Directories: System s Perspective Hui Chen Department of Computer & Information Science CUNY Brooklyn College 4/19/2018 CUNY Brooklyn College 1 File Systems: Requirements Long

More information

File Systems: FFS and LFS

File Systems: FFS and LFS File Systems: FFS and LFS A Fast File System for UNIX McKusick, Joy, Leffler, Fabry TOCS 1984 The Design and Implementation of a Log- Structured File System Rosenblum and Ousterhout SOSP 1991 Presented

More information

CS720 - Operating Systems

CS720 - Operating Systems CS720 - Operating Systems File Systems File Concept Access Methods Directory Structure File System Mounting File Sharing - Protection 1 File Concept Contiguous logical address space Types: Data numeric

More information

Drive Space Efficiency Using the Deduplication/Compression Function of the FUJITSU Storage ETERNUS AF series and ETERNUS DX S4/S3 series

Drive Space Efficiency Using the Deduplication/Compression Function of the FUJITSU Storage ETERNUS AF series and ETERNUS DX S4/S3 series White Paper Drive Space Efficiency Using the Function of the FUJITSU Storage ETERNUS F series and ETERNUS DX S4/S3 series The function is provided by the FUJITSU Storage ETERNUS F series and ETERNUS DX

More information

Data Deduplication Methods for Achieving Data Efficiency

Data Deduplication Methods for Achieving Data Efficiency Data Deduplication Methods for Achieving Data Efficiency Matthew Brisse, Quantum Gideon Senderov, NEC... SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies

More information

Table Compression in Oracle9i Release2. An Oracle White Paper May 2002

Table Compression in Oracle9i Release2. An Oracle White Paper May 2002 Table Compression in Oracle9i Release2 An Oracle White Paper May 2002 Table Compression in Oracle9i Release2 Executive Overview...3 Introduction...3 How It works...3 What can be compressed...4 Cost and

More information

Finding a needle in Haystack: Facebook's photo storage

Finding a needle in Haystack: Facebook's photo storage Finding a needle in Haystack: Facebook's photo storage The paper is written at facebook and describes a object storage system called Haystack. Since facebook processes a lot of photos (20 petabytes total,

More information

PERSISTENCE: FSCK, JOURNALING. Shivaram Venkataraman CS 537, Spring 2019

PERSISTENCE: FSCK, JOURNALING. Shivaram Venkataraman CS 537, Spring 2019 PERSISTENCE: FSCK, JOURNALING Shivaram Venkataraman CS 537, Spring 2019 ADMINISTRIVIA Project 4b: Due today! Project 5: Out by tomorrow Discussion this week: Project 5 AGENDA / LEARNING OUTCOMES How does

More information

The Google File System

The Google File System October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single

More information