Parallelizing Inline Data Reduction Operations for Primary Storage Systems

Size: px
Start display at page:

Download "Parallelizing Inline Data Reduction Operations for Primary Storage Systems"

Transcription

1 Parallelizing Inline Data Reduction Operations for Primary Storage Systems Jeonghyeon Ma ( ) and Chanik Park Department of Computer Science and Engineering, POSTECH, Pohang, South Korea {doitnow0415,cipark}@postech.ac.kr Abstract. Data reduction operations such as deduplication and compression are widely used to save storage capacity in primary storage system. These operations are compute-intensive. High performance storage devices like SSDs are widely used in most primary storage systems. Therefore, data reduction operations become a performance bottleneck in SSD-based primary storage systems. In this paper, we propose a parallel data reduction technique on data deduplication and compression utilizing both multi-core CPU and GPU in an integrated manner. First, we introduce bin-based data deduplication, a parallel technique on deduplication, where CPU-based parallelism is mainly applied whereas GPU is utilized as co-processor of CPU. Second, we also propose a parallel technique on compression, where main computation is done by GPU while CPU is responsible only for post-processing. Third, we propose a parallel technique handling both deduplication and compression in an integrated manner, where our technique controls when and how to use GPU. Experimental evaluation shows that our proposed techniques can achieve 15.0%, 88.3%, and 89.7% better throughput than the case where only CPU is applied for deduplication, compression, and integrated data reductions, respectively. Our proposed technique enables easy application of data reduction operations to SSD-based primary storage systems. Keywords: Primary storage Inline data reduction scheme GPU 1 Introduction Data reduction operations such as data de-duplication and compression are widely used to save storage capacity on primary storage systems. In recent years, however, replacing primary storage systems from HDD-based to SSD-based has exposed the computational overhead of data reduction operations, making it difficult to apply data reduction operations to storage systems. One way to conceal the overhead of data reduction operations is to store all of the data on the storage system and then perform data reduction in the background when the system is idle. However, this generates more write I/O than systems without the data reduction operations. Therefore, it is not applicable to SSDbased storage systems due to write endurance problems. A way to increase the lifetime of SSD-based storage systems is to apply data reduction operations to the critical I/O paths. However, applying them to the critical I/O paths can significantly degrade I/O performance. One way to improve the throughput of data reduction is to take advantage Springer International Publishing AG 2017 V. Malyshkin (Ed.): PaCT 2017, LNCS 10421, pp , DOI: / _29

2 302 J. Ma and C. Park of GPUs designed to calculate computation-intensive workloads. However, depending on the workload, the performance of the CPU-based parallel data reduction operations may be better than GPU-based techniques. In this paper, we propose an inline parallel data reduction operations based on multicore CPU and GPU for primary storage systems. To do this, we design a parallel deduplication and compression method considering multi-core CPU and GPU architecture, and finally we show how to integrate CPU and GPU-based data reduction operations. 2 Background Data reduction operations such as deduplication and compression are widely used to save storage capacity on primary storage systems. This section describes the basic tasks of data reduction operations and the performance bottlenecks. Deduplication is performed in four stages: chunking, hashing, indexing, and destaging. Chunking is the process of breaking a data stream into chunks, which is the base unit for checking the redundancy of data. Hashing is the process of calculating the hash value of each chunk. The hash value is used as an identifier for the chunk. Indexing is the process of comparing the hash value of each chunk with the hash values of already stored chunks to determine whether it is a duplicate. If the chunk is found to be unique, a destaging step is performed to store the chunk on the storage device. Of these stages, hashing and indexing are the main performance bottlenecks in deduplication systems. Previous work [1] has also attempted to address these two major performance bottlenecks. Among the compression algorithms, LZ-based compression algorithms are widely used in main storage systems due to their simplicity and effectiveness [2]. The history buffer and the look-ahead buffer are used to perform LZ compression. If characters in the same order are found in both the history buffer and the look-ahead buffer, the character sequence in the look-ahead buffer is replaced by a pointer to the character sequence in the history buffer. Matching the entire string is a performance bottleneck. 3 Design and Implementation 3.1 Parallel Data Deduplication on Multi-core CPU and GPU There is no data dependency between chunks when the hash value of the chunk is calculated in the hashing phase. This allows us to easily calculate multiple chunks at once in a natural parallel manner. However, parallelizing the indexing is more complicated than the hashing. This is because the hash table used to determine the chunk s redundancy is globally shared across all computing threads. Therefore, this section describes how indexing is parallelized on the multi-core CPU and GPU, and how it applies to the primary storage system. (1) How to Parallelize Indexing on the CPU: we divide the hash table into several small hash tables called bin so that multiple computing threads can check the chunks of multiple hash tables at the same time without locking mechanism. This is a technique

3 Parallelizing Inline Data Reduction Operations 303 that was commonly used in existing DHT-based systems. We call this operation binbased indexing. In addition, to avoid disk access that significantly degrades performance, hash table entries are kept in memory space only, not disk space. Due to this index management policy, the deduplication module cannot find some duplicate data. However that is not a big deal. Assuming that the storage capacity is 4 TB, the chunk size is 8 KB, and the index size is 32 bytes, including the hash size (SHA1, 20 bytes) and other metadata, the storage system requires 16 GB of memory for the index. That is, if primary storage is the target, it does not require that much memory. In addition, the way to reduce memory consumption is to remove the prefix value of the hash entry. If the prefix value is n bytes, the deduplication system keeps only 20-n bytes for each hash value. If the storage system uses a 2-byte prefix value, we can save 1 GB of memory in this way. (2) How to Parallelize Indexing on the GPU: parallel processing of GPU indexing needs to take into account the architectural characteristics of the GPU. First, the GPU is connected to the system memory via the PCI interface, and the data used for the calculation must be transferred from the system memory to the GPU device memory. Second, GPU threads in the same workgroup run the same command regardless of branching, even though each thread has its own execution path. Therefore, many branch operations can degrade computational performance. This means we have to design the GPU code in a rather simple way. Third, GPUs have many computing cores and large memory bandwidth. Therefore, we can calculate large amounts of data at a time. This means that allocating data to all computing cores and setting up data layouts is critical to taking full advantage of all GPU resources. The GPU also performs bin-based indexing just like on a CPU. However, considering the characteristics of the advanced GPU architecture, we organize one bin into a linear table structure rather than a tree structure. This continuous data layout is useful when utilizing the GPU s local memory. This is because copying data from GPU global memory to local memory can be done naturally if the thread accesses the data continuously. It also does not cause multiple branch operations. The GPU can check the redundancy of data by comparing a single hash table. Also, only the hash value persists in GPU memory, and other metadata in the chunk is maintained in system memory. This is because transferring data can be a direct update process. This means that there is no other hash table update overhead on the GPU. Therefore, the result of whether an index is hit or not includes an index number and a hit/miss information pair. The metadata space structure in system memory then uses the results of the GPU. (3) When to use GPU for indexing: we decide how to apply GPU for indexing. To do this, we compare the CPU and GPU indexing performance. The number of hash table entries used for indexing remains the same on the CPU and GPU for a fair comparison. Preliminary experiments show that CPU performance is 4.16 to 5.45 times better than GPU performance in terms of execution time. For GPU indexing, the execution time is fixed because of the inevitable time at which the GPU kernel starts. This means that even with high-performance GPUs, there is a limit to optimizing indexing on the GPU. Therefore, we decide to use GPU only when CPU utilization is full and there is still some work to do for indexing.

4 304 J. Ma and C. Park 3.2 Parallel Data Compression on Multi-core CPU and GPU In this section, we focus on the way to parallelize LZ compression schemes that are commonly used in primary storage systems. (1) How to parallelize compression for CPU: As with hashing operations, there is no data dependency between chunks, so we can run compression independently on each chunk. CPU-based compression algorithms have been well studied previously. Therefore, the compute is parallelized by the CPU by assigning a computing thread that runs the previously studied compression algorithm to each chunk. (2) How to parallelize compression for GPU: Ozsoy et al. [3] introduced a parallel compression algorithm on the GPU. This algorithm divides the data into several sub-blocks and calculates the compression result in each sub-block and merges it in the CPU. This algorithm has a weakness to apply as a compression algorithm for primary storage systems. This algorithm assumes that the size of the data to be compressed is large enough to take full advantage of the GPU resources. This means that it does not work well for small-sized target data. The size of the chunk is 4 KB. Only a small number of computing cores can be allocated to compute the compression result of 4 KB chunks. Therefore, we design a compression algorithm that computes the chunk compression results at a time. The GPU allocates multiple threads for each chunk. Each stage performs its own LZ compression algorithm with its own history buffer and look-ahead buffer. Adjacent threads inspect overlapping regions by the size of the history buffer. The GPU s compression results are not refined in GPU due to performance issues. Therefore, the CPU must refine the results. It is called as post-processing. (3) How to use the GPU for compression: we compare the compression performance of the CPU and the GPU to determine when to use the GPU. Experimental results show that GPU performance is 88.3% better than CPU performance in terms of execution time (In Sect. 4). The performance gap is large. Therefore, the GPU performs compression and the CPU is used for refinement. 3.3 Putting It All Together This section describes how to incorporate two parallel data reduction operations called deduplication and compression. First, we need to determine the order of which operation should be applied. Based on the result of [5], we adopt deduplication-before-compression order for higher data reduction ratio. Second, we add a bin buffer structure to the data deduplication algorithm. The bin buffer is used to temporarily store a hash for each bin before moving each bin to the GPU memory and bin tree. When the buffer is full, the hash is immediately flushed from the buffer to the storage. This creates the appropriate sequential writes for the SSD. Figure 1 shows a workflow that incorporates deduplication and compression operations on the CPU and GPU. GPU indexing is performed if the GPU is available, and CPU indexing is performed if duplicate hashes are not found. For the CPU indexing path, the bin buffer is checked first, because recently updated chunks can reside in the bin buffer and chunks are more likely to find duplicates in the bin buffer due to temporal locality. If there are no duplicates in the bin buffer, check the

5 Parallelizing Inline Data Reduction Operations 305 bin tree to store most of the hash table entries. If we cannot find any duplicate, then the chunk is regarded a unique chunk. Therefore, the chunk becomes the compression target. After compressing the data, the bin buffer is updated because the chunks are unique. If the bin buffer becomes full, the buffer will be flushed to the storage. And then, GPU bin in GPU memory are updated accordingly. Currently, random based replacement policy is applied. Fig. 1. An integrated workflow of deduplication and compression proposed for data reduction operations 4 Evaluation This section evaluates the throughput of the parallel data reduction operations on the CPU and GPU. The vdbench is used to generate datasets. Our test machine equipped with Intel i k, Radeon HD 7970, and 16 GB main memory. The vdbench [4] is used to generate the dataset. The size of the data stream is about 2 GB. The deduplication and compression ratio are set to 2.0, which is a common ratio for primary storage systems. We compare our schemes with the throughput of Samsung SSD 830. In this section, the Samsung SSD 830 is simply referred to as the SSD. (1) Parallel data deduplication: the GPU performs indexing of only a small portion of the chunk. The workflow for the integrated CPU and GPU for indexing is the same as in Fig. 1, except for the compression phase. Experimental results show that the GPU-supported data deduplication scheme can improve throughput by 15% over CPU-only data deduplication scheme. In addition, it shows three times the throughput of the SSD. (2) Parallel data compression: the proposed technique uses the GPU for compression and the CPU for post-processing of compression. Due to the nature of the compression technique, the throughput is high when the compression ratio is high. The CPUbased compression method has lower performance (about 50 K IOPS) than SSD throughput (about 80 K IOPS) when the compression ratio is low, but the GPUbased parallel compression method has the performance of 100 K IOPS even when the compression ratio is low. It always shows higher performance than SSD throughput. (3) Putting it all together - Parallelizing both data deduplication and compression together: In an environment where CPU and GPU are available, there are several

6 306 J. Ma and C. Park options for integrating two data reduction operations, deduplication and compression. The first option is to use the GPU in two data reduction operations. The second option is to use the GPU for only one data reduction operation. The last option is that both data reduction operations do not use the GPU at all. The last option may be useful when the performance of the GPU is poor. Figure 2 shows the throughput of these options. Fig. 2. Throughput comparison of integration methods Allocating the GPU for compression is the best choice among the integration methods. This is because data compression, which has a high performance gain when using a GPU, monopolizes the GPU. However, because hardware specifications may be different on different platforms, we cannot guarantee that this integration is always right. Therefore, before assigning processors to each data reduction operation, the performance of these integration methods is compared using dummy I/O to determine the best fit for throughput. Therefore, we can ensure the best performance even if the target platform is different. 5 Related Works There have been lots of previous researches which investigated the way to improve the throughput of data reduction operations. There exist some researches exploiting parallelism in data deduplication system. Xia W. et al. [6] proposed multicore-based parallel data deduplication approach. However, the problem is that they did not consider the operation of indexing which is known as main bottleneck in data deduplication [1]. Kim et al. [7] proposed GPUbased data deduplication approach for the primary storage. However, they did not consider utilizing CPU that performs better than GPU for indexing operation. There exist some researches exploiting GPU parallelism for compression operation. Ozsoy et al. [3] introduced the parallel compression algorithms on GPU. However, the compression target data are quite large to utilize GPU resource fully. This feature does not match with primary storage system that conducts compression for 4 KB of several chunks. Moreover, there exist researches introducing CPU parallel algorithms for compression. Shmuel et al. [8] introduce the algorithm for the compression executed

7 Parallelizing Inline Data Reduction Operations 307 using the tree-structured hierarchy. Gonzalo et al. [9] introduce the algorithm dividing data stream into several small subset and allocating each threads to the subset of data. Even they parallelize the compression for CPU, our GPU-based approach is better than at least about 88.3%. There exists a research analyzing the effect of mixing two data reduction operations, deduplication and compression. Constantinescu et al. [5] analyze the data reduction ratio when deduplication and compression are applied together. However, it focuses only data reduction ratio, not throughput. 6 Conclusion Throughput is becoming more important as data reduction operations are applied to save space on SSD-based primary storage systems. To solve this problem, we proposed parallel data reduction operations using multi-core CPU and GPU. We also showed how to integrate deduplication and compression technologies on multicore CPUs and GPUs. Applying our parallel approach to deduplication is 3 times better than SSD s throughput. For compression, the throughput of the parallel compression method supported by the GPU is 88.3% better than the average throughput of parallel QuickLZ. Finally, GPUsupported integration shows a performance improvement of 89.7% over parallel data reduction operations using CPU (deduplication ratio 2.0, compression 2.0). This means that our proposed technique enables easy application of data reduction operations to SSD-based primary storage systems. References 1. Guo, F., Efstathopoulos, P.: Building a high-performance deduplication system: In: USENIX Annual Technical Conference (2011) 2. De Agostino, S.: Lempel-Ziv data compression on parallel and distributed systems. Algorithms 4, (2011) 3. Ozsoy, A., Swany, M., Chauhan, A.: Pipelined parallel LZSS for streaming data compression on GPGPUs. In: Parallel and Distributed Systems, pp (2012) 4. Berryman, A., Calyam, P., Honigford, M., Lai, A.M.: Vdbench: a benchmarking toolkit for thin-client based virtual desktop environments. In: Cloud Computing Technology and Science, pp (2010) 5. Constantinescu, C., Glider, J., Chambliss, D.: Mixing deduplication and compression on active data sets: In: Data Compression Conference, pp (2011) 6. Xia, W., Jiang, H., Feng, D., Tian, L., Fu, M., Wang, Z.: P-dedupe: exploiting parallelism in data deduplication system: In: Networking, Architecture and Storage, pp (2012) 7. Kim, C., Park, K.W., Park, K.H.: GHOST: GPGPU-offloaded high performance storage I/O deduplication for primary storage system. In: Proceedings of the International Workshop on Programming Models and Applications for Multicores and Manycores, pp (2012) 8. Klein, S.T., Wiseman, Y.: Parallel Lempel Ziv coding (extended abstract). In: Amir, A. (ed.) CPM LNCS, vol. 2089, pp Springer, Heidelberg (2001). doi: / X_2 9. Navarro, G., Raffinot, M.: Practical and flexible pattern matching over Ziv-Lempel compressed text. J. Discrete Algorithms 2, (2004)

Design Tradeoffs for Data Deduplication Performance in Backup Workloads

Design Tradeoffs for Data Deduplication Performance in Backup Workloads Design Tradeoffs for Data Deduplication Performance in Backup Workloads Min Fu,DanFeng,YuHua,XubinHe, Zuoning Chen *, Wen Xia,YuchengZhang,YujuanTan Huazhong University of Science and Technology Virginia

More information

Speeding Up Cloud/Server Applications Using Flash Memory

Speeding Up Cloud/Server Applications Using Flash Memory Speeding Up Cloud/Server Applications Using Flash Memory Sudipta Sengupta and Jin Li Microsoft Research, Redmond, WA, USA Contains work that is joint with Biplob Debnath (Univ. of Minnesota) Flash Memory

More information

ChunkStash: Speeding Up Storage Deduplication using Flash Memory

ChunkStash: Speeding Up Storage Deduplication using Flash Memory ChunkStash: Speeding Up Storage Deduplication using Flash Memory Biplob Debnath +, Sudipta Sengupta *, Jin Li * * Microsoft Research, Redmond (USA) + Univ. of Minnesota, Twin Cities (USA) Deduplication

More information

Reducing The De-linearization of Data Placement to Improve Deduplication Performance

Reducing The De-linearization of Data Placement to Improve Deduplication Performance Reducing The De-linearization of Data Placement to Improve Deduplication Performance Yujuan Tan 1, Zhichao Yan 2, Dan Feng 2, E. H.-M. Sha 1,3 1 School of Computer Science & Technology, Chongqing University

More information

A DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU

A DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU A DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU PRESENTED BY ROMAN SHOR Overview Technics of data reduction in storage systems:

More information

Deploying De-Duplication on Ext4 File System

Deploying De-Duplication on Ext4 File System Deploying De-Duplication on Ext4 File System Usha A. Joglekar 1, Bhushan M. Jagtap 2, Koninika B. Patil 3, 1. Asst. Prof., 2, 3 Students Department of Computer Engineering Smt. Kashibai Navale College

More information

Alternative Approaches for Deduplication in Cloud Storage Environment

Alternative Approaches for Deduplication in Cloud Storage Environment International Journal of Computational Intelligence Research ISSN 0973-1873 Volume 13, Number 10 (2017), pp. 2357-2363 Research India Publications http://www.ripublication.com Alternative Approaches for

More information

HPDedup: A Hybrid Prioritized Data Deduplication Mechanism for Primary Storage in the Cloud

HPDedup: A Hybrid Prioritized Data Deduplication Mechanism for Primary Storage in the Cloud HPDedup: A Hybrid Prioritized Data Deduplication Mechanism for Primary Storage in the Cloud Huijun Wu 1,4, Chen Wang 2, Yinjin Fu 3, Sherif Sakr 1, Liming Zhu 1,2 and Kai Lu 4 The University of New South

More information

Chapter 14 HARD: Host-Level Address Remapping Driver for Solid-State Disk

Chapter 14 HARD: Host-Level Address Remapping Driver for Solid-State Disk Chapter 14 HARD: Host-Level Address Remapping Driver for Solid-State Disk Young-Joon Jang and Dongkun Shin Abstract Recent SSDs use parallel architectures with multi-channel and multiway, and manages multiple

More information

Rethinking Deduplication Scalability

Rethinking Deduplication Scalability Rethinking Deduplication Scalability Petros Efstathopoulos Petros Efstathopoulos@symantec.com Fanglu Guo Fanglu Guo@symantec.com Symantec Research Labs Symantec Corporation, Culver City, CA, USA 1 ABSTRACT

More information

Linux Software RAID Level 0 Technique for High Performance Computing by using PCI-Express based SSD

Linux Software RAID Level 0 Technique for High Performance Computing by using PCI-Express based SSD Linux Software RAID Level Technique for High Performance Computing by using PCI-Express based SSD Jae Gi Son, Taegyeong Kim, Kuk Jin Jang, *Hyedong Jung Department of Industrial Convergence, Korea Electronics

More information

DEC: An Efficient Deduplication-Enhanced Compression Approach

DEC: An Efficient Deduplication-Enhanced Compression Approach 2016 IEEE 22nd International Conference on Parallel and Distributed Systems DEC: An Efficient Deduplication-Enhanced Compression Approach Zijin Han, Wen Xia, Yuchong Hu *, Dan Feng, Yucheng Zhang, Yukun

More information

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Using Synology SSD Technology to Enhance System Performance Synology Inc. Using Synology SSD Technology to Enhance System Performance Synology Inc. Synology_WP_ 20121112 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges... 3 SSD

More information

Data Reduction Meets Reality What to Expect From Data Reduction

Data Reduction Meets Reality What to Expect From Data Reduction Data Reduction Meets Reality What to Expect From Data Reduction Doug Barbian and Martin Murrey Oracle Corporation Thursday August 11, 2011 9961: Data Reduction Meets Reality Introduction Data deduplication

More information

Parallel Processing for Data Deduplication

Parallel Processing for Data Deduplication Parallel Processing for Data Deduplication Peter Sobe, Denny Pazak, Martin Stiehr Faculty of Computer Science and Mathematics Dresden University of Applied Sciences Dresden, Germany Corresponding Author

More information

Performance Benefits of Running RocksDB on Samsung NVMe SSDs

Performance Benefits of Running RocksDB on Samsung NVMe SSDs Performance Benefits of Running RocksDB on Samsung NVMe SSDs A Detailed Analysis 25 Samsung Semiconductor Inc. Executive Summary The industry has been experiencing an exponential data explosion over the

More information

The What, Why and How of the Pure Storage Enterprise Flash Array. Ethan L. Miller (and a cast of dozens at Pure Storage)

The What, Why and How of the Pure Storage Enterprise Flash Array. Ethan L. Miller (and a cast of dozens at Pure Storage) The What, Why and How of the Pure Storage Enterprise Flash Array Ethan L. Miller (and a cast of dozens at Pure Storage) Enterprise storage: $30B market built on disk Key players: EMC, NetApp, HP, etc.

More information

Deduplication Storage System

Deduplication Storage System Deduplication Storage System Kai Li Charles Fitzmorris Professor, Princeton University & Chief Scientist and Co-Founder, Data Domain, Inc. 03/11/09 The World Is Becoming Data-Centric CERN Tier 0 Business

More information

New HPE 3PAR StoreServ 8000 and series Optimized for Flash

New HPE 3PAR StoreServ 8000 and series Optimized for Flash New HPE 3PAR StoreServ 8000 and 20000 series Optimized for Flash AGENDA HPE 3PAR StoreServ architecture fundamentals HPE 3PAR Flash optimizations HPE 3PAR portfolio overview HPE 3PAR Flash example from

More information

Adaptation of Distributed File System to VDI Storage by Client-Side Cache

Adaptation of Distributed File System to VDI Storage by Client-Side Cache Adaptation of Distributed File System to VDI Storage by Client-Side Cache Cheiyol Kim 1*, Sangmin Lee 1, Youngkyun Kim 1, Daewha Seo 2 1 Storage System Research Team, Electronics and Telecommunications

More information

ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency

ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency Thanos Makatos, Yannis Klonatos, Manolis Marazakis, Michail D. Flouris, and Angelos Bilas {mcatos,klonatos,maraz,flouris,bilas}@ics.forth.gr

More information

Byte Index Chunking Approach for Data Compression

Byte Index Chunking Approach for Data Compression Ider Lkhagvasuren 1, Jung Min So 1, Jeong Gun Lee 1, Chuck Yoo 2, Young Woong Ko 1 1 Dept. of Computer Engineering, Hallym University Chuncheon, Korea {Ider555, jso, jeonggun.lee, yuko}@hallym.ac.kr 2

More information

SFS: Random Write Considered Harmful in Solid State Drives

SFS: Random Write Considered Harmful in Solid State Drives SFS: Random Write Considered Harmful in Solid State Drives Changwoo Min 1, 2, Kangnyeon Kim 1, Hyunjin Cho 2, Sang-Won Lee 1, Young Ik Eom 1 1 Sungkyunkwan University, Korea 2 Samsung Electronics, Korea

More information

Delta Compressed and Deduplicated Storage Using Stream-Informed Locality

Delta Compressed and Deduplicated Storage Using Stream-Informed Locality Delta Compressed and Deduplicated Storage Using Stream-Informed Locality Philip Shilane, Grant Wallace, Mark Huang, and Windsor Hsu Backup Recovery Systems Division EMC Corporation Abstract For backup

More information

ENCRYPTED DATA MANAGEMENT WITH DEDUPLICATION IN CLOUD COMPUTING

ENCRYPTED DATA MANAGEMENT WITH DEDUPLICATION IN CLOUD COMPUTING ENCRYPTED DATA MANAGEMENT WITH DEDUPLICATION IN CLOUD COMPUTING S KEERTHI 1*, MADHAVA REDDY A 2* 1. II.M.Tech, Dept of CSE, AM Reddy Memorial College of Engineering & Technology, Petlurivaripalem. 2. Assoc.

More information

P-Dedupe: Exploiting Parallelism in Data Deduplication System

P-Dedupe: Exploiting Parallelism in Data Deduplication System 2012 IEEE Seventh International Conference on Networking, Architecture, and Storage P-Dedupe: Exploiting Parallelism in Data Deduplication System Wen Xia, Hong Jiang, Dan Feng,*, Lei Tian, Min Fu, Zhongtao

More information

Compression and Decompression of Virtual Disk Using Deduplication

Compression and Decompression of Virtual Disk Using Deduplication Compression and Decompression of Virtual Disk Using Deduplication Bharati Ainapure 1, Siddhant Agarwal 2, Rukmi Patel 3, Ankita Shingvi 4, Abhishek Somani 5 1 Professor, Department of Computer Engineering,

More information

Data deduplication for Similar Files

Data deduplication for Similar Files Int'l Conf. Scientific Computing CSC'17 37 Data deduplication for Similar Files Mohamad Zaini Nurshafiqah, Nozomi Miyamoto, Hikari Yoshii, Riichi Kodama, Itaru Koike, Toshiyuki Kinoshita School of Computer

More information

Flashed-Optimized VPSA. Always Aligned with your Changing World

Flashed-Optimized VPSA. Always Aligned with your Changing World Flashed-Optimized VPSA Always Aligned with your Changing World Yair Hershko Co-founder, VP Engineering, Zadara Storage 3 Modern Data Storage for Modern Computing Innovating data services to meet modern

More information

Main Memory (Part II)

Main Memory (Part II) Main Memory (Part II) Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) Main Memory 1393/8/17 1 / 50 Reminder Amir H. Payberah

More information

Sparse Indexing: Large-Scale, Inline Deduplication Using Sampling and Locality

Sparse Indexing: Large-Scale, Inline Deduplication Using Sampling and Locality Sparse Indexing: Large-Scale, Inline Deduplication Using Sampling and Locality Mark Lillibridge, Kave Eshghi, Deepavali Bhagwat, Vinay Deolalikar, Greg Trezise, and Peter Camble Work done at Hewlett-Packard

More information

Baoping Wang School of software, Nanyang Normal University, Nanyang , Henan, China

Baoping Wang School of software, Nanyang Normal University, Nanyang , Henan, China doi:10.21311/001.39.7.41 Implementation of Cache Schedule Strategy in Solid-state Disk Baoping Wang School of software, Nanyang Normal University, Nanyang 473061, Henan, China Chao Yin* School of Information

More information

SoftNAS Cloud Performance Evaluation on Microsoft Azure

SoftNAS Cloud Performance Evaluation on Microsoft Azure SoftNAS Cloud Performance Evaluation on Microsoft Azure November 30, 2016 Contents SoftNAS Cloud Overview... 3 Introduction... 3 Executive Summary... 4 Key Findings for Azure:... 5 Test Methodology...

More information

Presented by: Nafiseh Mahmoudi Spring 2017

Presented by: Nafiseh Mahmoudi Spring 2017 Presented by: Nafiseh Mahmoudi Spring 2017 Authors: Publication: Type: ACM Transactions on Storage (TOS), 2016 Research Paper 2 High speed data processing demands high storage I/O performance. Flash memory

More information

Drive Space Efficiency Using the Deduplication/Compression Function of the FUJITSU Storage ETERNUS AF series and ETERNUS DX S4/S3 series

Drive Space Efficiency Using the Deduplication/Compression Function of the FUJITSU Storage ETERNUS AF series and ETERNUS DX S4/S3 series White Paper Drive Space Efficiency Using the Function of the FUJITSU Storage ETERNUS F series and ETERNUS DX S4/S3 series The function is provided by the FUJITSU Storage ETERNUS F series and ETERNUS DX

More information

DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE

DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE WHITEPAPER DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE A Detailed Review ABSTRACT While tape has been the dominant storage medium for data protection for decades because of its low cost, it is steadily

More information

dedupv1: Improving Deduplication Throughput using Solid State Drives (SSD)

dedupv1: Improving Deduplication Throughput using Solid State Drives (SSD) University Paderborn Paderborn Center for Parallel Computing Technical Report dedupv1: Improving Deduplication Throughput using Solid State Drives (SSD) Dirk Meister Paderborn Center for Parallel Computing

More information

UCS Invicta: A New Generation of Storage Performance. Mazen Abou Najm DC Consulting Systems Engineer

UCS Invicta: A New Generation of Storage Performance. Mazen Abou Najm DC Consulting Systems Engineer UCS Invicta: A New Generation of Storage Performance Mazen Abou Najm DC Consulting Systems Engineer HDDs Aren t Designed For High Performance Disk 101 Can t spin faster (200 IOPS/Drive) Can t seek faster

More information

Optimizing Flash-based Key-value Cache Systems

Optimizing Flash-based Key-value Cache Systems Optimizing Flash-based Key-value Cache Systems Zhaoyan Shen, Feng Chen, Yichen Jia, Zili Shao Department of Computing, Hong Kong Polytechnic University Computer Science & Engineering, Louisiana State University

More information

GPUfs: Integrating a file system with GPUs

GPUfs: Integrating a file system with GPUs GPUfs: Integrating a file system with GPUs Mark Silberstein (UT Austin/Technion) Bryan Ford (Yale), Idit Keidar (Technion) Emmett Witchel (UT Austin) 1 Traditional System Architecture Applications OS CPU

More information

SoftNAS Cloud Performance Evaluation on AWS

SoftNAS Cloud Performance Evaluation on AWS SoftNAS Cloud Performance Evaluation on AWS October 25, 2016 Contents SoftNAS Cloud Overview... 3 Introduction... 3 Executive Summary... 4 Key Findings for AWS:... 5 Test Methodology... 6 Performance Summary

More information

Deduplication File System & Course Review

Deduplication File System & Course Review Deduplication File System & Course Review Kai Li 12/13/13 Topics u Deduplication File System u Review 12/13/13 2 Storage Tiers of A Tradi/onal Data Center $$$$ Mirrored storage $$$ Dedicated Fibre Clients

More information

Caching and Buffering in HDF5

Caching and Buffering in HDF5 Caching and Buffering in HDF5 September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 1 Software stack Life cycle: What happens to data when it is transferred from application buffer to HDF5 file and from HDF5

More information

LevelDB-Raw: Eliminating File System Overhead for Optimizing Performance of LevelDB Engine

LevelDB-Raw: Eliminating File System Overhead for Optimizing Performance of LevelDB Engine 777 LevelDB-Raw: Eliminating File System Overhead for Optimizing Performance of LevelDB Engine Hak-Su Lim and Jin-Soo Kim *College of Info. & Comm. Engineering, Sungkyunkwan University, Korea {haksu.lim,

More information

Lazy Exact Deduplication

Lazy Exact Deduplication Lazy Exact Deduplication JINGWEI MA, REBECCA J. STONES, YUXIANG MA, JINGUI WANG, JUNJIE REN, GANG WANG, and XIAOGUANG LIU, College of Computer and Control Engineering, Nankai University 11 Deduplication

More information

Online Version Only. Book made by this file is ILLEGAL. Design and Implementation of Binary File Similarity Evaluation System. 1.

Online Version Only. Book made by this file is ILLEGAL. Design and Implementation of Binary File Similarity Evaluation System. 1. , pp.1-10 http://dx.doi.org/10.14257/ijmue.2014.9.1.01 Design and Implementation of Binary File Similarity Evaluation System Sun-Jung Kim 2, Young Jun Yoo, Jungmin So 1, Jeong Gun Lee 1, Jin Kim 1 and

More information

Job Re-Packing for Enhancing the Performance of Gang Scheduling

Job Re-Packing for Enhancing the Performance of Gang Scheduling Job Re-Packing for Enhancing the Performance of Gang Scheduling B. B. Zhou 1, R. P. Brent 2, C. W. Johnson 3, and D. Walsh 3 1 Computer Sciences Laboratory, Australian National University, Canberra, ACT

More information

Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures

Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures Min Li, Sudharshan S. Vazhkudai, Ali R. Butt, Fei Meng, Xiaosong Ma, Youngjae Kim,Christian Engelmann, and Galen Shipman

More information

To Use or Not to Use: CPUs Cache Optimization Techniques on GPGPUs

To Use or Not to Use: CPUs Cache Optimization Techniques on GPGPUs To Use or Not to Use: CPUs Optimization Techniques on GPGPUs D.R.V.L.B. Thambawita Department of Computer Science and Technology Uva Wellassa University Badulla, Sri Lanka Email: vlbthambawita@gmail.com

More information

Operating System Supports for SCM as Main Memory Systems (Focusing on ibuddy)

Operating System Supports for SCM as Main Memory Systems (Focusing on ibuddy) 2011 NVRAMOS Operating System Supports for SCM as Main Memory Systems (Focusing on ibuddy) 2011. 4. 19 Jongmoo Choi http://embedded.dankook.ac.kr/~choijm Contents Overview Motivation Observations Proposal:

More information

Identifying Performance Bottlenecks with Real- World Applications and Flash-Based Storage

Identifying Performance Bottlenecks with Real- World Applications and Flash-Based Storage Identifying Performance Bottlenecks with Real- World Applications and Flash-Based Storage TechTarget Dennis Martin 1 Agenda About Demartek Enterprise Data Center Environments Storage Performance Metrics

More information

FuxiSort. Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc

FuxiSort. Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc Fuxi Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc {jiamang.wang, yongjun.wyj, hua.caihua, zhipeng.tzp, zhiqiang.lv,

More information

Understanding Primary Storage Optimization Options Jered Floyd Permabit Technology Corp.

Understanding Primary Storage Optimization Options Jered Floyd Permabit Technology Corp. Understanding Primary Storage Optimization Options Jered Floyd Permabit Technology Corp. Primary Storage Optimization Technologies that let you store more data on the same storage Thin provisioning Copy-on-write

More information

NetApp Data Compression, Deduplication, and Data Compaction

NetApp Data Compression, Deduplication, and Data Compaction Technical Report NetApp Data Compression, Deduplication, and Data Compaction Data ONTAP 8.3.1 and Later Karthik Viswanath, NetApp February 2018 TR-4476 Abstract This technical report focuses on implementing

More information

An Analysis on Empirical Performance of SSD-Based RAID

An Analysis on Empirical Performance of SSD-Based RAID An Analysis on Empirical Performance of SSD-Based RAID Chanhyun Park, Seongjin Lee and Youjip Won Abstract In this paper, we measure the I/O performance of five filesystems EXT4, XFS, BTRFS, NILFS2, and

More information

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review COS 318: Operating Systems NSF, Snapshot, Dedup and Review Topics! NFS! Case Study: NetApp File System! Deduplication storage system! Course review 2 Network File System! Sun introduced NFS v2 in early

More information

PowerVault MD3 SSD Cache Overview

PowerVault MD3 SSD Cache Overview PowerVault MD3 SSD Cache Overview A Dell Technical White Paper Dell Storage Engineering October 2015 A Dell Technical White Paper TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED AS IS, WITHOUT EXPRESS

More information

Nowadays data-intensive applications play a

Nowadays data-intensive applications play a Journal of Advances in Computer Engineering and Technology, 3(2) 2017 Data Replication-Based Scheduling in Cloud Computing Environment Bahareh Rahmati 1, Amir Masoud Rahmani 2 Received (2016-02-02) Accepted

More information

Multi-level Byte Index Chunking Mechanism for File Synchronization

Multi-level Byte Index Chunking Mechanism for File Synchronization , pp.339-350 http://dx.doi.org/10.14257/ijseia.2014.8.3.31 Multi-level Byte Index Chunking Mechanism for File Synchronization Ider Lkhagvasuren, Jung Min So, Jeong Gun Lee, Jin Kim and Young Woong Ko *

More information

IBM V7000 Unified R1.4.2 Asynchronous Replication Performance Reference Guide

IBM V7000 Unified R1.4.2 Asynchronous Replication Performance Reference Guide V7 Unified Asynchronous Replication Performance Reference Guide IBM V7 Unified R1.4.2 Asynchronous Replication Performance Reference Guide Document Version 1. SONAS / V7 Unified Asynchronous Replication

More information

JOURNALING techniques have been widely used in modern

JOURNALING techniques have been widely used in modern IEEE TRANSACTIONS ON COMPUTERS, VOL. XX, NO. X, XXXX 2018 1 Optimizing File Systems with a Write-efficient Journaling Scheme on Non-volatile Memory Xiaoyi Zhang, Dan Feng, Member, IEEE, Yu Hua, Senior

More information

LETTER Solid-State Disk with Double Data Rate DRAM Interface for High-Performance PCs

LETTER Solid-State Disk with Double Data Rate DRAM Interface for High-Performance PCs IEICE TRANS. INF. & SYST., VOL.E92 D, NO.4 APRIL 2009 727 LETTER Solid-State Disk with Double Data Rate DRAM Interface for High-Performance PCs Dong KIM, Kwanhu BANG, Seung-Hwan HA, Chanik PARK, Sung Woo

More information

I-CASH: Intelligently Coupled Array of SSD and HDD

I-CASH: Intelligently Coupled Array of SSD and HDD : Intelligently Coupled Array of SSD and HDD Jin Ren and Qing Yang Dept. of Electrical, Computer, and Biomedical Engineering University of Rhode Island, Kingston, RI 02881 (rjin,qyang)@ele.uri.edu Abstract

More information

Parallel LZ77 Decoding with a GPU. Emmanuel Morfiadakis Supervisor: Dr Eric McCreath College of Engineering and Computer Science, ANU

Parallel LZ77 Decoding with a GPU. Emmanuel Morfiadakis Supervisor: Dr Eric McCreath College of Engineering and Computer Science, ANU Parallel LZ77 Decoding with a GPU Emmanuel Morfiadakis Supervisor: Dr Eric McCreath College of Engineering and Computer Science, ANU Outline Background (What?) Problem definition and motivation (Why?)

More information

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation Weiping Liao, Saengrawee (Anne) Pratoomtong, and Chuan Zhang Abstract Binary translation is an important component for translating

More information

PROFILING BASED REDUCE MEMORY PROVISIONING FOR IMPROVING THE PERFORMANCE IN HADOOP

PROFILING BASED REDUCE MEMORY PROVISIONING FOR IMPROVING THE PERFORMANCE IN HADOOP ISSN: 0976-2876 (Print) ISSN: 2250-0138 (Online) PROFILING BASED REDUCE MEMORY PROVISIONING FOR IMPROVING THE PERFORMANCE IN HADOOP T. S. NISHA a1 AND K. SATYANARAYAN REDDY b a Department of CSE, Cambridge

More information

Design and Implementation of a Random Access File System for NVRAM

Design and Implementation of a Random Access File System for NVRAM This article has been accepted and published on J-STAGE in advance of copyediting. Content is final as presented. IEICE Electronics Express, Vol.* No.*,*-* Design and Implementation of a Random Access

More information

vsan 6.6 Performance Improvements First Published On: Last Updated On:

vsan 6.6 Performance Improvements First Published On: Last Updated On: vsan 6.6 Performance Improvements First Published On: 07-24-2017 Last Updated On: 07-28-2017 1 Table of Contents 1. Overview 1.1.Executive Summary 1.2.Introduction 2. vsan Testing Configuration and Conditions

More information

CPU-GPU hybrid computing for feature extraction from video stream

CPU-GPU hybrid computing for feature extraction from video stream LETTER IEICE Electronics Express, Vol.11, No.22, 1 8 CPU-GPU hybrid computing for feature extraction from video stream Sungju Lee 1, Heegon Kim 1, Daihee Park 1, Yongwha Chung 1a), and Taikyeong Jeong

More information

Reducing Solid-State Storage Device Write Stress Through Opportunistic In-Place Delta Compression

Reducing Solid-State Storage Device Write Stress Through Opportunistic In-Place Delta Compression Reducing Solid-State Storage Device Write Stress Through Opportunistic In-Place Delta Compression Xuebin Zhang, Jiangpeng Li, Hao Wang, Kai Zhao and Tong Zhang xuebinzhang.rpi@gmail.com ECSE Department,

More information

X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management

X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management Hideyuki Shamoto, Tatsuhiro Chiba, Mikio Takeuchi Tokyo Institute of Technology IBM Research Tokyo Programming for large

More information

An Efficient Snapshot Technique for Ext3 File System in Linux 2.6

An Efficient Snapshot Technique for Ext3 File System in Linux 2.6 An Efficient Snapshot Technique for Ext3 File System in Linux 2.6 Seungjun Shim*, Woojoong Lee and Chanik Park Department of CSE/GSIT* Pohang University of Science and Technology, Kyungbuk, Republic of

More information

Enhance Data De-Duplication Performance With Multi-Thread Chunking Algorithm. December 9, Xinran Jiang, Jia Zhao, Jie Zheng

Enhance Data De-Duplication Performance With Multi-Thread Chunking Algorithm. December 9, Xinran Jiang, Jia Zhao, Jie Zheng Enhance Data De-Duplication Performance With Multi-Thread Chunking Algorithm This paper is submitted in partial fulfillment of the requirements for Operating System Class (COEN 283) Santa Clara University

More information

Real-time processing for intelligent-surveillance applications

Real-time processing for intelligent-surveillance applications LETTER IEICE Electronics Express, Vol.14, No.8, 1 12 Real-time processing for intelligent-surveillance applications Sungju Lee, Heegon Kim, Jaewon Sa, Byungkwan Park, and Yongwha Chung a) Dept. of Computer

More information

I/O Buffering and Streaming

I/O Buffering and Streaming I/O Buffering and Streaming I/O Buffering and Caching I/O accesses are reads or writes (e.g., to files) Application access is arbitary (offset, len) Convert accesses to read/write of fixed-size blocks

More information

FGDEFRAG: A Fine-Grained Defragmentation Approach to Improve Restore Performance

FGDEFRAG: A Fine-Grained Defragmentation Approach to Improve Restore Performance FGDEFRAG: A Fine-Grained Defragmentation Approach to Improve Restore Performance Yujuan Tan, Jian Wen, Zhichao Yan, Hong Jiang, Witawas Srisa-an, Baiping Wang, Hao Luo Outline Background and Motivation

More information

Parallel graph traversal for FPGA

Parallel graph traversal for FPGA LETTER IEICE Electronics Express, Vol.11, No.7, 1 6 Parallel graph traversal for FPGA Shice Ni a), Yong Dou, Dan Zou, Rongchun Li, and Qiang Wang National Laboratory for Parallel and Distributed Processing,

More information

Improving Throughput in Cloud Storage System

Improving Throughput in Cloud Storage System Improving Throughput in Cloud Storage System Chanho Choi chchoi@dcslab.snu.ac.kr Shin-gyu Kim sgkim@dcslab.snu.ac.kr Hyeonsang Eom hseom@dcslab.snu.ac.kr Heon Y. Yeom yeom@dcslab.snu.ac.kr Abstract Because

More information

Next-Generation Cloud Platform

Next-Generation Cloud Platform Next-Generation Cloud Platform Jangwoo Kim Jun 24, 2013 E-mail: jangwoo@postech.ac.kr High Performance Computing Lab Department of Computer Science & Engineering Pohang University of Science and Technology

More information

Towards Breast Anatomy Simulation Using GPUs

Towards Breast Anatomy Simulation Using GPUs Towards Breast Anatomy Simulation Using GPUs Joseph H. Chui 1, David D. Pokrajac 2, Andrew D.A. Maidment 3, and Predrag R. Bakic 4 1 Department of Radiology, University of Pennsylvania, Philadelphia PA

More information

Deploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c

Deploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c White Paper Deploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c What You Will Learn This document demonstrates the benefits

More information

White Paper Features and Benefits of Fujitsu All-Flash Arrays for Virtualization and Consolidation ETERNUS AF S2 series

White Paper Features and Benefits of Fujitsu All-Flash Arrays for Virtualization and Consolidation ETERNUS AF S2 series White Paper Features and Benefits of Fujitsu All-Flash Arrays for Virtualization and Consolidation Fujitsu All-Flash Arrays are extremely effective tools when virtualization is used for server consolidation.

More information

Operating Systems. Lecture File system implementation. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

Operating Systems. Lecture File system implementation. Master of Computer Science PUF - Hồ Chí Minh 2016/2017 Operating Systems Lecture 7.2 - File system implementation Adrien Krähenbühl Master of Computer Science PUF - Hồ Chí Minh 2016/2017 Design FAT or indexed allocation? UFS, FFS & Ext2 Journaling with Ext3

More information

SolidFire and Ceph Architectural Comparison

SolidFire and Ceph Architectural Comparison The All-Flash Array Built for the Next Generation Data Center SolidFire and Ceph Architectural Comparison July 2014 Overview When comparing the architecture for Ceph and SolidFire, it is clear that both

More information

MySQL Performance Optimization and Troubleshooting with PMM. Peter Zaitsev, CEO, Percona

MySQL Performance Optimization and Troubleshooting with PMM. Peter Zaitsev, CEO, Percona MySQL Performance Optimization and Troubleshooting with PMM Peter Zaitsev, CEO, Percona In the Presentation Practical approach to deal with some of the common MySQL Issues 2 Assumptions You re looking

More information

CS3600 SYSTEMS AND NETWORKS

CS3600 SYSTEMS AND NETWORKS CS3600 SYSTEMS AND NETWORKS NORTHEASTERN UNIVERSITY Lecture 11: File System Implementation Prof. Alan Mislove (amislove@ccs.neu.edu) File-System Structure File structure Logical storage unit Collection

More information

GPU ACCELERATED DATABASE MANAGEMENT SYSTEMS

GPU ACCELERATED DATABASE MANAGEMENT SYSTEMS CIS 601 - Graduate Seminar Presentation 1 GPU ACCELERATED DATABASE MANAGEMENT SYSTEMS PRESENTED BY HARINATH AMASA CSU ID: 2697292 What we will talk about.. Current problems GPU What are GPU Databases GPU

More information

Part II: Data Center Software Architecture: Topic 2: Key-value Data Management Systems. SkimpyStash: Key Value Store on Flash-based Storage

Part II: Data Center Software Architecture: Topic 2: Key-value Data Management Systems. SkimpyStash: Key Value Store on Flash-based Storage ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 2: Key-value Data Management Systems SkimpyStash: Key Value

More information

SLM-DB: Single-Level Key-Value Store with Persistent Memory

SLM-DB: Single-Level Key-Value Store with Persistent Memory SLM-DB: Single-Level Key-Value Store with Persistent Memory Olzhas Kaiyrakhmet and Songyi Lee, UNIST; Beomseok Nam, Sungkyunkwan University; Sam H. Noh and Young-ri Choi, UNIST https://www.usenix.org/conference/fast19/presentation/kaiyrakhmet

More information

Software and Tools for HPE s The Machine Project

Software and Tools for HPE s The Machine Project Labs Software and Tools for HPE s The Machine Project Scalable Tools Workshop Aug/1 - Aug/4, 2016 Lake Tahoe Milind Chabbi Traditional Computing Paradigm CPU DRAM CPU DRAM CPU-centric computing 2 CPU-Centric

More information

SmartMD: A High Performance Deduplication Engine with Mixed Pages

SmartMD: A High Performance Deduplication Engine with Mixed Pages SmartMD: A High Performance Deduplication Engine with Mixed Pages Fan Guo 1, Yongkun Li 1, Yinlong Xu 1, Song Jiang 2, John C. S. Lui 3 1 University of Science and Technology of China 2 University of Texas,

More information

HiTune. Dataflow-Based Performance Analysis for Big Data Cloud

HiTune. Dataflow-Based Performance Analysis for Big Data Cloud HiTune Dataflow-Based Performance Analysis for Big Data Cloud Jinquan (Jason) Dai, Jie Huang, Shengsheng Huang, Bo Huang, Yan Liu Intel Asia-Pacific Research and Development Ltd Shanghai, China, 200241

More information

Isilon Performance. Name

Isilon Performance. Name 1 Isilon Performance Name 2 Agenda Architecture Overview Next Generation Hardware Performance Caching Performance Streaming Reads Performance Tuning OneFS Architecture Overview Copyright 2014 EMC Corporation.

More information

DUE to the explosive growth of the digital data, data

DUE to the explosive growth of the digital data, data 1162 IEEE TRANSACTIONS ON COMPUTERS, VOL. 64, NO. 4, APRIL 2015 Similarity and Locality Based Indexing for High Performance Data Deduplication Wen Xia, Hong Jiang, Senior Member, IEEE, Dan Feng, Member,

More information

WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression

WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression Philip Shilane, Mark Huang, Grant Wallace, & Windsor Hsu Backup Recovery Systems Division EMC Corporation Introduction

More information

Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching

Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching Kefei Wang and Feng Chen Louisiana State University SoCC '18 Carlsbad, CA Key-value Systems in Internet Services Key-value

More information

Using Transparent Compression to Improve SSD-based I/O Caches

Using Transparent Compression to Improve SSD-based I/O Caches Using Transparent Compression to Improve SSD-based I/O Caches Thanos Makatos, Yannis Klonatos, Manolis Marazakis, Michail D. Flouris, and Angelos Bilas {mcatos,klonatos,maraz,flouris,bilas}@ics.forth.gr

More information

FLASHARRAY//M Business and IT Transformation in 3U

FLASHARRAY//M Business and IT Transformation in 3U FLASHARRAY//M Business and IT Transformation in 3U TRANSFORM IT Who knew that moving to all-flash storage could help reduce the cost of IT? FlashArray//m makes server and workload investments more productive,

More information

Azor: Using Two-level Block Selection to Improve SSD-based I/O caches

Azor: Using Two-level Block Selection to Improve SSD-based I/O caches Azor: Using Two-level Block Selection to Improve SSD-based I/O caches Yannis Klonatos, Thanos Makatos, Manolis Marazakis, Michail D. Flouris, Angelos Bilas {klonatos, makatos, maraz, flouris, bilas}@ics.forth.gr

More information

Chapter 9 Memory Management

Chapter 9 Memory Management Contents 1. Introduction 2. Computer-System Structures 3. Operating-System Structures 4. Processes 5. Threads 6. CPU Scheduling 7. Process Synchronization 8. Deadlocks 9. Memory Management 10. Virtual

More information