A Novel Buffer Management Scheme for SSD

Size: px
Start display at page:

Download "A Novel Buffer Management Scheme for SSD"

Transcription

1 A Novel Buffer Management Scheme for SSD Qingsong Wei Data Storage Institute, A-STAR Singapore Bozhao Gong National University of Singapore Singapore Cheng Chen Data Storage Institute, A-STAR Singapore Abstract Random writes significantly limit the application of flash memory in enterprise environment due to its poor latency, negative impact on lifetime and high garbage collection overhead. Several buffer management schemes for flash memory are proposed to overcome this issue, which operate either at page granularity or block granularity. Traditional page-based buffer management schemes leverage temporal locality to pursue cache hit ratio improvement without considering sequentiality of flushed data. Current block-based buffer management schemes exploit spatial locality to improve the sequentiality of the write accesses passed to the flash memory at a cost of low buffer utilization. None of them achieves both high cache hit ratio and good sequentiality at the same time, which are two critical factors determining the efficiency of buffer management for flash memory. In this paper, we propose a novel buffer management scheme referred to as, which divides the buffer space into page region and block region to make full use of both temporal and spatial localities among accesses in hybrid form. dynamically balances our two design objectives of cache hit ratio and sequentiality for different workloads. can efficiently improve the performance and influence the I/O requests so that more sequential accesses are passed to the flash memory. has been extensively evaluated under various enterprise workloads. Our benchmark results conclusively demonstrate that can achieve up to 84% performance improvement and 85% garbage collection overhead reduction compared to existing buffer management schemes. Keywords-flash memory; buffer management; hybrid; cache hit ratio; write sequentiality I. INTRODUCTION Flash memory is rapidly becoming promising technology for the next-generation storage due to a number of strong technical merits including (i) low access latency (ii) low power consumption (iii) higher resistance to shocks (iv) light weight and (v) better endurance. As an emerging technology, flash memory received strong interest in both academia and industry [4-7]. Flash memory has been traditionally used in portable devices. More recently, as price drops and capacity increases, this technology has made huge strides into personal computer and server storage space in the form of Solid State Drive (SSD) with the intention of replacing traditional hard disk drives (HDD). In fact, two leading on-line search engine service providers, google.com and baidu.com, both announced their plans to migrate existing hard disk based storage system to a platform built on SSDs [3]. However, SSD suffers from random write issue when applied in enterprise environment. Firstly, writes on SSD are highly correlated with access patterns. The electrical properties of flash cells result in random writes being much slower than the sequential writes. The performance optimizations inside SSD including striping, interleaving and prefetching are not effective any more for random write because less sequential locality is left for them to exploit. Secondly, NAND flash memory can incur only a finite number of erases for a given physical block due to the nature of the technology. Therefore, increased erase operations due to random writes shorten the lifetime of a SSD. Experiments in [7] show that random write intensive workload could make flash memory wear out over hundred times faster than sequential write intensive workload. Finally, random writes result in higher overhead of garbage collection than sequential writes. If the incoming writes are randomly distributed over the logical block address space, sooner or later all physical flash memory blocks will be fragmented which has significant impact on garbage collection and performance. Therefore, random write is a critical problem to both performance and lifetime, which restricts SSDs widespread acceptance in datacenters [5]. SSD controller uses a part of memory as buffer to overcome this issue. In current practice, there are several major efforts in buffer management, which can be classified either into a page-based or a block-based buffer management scheme. Most of them are based on the existence of locality in access patterns, either temporal locality or spatial locality. However, none of them is able to achieve high cache hit ratio and good sequentiality simultaneously, which are two critical factors determining the efficiency of buffer management for flash memory. Page-based buffer management schemes adopt cache hit ratio improvement as sole objective by exploiting the temporal locality of data accesses. There are a large number of diskbased algorithms proposed such as LRU, CLOCK [6], WOW[7], Q [9] and ARC [8]. All these algorithms focus only on how to better utilize temporal locality, so that they are able to better predict the pages to be accessed and try to minimize page fault rate []. However, direct application of these algorithms is inappropriate for SSD because spatial locality is unfortunately ignored. Block-based buffer managements such as [3], [], and LB-CLOCK[3] exploit spatial locality to change the access pattern and provide more sequential writes for flash memory. Resident pages in the cache are grouped on

2 the basis of their logical block associations. When a logical page cached in the RAM buffer is accessed, all pages in the same block are placed at the head of the list based on the assumption that all pages in this block have the same recency. Because hundreds of pages exist in an erase block, this assumption may not hold true, especially for random dominant workloads. Because block position in the list is determined by temporal locality of partial page accesses, most pages in this block may not be accessed in the near future, which pollutes the buffer space. To make free space in the buffer, all pages in the victim block are removed simultaneously. However, some pages in the victim block may be accessed in the near future. This strategy tradeoffs spatial locality with temporal locality, which results in low buffer space utilization and low cache hit ratio. Since buffer size is very critical to energy consumption and cost of SSD, it is important to improve buffer space utilization. In this paper, we propose a novel buffer management scheme referred to as, which adopts high cache hit ratio and good sequentiality as two objectives by exploiting both temporal and spatial localities among access patterns. divides buffer space into page region and block region and manage them in hybrid form. In the page region, buffer data is managed and sorted in page granularity to improve buffer space utilization, while block region manages data in block granularity. To achieve both objectives, we give preference to random access pages for staying in the page region, while sequential access pages in the block region are replaced first. Buffer data in the page region will be dynamically migrated to the block region if the number of pages in a block reaches a threshold, which is adaptive to different workloads. Leveraging hybrid management and dynamic migration, not only improves performance and extends SSD lifetime, but also significantly reduces the internal fragmentation and garbage collection overhead associated with random write. The rest of this paper is organized as follows. Section II provides an overview of background and our motivation. In Section III, we present details of hybrid buffer management scheme (called ). Evaluation and measurement results are presented in Section IV. In Section V, we give a brief study of related works in the literature; and conclusions and possible future work are summarized in Section VI. II. BACKGROUND AND MOTIVATION In this section, we present three basic concepts that are essential to our work and our motivation. A. Flash Memory Technology There are two types of flash memories, NOR and NAND [6]. NOR flash memory supports random accesses in bits and is mainly used for storing code. NAND flash memory is designed for data storage with denser capacity and only allows access in units of sectors. Most SSDs available on the market are based on NAND flash memories. In this paper, flash memory refers to NAND flash memory specifically. NAND flash memory can be classified into two categories, Single-Level Cell (SLC) and Multi-Level Cell (MLC) NAND. A SLC flash memory cell stores only one bit, while a MLC flash memory cell can store two bits or even more. For both SLC and MLC NAND, a flash memory package is composed of one or more dies (chips). Each die within a package contains multiple planes. A typical plane consists of thousands (e.g. 48) of blocks and one or two registers of the page size as an I/O buffer. Each block in turn consists of 64 to 8 pages. Each page has a KB or 4KB data area and a metadata area (e.g. 8 bytes) for storing identification, page state and Error Correcting Code (ECC) information. Flash memory supports three major operations, read, write, and erase. Read and write are performed in units of pages. A unique requirement of flash memory is that flash blocks must be erased before they can be reused and erase operation must be conducted in block granularity. [7]. In addition, each block can be erased only a finite number of times. A typical MLC flash memory has around, erase cycles, while a SLC flash memory has around, erase cycles. After wearing out, flash memory cells can no longer store data. B. SSD SSDs use flash memory as their storage medium. Flash Translation Layer (FTL)[8,9], a critical firmware implemented in the SSD controller, allows operating systems to access flash memory devices in the same way as conventional disk drives. The FTL plays a key role in SSD and many sophisticated mechanisms are adopted to optimize SSD performance. It provides address mapping, wear leveling and garbage collection. Logical Block Mapping Generally, FTL schemes can be classified into three groups depending on the granularity of address mapping: page-level, block-level, and hybrid-level FTL schemes[9]. In the page-level FTL scheme, a logical page number (LPN) can be mapped to a physical page number (PPN) in flash memory. This mapping approach is efficient and shows great garbage collection efficiency, but it requires a large amount of RAM space to store the mapping table. On the other hand, block-level FTL is space efficient and requires an expensive read-modify-write operation when writing only part of a block. In order to overcome these disadvantages, the hybrid-level FTL scheme was proposed. Hybrid-level FTL uses a block-level mapping to manage most data blocks and uses a page-level mapping to manage a small set of log blocks, which works as a buffer to accept incoming write requests []. They show high garbage collection efficiency and require a smallsized mapping table. However, Hybrid-level FTLs incur expensive full merges for random write dominant workloads. Garbage Collection Since data in flash memory cannot be updated in place, the FTL simply writes the data to another clean page and marks the previous page as invalid. When running out of clean blocks, a garbage collection module scans flash memory blocks and recycles invalidated pages. If a pagelevel mapping is used, the valid pages in the scanned block are copied out and condensed into a new block. For block-level and hybrid-level mappings, the valid pages need to be merged together with the updated pages in the same block. Wear Leveling Due to the locality in most workloads, writes are often performed over a subset of blocks. Thus some flash memory blocks may be frequently overwritten and tend to

3 wear out earlier than other blocks. FTLs usually employ wear leveling algorithm to ensure that equal use is made of all the available write cycles for each block [3]. C. Buffer Management in SSD Many SSD controllers use a part of RAM as read buffer or write buffer. Different buffer cache management policies are proposed to improve performance and extend lifetime of flash memory. One problem of SSD is that the background garbage collection and wear leveling compete for internal resources with the foreground user accesses. If most foreground user accesses can be hit in buffer cache, the influences of each other will be significantly reduced. High cache hit ratio can significantly reduce the direct accesses from/to flash memory, which helps to achieve low latency for foreground user accesses and save resources for background tasks. On the other hand, sequentiality of write accesses passed to flash memory is critical because random write has following negative impacts on SSD. ) Shorten SSD Lifetime For SSD, the more random the writes are, the more erase operations are needed. Due to the nature of the technology, NAND flash memory can incur only a finite number of erases for a given physical block. Therefore, increased erase operations due to random writes make a flash storage wear out much faster than sequential writes. ) High Garbage Collection Overhead Random writes result in higher overhead of garbage collection than sequential writes. For SSD adopting hybrid FTL, the more random the writes are, the more merge operations [] are needed. At the worst case, each individual page in a log block would belong to a different mapping unit and needs expensive full merge operation correspondingly []. In addition, random write operations are most likely to trigger garbage collection. These internal operations running in the background may compete for resources with incoming foreground requests and cause increased latency. 3) Internal Fragmentation Flash memory does not support in-place update. Therefore, if the incoming writes are randomly distributed over the logical block address space, sooner or later all physical flash memory blocks may have an invalid page, which is called internal fragmentation [6]. Such an internal fragmentation has significant impact on garbage collection and performance. First, the cleaning efficiency drastically drops. Second, after fragmentation, each write becomes excessively expensive and the bandwidth of sequential write collapses much lower than that on a regular laptop disk [7]. Finally, the prefetching mechanism inside SSD would not be effective any more since logically continuous pages are not physically continuous to each other. This causes the bandwidth of sequential read to drop closely to the bandwidth of random read. 4) Little Chance for Performance Optimization SSD leverages striping and interleaving to improve performance based on sequential locality [5,7]. If a write is sequential, the data can be striped and written across different dies or planes in parallel. Interleaving is used to hide the latency of costly operations. Single multi-page read or write can be efficiently interleaved, while multiple single-page reads or writes can only be conducted in separate way. While above optimizations can dramatically improve performance for workload with more sequential locality, its ability to deal with random write is very limited because less sequential locality is left for it to exploit. Therefore, both cache hit ratio and sequentiality are two critical factors determining the efficiency of buffer management for flash memory. D. Motivation The typical workload in an enterprise system is a mixture of random and sequential accesses, which expose temporal locality and spatial locality respectively. Both page-based buffer management schemes and block-based buffer management schemes fail to utilize both temporal and spatial localities among the enterprise workloads to improve cache hit ratio and sequentiality for SSD. To illustrate the limitation of current buffer management schemes, let us consider an example reference stream mixed with sequential and random accesses, shown in the Table I. In this example, page-based LRU achieves 6 hits higher than block-based LRU, and block-based LRU has one sequential flush better than page-based LRU. Hybrid LRU achieves 3 cache hits and one sequential flush, which combines the advantages of both page-based LRU and block-based LRU. Since the buffer is positioned at a level higher than the flash memory and receives I/O requests directly from host, we are motivated to design a novel buffer management scheme to fully utilize both temporal and sequential localities to achieve high cache hit ratio and good sequentiality for SSD. III. HYBRID BUFFER MANAGEMENT Since both cache hit ratio and sequentiality affect the efficiency of buffer management in terms of performance, garbage collection overhead and lifetime, we adopt improvement on both as our design objectives. Our rationale is that one objective cannot be achieved at a cost of sacrificing another objective. To do so, we propose a novel buffer management scheme referred to as, which exploits both page and block to manage buffer space in hybrid form. By utilizing both temporal and spatial localities among accesses, pursues high cache hit ratio and good sequentiality for different enterprise workloads. A. Hybrid Management Several previous studies [,] claimed that the request frequency and the file size are inversely correlated, i.e. the most popular files are typically small in size, while the large files are relatively unpopular. Most files are small and most file accessed are small, e.g., [4] reports that 8% of file accesses

4 TABLE I. COMPARISON OF PAGE-LEVEL LRU, BLOCK-LEVEL LRU AND HYBRID LRU. CACHE SIZE IS 8 PAGES AND AN ERASE BLOCK CONTAINS 4 PAGES. HYBRID LRU MAINTAINS BUFFER AT PAGE AND BLOCK GRANULARITY. ONLY FULL BLOCKS WILL BE MANAGED AT BLOCK GRANULARITY AND WILL BE SELECTED AS VICTIM. IN THIS EXAMPLE, WE USE [] TO DENOTE BLOCK BOUNDARY. Access Page-Level LRU Block-Level LRU Hybrid LRU Cache(8) Flush Hit? Cache(8) Flush Hit? Cache(8) Flush Hit?,,,3 3,,, Miss [,,,3] Miss [,,,3] Miss 5,9,,4 4,,9,5,3,,, Miss [4],[9,],[5],[,,,3] Miss 4,,9,5,[,,,3] Miss 7 7,4,,9,5,3,, Miss [5,7], [4],[9,] [,,,3] Miss 7,4,,9,5 [,,,3] Miss 3 3,7, 4,,9,5,, Hit [3], [5,7], [4],[9,] Miss 3,7,4,,9,5 Miss,3,7,4,9,5,, Hit [9,], [3], [5,7], [4] Hit,3,7,4,9,5 Hit,,3,7,4,9,5, Hit [,3],[9,], [5,7], [4] Miss,,3,7,4,9,5 Miss 4 4,,,3,7,9,5, Hit [4], [,3],[9,], [5,7] Hit 4,,,3,7,9,5 Hit,4,,,3,7,9,5 Hit [,,3],[4],[9,],[5,7] Miss,4,,,3,7,9,5 Miss,,4,,,3,7,9 5 Miss [9,,],[,,3],[4] [5,7] Miss,,4,,,3,7,9 5 Miss 7 7,,,4,,,3,9 Hit [7],[9,,],[,,3],[4] Miss 7,,,4,,,3,9 Hit Sequential flush Cache hit 6 3 are to files of less than KB and the locality type of each request is deeply related to its size. Random accesses are small and popular, which have high temporal locality. Page-based buffer management is good at exploiting temporal locality to achieve high cache hit ratio. Sequential accesses are large and unpopular, which have high spatial locality. The block-based buffer management scheme can effectively exploit spatial locality. To fully utilize both temporal and spatial localities among enterprise workloads, we divide the buffer space into page region and block region, as shown in the Figure. In the page region, buffer data is managed and sorted in page granularity to improve buffer space utilization. Block region operates at the logical block granularity that has the same size as the erasable block size in the NAND flash memory. A page is either in page region or in block region. Both regions serve incoming requests. Pages in page region are organized as page LRU list. When a page cached in the page region is accessed (read/write), only this page is placed at the head of the page LRU list. Blocks in block region are organized and sorted on the basis of block popularity. Block popularity is defined as block access frequency including reading and writing of any pages of the block. When a logical page of a Page Region LRU List 37 Block Region Figure. Hybrid Buffer Management, Buffer space is divided into two regions, page region and block region. In the page region, buffer data is managed and sorted in page granularity, while block region manages data in block granularity. Block in block region is slected as victim for replacement. 8 9 Block Popularity List block is accessed (including read miss), we increase the block popularity by one. Sequentially accessing multiple pages of a block is treated as one block access instead of multiple accesses. Thus, block with sequential accesses will have low popularity value, while block with random accesses has high popularity value. The blocks with same popularity are sorted on the order of the number of pages in the block. Therefore, the temporal locality among the random accesses and spatial locality among sequential accesses can be fully exploited by page-based buffer management and blockbased buffer management respectively. B. Servicing Both Read and Write Operations In the real application, read and write accesses are mixed. Usage patterns exhibit block-level temporal locality: the pages in the same logical block are likely to be accessed (read/write) again in the near future. Separately servicing the read and write accesses in different buffer space may destroy the original locality present among access sequences. Moreover, servicing reads helps to reduce the load in the flash data channel which is shared by both read and write operations [3]. By servicing foreground read operations, the flash data channel s bandwidth can also be saved to conduct background garbage collection task, which helps to reduce the influences of each other. [] is designed only for write buffer. It uses page padding to improve sequentiality of flushed data at a cost of additional reads, which impacts the overall performance. For random dominant workload, needs to read a large number of additional pages, which can be seen in our experiments. Unlike, we leverage the block-level temporal locality among read and write accesses to naturally form sequential block. treats read and write as a whole to make full use of locality of accesses. groups both dirty and clean pages belonging to the same erase block into a logical block in the block region.

5 For read requests, attempts to fetch data from page region and block region. If the read request is not hit in buffer, would then fetch data from flash memory and a copy of the data will be placed in buffer as reference for future requests. Upon the arrival of a write request, places it into buffer cache instead of synchronously writing it into flash memory. For both reading missed data and writing data, places them in either page region or block region in the following way. If the corresponding block exists in block region, places the data in the block region and reorders the block list with updated popularity. Otherwise places it at the beginning of the page region and updates Block B+tree, which will be discussed in the following subsection D. Page LRU List Block Assembling Page Region Blk. 9 Blk. 5 7 Blk. C. Replacement Policy This paper views negative impacts of random write on SSD lifetime and performance as penalty. The cost of sequential write miss is much lower than that of random write. Popular data will be frequently updated. When replacement happens, unpopular data should be replaced instead of popular data. Keeping popular data in buffer as long as possible can minimize the penalty. For this purpose, we give preference to random access pages for staying in the page region, while sequential access pages in block region are replaced first. The least popular block in the block region is selected as victim. If more than one block has the same least popularity, a block having the largest number of cached pages is selected as a victim. Once a block is selected as victim, there are two cases to deal with. (i) If there are dirty pages in this block, both dirty pages and clean pages of this block are sequentially flushed into flash memory. This policy guarantees that logically continuous pages can be physically placed onto continuous pages, so as to avoid internal fragmentation. By contrast, [3] flushes only dirty pages in the victim block and discards all the clean pages without considering the sequentiality of flushed data. (ii) If there are no dirty pages in the block, all the clean pages of this block will be discarded. Only if block region is empty, we select the least recently used page as victim from page region. The corresponding pages belonging to the same block as this victim page will be replaced and flushed. This policy tries to avoid single page flush which has high impact on garbage collection and internal fragmentation. With the filtering effect of the cache on I/O requests, we influence the I/O requests from host so that more sequential page requests and less random page requests are passed to the flash memory thereafter. The flash memory is then able to process the requests with stronger spatial locality more efficiently. D. Threshold-based Migration Buffer data in page region will be migrated to block region if the number of pages in a block reaches the threshold, as shown in Figure. How to determine the threshold value will be discussed in the following subsection E. With filter effect of the threshold, random pages will stay in the page region, while the sequential blocks reside in the block Block Region Number of pages >= THR migrate Blk.5 Blk Block migration Blk. Blk.9 Figure. Threshold-based Migration. Buffer data in page region will be migrated to block region only if the number of pages in a block reaches the threshold. Grey boxes denote that a block is found and migrated to block region. An erase block consists of 4 pages. region. Therefore, temporal locality among random pages and spatial locality among sequential blocks can be fully utilized in the hybrid buffer management. To implement threshold-based migration, we use a Block B+tree to assemble sequential blocks in the page region, as shown in Figure 3. The Block B+tree uses block number as key. A data structure called block node is introduced to describe a block in terms of block popularity, number of pages in the block including clean and dirty pages, and pointer array. The pointers in the array point to the pages belonging to the same block. Upon arrival of an access in the page region, we deal with in following way. We use the logical page number to calculate the corresponding block number. Then we search the Block B+tree using the block number. If the block exists in the Block B+tree, we update the block node including block popularity, number of pages and pointer. If the number of pages in the block reaches the threshold, all the pages in the block will be migrated to block region. The Block B+tree deletes the block node and reconstructs the B+tree. The migrated block will be inserted into corresponding location of the block list in the block region according to its popularity. If it is a new access, we allocate a new block node and add it into the Block B+tree. The function of the Block B+tree is as follows. Firstly, it manages the blocks for migration. Secondly, if we have to replace pages from page region for free space, it is used to quickly search the block which the victim page belongs to. Victim block Block Popularity List

6 Key: Block Number Number of pages Block Popularity Pointer Array Block Node Figure 3. Block B+tree Root node Interior node Leaf node Page LRU List By using the Block B+tree, the pages belonging to the same block can be quickly searched and located. Meanwhile, the space overhead of the Block B+tree is limited. As shown in Figure 3, Block B+tree generally includes three parts: block nodes, leaf nodes and interior nodes (including root node). In order to analyze the space overhead, we first make following assumptions: (i) Integer and pointer type consumes 4 bytes, (ii) Fill factor of Block B+tree is at least 5% [36], (iii) the total size of interior nodes is half of total size of leaf nodes (in practice, the number of interior nodes is much smaller than leaf nodes. Fan out of B+tree is 33 on average [36]. Even when fan out is equal to two, the number of interior nodes is less than leaf nodes), (iv) every block node consumes bytes when pointer array only includes one pointer. In this case the number of the block nodes is max. When the length of page list is L, the size of buffer pages is L**4 (KB page). Accordingly, the number of block nodes is L (assumption iv), the total size of block nodes is L* (assumption iv), the total size of leaf nodes is L**8 (one unit corresponding to one block node in leaf node consumes 8 bytes), and the total size of interior nodes is L**4 (assumption iii). Therefore, the space overhead of Block B+tree is less than % of buffer pages. E. Dynamic Threshold Our objective is to improve sequentiality of accesses passed to flash memory, at the meantime maintain high buffer space utilization. The threshold is utilized to balance the two objectives, whose value is critical to the efficiency of our proposed buffer management scheme. In order to investigate the proper threshold value, we tested the effects of different threshold value through repetitive experiments over a set of workloads. However, we found in our experiments that it is difficult to find an average wellperformed value for all types of workloads. Different threshold values should be set to achieve optimal results even for the same workload. Statically setting threshold value can not adapt to enterprise workloads with complex features and interleaved I/O requests. We realized that the value of threshold is highly dependent on workload features. For random dominant workload, a small value is suitable because it is difficult to form sequential blocks. For sequential dominant workload, a large value is desirable because a lot of partial blocks instead of full blocks will be migrated from page region to block region if a small threshold is set. Dynamic threshold is achieved by using the following heuristic method. We use THR migrate to denote the threshold. Clearly, the value of THR migrate is from to the total number of pages in a block. N block and N total represent the size of block region and total buffer space, respectively. We use γ to denote the ratio between the N block and N total. The value of γ is under following constraint, which is used to control whether to enlarge or reduce the threshold. N block γ = θ () N total Where θ is a parameter configured based on buffer size. Since the block region is used to store block candidates whose pages are sequential enough for replacement and flush, the size of the block region should be much smaller than the size of page region. We set the value of θ as % for small buffer size and % for large buffer size. Initially, the value of THR migrate is set to. If the value of γ is becoming larger than θ, it indicates that the size of the block region breaks the above constraint. To reduce the size of the block region, a large value of THR migrate is required to increase the difficulty of page migration from the page region to the block region. Then, the value of THR migrate will be doubled until γ is less than θ. On the other hand, the value of THR migrate will be halved if block region is empty. The dynamic method can properly adjust the threshold value based on different workload features. With our experiments, the dynamic threshold performs well with various buffer size imposed by different application sets. IV. EVALUATION A. Experiment Setup ) Trace-driven Simulator We built a trace-driven buffer cache simulator by interfacing a modified version of the DiskSim 4. [] and its SSD extension [5]. We implemented following buffer cache schemes in the SSD code: ), ) [], and 3) [3]. Block Associative Sector Translation (BAST) [3] is implemented as FTL in SSD. Configuration values of SSD listed in the Table II are taken from [5]. ) Workload Traces We use a mixture of real-world and synthetic traces to study the efficiency of different buffer management schemes on a wide spectrum of enterprise-scale workloads. Table III presents salient features of our workload traces. We employ a read-dominant I/O trace from an OLTP application running at a financial institution [] made available by the Storage Performance Council (SPC),

7 TABLE II. SPECIFICATION OF SSD CONFIGURATION Page Read to Register 5μs Page Program (Write) from Register μs Block Erase.5ms Serial Access to Register (Data bus) μs Die Size GB Block Size 56 KB Page Size 4 KB Data Register 4 KB Erase Cycles K SSD Capacity 3GB henceforth referred to as Financial trace. We also employ a write-dominant I/O trace called CAMWEBDEV, which was collected by Microsoft [3] made available by the Storage Network Information Association (SNIA). Besides read and write dominant workloads, we want to assess the behavior of different buffer management schemes under mixed workloads. For this purpose, we use MSNFS and Exchange traces, which were collected by Microsoft made available by SNIA[3]. Finally, we also use a synthetic trace to study the behavior of different buffer management schemes for a sequential dominant workload, which is referred to as Syn trace. The five traces used in our experiments cover wide range of workload characteristics from random to sequential and from read dominant to write dominant. TABLE III. SPECIFICATION OF WORKLOADS Workload Avg. Req. Size(KB) Write (%) Seq. (%) Avg. Req. Interarrive Time(ms) Financial MSNFS Exchange CAMWEBDEV Syn ) Evaluation Metrics In this study, we utilize (i) response time as seen at the I/O driver (this is the sum of the device service time and time spent waiting in the driver s queue), (ii) cache hit ratio, (iii) number of erases (indicators of the garbage collection overhead) and (iv) distribution of write length (this indicates the sequentiality of write accesses passed to flash memory) to characterize the behavior of different buffer management schemes. B. Experiment Results Figures 4 through Figure 7 show the average response time, cache hit ratio, number of erases and distribution of write length of, and caching schemes for the four workloads when we vary memory size. To indicate the write length distribution, we use CDF curves to show percentage (shown on Y-axis) of written pages whose sizes are less than a certain value (shown on X-axis). Because of space limitation, we only present CDF curves for MB buffer in Figure 4 through Figure 7. We also present CDF curves for 6 MB buffer under different traces in Figure 9. The following observations are made from the results. ) Financial trace Figure 4 shows that outperforms and in terms of average response time, cache hit ratio, number of erases and number of sequential writes under the completely read dominant trace. With the memory size of MB, the average response time of is. msecond. By contrast, the average response time of is.75 msecond (see Figure 4(a)). makes 84% improvement in terms of average response time compared to. Accordingly, there is a 85% hit ratio increase (see Figure 4(b)) and an 85% erase reduction (see Figure 4(c)) compared to. also exhibits 76% faster, 46% more cache hits and 8% lower erases compared to for a MB buffer. Figure 4(d) shows that the percentage of -page write of and is 56% and %, respectively. By contrast, only has 5% small writes, better than and. Further, provides much more large writes than and. For example, almost 3% of the writes are larger than 4 pages in size for, while and only have 4% and % writes larger than 4 pages. We can also see from Figure 9 that algorithm is very efficient in increasing sequentiality of write accesses for different traces. The results indicate that the performance gain of comes from two aspects: high cache hit ratio and reduced garbage collection. This is because exploits page and block to manage buffer space in hybrid way, taking both temporal and spatial localities into account. makes its contributions through improving cache hit ratio and increasing the portion of sequential writes. ) MSNFS trace MSNFS trace is a random workload in which reads are about 34% more than writes. Careful analysis of this workload reveals that it exhibits a very high degree of both spatial and temporal locality. Figure 5 shows that exhibits up to 8% faster, 39% more cache hits and 78% lower erases compared to for the cache size up to 3 MB. also performs up to 63%, 63% and 38% better in terms of average response time, cache hits ration and number of erases compared to. Beyond 3 MB, the advantages of over and narrow down because cache size is large enough to accommodate most accesses. It is thus obvious to see that is better than and for workloads of this nature. This is because block-based and tradeoff spatial locality with temporal locality, while can efficiently leverage both temporal and spatial localities. 3) Exchange trace Exchange trace is a random workload in which writes are about 44% more than reads. For a memory size of MB, the average response time of is.73 msecond. By contrast, the average response time of is 3.4 msecond (see Figure 6(a)). makes 45% improvement in terms of average response time compared to. Accordingly, there is a 6% hit ratio increase (see Figure 6(b)) and a % erase reduction (see Figure 6(c)).

8 Repsonse Time (msec) (a) Response time varies with buffer size Hit Ratio (%) (b) Cache hit ratio varies with buffer size Number of Erases Cumulative Probability (%) Write Length (Pages) (c) Number of erases varies with buffer size Figure 4. Financial Trace (d) Distribution of write length when buffer is MB Repsonse Time (msec) Hit Ratio (%) (a) Response time varies with buffer size (b) Cache hit ratio varies with buffer size Number of Erases Cumulative Probability(%) Write Length (Pages) (c) Number of erases varies with buffer size Figure 5. MSNFS Trace (d) Distribution of write length when buffer is MB

9 Repsonse Time (msec) (a) Response time varies with buffer size Hit Ratio (%) (b) Cache hit ratio varies with buffer size Number of Erases Cumulative Probability(%) Write Length (Pages) (c) Number of erases varies with buffer size Figure 6. Exchange Trace (d) Distribution of write length when buffer is MB Repsonse Time (msec) Hit Ratio (%) (a) Response time varies with buffer size (b) Cache hit ratio varies with buffer size Number of Erases Cumulative Probability(%) Write Length (Pages) (c) Number of erases varies with buffer size Figure 7. CAMWEBDEV Trace (d) Distribution of write length when buffer is MB

10 also exhibits 3% faster, 8% more cache hits and 5% lower erases than for a MB buffer. The percentage of small writes (less than pages) of and is 44% and 9%, respectively. By contrast, has 4% small writes, better than and (see in Figure 6(d)). Further, provides much more sequential writes than and. For example, almost 38% of the writes are larger than 4 pages in size for, while and only have % and.% writes larger than 4 pages. 4) CAMWEBDEV trace CAMWEBDEV trace is a completely random write dominant workload. Figure 7 shows that performs 7% and 4% faster than and for the buffer size of MB. Accordingly, there is a 8% and 65% hit ratio increase compared to and. There is also a 48% and % reduction of erases. We observed that is also efficient in reducing the number of small writes and increasing the number of sequential writes for write intensive workload. Figure 7 (d) shows that and produce 8% and 98% small writes (less than 4 pages). By contrast, has 74% small write, which is letter than and. also provide % large writes, which is larger than 8 pages in size. However, and only have 7% and.% large writes. Figure 9(d) also shows that with buffer size of 6MB, and produce 8% and 4% small writes (less than 4 pages). By contrast, has only 8% small writes. also provide 33% large writes, which is larger than 8 pages in size. However, and only have 8% and 5% large writes. As buffer size increases from MB to 6MB, improvement of is much larger than and. This indicates that algorithm is more efficient than and in increasing sequentiality of write accesses across different buffer sizes. The results further show that the distribution of write length is directly correlated to the garbage collection overhead and performance. With buffer size of MB, is able to produce 7% writes whose sizes are larger than 4 pages compared to 8%, % for and (see Figure 7(d)). Accordingly, there is a 48% and % garbage collection overhead reduction for compared to and (see Figure 7(c)). Consequently, performance is improved by 7% and 4% compared to and (see Figure 7(a)). The correlation clearly indicates that write length is a critical factor affecting SSD performance and garbage collection overhead. 5) Effect of workloads We observed that efficiency of is different under different workload traces. With buffer size of MB, achieves 84%, 8%, 45% and 7% performance improvement over for Financial, MSNFS, Exchange and CAMWEBDEV traces, respectively. Accordingly, there is a 76%, 63%, % and 4% performance improvement compared to. The results Repsonse Time (msec) Figure 8. Average response time of, and under Syn trace indicate that outperforms and for different types of random workloads in an enterprise system. For sequential write dominant trace, we only show performance results in the Figure 8 due to space limitation. We can see that still performs better than and, but the advantage of over and is not so significant because this workload provides more spatial locality for them to exploit, compared to above random workloads. 6) Additional overhead To study the overhead of different buffer management schemes under different workloads, we present total read pages during replaying traces in Figure. We can see from the results that conducts a large number of read operations. Let s take Figure (a) as an example. With the buffer size of MB, results in 368% and 57% more page reading than and respectively. Accordingly, the average response time of is 53% and 48% slower than and (see Figure 4(a)) respectively. This is because that uses page padding to improve the number of sequential writes. For completely random workload in enterprise environment, needs to read a large number of additional pages, which impacts the overall performance. By contrast, our proposed achieves better performance without additional reads. treats read and write as a whole and leverages the block-level temporal locality among read and write accesses to naturally form sequential block. 7) Effect of Threshold To investigate how threshold value affects the efficiency of proposed, we tested with static thresholds and dynamic threshold for different traces, as shown in Figure. Let s take Figure (c) as an example. With the memory size of 6 MB, the average response time of is.79msecond,.65msecond and.73msecond when threshold value is, 4, and 8 respectively. By contrast, the average response time of is.55msecond for dynamic threshold, which is much better than that of static thresholds. We further observed that the same threshold is also unable to achieve optimal performance for different workloads. Figure (a) shows that with buffer size of 6MB, performs

11 Cumulative Probability(%) Write Length(pages) Write Length(pages) Write Length(pages) Write Length(pages) (a) Financial (b) MSNFS trace (c) Exchange (d) CAMWEBDEV Figure 9. Distribution of Write length of, and when buffer size is 6MB Read ( pages) (a) Financial (b) MSNFS trace (c) Exchange (d) CAMWEBDEV Figure. Total pages read by, and under 4 traces Response time(msec) THR= THR=8 THR=3 Dynamic THR THR=4 THR=6 THR= THR= THR=8 THR=3 Dynamic THR THR=4 THR=6 THR= THR= THR=8 THR=3 Dynamic THR THR=4 THR=6 THR= THR= THR=8 THR=3 Dynamic THR THR=4 THR=6 THR= (a) Financial (b) MSNFS trace (c) Exchange (d) CAMWEBDEV Figure. Effect of Threshold on better when threshold is set as 64 for Financial trace, compared to other static thresholds. However, with buffer size of 6 MB and threshold of 64, the average response time of is.67msecond for CAMWEBDEV trace, which is worse compared to threshold of, 4, and 8 respectively(see Figure (d)). By contrast, the results in Figure show that dynamic threshold achieves best performance for Financial, MSNFS, Exchange and CAMWEBDEV traces respectively. The variation in performance curves shown in the Figure clearly indicates that threshold value has significant impact on efficiency of our proposed. Statically setting threshold is unable to achieve optimal performance. Dynamically adjusting

12 the threshold for enterprise workloads makes proposed workload adaptive. V. RELATED WORKS In this section, we present related works in the literature. A. Buffer Cache Management ) Disk Buffer Cache Management One of the most active research areas on improving disk I/O performance is buffer caching. Over the years, numerous replacement algorithms have been proposed to reduce actual disk accesses. The oldest and yet still widely adopted algorithm is the Least Recently Used (LRU) algorithm. The popularity of LRU comes from its simple and effective exploitation of temporal locality: a block that is accessed recently is likely to be accessed again in the near future. There are also a large number of other algorithms proposed such as CLOCK[6], Q [9], MQ [33], ARC [8], and LIRS [34]. By exploiting the temporal locality of data accesses, all these replacement algorithms are designed by adopting cache hit ratio improvement as the sole objective to minimize disk activities. However this can be a misleading metric for SSD. As discussed in Section II, sequentiality of write accesses passed to SSD significantly influences performance, lifetime, internal fragmentation and garbage collection overhead. These replacement algorithms are not effective for SSD because spatial locality is unfortunately ignored. DULO [] scheme introduces spatial locality into the consideration of page replacement and thus makes replacement algorithms aware of page placements on the disk. DULO utilizes both temporal and spatial locality in buffer management for hard disk. In DULO, the characteristics of a hard disk are exploited so that sequential access is more efficient than random access. However, it cannot be directly applied in SSD because it consider hard disk layout instead of SSD layout. ) SSD Buffer Cache Management Existing caching algorithms for flash memory include CFLRU [35], [], [3] and LB-CLOCK [3] algorithms. Clean first LRU (CFLRU) [35] is a page-based buffer cache management algorithm for flash storage. It divides the host buffer space into working region and eviction region. Victim buffer pages are selected from the eviction region. To exploit the asymmetric performance of flash memory read and write operations, it attempts to choose a clean page as a victim rather than dirty pages. CFLRU reduces number of writes by performing more reads. The flash aware buffer policy () [3] is block-based buffer cache management policy used for flash memory. In, the buffer pages belonging to the same erasable block are grouped together. considers the number of resident cached pages in a block as the sole criteria to select a victim block. evicts a block having the largest number of cached pages. In case of a tie, it considers the LRU order. All the dirty pages in the victim group are flushed, and all the clean pages in it are discarded. This policy may results in internal fragmentation, which significantly impacts the efficiency of garbage collection and performance. The main application of is in portable media players in which access pattern of write is sequential. Block Padding LRU () [] is another block-based buffer cache management scheme only for flash memory write. uses block-level LRU, page padding, and LRU compensation to establish a desirable write pattern with RAM buffering. However it does not consider read requests. For completely random workload, incurs a large number of additional reads, which significantly impact the overall performance. Large Block CLOCK (LB-CLOCK) [3] algorithm considers recency and block space utilization metrics to make cache management decisions. LB-CLOCK dynamically varies the priority between these two metrics to adapt to changes in workload characteristics. B. Flash Memory Several studies have been conducted on flash storage concerning the performance of random writes at various levels of storage hierarchy [4,5,6,7]. Research works on FTL try to improve performance and address the problems of high garbage collection overhead. BAST exclusively associates a log block with a data block. In presence of small random writes, this scheme suffers from increased garbage collection cost. FAST [9] keeps a single sequential log block dedicated for sequential updates while other log blocks are used for random writes. SuperBlock FTL scheme [] utilizes block level spatial locality in workloads by combining consecutive logical blocks into a Superblock. It maintains page level mappings within the superblock to exploit temporal locality by separating hot and cold data within the superblock. The Locality-Aware Sector Translation (LAST) scheme [4] tries to alleviate the shortcomings of BAST and FAST by exploiting both temporal locality and sequential locality in workloads. It further separates random log blocks into hot and cold regions to reduce garbage collection cost. Unlike currently predominant hybrid FTLs, Demand-based Flash Translation Layer (DFTL) [] is purely page-mapped, which exploits temporal locality in enterprise-scale workloads to store the most popular mappings in on-flash limited SRAM while the rest are maintained on the flash device itself. MFT [8], a block device level solution, translates random writes to sequential writes between the file system and SSD. FlashLite [7] does it between application and the file system with idea similar to PP file sharing. Griffin [4] is proposed to use a log-structured HDD as a write cache to improve the sequentiality of the write accesses to the SSD. This paper differs from the above mentioned studies in a number of ways. First, is a hybrid buffer management scheme which exploits both page and block to manage buffer space and considers both cache hit ratio and sequentiality as design metrics. A second major feature of this study is that

CBM: A Cooperative Buffer Management for SSD

CBM: A Cooperative Buffer Management for SSD 3 th International Conference on Massive Storage Systems and Technology (MSST 4) : A Cooperative Buffer Management for SSD Qingsong Wei, Cheng Chen, Jun Yang Data Storage Institute, A-STAR, Singapore June

More information

HBM: A HYBRID BUFFER MANAGEMENT SCHEME FOR SOLID STATE DISKS GONG BOZHAO A THESIS SUBMITTED FOR THE DEGREE OF MASTER OF SCIENCE

HBM: A HYBRID BUFFER MANAGEMENT SCHEME FOR SOLID STATE DISKS GONG BOZHAO A THESIS SUBMITTED FOR THE DEGREE OF MASTER OF SCIENCE HBM: A HYBRID BUFFER MANAGEMENT SCHEME FOR SOLID STATE DISKS GONG BOZHAO A THESIS SUBMITTED FOR THE DEGREE OF MASTER OF SCIENCE DEPARTMENT OF COMPUTER SCIENCE NATIONAL UNIVERSITY OF SINGAPORE June 2010

More information

Presented by: Nafiseh Mahmoudi Spring 2017

Presented by: Nafiseh Mahmoudi Spring 2017 Presented by: Nafiseh Mahmoudi Spring 2017 Authors: Publication: Type: ACM Transactions on Storage (TOS), 2016 Research Paper 2 High speed data processing demands high storage I/O performance. Flash memory

More information

A Buffer Replacement Algorithm Exploiting Multi-Chip Parallelism in Solid State Disks

A Buffer Replacement Algorithm Exploiting Multi-Chip Parallelism in Solid State Disks A Buffer Replacement Algorithm Exploiting Multi-Chip Parallelism in Solid State Disks Jinho Seol, Hyotaek Shim, Jaegeuk Kim, and Seungryoul Maeng Division of Computer Science School of Electrical Engineering

More information

LAST: Locality-Aware Sector Translation for NAND Flash Memory-Based Storage Systems

LAST: Locality-Aware Sector Translation for NAND Flash Memory-Based Storage Systems : Locality-Aware Sector Translation for NAND Flash Memory-Based Storage Systems Sungjin Lee, Dongkun Shin, Young-Jin Kim and Jihong Kim School of Information and Communication Engineering, Sungkyunkwan

More information

A Memory Management Scheme for Hybrid Memory Architecture in Mission Critical Computers

A Memory Management Scheme for Hybrid Memory Architecture in Mission Critical Computers A Memory Management Scheme for Hybrid Memory Architecture in Mission Critical Computers Soohyun Yang and Yeonseung Ryu Department of Computer Engineering, Myongji University Yongin, Gyeonggi-do, Korea

More information

BPCLC: An Efficient Write Buffer Management Scheme for Flash-Based Solid State Disks

BPCLC: An Efficient Write Buffer Management Scheme for Flash-Based Solid State Disks BPCLC: An Efficient Write Buffer Management Scheme for Flash-Based Solid State Disks Hui Zhao 1, Peiquan Jin *1, Puyuan Yang 1, Lihua Yue 1 1 School of Computer Science and Technology, University of Science

More information

CFTL: A Convertible Flash Translation Layer with Consideration of Data Access Patterns. Technical Report

CFTL: A Convertible Flash Translation Layer with Consideration of Data Access Patterns. Technical Report : A Convertible Flash Translation Layer with Consideration of Data Access Patterns Technical Report Department of Computer Science and Engineering University of Minnesota 4-9 EECS Building Union Street

More information

SFS: Random Write Considered Harmful in Solid State Drives

SFS: Random Write Considered Harmful in Solid State Drives SFS: Random Write Considered Harmful in Solid State Drives Changwoo Min 1, 2, Kangnyeon Kim 1, Hyunjin Cho 2, Sang-Won Lee 1, Young Ik Eom 1 1 Sungkyunkwan University, Korea 2 Samsung Electronics, Korea

More information

Optimizing Translation Information Management in NAND Flash Memory Storage Systems

Optimizing Translation Information Management in NAND Flash Memory Storage Systems Optimizing Translation Information Management in NAND Flash Memory Storage Systems Qi Zhang 1, Xuandong Li 1, Linzhang Wang 1, Tian Zhang 1 Yi Wang 2 and Zili Shao 2 1 State Key Laboratory for Novel Software

More information

Storage Architecture and Software Support for SLC/MLC Combined Flash Memory

Storage Architecture and Software Support for SLC/MLC Combined Flash Memory Storage Architecture and Software Support for SLC/MLC Combined Flash Memory Soojun Im and Dongkun Shin Sungkyunkwan University Suwon, Korea {lang33, dongkun}@skku.edu ABSTRACT We propose a novel flash

More information

CFDC A Flash-aware Replacement Policy for Database Buffer Management

CFDC A Flash-aware Replacement Policy for Database Buffer Management CFDC A Flash-aware Replacement Policy for Database Buffer Management Yi Ou University of Kaiserslautern Germany Theo Härder University of Kaiserslautern Germany Peiquan Jin University of Science and Technology

More information

S-FTL: An Efficient Address Translation for Flash Memory by Exploiting Spatial Locality

S-FTL: An Efficient Address Translation for Flash Memory by Exploiting Spatial Locality S-FTL: An Efficient Address Translation for Flash Memory by Exploiting Spatial Locality Song Jiang, Lei Zhang, Xinhao Yuan, Hao Hu, and Yu Chen Department of Electrical and Computer Engineering Wayne State

More information

SUPA: A Single Unified Read-Write Buffer and Pattern-Change-Aware FTL for the High Performance of Multi-Channel SSD

SUPA: A Single Unified Read-Write Buffer and Pattern-Change-Aware FTL for the High Performance of Multi-Channel SSD SUPA: A Single Unified Read-Write Buffer and Pattern-Change-Aware FTL for the High Performance of Multi-Channel SSD DONGJIN KIM, KYU HO PARK, and CHAN-HYUN YOUN, KAIST To design the write buffer and flash

More information

Performance Modeling and Analysis of Flash based Storage Devices

Performance Modeling and Analysis of Flash based Storage Devices Performance Modeling and Analysis of Flash based Storage Devices H. Howie Huang, Shan Li George Washington University Alex Szalay, Andreas Terzis Johns Hopkins University MSST 11 May 26, 2011 NAND Flash

More information

Parallel-DFTL: A Flash Translation Layer that Exploits Internal Parallelism in Solid State Drives

Parallel-DFTL: A Flash Translation Layer that Exploits Internal Parallelism in Solid State Drives Parallel-: A Flash Translation Layer that Exploits Internal Parallelism in Solid State Drives Wei Xie, Yong Chen and Philip C. Roth Department of Computer Science, Texas Tech University, Lubbock, TX 7943

More information

Page Mapping Scheme to Support Secure File Deletion for NANDbased Block Devices

Page Mapping Scheme to Support Secure File Deletion for NANDbased Block Devices Page Mapping Scheme to Support Secure File Deletion for NANDbased Block Devices Ilhoon Shin Seoul National University of Science & Technology ilhoon.shin@snut.ac.kr Abstract As the amount of digitized

More information

Baoping Wang School of software, Nanyang Normal University, Nanyang , Henan, China

Baoping Wang School of software, Nanyang Normal University, Nanyang , Henan, China doi:10.21311/001.39.7.41 Implementation of Cache Schedule Strategy in Solid-state Disk Baoping Wang School of software, Nanyang Normal University, Nanyang 473061, Henan, China Chao Yin* School of Information

More information

Flash Memory Based Storage System

Flash Memory Based Storage System Flash Memory Based Storage System References SmartSaver: Turning Flash Drive into a Disk Energy Saver for Mobile Computers, ISLPED 06 Energy-Aware Flash Memory Management in Virtual Memory System, islped

More information

Computer Architecture and System Software Lecture 09: Memory Hierarchy. Instructor: Rob Bergen Applied Computer Science University of Winnipeg

Computer Architecture and System Software Lecture 09: Memory Hierarchy. Instructor: Rob Bergen Applied Computer Science University of Winnipeg Computer Architecture and System Software Lecture 09: Memory Hierarchy Instructor: Rob Bergen Applied Computer Science University of Winnipeg Announcements Midterm returned + solutions in class today SSD

More information

arxiv: v1 [cs.ar] 11 Apr 2017

arxiv: v1 [cs.ar] 11 Apr 2017 FMMU: A Hardware-Automated Flash Map Management Unit for Scalable Performance of NAND Flash-Based SSDs Yeong-Jae Woo Sang Lyul Min Department of Computer Science and Engineering, Seoul National University

More information

Purity: building fast, highly-available enterprise flash storage from commodity components

Purity: building fast, highly-available enterprise flash storage from commodity components Purity: building fast, highly-available enterprise flash storage from commodity components J. Colgrove, J. Davis, J. Hayes, E. Miller, C. Sandvig, R. Sears, A. Tamches, N. Vachharajani, and F. Wang 0 Gala

More information

Integrating Flash Memory into the Storage Hierarchy

Integrating Flash Memory into the Storage Hierarchy Integrating Flash Memory into the Storage Hierarchy A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Biplob Kumar Debnath IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

More information

Clustered Page-Level Mapping for Flash Memory-Based Storage Devices

Clustered Page-Level Mapping for Flash Memory-Based Storage Devices H. Kim and D. Shin: ed Page-Level Mapping for Flash Memory-Based Storage Devices 7 ed Page-Level Mapping for Flash Memory-Based Storage Devices Hyukjoong Kim and Dongkun Shin, Member, IEEE Abstract Recent

More information

Preface. Fig. 1 Solid-State-Drive block diagram

Preface. Fig. 1 Solid-State-Drive block diagram Preface Solid-State-Drives (SSDs) gained a lot of popularity in the recent few years; compared to traditional HDDs, SSDs exhibit higher speed and reduced power, thus satisfying the tough needs of mobile

More information

A Caching-Oriented FTL Design for Multi-Chipped Solid-State Disks. Yuan-Hao Chang, Wei-Lun Lu, Po-Chun Huang, Lue-Jane Lee, and Tei-Wei Kuo

A Caching-Oriented FTL Design for Multi-Chipped Solid-State Disks. Yuan-Hao Chang, Wei-Lun Lu, Po-Chun Huang, Lue-Jane Lee, and Tei-Wei Kuo A Caching-Oriented FTL Design for Multi-Chipped Solid-State Disks Yuan-Hao Chang, Wei-Lun Lu, Po-Chun Huang, Lue-Jane Lee, and Tei-Wei Kuo 1 June 4, 2011 2 Outline Introduction System Architecture A Multi-Chipped

More information

Data Organization and Processing

Data Organization and Processing Data Organization and Processing Indexing Techniques for Solid State Drives (NDBI007) David Hoksza http://siret.ms.mff.cuni.cz/hoksza Outline SSD technology overview Motivation for standard algorithms

More information

Performance Impact and Interplay of SSD Parallelism through Advanced Commands, Allocation Strategy and Data Granularity

Performance Impact and Interplay of SSD Parallelism through Advanced Commands, Allocation Strategy and Data Granularity Performance Impact and Interplay of SSD Parallelism through Advanced Commands, Allocation Strategy and Data Granularity Yang Hu, Hong Jiang, Dan Feng, Lei Tian,Hao Luo, Shuping Zhang School of Computer,

More information

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address

More information

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350): The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350): Motivation for The Memory Hierarchy: { CPU/Memory Performance Gap The Principle Of Locality Cache $$$$$ Cache Basics:

More information

Using Transparent Compression to Improve SSD-based I/O Caches

Using Transparent Compression to Improve SSD-based I/O Caches Using Transparent Compression to Improve SSD-based I/O Caches Thanos Makatos, Yannis Klonatos, Manolis Marazakis, Michail D. Flouris, and Angelos Bilas {mcatos,klonatos,maraz,flouris,bilas}@ics.forth.gr

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to

More information

NAND Flash-based Storage. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

NAND Flash-based Storage. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University NAND Flash-based Storage Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Today s Topics NAND flash memory Flash Translation Layer (FTL) OS implications

More information

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2014 Lecture 14

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2014 Lecture 14 CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 2014 Lecture 14 LAST TIME! Examined several memory technologies: SRAM volatile memory cells built from transistors! Fast to use, larger memory cells (6+ transistors

More information

Design of Flash-Based DBMS: An In-Page Logging Approach

Design of Flash-Based DBMS: An In-Page Logging Approach SIGMOD 07 Design of Flash-Based DBMS: An In-Page Logging Approach Sang-Won Lee School of Info & Comm Eng Sungkyunkwan University Suwon,, Korea 440-746 wonlee@ece.skku.ac.kr Bongki Moon Department of Computer

More information

A File-System-Aware FTL Design for Flash Memory Storage Systems

A File-System-Aware FTL Design for Flash Memory Storage Systems 1 A File-System-Aware FTL Design for Flash Memory Storage Systems Po-Liang Wu, Yuan-Hao Chang, Po-Chun Huang, and Tei-Wei Kuo National Taiwan University 2 Outline Introduction File Systems Observations

More information

A Semi Preemptive Garbage Collector for Solid State Drives. Junghee Lee, Youngjae Kim, Galen M. Shipman, Sarp Oral, Feiyi Wang, and Jongman Kim

A Semi Preemptive Garbage Collector for Solid State Drives. Junghee Lee, Youngjae Kim, Galen M. Shipman, Sarp Oral, Feiyi Wang, and Jongman Kim A Semi Preemptive Garbage Collector for Solid State Drives Junghee Lee, Youngjae Kim, Galen M. Shipman, Sarp Oral, Feiyi Wang, and Jongman Kim Presented by Junghee Lee High Performance Storage Systems

More information

NAND Flash-based Storage. Computer Systems Laboratory Sungkyunkwan University

NAND Flash-based Storage. Computer Systems Laboratory Sungkyunkwan University NAND Flash-based Storage Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Today s Topics NAND flash memory Flash Translation Layer (FTL) OS implications

More information

Rejuvenator: A Static Wear Leveling Algorithm for NAND Flash Memory with Minimized Overhead

Rejuvenator: A Static Wear Leveling Algorithm for NAND Flash Memory with Minimized Overhead Rejuvenator: A Static Wear Leveling Algorithm for NAND Flash Memory with Minimized Overhead Muthukumar Murugan University Of Minnesota Minneapolis, USA-55414 Email: murugan@cs.umn.edu David.H.C.Du University

More information

Page Replacement for Write References in NAND Flash Based Virtual Memory Systems

Page Replacement for Write References in NAND Flash Based Virtual Memory Systems Regular Paper Journal of Computing Science and Engineering, Vol. 8, No. 3, September 2014, pp. 1-16 Page Replacement for Write References in NAND Flash Based Virtual Memory Systems Hyejeong Lee and Hyokyung

More information

Meta Paged Flash Translation Layer

Meta Paged Flash Translation Layer Meta Paged Flash Translation Layer Abstract -- Today NAND Flash Memory is used in handheld electronic devices like mobile, cameras, ipods, music players, is also used as an alternative storage medium for

More information

Reducing Solid-State Storage Device Write Stress Through Opportunistic In-Place Delta Compression

Reducing Solid-State Storage Device Write Stress Through Opportunistic In-Place Delta Compression Reducing Solid-State Storage Device Write Stress Through Opportunistic In-Place Delta Compression Xuebin Zhang, Jiangpeng Li, Hao Wang, Kai Zhao and Tong Zhang xuebinzhang.rpi@gmail.com ECSE Department,

More information

Cooperating Write Buffer Cache and Virtual Memory Management for Flash Memory Based Systems

Cooperating Write Buffer Cache and Virtual Memory Management for Flash Memory Based Systems Cooperating Write Buffer Cache and Virtual Memory Management for Flash Memory Based Systems Liang Shi, Chun Jason Xue and Xuehai Zhou Joint Research Lab of Excellence, CityU-USTC Advanced Research Institute,

More information

Threshold-Based Markov Prefetchers

Threshold-Based Markov Prefetchers Threshold-Based Markov Prefetchers Carlos Marchani Tamer Mohamed Lerzan Celikkanat George AbiNader Rice University, Department of Electrical and Computer Engineering ELEC 525, Spring 26 Abstract In this

More information

Chapter 2: Memory Hierarchy Design Part 2

Chapter 2: Memory Hierarchy Design Part 2 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

80 IEEE TRANSACTIONS ON COMPUTERS, VOL. 60, NO. 1, JANUARY Flash-Aware RAID Techniques for Dependable and High-Performance Flash Memory SSD

80 IEEE TRANSACTIONS ON COMPUTERS, VOL. 60, NO. 1, JANUARY Flash-Aware RAID Techniques for Dependable and High-Performance Flash Memory SSD 80 IEEE TRANSACTIONS ON COMPUTERS, VOL. 60, NO. 1, JANUARY 2011 Flash-Aware RAID Techniques for Dependable and High-Performance Flash Memory SSD Soojun Im and Dongkun Shin, Member, IEEE Abstract Solid-state

More information

Delayed Partial Parity Scheme for Reliable and High-Performance Flash Memory SSD

Delayed Partial Parity Scheme for Reliable and High-Performance Flash Memory SSD Delayed Partial Parity Scheme for Reliable and High-Performance Flash Memory SSD Soojun Im School of ICE Sungkyunkwan University Suwon, Korea Email: lang33@skku.edu Dongkun Shin School of ICE Sungkyunkwan

More information

A Hybrid Solid-State Storage Architecture for the Performance, Energy Consumption, and Lifetime Improvement

A Hybrid Solid-State Storage Architecture for the Performance, Energy Consumption, and Lifetime Improvement A Hybrid Solid-State Storage Architecture for the Performance, Energy Consumption, and Lifetime Improvement Guangyu Sun, Yongsoo Joo, Yibo Chen Dimin Niu, Yuan Xie Pennsylvania State University {gsun,

More information

I/O CANNOT BE IGNORED

I/O CANNOT BE IGNORED LECTURE 13 I/O I/O CANNOT BE IGNORED Assume a program requires 100 seconds, 90 seconds for main memory, 10 seconds for I/O. Assume main memory access improves by ~10% per year and I/O remains the same.

More information

Lecture 15: Caches and Optimization Computer Architecture and Systems Programming ( )

Lecture 15: Caches and Optimization Computer Architecture and Systems Programming ( ) Systems Group Department of Computer Science ETH Zürich Lecture 15: Caches and Optimization Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Last time Program

More information

Understanding the Relation between the Performance and Reliability of NAND Flash/SCM Hybrid Solid- State Drive

Understanding the Relation between the Performance and Reliability of NAND Flash/SCM Hybrid Solid- State Drive Understanding the Relation between the Performance and Reliability of NAND Flash/SCM Hybrid Solid- State Drive Abstract: A NAND flash memory/storage-class memory (SCM) hybrid solid-state drive (SSD) can

More information

Pseudo SLC. Comparison of SLC, MLC and p-slc structures. pslc

Pseudo SLC. Comparison of SLC, MLC and p-slc structures. pslc 1 Pseudo SLC In the MLC structures, it contains strong pages and weak pages for 2-bit per cell. Pseudo SLC (pslc) is to store only 1bit per cell data on the strong pages of MLC. With this algorithm, it

More information

VSSIM: Virtual Machine based SSD Simulator

VSSIM: Virtual Machine based SSD Simulator 29 th IEEE Conference on Mass Storage Systems and Technologies (MSST) Long Beach, California, USA, May 6~10, 2013 VSSIM: Virtual Machine based SSD Simulator Jinsoo Yoo, Youjip Won, Joongwoo Hwang, Sooyong

More information

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5 th Edition Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic

More information

NAND Flash-based Storage. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

NAND Flash-based Storage. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University NAND Flash-based Storage Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Today s Topics NAND flash memory Flash Translation Layer (FTL) OS implications

More information

CS3350B Computer Architecture

CS3350B Computer Architecture CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &

More information

[537] Flash. Tyler Harter

[537] Flash. Tyler Harter [537] Flash Tyler Harter Flash vs. Disk Disk Overview I/O requires: seek, rotate, transfer Inherently: - not parallel (only one head) - slow (mechanical) - poor random I/O (locality around disk head) Random

More information

ParaFS: A Log-Structured File System to Exploit the Internal Parallelism of Flash Devices

ParaFS: A Log-Structured File System to Exploit the Internal Parallelism of Flash Devices ParaFS: A Log-Structured File System to Exploit the Internal Parallelism of Devices Jiacheng Zhang, Jiwu Shu, Youyou Lu Tsinghua University 1 Outline Background and Motivation ParaFS Design Evaluation

More information

2 Improved Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers [1]

2 Improved Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers [1] EE482: Advanced Computer Organization Lecture #7 Processor Architecture Stanford University Tuesday, June 6, 2000 Memory Systems and Memory Latency Lecture #7: Wednesday, April 19, 2000 Lecturer: Brian

More information

p-oftl: An Object-based Semantic-aware Parallel Flash Translation Layer

p-oftl: An Object-based Semantic-aware Parallel Flash Translation Layer p-oftl: An Object-based Semantic-aware Parallel Flash Translation Layer Wei Wang, Youyou Lu, and Jiwu Shu Department of Computer Science and Technology, Tsinghua University, Beijing, China Tsinghua National

More information

CS356: Discussion #9 Memory Hierarchy and Caches. Marco Paolieri Illustrations from CS:APP3e textbook

CS356: Discussion #9 Memory Hierarchy and Caches. Marco Paolieri Illustrations from CS:APP3e textbook CS356: Discussion #9 Memory Hierarchy and Caches Marco Paolieri (paolieri@usc.edu) Illustrations from CS:APP3e textbook The Memory Hierarchy So far... We modeled the memory system as an abstract array

More information

Advanced Memory Organizations

Advanced Memory Organizations CSE 3421: Introduction to Computer Architecture Advanced Memory Organizations Study: 5.1, 5.2, 5.3, 5.4 (only parts) Gojko Babić 03-29-2018 1 Growth in Performance of DRAM & CPU Huge mismatch between CPU

More information

Chapter 8. Virtual Memory

Chapter 8. Virtual Memory Operating System Chapter 8. Virtual Memory Lynn Choi School of Electrical Engineering Motivated by Memory Hierarchy Principles of Locality Speed vs. size vs. cost tradeoff Locality principle Spatial Locality:

More information

Workload-Aware Elastic Striping With Hot Data Identification for SSD RAID Arrays

Workload-Aware Elastic Striping With Hot Data Identification for SSD RAID Arrays IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 36, NO. 5, MAY 2017 815 Workload-Aware Elastic Striping With Hot Data Identification for SSD RAID Arrays Yongkun Li,

More information

Chapter 2: Memory Hierarchy Design Part 2

Chapter 2: Memory Hierarchy Design Part 2 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

Improving Performance of Solid State Drives in Enterprise Environment

Improving Performance of Solid State Drives in Enterprise Environment University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Computer Science and Engineering: Theses, Dissertations, and Student Research Computer Science and Engineering, Department

More information

Design and Implementation for Multi-Level Cell Flash Memory Storage Systems

Design and Implementation for Multi-Level Cell Flash Memory Storage Systems Design and Implementation for Multi-Level Cell Flash Memory Storage Systems Amarnath Gaini, K Vijayalaxmi Assistant Professor Department of Electronics VITS (N9), Andhra Pradesh Sathish Mothe Assistant

More information

Lecture: Large Caches, Virtual Memory. Topics: cache innovations (Sections 2.4, B.4, B.5)

Lecture: Large Caches, Virtual Memory. Topics: cache innovations (Sections 2.4, B.4, B.5) Lecture: Large Caches, Virtual Memory Topics: cache innovations (Sections 2.4, B.4, B.5) 1 More Cache Basics caches are split as instruction and data; L2 and L3 are unified The /L2 hierarchy can be inclusive,

More information

Second-Tier Cache Management Using Write Hints

Second-Tier Cache Management Using Write Hints Second-Tier Cache Management Using Write Hints Xuhui Li University of Waterloo Aamer Sachedina IBM Toronto Lab Ashraf Aboulnaga University of Waterloo Shaobo Gao University of Waterloo Kenneth Salem University

More information

Design Considerations for Using Flash Memory for Caching

Design Considerations for Using Flash Memory for Caching Design Considerations for Using Flash Memory for Caching Edi Shmueli, IBM XIV Storage Systems edi@il.ibm.com Santa Clara, CA August 2010 1 Solid-State Storage In a few decades solid-state storage will

More information

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1 CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson

More information

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang CHAPTER 6 Memory 6.1 Memory 341 6.2 Types of Memory 341 6.3 The Memory Hierarchy 343 6.3.1 Locality of Reference 346 6.4 Cache Memory 347 6.4.1 Cache Mapping Schemes 349 6.4.2 Replacement Policies 365

More information

Announcements. ECE4750/CS4420 Computer Architecture L6: Advanced Memory Hierarchy. Edward Suh Computer Systems Laboratory

Announcements. ECE4750/CS4420 Computer Architecture L6: Advanced Memory Hierarchy. Edward Suh Computer Systems Laboratory ECE4750/CS4420 Computer Architecture L6: Advanced Memory Hierarchy Edward Suh Computer Systems Laboratory suh@csl.cornell.edu Announcements Lab 1 due today Reading: Chapter 5.1 5.3 2 1 Overview How to

More information

Memory. From Chapter 3 of High Performance Computing. c R. Leduc

Memory. From Chapter 3 of High Performance Computing. c R. Leduc Memory From Chapter 3 of High Performance Computing c 2002-2004 R. Leduc Memory Even if CPU is infinitely fast, still need to read/write data to memory. Speed of memory increasing much slower than processor

More information

AMC: an adaptive multi-level cache algorithm in hybrid storage systems

AMC: an adaptive multi-level cache algorithm in hybrid storage systems CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. (5) Published online in Wiley Online Library (wileyonlinelibrary.com)..5 SPECIAL ISSUE PAPER AMC: an adaptive multi-level

More information

Lecture notes for CS Chapter 2, part 1 10/23/18

Lecture notes for CS Chapter 2, part 1 10/23/18 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

Lecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections )

Lecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections ) Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections 5.1-5.3) 1 Reducing Miss Rate Large block size reduces compulsory misses, reduces miss penalty in case

More information

Memory Hierarchies &

Memory Hierarchies & Memory Hierarchies & Cache Memory CSE 410, Spring 2009 Computer Systems http://www.cs.washington.edu/410 4/26/2009 cse410-13-cache 2006-09 Perkins, DW Johnson and University of Washington 1 Reading and

More information

10/1/ Introduction 2. Existing Methods 3. Future Research Issues 4. Existing works 5. My Research plan. What is Data Center

10/1/ Introduction 2. Existing Methods 3. Future Research Issues 4. Existing works 5. My Research plan. What is Data Center Weilin Peng Sept. 28 th 2009 What is Data Center Concentrated clusters of compute and data storage resources that are connected via high speed networks and routers. H V A C Server Network Server H V A

More information

Middleware and Flash Translation Layer Co-Design for the Performance Boost of Solid-State Drives

Middleware and Flash Translation Layer Co-Design for the Performance Boost of Solid-State Drives Middleware and Flash Translation Layer Co-Design for the Performance Boost of Solid-State Drives Chao Sun 1, Asuka Arakawa 1, Ayumi Soga 1, Chihiro Matsui 1 and Ken Takeuchi 1 1 Chuo University Santa Clara,

More information

Page 1. Memory Hierarchies (Part 2)

Page 1. Memory Hierarchies (Part 2) Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy

More information

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili Virtual Memory Lecture notes from MKP and S. Yalamanchili Sections 5.4, 5.5, 5.6, 5.8, 5.10 Reading (2) 1 The Memory Hierarchy ALU registers Cache Memory Memory Memory Managed by the compiler Memory Managed

More information

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off

More information

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] CSF Improving Cache Performance [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user

More information

PowerVault MD3 SSD Cache Overview

PowerVault MD3 SSD Cache Overview PowerVault MD3 SSD Cache Overview A Dell Technical White Paper Dell Storage Engineering October 2015 A Dell Technical White Paper TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED AS IS, WITHOUT EXPRESS

More information

Chapter 12 Wear Leveling for PCM Using Hot Data Identification

Chapter 12 Wear Leveling for PCM Using Hot Data Identification Chapter 12 Wear Leveling for PCM Using Hot Data Identification Inhwan Choi and Dongkun Shin Abstract Phase change memory (PCM) is the best candidate device among next generation random access memory technologies.

More information

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Using Synology SSD Technology to Enhance System Performance Synology Inc. Using Synology SSD Technology to Enhance System Performance Synology Inc. Synology_WP_ 20121112 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges... 3 SSD

More information

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM

More information

Memory Management! How the hardware and OS give application pgms:" The illusion of a large contiguous address space" Protection against each other"

Memory Management! How the hardware and OS give application pgms: The illusion of a large contiguous address space Protection against each other Memory Management! Goals of this Lecture! Help you learn about:" The memory hierarchy" Spatial and temporal locality of reference" Caching, at multiple levels" Virtual memory" and thereby " How the hardware

More information

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics

More information

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms The levels of a memory hierarchy CPU registers C A C H E Memory bus Main Memory I/O bus External memory 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms 1 1 Some useful definitions When the CPU finds a requested

More information

Architecture Tuning Study: the SimpleScalar Experience

Architecture Tuning Study: the SimpleScalar Experience Architecture Tuning Study: the SimpleScalar Experience Jianfeng Yang Yiqun Cao December 5, 2005 Abstract SimpleScalar is software toolset designed for modeling and simulation of processor performance.

More information

Lecture: Cache Hierarchies. Topics: cache innovations (Sections B.1-B.3, 2.1)

Lecture: Cache Hierarchies. Topics: cache innovations (Sections B.1-B.3, 2.1) Lecture: Cache Hierarchies Topics: cache innovations (Sections B.1-B.3, 2.1) 1 Types of Cache Misses Compulsory misses: happens the first time a memory word is accessed the misses for an infinite cache

More information

SHRD: Improving Spatial Locality in Flash Storage Accesses by Sequentializing in Host and Randomizing in Device

SHRD: Improving Spatial Locality in Flash Storage Accesses by Sequentializing in Host and Randomizing in Device SHRD: Improving Spatial Locality in Flash Storage Accesses by Sequentializing in Host and Randomizing in Device Hyukjoong Kim 1, Dongkun Shin 1, Yun Ho Jeong 2 and Kyung Ho Kim 2 1 Samsung Electronics

More information

Systems Programming and Computer Architecture ( ) Timothy Roscoe

Systems Programming and Computer Architecture ( ) Timothy Roscoe Systems Group Department of Computer Science ETH Zürich Systems Programming and Computer Architecture (252-0061-00) Timothy Roscoe Herbstsemester 2016 AS 2016 Caches 1 16: Caches Computer Architecture

More information

!! What is virtual memory and when is it useful? !! What is demand paging? !! When should pages in memory be replaced?

!! What is virtual memory and when is it useful? !! What is demand paging? !! When should pages in memory be replaced? Chapter 10: Virtual Memory Questions? CSCI [4 6] 730 Operating Systems Virtual Memory!! What is virtual memory and when is it useful?!! What is demand paging?!! When should pages in memory be replaced?!!

More information

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK] Lecture 17 Adapted from instructor s supplementary material from Computer Organization and Design, 4th Edition, Patterson & Hennessy, 2008, MK] SRAM / / Flash / RRAM / HDD SRAM / / Flash / RRAM/ HDD SRAM

More information

Cache Performance (H&P 5.3; 5.5; 5.6)

Cache Performance (H&P 5.3; 5.5; 5.6) Cache Performance (H&P 5.3; 5.5; 5.6) Memory system and processor performance: CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st

More information

Flash Trends: Challenges and Future

Flash Trends: Challenges and Future Flash Trends: Challenges and Future John D. Davis work done at Microsoft Researcher- Silicon Valley in collaboration with Laura Caulfield*, Steve Swanson*, UCSD* 1 My Research Areas of Interest Flash characteristics

More information